date:20170221

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-21 Thread Andres Freund

On 2017-02-22 00:10:35 -0600, Jim Nasby wrote:
> On 2/20/17 5:04 AM, Andres Freund wrote:
> > On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote:
> > > That being said, I did wonder myself if we should just deprecate float
> > > timestamps as well.
> > 
> > I think we need a proper deprecation period for that, given that the
> > conversion away will be painful for pg_upgrade using people with big
> > clusters.  So I think we should fix this regardless... :(
> 
> I wounder if a separate "floatstamp" data type might fit the bill there. It
> might not be completely seamless, but it would be binary compatible.

I don't really see what'd that solve.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] tablesample with partitioned tables

2017-02-21 Thread Amit Langote

Attached patch fixes an oversight that tablesample cannot be used with
partitioned tables:

create table p (a int) partition by list (a);
select * from p tablesample bernoulli (50);
ERROR:  TABLESAMPLE clause can only be applied to tables and materialized
views

Thanks,
Amit
>From d9c412ea14b005b4c6013026c6c62eab97727c3c Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 22 Feb 2017 16:27:35 +0900
Subject: [PATCH] Partitioned tables support tablesample

---
 src/backend/parser/parse_clause.c |  3 ++-
 src/test/regress/expected/tablesample.out | 18 ++
 src/test/regress/sql/tablesample.sql  |  8 
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index fecc1d6598..b5eae56006 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -907,7 +907,8 @@ transformFromClauseItem(ParseState *pstate, Node *n,
 		rte = rt_fetch(rtr->rtindex, pstate->p_rtable);
 		/* We only support this on plain relations and matviews */
 		if (rte->relkind != RELKIND_RELATION &&
-			rte->relkind != RELKIND_MATVIEW)
+			rte->relkind != RELKIND_MATVIEW &&
+			rte->relkind != RELKIND_PARTITIONED_TABLE)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("TABLESAMPLE clause can only be applied to tables and materialized views"),
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
index 7e91b958ae..b18e420e9b 100644
--- a/src/test/regress/expected/tablesample.out
+++ b/src/test/regress/expected/tablesample.out
@@ -313,3 +313,21 @@ SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
 ERROR:  syntax error at or near "TABLESAMPLE"
 LINE 1: ...CT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPL...
  ^
+-- check partitioned tables support tablesample
+create table parted_sample (a int) partition by list (a);
+create table parted_sample_1 partition of parted_sample for values in (1);
+create table parted_sample_2 partition of parted_sample for values in (2);
+explain (costs off)
+  select * from parted_sample tablesample bernoulli (100);
+QUERY PLAN 
+---
+ Append
+   ->  Sample Scan on parted_sample
+ Sampling: bernoulli ('100'::real)
+   ->  Sample Scan on parted_sample_1
+ Sampling: bernoulli ('100'::real)
+   ->  Sample Scan on parted_sample_2
+ Sampling: bernoulli ('100'::real)
+(7 rows)
+
+drop table parted_sample, parted_sample_1, parted_sample_2;
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
index eec9793496..c39fe4b750 100644
--- a/src/test/regress/sql/tablesample.sql
+++ b/src/test/regress/sql/tablesample.sql
@@ -100,3 +100,11 @@ WITH query_select AS (SELECT * FROM test_tablesample)
 SELECT * FROM query_select TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
 
 SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
+
+-- check partitioned tables support tablesample
+create table parted_sample (a int) partition by list (a);
+create table parted_sample_1 partition of parted_sample for values in (1);
+create table parted_sample_2 partition of parted_sample for values in (2);
+explain (costs off)
+  select * from parted_sample tablesample bernoulli (100);
+drop table parted_sample, parted_sample_1, parted_sample_2;
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-02-21 Thread Robins Tharakan

Stephen,

On 20 February 2017 at 08:50, Stephen Frost  wrote:

> The other changes to use pg_roles instead of pg_authid when rolpassword
> isn't being used look like they should just be changed to use pg_roles
> instead of using one or the other.  That should be an independent patch
> from the one which adds the option we are discussing.
>

Sure. Attached are 2 patches, of which 1 patch just replaces pg_authid
with
pg_roles in pg_dumpall. The only exceptions there are buildShSecLabels()
& pg_catalog.binary_upgrade_set_next_pg_authid_oid() which I thought
should still use pg_authid.


Perhaps --no-role-passwords instead?
>
> Makes Sense. The updated patch uses this name.


> > pg_dumpall --no-pgauthid --globals-only > a.sql
>
> Does that then work with a non-superuser account on a regular PG
> instance also?  If not, I'd like to suggest that we consider follow-on
> patches to provide options for whatever else currently requires
> superuser on a regular install.
>
> If I understand that correctly, the answer is Yes. I didn't test all db
objects,
but trying to do a pg_dumpall using a non-priviledge user does successfully
complete with all existing users dumped successfully.

pg_dumpall --globals-only --no-role-password > a.sql


> Yes, please do create a commitfest entry for this.
>
> Created Commitfest entry.

-
robins


use_pgroles_instead_of_pg_authid.diff.gz
Description: GNU Zip compressed data


no_role_password.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index-only scan

2017-02-21 Thread Rafia Sabih

On Sun, Feb 19, 2017 at 4:02 PM, Robert Haas  wrote:

> Committed, although I neglected to incorporate this change.  Not sure
> if I should go back and do that; it doesn't read too badly as-is.
>
Thanks Robert for committing, and thanks Rahila, Amit, and Tushar for
reviewing and testing the patch.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-02-21 Thread Rafia Sabih

On Wed, Feb 22, 2017 at 12:25 PM, Robert Haas  wrote:
> Looks fine to me.  Committed.  I did move es_queryText to what I think
> is a more appropriate location in the structure definition.
>
> Thanks.
>
Many thanks to Robert for committing and to Kuntal and Amit for reviewing.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

2017-02-21 Thread Michael Paquier

On Fri, Feb 3, 2017 at 1:18 PM, Kyotaro HORIGUCHI
 wrote:
> Thanks to that Heikki have pushed the first two patches and a
> part of the third, only one patch is remaining now.
>
> # Sorry for not separating KOI8 stuffs.
>
> At Tue, 31 Jan 2017 19:06:09 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20170131.190609.254672218.horiguchi.kyot...@lab.ntt.co.jp>
>> > Thanks for the new version, I'll look at it once I am done with the
>> > cleanup of the current CF. For now I have moved it to the CF 2017-03.
>>
>> Agreed. Thank you.
>
> Attached is the latest version on the current master (555494d).
>
> Note: since this patch is created by git diff --irreversble-delete,
> three files mb/Unicode/*.(txt|xml) to be deleted are left alone.

Thanks for the rebase. I have been spending sore time looking at this
patch. The new stuff in convutils.pm is by far the interesting part of
the patch, where the building of the radix trees using a byte
structure looks in pretty good shape after eyeballing the logic for a
couple of hours.

+# ignore backup files of editors
+/*[~#]
+
This does not belong to Postgres core code. You could always set up
that in a global exclude file with core.excludefiles.

In order to conduct sanity checks on the shape of the radix tree maps
compared to the existing maps, having map_checker surely makes sense.
Now in the final result I don't think we need it. The existing map
files ought to be replaced by their radix versions at the end, and
map_checker should be removed. This leads to a couple of
simplifications, like Makefile, and reduces the maintenance to one
mechanism.

+sub print_radix_trees
+{
+   my ($this_script, $csname, $charset) = @_;
+
+   _radix_map($this_script, $csname, "from_unicode", $charset, 78);
+   _radix_map($this_script, $csname, "to_unicode",   $charset, 78);
+}
There is no need for the table width to be defined as a variable (5th
argument). Similarly, to_unicode/from_unicode require checks in
multiple places, this could be just a simple boolean flag. Or if you
want to go to the road of non-simple things, you could have two
arguments: an origin and a target. If one is UTF8 the other is the
mapping name.

+sub dump_charset
+{
+   my ($list, $filt) = @_;
+
+   foreach my $i (@$list)
+   {
+   next if (defined $filt && !&$filt($i));
+   if (!defined $i->{ucs}) { $i->{ucs} = ($i->{utf8}); }
+   printf "ucs=%x, code=%x, direction=%s %s:%d %s\n",
+ $i->{ucs}, $i->{code}, $i->{direction},
+ $i->{f},   $i->{l},$i->{comment};
+   }
+}
This is used nowhere. Perhaps it was useful for debugging at some point?

+# make_charmap - convert charset table to charmap hash
+# with checking duplicate source code
Maybe this should be "with checking of duplicated source codes".

+# print_radix_map($this_script, $csname, $direction, \%charset, $tblwidth)
+#
+# this_script - the name of the *caller script* of this feature
$this_script is not needed at all, you could just use basename($0) and
reduce the number of arguments of the different functions of the
stack.

+   ### amount of zeros that can be ovarlaid.
s/ovarlaid/overlaid.

+# make_mapchecker.pl - Gerates map_checker.h file included by map_checker.c
s/gerates/generates/

+   if (s < 0x80)
+   {
+   fprintf(stderr, "\nASCII character ? (%x)", s);
+   exit(1);
+   }
Most likely a newline at the end of the error string is better here.

+   $charmap{ ucs2utf($src) } = $dst;
+   }
+
+   }
Unnecessary newline here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GRANT EXECUTE ON FUNCTION foo() TO bar();

2017-02-21 Thread Joel Jacobson

Hi Hackers,

Currently, it's only possible to grant/revoke execute on functions to roles.

I think it would be useful in many situations, both for documentation purposes,
but also for increased security, to in a precise way control what
other function(s)
are allowed to execute a specific function.

This would be useful for functions that are not supposed to be used
manually by any human or any other function(s) than the few places
where the function makes sense to use.

Thoughts?

/Joel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-02-21 Thread Robert Haas

On Tue, Feb 21, 2017 at 9:18 AM, Rafia Sabih
 wrote:
> Done.

Looks fine to me.  Committed.  I did move es_queryText to what I think
is a more appropriate location in the structure definition.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2017-02-21 Thread Etsuro Fujita


On 2017/02/21 19:31, Rushabh Lathia wrote:

On Mon, Feb 20, 2017 at 1:41 PM, Etsuro Fujita
> wrote:



On 2017/02/13 18:24, Rushabh Lathia wrote:
Here are few comments:

1)

@@ -211,6 +211,12 @@ typedef struct PgFdwDirectModifyState
 PGresult   *result;/* result for query */
 intnum_tuples;/* # of result tuples */
 intnext_tuple;/* index of next one to
return */
+RelationresultRel;/* relcache entry for the
target table */


Why we need resultRel? Can't we directly use dmstate->rel ?


The reason why we need that is because in get_returning_data, we
pass dmstate->rel to make_tuple_from_result_row, which requires that
dmstate->rel be NULL when the scan tuple is described by
fdw_scan_tlist.  So in that case we set dmstate->rel to NULL and
have dmstate->resultRel that is the relcache entry for the target
relation in postgresBeginDirectModify.



Thanks for the explanation. We might do something here by using
fdw_scan_tlist or changing the assumption of
make_tuple_from_result_row(), and that way we can avoid two similar
variable pointer in the PgFdwDirectModifyState.


I agree that the two similar variables are annoying, to some extent, but 
ISTM that is not that bad because that makes the handling of 
dmstate->rel consistent with that of PgFdwScanState's rel.  As you know, 
PgFdwDirectModifyState is defined in a similar way as PgFdwScanState, in 
which the rel is a relcache entry for the foreign table for a simple 
foreign table scan and NULL for a foreign join scan (see comments for 
the definition of PgFdwScanState).



I am okay with currently also, but it adding a note somewhere about this
would be great. Also let keep this point open for the committer, if
committer feel this is good then lets go ahead with this.


Agreed.


Here are few other cosmetic changes:

1)

+ *
+ * 'target_rel' is either zero or the rangetable index of a target
relation.
+ * In the latter case this construncts FROM clause of UPDATE or USING
clause
+ * of DELETE by simply ignoring the target relation while deparsing the
given

Spell correction: - construncts

2)

+/*
+ * If either input is the target relation, get all the joinclauses.
+ * Otherwise extract conditions mentioning the target relation from
+ * the joinclauses.
+ */


space between joinclauses needed.

3)

+/*
+ * If UPDATE/DELETE on a join, create a RETURINING list used in the
+ * remote query.
+ */
+if (fscan->scan.scanrelid == 0)
+returningList = make_explicit_returning_list(resultRelation,
+ rel,
+ returningList);


Spell correction: RETURINING


Good catch!


I did above changes in the attached patch. Please have  a look once and


I'm fine with that.  Thanks for the patch!


then I feel like this patch is ready for committer.


Thanks for reviewing!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-02-21 Thread Robert Haas

On Tue, Feb 21, 2017 at 3:14 PM, Ashutosh Sharma  wrote:
> Thanks for reporting it. This is because of incorrect data typecasting.
> Attached is the patch that fixes this issue.

Oops, that's probably my fault.  Thanks for the patch; committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-21 Thread Jim Nasby


On 2/20/17 5:04 AM, Andres Freund wrote:

On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote:

That being said, I did wonder myself if we should just deprecate float
timestamps as well.


I think we need a proper deprecation period for that, given that the
conversion away will be painful for pg_upgrade using people with big
clusters.  So I think we should fix this regardless... :(


I wounder if a separate "floatstamp" data type might fit the bill there. 
It might not be completely seamless, but it would be binary compatible.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-21 Thread Jim Nasby


On 2/21/17 4:52 PM, James Cloos wrote:

"TL" == Tom Lane  writes:

TL> The question to be asked is whether there is still anybody out there
TL> using float timestamps.

Gentoo's ebuild includes:

   $(use_enable !pg_legacytimestamp integer-datetimes) \


FWIW, last time I looked it was also an option in FreeBSD's ports, 
though I think it's defaulted to int since forever ago (like, 7.4 era).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Change in "policy" on dump ordering?

2017-02-21 Thread Jim Nasby


On 2/21/17 4:25 PM, Peter Eisentraut wrote:

On 2/21/17 14:58, Jim Nasby wrote:

AFAICT in older versions only object types that absolutely had to wait
for DO_POST_DATA_BOUNDARY would do so. More recently though, objects are
being added after that (presumably because it's easier than renumbering
everything in dbObjectTypePriority).


Is there any specific assignment that you have concerns about?


Originally, no, but reviewing the list again I'm kindof wondering about 
DO_DEFAULT_ACL, especially since the acl code in pg_dump looks at 
defaults as part of what removes the need to explicitly dump 
permissions. I'm also wondering if DO_POLICY could potentially affect 
matviews?


Actually, I think matviews really need to be the absolute last thing. 
What if you had a matview that referenced publications or subscriptions? 
I'm guessing that would be broken right now.



Is this change a good or bad idea? Should there be an official guide for
where new things go?


The comment above dbObjectTypePriority explains it, doesn't it?


Not really; it just makes reference to needing to be in-sync with 
pg_dump.c. My concern is that clearly people went to lengths in the past 
to put everything possible before DO_PRE_DATA_BOUNDARY (ie, text search 
and FDW) but most recently added stuff has gone after 
DO_POST_DATA_BOUNDARY, even though there's no reason it couldn't be 
pre-data. That's certainly a change, and I suspect it's not intentional 
(other than it's obviously less work to stick stuff at the end, but that 
could be fixed by having an array of the actual enum values and just 
having pg_dump sort that when it starts).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Performance degradation in TPC-H Q18

2017-02-21 Thread Kuntal Ghosh

Hello everyone,

While conducting the experiments for parallelism, Rafia came across a
hang in Q18 when plan uses partial and finalize hash aggregate. This
could be seen on both scale factors - 20 and 300, on setting work_mem
high enough so that the query uses hash aggregate. It seems that
commit b81b5a96f424531b97cdd1dba97d9d1b9c9d372e does not solve the
issue completely.

I've tested on scale factor 10 and work_mem 100mb and the issue is
reproducible. Below is the simplified part of the query which uses the
hash aggregate.

tpch10GB=# explain select l_orderkey from lineitem group by l_orderkey;

   QUERY PLAN

 Finalize HashAggregate  (cost=1612872.73..1616801.32 rows=392859 width=8)
   Group Key: l_orderkey
   ->  Gather  (cost=1528408.04..1610908.43 rows=785718 width=8)
 Workers Planned: 2
 ->  Partial HashAggregate  (cost=1527408.04..1531336.63
rows=392859 width=8)
   Group Key: l_orderkey
   ->  Parallel Seq Scan on lineitem
(cost=0.00..1464922.43 rows=24994243 width=8)
(7 rows)

I've also tried to execute the same query with a different column name.

tpch10GB=# explain analyze select l_suppkey from lineitem group by l_suppkey;
  QUERY PLAN
-
 Finalize HashAggregate  (cost=1549511.80..1550493.37 rows=98157 width=8)
(actual time=22363.901..22380.361 rows=10 loops=1)
   Group Key: l_suppkey
   ->  Gather  (cost=1528408.04..1549021.01 rows=196314 width=8)
(actual time=22107.387..22244.848 rows=30 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Partial HashAggregate  (cost=1527408.04..1528389.61
rows=98157 width=8)
(actual time=22100.776..22135.063 rows=10 loops=3)
   Group Key: l_suppkey
   ->  Parallel Seq Scan on lineitem
(cost=0.00..1464922.43 rows=24994243 width=8)
(actual time=0.846..10664.258 rows=19995351 loops=3)
 Planning time: 0.171 ms
 Execution time: 22393.030 ms
(10 rows)

And, it finished in time although the stats in both the plans are
exactly same. Hence, the problem is with the query itself. As Robert
suggested offline, I've produced the perf report for the query.

99.44% 0.01%  postgres  postgres   [.] LookupTupleHashEntry
74.31%53.55%  postgres  postgres   [.] tuplehash_insert
17.05%17.04%  postgres  libc-2.17.so   [.] __memcpy_ssse3_back
11.06%11.05%  postgres  postgres   [.] tuplehash_prev
10.81%10.80%  postgres  postgres   [.] tuplehash_next

It's clear from the perf report that something terrible is going on in
the simple hash insert. I've enabled the logging for hash
ops(SH_STAT()) during hash growing phase.

2017-02-21 16:42:39.634 IST [9372] LOG:  size: 1048576, members:
838860, filled: 0.79,
total chain: 2008236860, max chain: 66743, avg chain: 2394.007176,
total_collisions: 278348, max_collisions: 8, avg_collisions: 0.331817
2017-02-21 16:42:39.634 IST [9372] STATEMENT:  explain analyze select
l_orderkey from lineitem group by l_orderkey;

2017-02-21 16:42:39.741 IST [9372] LOG:  size: 2097152, members:
838860, filled: 0.40,
total chain: 788175, max chain: 124, avg chain: 0.939579,
total_collisions: 216052, max_collisions: 6, avg_collisions: 0.257554
2017-02-21 16:42:39.741 IST [9372] STATEMENT:  explain analyze select
l_orderkey from lineitem group by l_orderkey;

The value of max chain length is really high which hampers the query's
performance. To further understand the scenario, I've taken the
following results:

tpch10GB=# select count(distinct l_orderkey) from lineitem;
  count
--
 1500

tpch10GB=# select correlation from pg_stats where tablename='lineitem'
and attname='l_orderkey';
 correlation
-
   1

tpch10GB=# select count(distinct l_suppkey) from lineitem;
 count

 10

tpch10GB=# select correlation from pg_stats where tablename='lineitem'
and attname='l_suppkey';
 correlation
-
  0.00900259

It seems that l_orderkey produces a pretty bad case for hashing with
linear probing and a bad choice of hash function(as linear probing is
sensitive to elements with nearby hash codes). In [1], Andres's
proposed a hack to resize the hashtable when the chain is growing
quickly. Below is a comment from the patch explaining the objective:

+/*
+ * To avoid negative consequences from overly imbalanced
+ * hashtables, grow the hashtable if collisions lead to large
+ * runs. The most likely cause of such imbalance is filling a
+ * (currently) small table, from a currently big one, in
+

Re: [HACKERS] mat views stats

2017-02-21 Thread Jim Nasby


On 2/21/17 4:22 PM, Peter Eisentraut wrote:

Attached is a patch to trigger autovacuum based on a matview refresh
along with a system view pg_stat_all_matviews to show information more
meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.


Certainly easier, but I don't think it'd be better. Matviews really 
aren't the same thing as tables. Off-hand (without reviewing the patch), 
update and delete counts certainly wouldn't make any sense. "Insert" 
counts might, in as much as it's how many rows have been added by 
refreshes. You'd want a refresh count too.



I think these should be two separate patches.  We might want to
backpatch the first one.


+1; definitely sounds like a bug to me.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dropping partitioned tables without CASCADE

2017-02-21 Thread Amit Langote

On 2017/02/22 13:46, Ashutosh Bapat wrote:
> Looks good to me. In the attached patch I have added a comment
> explaining the reason to make partition tables "Auto" dependent upon
> the corresponding partitioned tables.

Good call.

+   /*
+* Unlike inheritance children, partition tables are expected to be 
dropped
+* when the parent partitioned table gets dropped.
+*/

Hmm.  Partitions *are* inheritance children, so we perhaps don't need the
part before the comma.  Also, adding "automatically" somewhere in there
would be nice.

Or, one could just write: /* add an auto dependency for partitions */

> In the tests we are firing commands to drop partitioned table, but are
> not checking whether those tables or the partitions are getting
> dropped or not. Except for drop_if_exists.sql, I did not find that we
> really check this. Should we try a query on pg_class to ensure that
> the tables get really dropped?

I don't see why this patch should do it, if dependency.sql itself does
not?  I mean dropping AUTO dependent objects is one of the contracts of
dependency.c, so perhaps it would make sense to query pg_class in
dependency.sql to check if AUTO dependencies work correctly.

Thanks,
Amit
>From 682624f4562087bb05b2ff9f282080bcfcfb5233 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 16 Feb 2017 15:56:44 +0900
Subject: [PATCH] Allow dropping partitioned table without CASCADE

Currently, a normal dependency is created between a inheritance
parent and child when creating the child.  That means one must
specify CASCADE to drop the parent table if a child table exists.
When creating partitions as inheritance children, create auto
dependency instead, so that partitions are dropped automatically
when the parent is dropped i.e., without specifying CASCADE.
---
 src/backend/commands/tablecmds.c   | 26 ++
 src/test/regress/expected/alter_table.out  | 10 --
 src/test/regress/expected/create_table.out |  9 ++---
 src/test/regress/expected/inherit.out  | 22 ++
 src/test/regress/expected/insert.out   |  7 ++-
 src/test/regress/expected/update.out   |  7 +--
 src/test/regress/sql/alter_table.sql   | 10 --
 src/test/regress/sql/create_table.sql  |  9 ++---
 src/test/regress/sql/inherit.sql   |  4 ++--
 src/test/regress/sql/insert.sql|  7 ++-
 src/test/regress/sql/update.sql|  2 +-
 11 files changed, 40 insertions(+), 73 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3cea220421..cf566f974b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -289,9 +289,11 @@ static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
 static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel);
 static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel);
-static void StoreCatalogInheritance(Oid relationId, List *supers);
+static void StoreCatalogInheritance(Oid relationId, List *supers,
+		bool child_is_partition);
 static void StoreCatalogInheritance1(Oid relationId, Oid parentOid,
-		 int16 seqNumber, Relation inhRelation);
+		 int16 seqNumber, Relation inhRelation,
+		 bool child_is_partition);
 static int	findAttrByName(const char *attributeName, List *schema);
 static void AlterIndexNamespaces(Relation classRel, Relation rel,
    Oid oldNspOid, Oid newNspOid, ObjectAddresses *objsMoved);
@@ -725,7 +727,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		  typaddress);
 
 	/* Store inheritance information for new rel. */
-	StoreCatalogInheritance(relationId, inheritOids);
+	StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL);
 
 	/*
 	 * We must bump the command counter to make the newly-created relation
@@ -2240,7 +2242,8 @@ MergeCheckConstraint(List *constraints, char *name, Node *expr)
  * supers is a list of the OIDs of the new relation's direct ancestors.
  */
 static void
-StoreCatalogInheritance(Oid relationId, List *supers)
+StoreCatalogInheritance(Oid relationId, List *supers,
+		bool child_is_partition)
 {
 	Relation	relation;
 	int16		seqNumber;
@@ -2270,7 +2273,8 @@ StoreCatalogInheritance(Oid relationId, List *supers)
 	{
 		Oid			parentOid = lfirst_oid(entry);
 
-		StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation);
+		StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation,
+ child_is_partition);
 		seqNumber++;
 	}
 
@@ -2283,7 +2287,8 @@ StoreCatalogInheritance(Oid relationId, List *supers)
  */
 static void
 StoreCatalogInheritance1(Oid relationId, Oid parentOid,
-		 int16 seqNumber, Relation inhRelation)
+		 int16 seqNumber, Relation inhRelation,
+		 bool child_is_partition)
 {

Re: [HACKERS] Adding new output parameter of pg_stat_statements to identify operation of the query.

2017-02-21 Thread Jim Nasby


On 2/19/17 10:02 PM, Tom Lane wrote:

Jim Nasby  writes:

Something that needs to be considered with doing this in
pg_stat_statement is that a query that's reported there can contain
multiple SQL statements. I don't remember offhand if all statements get
parsed as a whole before anything else happens; if that's the case then
you could potentially have an array in pg_stat_statements indicating
what the command tags are.

I think that's been addressed as of 83f2061dd.

My own concern here is that pg_stat_statements shared hashtable entries
(pgssEntry) are currently 200 bytes, if I counted correctly.  It's hard
to see how to implement this feature without adding COMPLETION_TAG_BUFSIZE
(64 bytes) to that, which is kind of a large percentage bump for a feature
request that AFAIR nobody else has ever made.


AFAIK the only variable part of any tag is the rowcount from SELECT (if 
that's even part of the tag?)... so couldn't tags be switched over to an 
enum, at least internally?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign partition DDL regression tests

2017-02-21 Thread Amit Langote

On 2017/02/22 13:26, Ashutosh Bapat wrote:
> Please add this to the upcoming commitfest.

Done.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-02-21 Thread Rafia Sabih

Hello everybody,

In the current version, queries in SQL or PL functions can not
leverage parallelism. To relax this restriction I prepared a patch,
the approach used in the patch is explained next,

Approach:

1. Allow parallelism for queries in PL functions by passing
CURSOR_OPT_PARALLEL_OK instead of 0 to exec_prepare_plan called from
exec_stmt_execsql or exec_stmt_dynexecute. Similarly, pass
CURSOR_OPT_PARALLEL_OK instead of 0 to SPI_execute and exec_run_select
called from exec_stmt_dynexecute. CURSOR_OPT_PARALLEL_OK is passed to
these functions after checking if the statement is not trigger, since
in that case using parallelism may not be efficient.

2. In ExecutePlan there is an additional check to see if the query is
coming from SQL or PL functions and is having a parallel plan. In that
case we ignore the check of numberTuples since it is always one for
these functions and existing checks restrict parallelism for these
cases. Though, I understand this may not be the most elegant way to do
this, and would be pleased to know some better alternative.


I have attached a sql file containing cases for some pgpsql, perl,
python functions and an .out file which contains the parallel plans
for the queries in these functions after this patch. This might be
helpful in understanding the level of parallelism this patch is
relaxing for PL functions.

Thanks to my colleagues Amit Kapila and Dilip Kumar for discussions in
this regard.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pl_parallel_v1.patch
Description: Binary data


pl_parallel.out
Description: Binary data


pl_parallel_test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dropping partitioned tables without CASCADE

2017-02-21 Thread Ashutosh Bapat

Looks good to me. In the attached patch I have added a comment
explaining the reason to make partition tables "Auto" dependent upon
the corresponding partitioned tables.

In the tests we are firing commands to drop partitioned table, but are
not checking whether those tables or the partitions are getting
dropped or not. Except for drop_if_exists.sql, I did not find that we
really check this. Should we try a query on pg_class to ensure that
the tables get really dropped?

On Wed, Feb 22, 2017 at 7:26 AM, Amit Langote
 wrote:
> On 2017/02/22 10:49, Amit Langote wrote:
>> On 2017/02/21 20:17, Ashutosh Bapat wrote:
>>> Are you sure you have attached the right patch?
>>
>> Oops, really fixed this time.
>
> Sorry again, 3rd time's a charm.  I copy-paste the hunk below from the
> patch file before I attach it to make sure:
>
> -   recordDependencyOn(, , DEPENDENCY_NORMAL);
> +   if (child_is_partition)
> +   recordDependencyOn(, , 
> DEPENDENCY_AUTO);
> +   else
> +   recordDependencyOn(, , 
> DEPENDENCY_NORMAL);
>
> Thanks,
> Amit

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

0001-Allow-dropping-partitioned-table-without-CASCADE_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign partition DDL regression tests

2017-02-21 Thread Ashutosh Bapat

Please add this to the upcoming commitfest.

On Wed, Feb 22, 2017 at 7:10 AM, Amit Langote
 wrote:
> Ashutosh Bapat pointed out [0] that regression tests are missing for the
> foreign partition DDL commands.  Attached patch takes care of that.
>
> Thanks,
> Amit
>
> [0]
> https://www.postgresql.org/message-id/CAFjFpRcrdzBRj0cZ%2BJAQmfSa2Tv8wSEcWAeYtDpV-YZnNna2sA%40mail.gmail.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "may be unused" warnings for gcc

2017-02-21 Thread Andres Freund

On 2017-02-21 17:20:44 -0500, Peter Eisentraut wrote:
> On 2/20/17 09:41, Andres Freund wrote:
> > When building with a new-ish gcc (6.3.0 right now, but I've seen this
> > for a while) with optimization I get a number of warnings:
> 
> These all look like related to inlining/-O3.
> 
> I have attempted to fix these in the past, but I have found that -O3
> doesn't get any performance improvement, so I haven't bothered lately.

I've not run comparisons this year, but late last year I was seeing > 5%
< 10% benefits - that seems plenty enough to care.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Statement timeout behavior in extended queries

2017-02-21 Thread Tatsuo Ishii

Last year I have proposed an enhancement regarding behavior of the
statement timeout in extended queries.

https://www.postgresql.org/message-id/20160528.220442.1489791680347556026.t-ishii%40sraoss.co.jp

IMO the current behavior is counter intuitive and I would like to
change it toward PostgreSQL 10.0.

For example, suppose that the timeout is set to 4 seconds and the
first query takes 2 seconds and the second query takes 3 seconds. Then
the statement timeout is triggered if a sync message is sent to
backend after the second query.

Moreover, log_duration or log_min_duration_statement shows that each
query took 2 or 3 seconds of course, which is not very consistent with
the statement timeout IMO.

Attached patch tries to change the behavior, by checking statement
timeout against each phase of an extended query.

To test the patch, I have created a small tool called "pgproto", which
can issue arbitrary sequence of frontend/backend message, reading from a
text file.

https://github.com/tatsuo-ishii/pgproto
(to build the program, you need C compiler and libpq)

A test data is here:
--
#
# Test case for statement timeout patch.
#
'Q' "SET statement_timeout = '4s'"

# Receive response from backend
'Y'

# Execute statement which takes 3 seconds.
'P' "S1""SELECT pg_sleep(3)"0
'B' ""  "S1"0   0   0
'E' ""  0
'C' 'S' "S1"

# Execute statement which takes 2 seconds.
'P' "S2""SELECT pg_sleep(2)"0
'B' ""  "S2"0   0   0
'E' ""  0
'C' 'S' "S2"

# Issue Sync message
'S'

# Receive response from backend
'Y'

# Send terminate message
'X'
--

In each row, the first column corresponds to the message type defined
in frontend/backend protocol (except 'Y', which asks pgproto to
collect responses from backend). Each column is separated with a tab
character.

To run the test:

pgproto -f data_file -p port_number -d database_name

With the attached patch, "SELECT pg_sleep(3)" and "SELECT pg_sleep(2)"
does not trigger the statement timeout as expected, while existing
code triggers the statement timeout after the sync message is sent.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b07d6c6..e35a1dd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -149,6 +149,11 @@ static bool doing_extended_query_message = false;
 static bool ignore_till_sync = false;
 
 /*
+ * Flag to keep track of whether we have started statement timeout timer.
+ */
+static bool st_timeout = false;
+
+/*
  * If an unnamed prepared statement exists, it's stored here.
  * We keep it separate from the hashtable kept by commands/prepare.c
  * in order to reduce overhead for short-lived queries.
@@ -188,6 +193,8 @@ static bool IsTransactionStmtList(List *pstmts);
 static void drop_unnamed_stmt(void);
 static void SigHupHandler(SIGNAL_ARGS);
 static void log_disconnections(int code, Datum arg);
+static void enable_statement_timeout(void);
+static void disable_statement_timeout(void);
 
 
 /* 
@@ -1234,6 +1241,11 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	start_xact_command();
 
 	/*
+	 * Set statement timeout running, if any
+	 */
+	enable_statement_timeout();
+
+	/*
 	 * Switch to appropriate context for constructing parsetrees.
 	 *
 	 * We have two strategies depending on whether the prepared statement is
@@ -1521,6 +1533,11 @@ exec_bind_message(StringInfo input_message)
 	 */
 	start_xact_command();
 
+	/*
+	 * Set statement timeout running, if any
+	 */
+	enable_statement_timeout();
+
 	/* Switch back to message context */
 	MemoryContextSwitchTo(MessageContext);
 
@@ -1937,6 +1954,11 @@ exec_execute_message(const char *portal_name, long max_rows)
 	start_xact_command();
 
 	/*
+	 * Set statement timeout running, if any
+	 */
+	enable_statement_timeout();
+
+	/*
 	 * If we re-issue an Execute protocol request against an existing portal,
 	 * then we are only fetching more rows rather than completely re-executing
 	 * the query from the start. atStart is never reset for a v3 portal, so we
@@ -2008,6 +2030,11 @@ exec_execute_message(const char *portal_name, long max_rows)
 			 * those that start or end a transaction block.
 			 */
 			CommandCounterIncrement();
+
+			/*
+			 * We need to reset statement timeout if already set.
+			 */
+			disable_statement_timeout();
 		}
 
 		/* Send appropriate CommandComplete to client */
@@ -2437,14 +2464,10 @@ start_xact_command(void)
 	{
 		StartTransactionCommand();
 
-		/* Set statement timeout running, if any */
-		/* NB: this mustn't be enabled until we are within an xact */
-		if (StatementTimeout > 0)
-

Re: [HACKERS] Documentation improvements for partitioning

2017-02-21 Thread Robert Haas

On Tue, Feb 21, 2017 at 10:27 PM, Yugo Nagata  wrote:
> There is a very small typo that a comma is missing.
> Attached is the patch to fix it.

Thank you.  I have committed your patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch proposal

2017-02-21 Thread Venkata B Nagothi

On Tue, Jan 31, 2017 at 10:41 AM, Michael Paquier  wrote:

> On Tue, Jan 31, 2017 at 4:49 AM, David Steele  wrote:
> > On 1/27/17 3:19 AM, Venkata B Nagothi wrote:
> >> I have split the patch into two different
> >> pieces. One is to determine if the recovery can start at all and other
> >> patch deals with the incomplete recovery situation.
> >
> > I think the first patch looks promising and I would rather work through
> > that before considering the second patch.  Once there are tests for the
> > first patch I will complete my review.
>
> Based on that, I am moving the patch to next CF with "Needs Review".
> Venkata, please be careful in updating correctly the patch status, it
> was still on "Waiting on Author".
>

Apologies. Sure. Will make a note.

Regards,

Venkata B N
Database Consultant

Re: [HACKERS] Documentation improvements for partitioning

2017-02-21 Thread Robert Haas

On Tue, Feb 21, 2017 at 2:03 AM, Bruce Momjian  wrote:
> I have to admit my reaction was similar to Simon's, meaning that the
> lack of docs is a problem, and that the limitations are kind of a
> surprise, and I wonder what other surprises there are.

Did you read my message upthread pointing out that the initial commit
contained hundreds of lines of documentation?  I agree that it would
be bad if table partitioning got committed with no documentation, but
that did not happen.

> I am thinking this is a result of small teams, often from the same
> company, working on a features in isolation and then making them public.
> It is often not clear what decisions were made and why.

That also did not happen, or at least certainly not with this patch.
All of the discussion was public and on the mailing list.  I never
communicated with Amit Langote off-list about this work, except
shortly before I committed his patches I added him on Skype and gave
him a heads up that I was planning to do so real soon.   At no time
have the two of us worked for the same company.  Also, the patch had 7
other reviewers credited in the commit messages spread across, I
think, 4 different companies.

I think the issue here might be that with this feature, as with some
other features I've committed over the last few years, the email
discussion got very long.  On the one hand, that does make it hard for
others to keep up, but not because anything is secret, only because
reading hundreds of email messages takes a lot of time.  However, the
large number of emails on a public mailing list makes it absolutely
clear that this wasn't developed in isolation and presented as a done
deal.  It was written and rewritten multiple times in response to
feedback, not only from me but from other people who did take the time
to keep up with the discussion.  As Ashutosh Bapat and Amit Langote
already pointed out, I even posted (on a separate thread with a clear
subject line) some thoughts about the overall design of this feature,
in response to concerns articulated on an unrelated thread by Tom and
Alvaro.  I did that in an attempt to give people a separate thread on
which to discuss those issues - without having to dive into the main
thread where things were being hashed out in detail - but it got no
responses, either because the logic was unarguable or because nobody
took the time to write back.

I think there's certainly room to criticize cases where a feature is
designed, developed, and committed almost entirely by people who work
at a single company, or where a significant amount of discussion
happens off-list, but it would be difficult to find a more clear-cut
case of where that DIDN'T happen than this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch proposal

2017-02-21 Thread Venkata B Nagothi

On Tue, Jan 31, 2017 at 6:49 AM, David Steele  wrote:

> On 1/27/17 3:19 AM, Venkata B Nagothi wrote:
>
> > I will be adding the tests in
> > src/test/recovery/t/003_recovery_targets.pl
> > . My tests would look more or less
> > similar to recovery_target_action. I do not see any of the tests added
> > for the parameter recovery_target_action ? Anyways, i will work on
> > adding tests for recovery_target_incomplete.
>
> Do you know when those will be ready?
>

Apologies for the delayed response. Actually, I am working through to add
the tests for both the patches and started with adding the tests to
recovery_target_incomplete parameter.
I have hit an issue while adding the tests which i have explained below.

> >
> >
> > 4) I'm not convinced that fatal errors by default are the way to go.
> > It's a pretty big departure from what we have now.
> >
> >
> > I have changed the code to generate ERRORs instead of FATALs. Which is
> > in the patch recoveryStartPoint.patch
>
> I think at this point in recovery any ERROR at all will probably be
> fatal.  Once you have some tests in place we'll know for sure.
>

Sure. I can add the tests for the first patch.

Either way, the goal would be to keep recovery from proceeding and let
> the user adjust their targets.  Since this is a big behavioral change
> which will need buy in from the community.
>

Sure. Agreed.

> > Also, I don't think it's very intuitive how
> recovery_target_incomplete
> > works.  For instance, if I set recovery_target_incomplete=promote
> and
> > set recovery_target_time, but pick a backup after the specified time,
> > then there will be an error "recovery_target_time %s is older..."
> rather
> > than a promotion.
> >
> >
> > Yes, that is correct and that is the expected behaviour. Let me explain -
>
> My point was that this combination of options could lead to confusion on
> the part of users.  However, I'd rather leave that alone for the time
> being and focus on the first patch.
>

Sure.

> > I have split the patch into two different
> > pieces. One is to determine if the recovery can start at all and other
> > patch deals with the incomplete recovery situation.
>
> I think the first patch looks promising and I would rather work through
> that before considering the second patch.  Once there are tests for the
> first patch I will complete my review.
>

Sure. Will submit the first patch along with tests once i can get through
the issue i am facing.

*The issue i am facing while writing the tests*

Below is the problem i am facing while adding the tests to
003_recovery_targets.pl

1. Test case : Test the Incomplete recovery with parameters
restore_command, recovery_target_xid and recovery_target_incomplete
='shutdown' (option 'pause' and 'promote' work fine)

2. Expected test Result : Standby node successfully shuts-down if the
recovery process reaches mid-way and does not reach the intended recovery
target due to unavailability of WALs

3. All good. Everything works as expected. The problem is that, the test
exits with the exit code 256, bails out and shows-up as "pg_ctl failed".

Is there anyway, i can ensure the test does not bail-out and exits with the
exit code '0' when you do "pg_ctl start" and the server starts and
shuts-down for any reason ?

If you look at the log file, pg_ctl actually starts successfully and then
shuts down cleanly because intended recovery target is not reached. The
messages in the log file are also appropriate.
Since it is a clean-start and clean-shutdown (which means the test is
successful), the test should return a success, which is not happening.

Below is the log (a test for the second patch). I believe Perl generates an
error code 256 because pg_ctl responds with an error - "pg_ctl: could not
start the server".
I would need my test case to return a success in this scenario.  Any
suggestions would be great.

[dba@buildhost ~]$  /data/PostgreSQL-Repo/postgresql/tmp_install/disk1/
pginstall/pgsql-10-dev-master-xlog-2-wal/bin/pg_ctl -D
/data/PostgreSQL-Repo/postgresql/src/test/recovery/
tmp_check/data_pitr_3_U_oA/pgdata start
waiting for server to start2017-02-22 12:06:41.773 AEDT [23858] LOG:
 database system was interrupted while in recovery at log time 2017-02-15
16:40:41 AEDT
2017-02-22 12:06:41.773 AEDT [23858] HINT:  If this has occurred more than
once some data might be corrupted and you might need to choose an earlier
recovery target.
2017-02-22 12:06:41.773 AEDT [23858] LOG:  starting point-in-time recovery
to XID 559
2017-02-22 12:06:41.782 AEDT [23858] LOG:  restored log file
"00010002" from archive
2017-02-22 12:06:41.784 AEDT [23858] LOG:  redo starts at 0/228
2017-02-22 12:06:41.794 AEDT [23858] LOG:  restored log file
"00010003" from archive
2017-02-22 12:06:41.814 AEDT [23858] LOG:  restored log file
"00010004" from archive
2017-02-22 12:06:41.835 AEDT [23858]

Re: [HACKERS] Speedup twophase transactions

2017-02-21 Thread Michael Paquier

On Wed, Feb 22, 2017 at 7:03 AM, Alvaro Herrera
 wrote:
> I have pushed the TAP test file, which is already passing, at least for
> me.  Let's see what the buildfarm says.

Thanks. I still have on my sheet to look at the latest 2PC patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-02-21 Thread Petr Jelinek

On 13/12/16 00:38, Petr Jelinek wrote:
> On 12/12/16 23:33, Andres Freund wrote:
>> On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
>>> On 12/12/16 22:42, Andres Freund wrote:
 Hi,

 On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
> Hi,
> First one is outright bug, which has to do with how we track running
> transactions. What snapbuild basically does while doing initial snapshot
> is read the xl_running_xacts record, store the list of running txes and
> then wait until they all finish. The problem with this is that
> xl_running_xacts does not ensure that it only logs transactions that are
> actually still running (to avoid locking PGPROC) so there might be xids
> in xl_running_xacts that already committed before it was logged.

 I don't think that's actually true?  Notice how LogStandbySnapshot()
 only releases the lock *after* the LogCurrentRunningXacts() iff
 wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
 observed must actually be a bit more complex :(

>>>
>>> Hmm, interesting, I did see the transaction commit in the WAL before the
>>> xl_running_xacts that contained the xid as running. I only seen it on
>>> production system though, didn't really manage to easily reproduce it
>>> locally.
>>
>> I suspect the reason for that is that RecordTransactionCommit() doesn't
>> conflict with ProcArrayLock in the first place - only
>> ProcArrayEndTransaction() does.  So they're still running in the PGPROC
>> sense, just not the crash-recovery sense...
>>
> 
> That looks like reasonable explanation. BTW I realized my patch needs
> bit more work, currently it will break the actual snapshot as it behaves
> same as if the xl_running_xacts was empty which is not correct AFAICS.
> 

Hi,

I got to work on this again. Unfortunately I haven't found solution that
I would be very happy with. What I did is in case we read
xl_running_xacts which has all transactions we track finished, we start
tracking from that new xl_running_xacts again with the difference that
we clean up the running transactions based on previously seen committed
ones. That means that on busy server we may wait for multiple
xl_running_xacts rather than just one, but at least we have chance to
finish unlike with current coding which basically waits for empty
xl_running_xacts. I also removed the additional locking for logical
wal_level in LogStandbySnapshot() since it does not work.

I also identified another bug in snapbuild while looking at the code.
That is the logical decoding will try to use on disk serialized snapshot
for initial snapshot export when it can. The problem is that these
snapshots are quite special and are not really usable as snapshots for
data (ie, the logical decoding snapshots regularly have xmax smaller
than xmin). So then when client tries to use this exported snapshot it
gets completely wrong data as the snapshot is broken. I think this is
explanation for Erik Rijker's problems with the initial COPY patch for
logical replication. At least for me the issues go away when I disable
use of the ondisk snapshots.

I didn't really find better solution than that though (disabling the use
of ondisk snapshots for initial consistent snapshot).

So to summarize attached patches:
0001 - Fixes performance issue where we build tons of snapshots that we
don't need which kills CPU.

0002 - Disables the use of ondisk historical snapshots for initial
consistent snapshot export as it may result in corrupt data. This
definitely needs backport.

0003 - Fixes bug where we might never reach snapshot on busy server due
to race condition in xl_running_xacts logging. The original use of extra
locking does not seem to be enough in practice. Once we have agreed fix
for this it's probably worth backpatching. There are still some comments
that need updating, this is more of a PoC.

Thoughts or better ideas?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From e8d4dc52bc9dc7e5b17f4e374319fda229b19e61 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Tue, 21 Feb 2017 19:58:18 +0100
Subject: [PATCH 1/3] Skip unnecessary snapshot builds

When doing initial snapshot build during logical decoding
initialization, don't build snapshots for transactions where we know the
transaction didn't do any catalog changes. Otherwise we might end up
with thousands of useless snapshots on busy server which can be quite
expensive.
---
 src/backend/replication/logical/snapbuild.c | 82 +++--
 1 file changed, 53 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index ed9f69f..c2476a9 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -972,6 +972,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,

Re: [HACKERS] dropping partitioned tables without CASCADE

2017-02-21 Thread Amit Langote

On 2017/02/22 10:49, Amit Langote wrote:
> On 2017/02/21 20:17, Ashutosh Bapat wrote:
>> Are you sure you have attached the right patch?
> 
> Oops, really fixed this time.

Sorry again, 3rd time's a charm.  I copy-paste the hunk below from the
patch file before I attach it to make sure:

-   recordDependencyOn(, , DEPENDENCY_NORMAL);
+   if (child_is_partition)
+   recordDependencyOn(, , 
DEPENDENCY_AUTO);
+   else
+   recordDependencyOn(, , 
DEPENDENCY_NORMAL);

Thanks,
Amit
>From 682624f4562087bb05b2ff9f282080bcfcfb5233 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 16 Feb 2017 15:56:44 +0900
Subject: [PATCH] Allow dropping partitioned table without CASCADE

Currently, a normal dependency is created between a inheritance
parent and child when creating the child.  That means one must
specify CASCADE to drop the parent table if a child table exists.
When creating partitions as inheritance children, create auto
dependency instead, so that partitions are dropped automatically
when the parent is dropped i.e., without specifying CASCADE.
---
 src/backend/commands/tablecmds.c   | 26 ++
 src/test/regress/expected/alter_table.out  | 10 --
 src/test/regress/expected/create_table.out |  9 ++---
 src/test/regress/expected/inherit.out  | 22 ++
 src/test/regress/expected/insert.out   |  7 ++-
 src/test/regress/expected/update.out   |  7 +--
 src/test/regress/sql/alter_table.sql   | 10 --
 src/test/regress/sql/create_table.sql  |  9 ++---
 src/test/regress/sql/inherit.sql   |  4 ++--
 src/test/regress/sql/insert.sql|  7 ++-
 src/test/regress/sql/update.sql|  2 +-
 11 files changed, 40 insertions(+), 73 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3cea220421..cf566f974b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -289,9 +289,11 @@ static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
 static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel);
 static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel);
-static void StoreCatalogInheritance(Oid relationId, List *supers);
+static void StoreCatalogInheritance(Oid relationId, List *supers,
+		bool child_is_partition);
 static void StoreCatalogInheritance1(Oid relationId, Oid parentOid,
-		 int16 seqNumber, Relation inhRelation);
+		 int16 seqNumber, Relation inhRelation,
+		 bool child_is_partition);
 static int	findAttrByName(const char *attributeName, List *schema);
 static void AlterIndexNamespaces(Relation classRel, Relation rel,
    Oid oldNspOid, Oid newNspOid, ObjectAddresses *objsMoved);
@@ -725,7 +727,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		  typaddress);
 
 	/* Store inheritance information for new rel. */
-	StoreCatalogInheritance(relationId, inheritOids);
+	StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL);
 
 	/*
 	 * We must bump the command counter to make the newly-created relation
@@ -2240,7 +2242,8 @@ MergeCheckConstraint(List *constraints, char *name, Node *expr)
  * supers is a list of the OIDs of the new relation's direct ancestors.
  */
 static void
-StoreCatalogInheritance(Oid relationId, List *supers)
+StoreCatalogInheritance(Oid relationId, List *supers,
+		bool child_is_partition)
 {
 	Relation	relation;
 	int16		seqNumber;
@@ -2270,7 +2273,8 @@ StoreCatalogInheritance(Oid relationId, List *supers)
 	{
 		Oid			parentOid = lfirst_oid(entry);
 
-		StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation);
+		StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation,
+ child_is_partition);
 		seqNumber++;
 	}
 
@@ -2283,7 +2287,8 @@ StoreCatalogInheritance(Oid relationId, List *supers)
  */
 static void
 StoreCatalogInheritance1(Oid relationId, Oid parentOid,
-		 int16 seqNumber, Relation inhRelation)
+		 int16 seqNumber, Relation inhRelation,
+		 bool child_is_partition)
 {
 	TupleDesc	desc = RelationGetDescr(inhRelation);
 	Datum		values[Natts_pg_inherits];
@@ -2317,7 +2322,10 @@ StoreCatalogInheritance1(Oid relationId, Oid parentOid,
 	childobject.objectId = relationId;
 	childobject.objectSubId = 0;
 
-	recordDependencyOn(, , DEPENDENCY_NORMAL);
+	if (child_is_partition)
+		recordDependencyOn(, , DEPENDENCY_AUTO);
+	else
+		recordDependencyOn(, , DEPENDENCY_NORMAL);
 
 	/*
 	 * Post creation hook of this inheritance. Since object_access_hook
@@ -10744,7 +10752,9 @@ CreateInheritance(Relation child_rel, Relation parent_rel)
 	StoreCatalogInheritance1(RelationGetRelid(child_rel),
 			 RelationGetRelid(parent_rel),
 			 inhseqno + 1,
-			 catalogRelation);
+

Re: [HACKERS] dropping partitioned tables without CASCADE

2017-02-21 Thread Amit Langote

On 2017/02/21 20:17, Ashutosh Bapat wrote:
> On Tue, Feb 21, 2017 at 12:05 PM, Amit Langote wrote:
>> On 2017/02/20 21:49, Ashutosh Bapat wrote:
>>> Here are some comments
>>>
>>> For the sake of readability you may want reverse the if and else order.
>>> -recordDependencyOn(, , DEPENDENCY_NORMAL);
>>> +if (!child_is_partition)
>>> +recordDependencyOn(, , DEPENDENCY_NORMAL);
>>> +else
>>> +recordDependencyOn(, , DEPENDENCY_AUTO);
>>> like
>>> +if (child_is_partition)
>>> +recordDependencyOn(, , DEPENDENCY_AUTO);
>>> +else
>>> +recordDependencyOn(, , DEPENDENCY_NORMAL);
>>
>> Sure, done.
> 
> I still see
> -recordDependencyOn(, , DEPENDENCY_NORMAL);
> +if (!child_is_partition)
> +recordDependencyOn(, , DEPENDENCY_NORMAL);
> +else
> +recordDependencyOn(, , DEPENDENCY_AUTO);
> 
> Are you sure you have attached the right patch?

Oops, really fixed this time.

>>> --- cleanup: avoid using CASCADE
>>> -DROP TABLE list_parted, part_1;
>>> -DROP TABLE list_parted2, part_2, part_5, part_5_a;
>>> -DROP TABLE range_parted, part1, part2;
>>> +-- cleanup
>>> +DROP TABLE list_parted, list_parted2, range_parted;
>>> Testcases usually drop one table at a time, I guess, to reduce the 
>>> differences
>>> when we add or remove tables from testcases. All such blocks should probably
>>> follow same policy.
>>
>> Hmm, I see this in src/test/regress/sql/inherit.sql:141
>>
>> DROP TABLE firstparent, secondparent, jointchild, thirdparent, otherchild;
> 
> Hmm, I can spot some more such usages. Let's keep this for the
> committer to decide.

Sure.

>>> BTW, I noticed that although we are allowing foreign tables to be
>>> partitions, there are no tests in foreign_data.sql for testing it. If
>>> there would have been we would tests DROP TABLE on a partitioned table
>>> with foreign partitions as well. That file has testcases for testing
>>> foreign table inheritance, and should have tests for foreign table
>>> partitions.
>>
>> That makes sense.  Patch 0002 is for that (I'm afraid this should be
>> posted separately though).  I didn't add/repeat all the tests that were
>> added by the foreign table inheritance patch again for foreign partitions
>> (common inheritance rules apply to both cases), only added those for the
>> new partitioning commands and certain new rules.
> 
> Thanks. Yes, a separate thread would do. I will review it there. May
> be you want to add it to the commitfest too.

Posted in a new thread titled "foreign partition DDL regression tests".

Thanks,
Amit
>From 525da2525b69e5191abac68f074d1d36c78bef8c Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 16 Feb 2017 15:56:44 +0900
Subject: [PATCH] Allow dropping partitioned table without CASCADE

Currently, a normal dependency is created between a inheritance
parent and child when creating the child.  That means one must
specify CASCADE to drop the parent table if a child table exists.
When creating partitions as inheritance children, create auto
dependency instead, so that partitions are dropped automatically
when the parent is dropped i.e., without specifying CASCADE.
---
 src/backend/commands/tablecmds.c   | 26 ++
 src/test/regress/expected/alter_table.out  | 10 --
 src/test/regress/expected/create_table.out |  9 ++---
 src/test/regress/expected/inherit.out  | 22 ++
 src/test/regress/expected/insert.out   |  7 ++-
 src/test/regress/expected/update.out   |  7 +--
 src/test/regress/sql/alter_table.sql   | 10 --
 src/test/regress/sql/create_table.sql  |  9 ++---
 src/test/regress/sql/inherit.sql   |  4 ++--
 src/test/regress/sql/insert.sql|  7 ++-
 src/test/regress/sql/update.sql|  2 +-
 11 files changed, 40 insertions(+), 73 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3cea220421..31b50ad77f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -289,9 +289,11 @@ static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
 static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel);
 static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel);
-static void StoreCatalogInheritance(Oid relationId, List *supers);
+static void StoreCatalogInheritance(Oid relationId, List *supers,
+		bool child_is_partition);
 static void StoreCatalogInheritance1(Oid relationId, Oid parentOid,
-		 int16 seqNumber, Relation inhRelation);
+		 int16 seqNumber, Relation inhRelation,
+		 bool child_is_partition);
 static int	findAttrByName(const char *attributeName, List *schema);
 static void AlterIndexNamespaces(Relation classRel, Relation rel,
    Oid oldNspOid, Oid newNspOid, ObjectAddresses *objsMoved);
@@

[HACKERS] foreign partition DDL regression tests

2017-02-21 Thread Amit Langote

Ashutosh Bapat pointed out [0] that regression tests are missing for the
foreign partition DDL commands.  Attached patch takes care of that.

Thanks,
Amit

[0]
https://www.postgresql.org/message-id/CAFjFpRcrdzBRj0cZ%2BJAQmfSa2Tv8wSEcWAeYtDpV-YZnNna2sA%40mail.gmail.com
>From 236c357b94af848663ed3d0ace10dd22167b7d08 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 21 Feb 2017 15:06:04 +0900
Subject: [PATCH] Add regression tests foreign partition DDL

Commands like CREATE FOREIGN TABLE .. PARTITION OF, ATTACH PARTITION,
DETACH PARTITION foreign_table didn't get any tests so far.  Per
suggestion from Ashutosh Bapat.
---
 src/test/regress/expected/foreign_data.out | 195 +
 src/test/regress/sql/foreign_data.sql  |  71 +++
 2 files changed, 266 insertions(+)

diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 3a9fb8f558..a0f969f3e5 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -1751,6 +1751,201 @@ DETAIL:  user mapping for regress_test_role on server s5 depends on server s5
 HINT:  Use DROP ... CASCADE to drop the dependent objects too.
 DROP OWNED BY regress_test_role2 CASCADE;
 NOTICE:  drop cascades to user mapping for regress_test_role on server s5
+-- Foreign partition DDL stuff
+CREATE TABLE pt2 (
+	c1 integer NOT NULL,
+	c2 text,
+	c3 date
+) PARTITION BY LIST (c1);
+CREATE FOREIGN TABLE pt2_1 PARTITION OF pt2 FOR VALUES IN (1)
+  SERVER s0 OPTIONS (delimiter ',', quote '"', "be quoted" 'value');
+\d+ pt2
+Table "public.pt2"
+ Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
++-+---+--+-+--+--+-
+ c1 | integer |   | not null | | plain|  | 
+ c2 | text|   |  | | extended |  | 
+ c3 | date|   |  | | plain|  | 
+Partition key: LIST (c1)
+Partitions: pt2_1 FOR VALUES IN (1)
+
+\d+ pt2_1
+  Foreign table "public.pt2_1"
+ Column |  Type   | Collation | Nullable | Default | FDW Options | Storage  | Stats target | Description 
++-+---+--+-+-+--+--+-
+ c1 | integer |   | not null | | | plain|  | 
+ c2 | text|   |  | | | extended |  | 
+ c3 | date|   |  | | | plain|  | 
+Partition of: pt2 FOR VALUES IN (1)
+Server: s0
+FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
+
+-- partition cannot have additional columns
+DROP FOREIGN TABLE pt2_1;
+CREATE FOREIGN TABLE pt2_1 (
+	c1 integer NOT NULL,
+	c2 text,
+	c3 date,
+	c4 char
+) SERVER s0 OPTIONS (delimiter ',', quote '"', "be quoted" 'value');
+\d+ pt2_1
+ Foreign table "public.pt2_1"
+ Column | Type | Collation | Nullable | Default | FDW Options | Storage  | Stats target | Description 
++--+---+--+-+-+--+--+-
+ c1 | integer  |   | not null | | | plain|  | 
+ c2 | text |   |  | | | extended |  | 
+ c3 | date |   |  | | | plain|  | 
+ c4 | character(1) |   |  | | | extended |  | 
+Server: s0
+FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
+
+ALTER TABLE pt2 ATTACH PARTITION pt2_1 FOR VALUES IN (1);   -- ERROR
+ERROR:  table "pt2_1" contains column "c4" not found in parent "pt2"
+DETAIL:  New partition should contain only the columns present in parent.
+DROP FOREIGN TABLE pt2_1;
+\d+ pt2
+Table "public.pt2"
+ Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
++-+---+--+-+--+--+-
+ c1 | integer |   | not null | | plain|  | 
+ c2 | text|   |  | | extended |  | 
+ c3 | date|   |  | | plain|  | 
+Partition key: LIST (c1)
+
+CREATE FOREIGN TABLE pt2_1 (
+	c1 integer NOT NULL,
+	c2 text,
+	c3 date
+) SERVER s0 OPTIONS (delimiter ',', quote '"', "be quoted" 'value');
+\d+ pt2_1
+  Foreign table "public.pt2_1"
+ Column |  Type   | Collation | Nullable | Default | FDW Options | Storage  | Stats target | Description

Re: [HACKERS] [doc fix] Really trivial fix for BRIN documentation

2017-02-21 Thread Tsunakawa, Takayuki

From: Simon Riggs [mailto:si...@2ndquadrant.com]
> Pushed, but using "heap" rather than "table", for clarity. Thanks for the
> patch.

Thank you for responding so quickly.  I'm comfortable with "heap."  On the 
other hand, src/backend/access/brin/README uses "table" as follows.  Second, I 
thought users would feel more familiar with the general term "table."  Third, I 
supposed PostgreSQL might add support for other structures for tables than heap 
in the future, like SQL Server provides heap (non-clustered table) and 
clustered tables.

At index creation time, the whole table is scanned; for each page range the
summarizing values of each indexed column and nulls bitmap are collected and
stored in the index.

I should have written the reason I chose "table."  Anyway, I'm OK with heap.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bold itemized list entries

2017-02-21 Thread Robert Haas

On Tue, Feb 21, 2017 at 10:26 PM, Magnus Hagander  wrote:
> Did this one get fixed with the changes that went on a couple of days ago?

The lists-being-boldfaced thing seems to be fixed, at least it looks
like it to me, but the devel documentation still doesn't look right
compared to the 9.6 documentation.  For example, in
https://www.postgresql.org/docs/devel/static/ddl-partitioning.html the
section heading is "5.11 Partitioning" and the font color is grey,
whereas in https://www.postgresql.org/docs/current/static/ddl-partitioning.html
the heading is "5.10 Partitioning" and the font color is orange.  The
font size is also slightly larger.  The subheading colors, like
"5.11.1 Overview" are also slightly larger in current than in devel.

The formatting of the grey-brown boxes in the middle of the page has
changed too.  If you compare
https://www.postgresql.org/docs/current/static/geqo.html vs
https://www.postgresql.org/docs/devel/static/geqo.html you can see
that current has "Author" in black boldface, a colon, and then the
text, while devel has "Author" in grey, no colon but instead a blank
line, and then the text.  I think some font property of the text is
different, too; anyway, it seems clear that the new formatting does
not look as good as the old, and it looks like every "Note" and "Tip"
in our documentation is similarly affected.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SERIALIZABLE with parallel query

2017-02-21 Thread Robert Haas

On Tue, Feb 21, 2017 at 12:55 AM, Thomas Munro
 wrote:
> On Thu, Feb 16, 2017 at 6:19 PM, Thomas Munro
>  wrote:
>> Specifically, DeleteChildTargetLocks() assumes it can walk
>> MySerializableXact->predicateLocks and throw away locks that are
>> covered by a new lock (ie throw away tuple locks because a covering
>> page lock has been acquired) without let or hindrance until it needs
>> to modify the locks themselves.  That assumption doesn't hold up with
>> that last patch and will require a new kind of mutual exclusion.  I
>> wonder if the solution is to introduce an LWLock into each
>> SERIALIZABLEXACT object, so DeleteChildTargetLocks() can prevent
>> workers from stepping on each others' toes during lock cleanup.  An
>> alternative would be to start taking SerializablePredicateLockListLock
>> in exclusive rather than shared mode, but that seems unnecessarily
>> coarse.
>
> Here is a patch to do that, for discussion.  It adds an LWLock to each
> SERIALIZABLEXACT, and acquires it after SerializablePredicateListLock
> and before any predicate lock partition lock.  It doesn't bother with
> that if not in parallel mode, or in the cases where
> SerializablePredicateListLock is held exclusively.  This prevents
> parallel query workers and leader from stepping on each others' toes
> when manipulating the predicate list.
>
> The case in CheckTargetForConflictsIn is theoretical for now since we
> don't support writing in parallel query yet.  The case in
> CreatePredicateLock is reachable by running a simple parallel
> sequential scan.  The case in DeleteChildTargetLocks is for when we've
> acquired a new predicate lock that covers finer grained locks which
> can be dropped; that is reachable the same way again.  I don't think
> it's required in ReleaseOneSerializableXact since it was already
> called in several places with an sxact other than the caller's, and
> deals with finished transactions.

I don't think I know enough about the serializable code to review
this, or at least not quickly, but it seems very cool if it works.
Have you checked what effect it has on shared memory consumption?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-21 Thread Robert Haas

On Wed, Feb 22, 2017 at 4:06 AM, Peter Eisentraut
 wrote:
> On 2/10/17 15:12, Robert Haas wrote:
>> Right.  I can't see why you'd want to be able to separately control
>> those two things.  If you're not dumping, you don't want to load; if
>> you're not loading, you don't want to dump.
>
> What about the case where you want to prewarm a standby from the info
> from the primary (or another standby)?

I think it's OK to treat that as something of a corner case.  There's
nothing to keep you from doing that today: just use pg_buffercache to
dump a list of blocks on one server, and then pass those blocks to
pg_prewarm on another server.  The point of this patch, AIUI, is to
automate a particularly common case of that, which is to dump before
shutdown and load upon startup.  It doesn't preclude other things that
people want to do.

I suppose we could have an SQL-callable function that does an
immediate dump (without starting a worker).  That wouldn't hurt
anything, and might be useful in a case like the one you mention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Json support for array pseudotypes

2017-02-21 Thread Andrew Dunstan



On 02/21/2017 06:01 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> The attached tiny patch lets the to_json(b) routines convert objects
>> with array pseudotypes to json{b}. The main impetus for this came from
>> my trying to convert the annyarray columns in pg_stats to json and
>> finding that they ended up as text instead of json arrays. This way
>> to_json on these columns does what is expected.
> Looks reasonable offhand, but maybe you should add some test cases?
>
>   


Yeah, I will. Not sure how to add a test for the RECORDARRAYOID case. We
don't seem to have any of those hanging around.

cheers

andrew

-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Json support for array pseudotypes

2017-02-21 Thread Tom Lane

Andrew Dunstan  writes:
> The attached tiny patch lets the to_json(b) routines convert objects
> with array pseudotypes to json{b}. The main impetus for this came from
> my trying to convert the annyarray columns in pg_stats to json and
> finding that they ended up as text instead of json arrays. This way
> to_json on these columns does what is expected.

Looks reasonable offhand, but maybe you should add some test cases?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] safer node casting

2017-02-21 Thread Tom Lane

Jeff Janes  writes:
> With commit 38d103763d14baddf3cbfe4b00b501059fc9447f, I'm now getting a
> compiler warning:
> indxpath.c: In function 'generate_bitmap_or_paths':
> indxpath.c:1312: warning: unused variable 'rinfo'

Me too, without asserts.  Fixed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-21 Thread James Cloos

> "TL" == Tom Lane  writes:

TL> The question to be asked is whether there is still anybody out there
TL> using float timestamps.

Gentoo's ebuild includes:

   $(use_enable !pg_legacytimestamp integer-datetimes) \

meaning that by default --enable-integer-datetimes is passed to configure,
but if the pg_legacytimestamp use flag is set, then --disable-integer-datetimes
is passed instead.

They document it as:


Use double precision floating-point numbers instead of 64-bit
integers for timestamp storage.


Ie, w/o any kind of deprecation notice.

I don't know how many (how few?) add pg_legacytimestamp to USE when
merging postgresql.  But it is still available as of 9.6 and also
with their live build of git://git.postgresql.org/git/postgresql.git.

-JimC
-- 
James Cloos  OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] safer node casting

2017-02-21 Thread Jeff Janes

On Tue, Feb 21, 2017 at 9:00 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 1/26/17 16:15, Andres Freund wrote:
> > On 2017-01-25 19:21:40 -0500, Tom Lane wrote:
> >> Andres Freund  writes:
> >>> On 2016-12-31 12:08:22 -0500, Peter Eisentraut wrote:
>  RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc));
> >>
> >>> Are you planning to add this / update this patch? Because I really
> would
> >>> have liked this a number of times already...  I can update it according
> >>> to my suggestions (to avoid multiple eval scenarios) if helpful
> >>
> >> Yeah, I'd like that in sooner rather than later, too.  But we do need
> >> it to be foolproof - no multiple evals.  The first draft had
> >> inadequate-parenthesization hazards,
> >
> > How about something like the attached? The first patch just contains
> > castNode(), the second one a rebased version of Peter's changes (with
> > some long lines broken up).
>
> Thanks for finishing that up.  I have committed more uses that I had
> lying around partially done.  Looks nice now.
>
>
With commit 38d103763d14baddf3cbfe4b00b501059fc9447f, I'm now getting a
compiler warning:

indxpath.c: In function 'generate_bitmap_or_paths':
indxpath.c:1312: warning: unused variable 'rinfo'


I have: gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC)

No arguments to ./configure are needed.

Cheers,

Jeff

[HACKERS] Json support for array pseudotypes

2017-02-21 Thread Andrew Dunstan


The attached tiny patch lets the to_json(b) routines convert objects
with array pseudotypes to json{b}. The main impetus for this came from
my trying to convert the annyarray columns in pg_stats to json and
finding that they ended up as text instead of json arrays. This way
to_json on these columns does what is expected.


cheers


andrew

-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 628e9de..0ed6a10 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -1397,9 +1397,10 @@ json_categorize_type(Oid typoid,
 
 		default:
 			/* Check for arrays and composites */
-			if (OidIsValid(get_element_type(typoid)))
+			if (OidIsValid(get_element_type(typoid)) || typoid == ANYARRAYOID
+|| typoid == RECORDARRAYOID)
 *tcategory = JSONTYPE_ARRAY;
-			else if (type_is_rowtype(typoid))
+			else if (type_is_rowtype(typoid)) /* includes RECORDOID */
 *tcategory = JSONTYPE_COMPOSITE;
 			else
 			{
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index b9bf18f..5b6178b 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -644,9 +644,10 @@ jsonb_categorize_type(Oid typoid,
 
 		default:
 			/* Check for arrays and composites */
-			if (OidIsValid(get_element_type(typoid)))
+			if (OidIsValid(get_element_type(typoid)) || typoid == ANYARRAYOID
+|| typoid == RECORDARRAYOID)
 *tcategory = JSONBTYPE_ARRAY;
-			else if (type_is_rowtype(typoid))
+			else if (type_is_rowtype(typoid)) /* includes RECORDOID */
 *tcategory = JSONBTYPE_COMPOSITE;
 			else
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-21 Thread Peter Eisentraut

On 2/10/17 15:12, Robert Haas wrote:
> Right.  I can't see why you'd want to be able to separately control
> those two things.  If you're not dumping, you don't want to load; if
> you're not loading, you don't want to dump.

What about the case where you want to prewarm a standby from the info
from the primary (or another standby)?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Change in "policy" on dump ordering?

2017-02-21 Thread Peter Eisentraut

On 2/21/17 14:58, Jim Nasby wrote:
> AFAICT in older versions only object types that absolutely had to wait 
> for DO_POST_DATA_BOUNDARY would do so. More recently though, objects are 
> being added after that (presumably because it's easier than renumbering 
> everything in dbObjectTypePriority).

Is there any specific assignment that you have concerns about?

> Is this change a good or bad idea? Should there be an official guide for 
> where new things go?

The comment above dbObjectTypePriority explains it, doesn't it?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mat views stats

2017-02-21 Thread Peter Eisentraut

On 2/20/17 10:06, Jim Mlodgenski wrote:
> I've come across a number of times where the statistics on materialized
> views become stale producing bad plans. It turns out that autovacuum
> only touches a materialized view when it is first created and ignores it
> on a refresh. When you have a materialized view like yesterdays_sales
> the data in the materialized view turns over every day. 

That sounds like a bug.

> Attached is a patch to trigger autovacuum based on a matview refresh
> along with a system view pg_stat_all_matviews to show information more
> meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.

I think these should be two separate patches.  We might want to
backpatch the first one.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "may be unused" warnings for gcc

2017-02-21 Thread Peter Eisentraut

On 2/20/17 09:41, Andres Freund wrote:
> When building with a new-ish gcc (6.3.0 right now, but I've seen this
> for a while) with optimization I get a number of warnings:

These all look like related to inlining/-O3.

I have attempted to fix these in the past, but I have found that -O3
doesn't get any performance improvement, so I haven't bothered lately.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-02-21 Thread Alvaro Herrera

I have pushed the TAP test file, which is already passing, at least for
me.  Let's see what the buildfarm says.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring replay lag

2017-02-21 Thread Thomas Munro

On Tue, Feb 21, 2017 at 6:21 PM, Simon Riggs  wrote:
> And happier again, leading me to move to the next stage of review,
> focusing on the behaviour emerging from the design.
>
> So my current understanding is that this doesn't rely upon LSN
> arithmetic to measure lag, which is good. That means logical
> replication should "just work" and future mechanisms to filter
> physical WAL will also just work. This is important, so please comment
> if you see that isn't the case.

Yes, my understanding (based on
https://www.postgresql.org/message-id/f453caad-0396-1bdd-c5c1-5094371f4...@2ndquadrant.com
) is that this should in principal work for logical replication, it
just might show the same number in 2 or 3 of the lag columns because
of the way it reports LSNs.

However, I think a call like LagTrackerWrite(SendRqstPtr,
GetCurrentTimestamp()) needs to go into XLogSendLogical, to mirror
what happens in XLogSendPhysical.  I'm not sure about that.

> I notice that LagTrackerRead() doesn't do anything to interpolate the
> time given, so at present any attempt to prune the lag sample buffer
> would result in falsely increasing the lag times reported. Which is
> probably the reason why you say "There may be better adaptive
> compaction algorithms."  We need to look at this some more, an initial
> guess might be that we need to insert fewer samples as the buffer
> fills since the LagTrackerRead() algorithm is O(N) on number of
> samples and thus increasing the buffer itself isn't a great plan.

Interesting idea about interpolation.  The lack of it didn't "result
in falsely increasing the lag times reported", it resulted in reported
lag staying static for a period of time even though we were falling
further behind.  I finished up looking into fixing this with
interpolation.  See below.

About adaptive sampling:  This patch does in fact "insert fewer
samples once the buffer fills".  Normally, the sender records a sample
every time it sends a message.  Now imagine that the standby's
recovery is very slow and the buffer fills up.  The sender starts
repeatedly overwriting the same buffer element because the write head
has crashed into the slow moving read head.  Every time the standby
makes some progress and reports it, the read head finally advances
releasing some space, so the sender is able to advance to the next
element and record a new sample (and probably overwrite that one many
times).  So effectively we reduce our sampling rate for all new
samples.  We finish up with a sampling rate that is determined by the
rate of standby progress.  I expect you can make something a bit
smoother and more sophisticated that starts lowering the sampling rate
sooner and perhaps thins out the pre-existing samples when the buffer
fills up, and I'm open to ideas, but my intuition is that it would be
complicated and no one would even notice the difference.

LagTrackerRead() is O(N) not in the total number of samples, but in
the number of samples whose LSN is <= the LSN in the reply message
we're processing.  Given that the sender record samples as it sends
messages, and the standby sends replies on write/flush of those
messages, I think the N in question here will typically be a very
small number except in the case below called 'overwhelm.png' when the
WAL sender would be otherwise completely idle.

> It would be very nice to be able to say something like that the +/-
> confidence limits of the lag are no more than 50% of the lag time, so
> we have some idea of how accurate the value is at any point. We need
> to document the accuracy of the result, otherwise we'll be answering
> questions on that for some time. So lets think about that now.

The only source of inaccuracy I can think of right now is that if
XLogSendPhysical doesn't run very often, then we won't notice the
local flushed LSN moving until a bit later, and to the extent that
we're late noticing that, we could underestimate the lag numbers.  But
actually it runs very frequently and is woken whenever WAL is flushed.
This gap could be closed by recording the system time in shared memory
whenever local WAL is flushed; as described in a large comment in the
patch, I figured this wasn't worth that.

> Given LagTrackerRead() is reading the 3 positions in order, it seems
> sensible to start reading the LAG_TRACKER_FLUSH_HEAD from the place
> you finished reading LAG_TRACKER_WRITE_HEAD etc.. Otherwise we end up
> doing way too much work with larger buffers.

Hmm.  I was under the impression that we'd nearly always be eating a
very small number of samples with each reply message, since standbys
usually report progress frequently.  But yeah, if the buffer is full
AND the standby is sending very infrequent replies because the primary
is idle, then perhaps we could try to figure out how to skip ahead
faster than one at a time.

> Which makes me think about the read more. The present design
> calculates everything on receipt of standby messages. I think we
> should

Re: [HACKERS] pg_dump does not refresh matviews from extensions

2017-02-21 Thread Stephen Frost

Jim,

* Jim Nasby (jim.na...@bluetreble.com) wrote:
> I think $SUBJECT is a bug. While it would be unusual for an
> extension to have a matview, it's still allowed, and as it stands
> right now that view won't be refreshed at the end of a restore,
> unlike other matviews.
> 
> I started looking into a patch for this, but I'm not sure of the
> best way to handle it. One possibility is to glom this in with the
> code that handles extension config tables, but that doesn't feel
> right since matviews aren't really config.
> 
> Would the best option be to change selectDumpableTable(), but I
> suspect that'd have to use the same "dumpobj" logic that
> processExtensionTables() uses.

As I noted elsewhere, I'd kind of like to see processExtensionTables()
go away in favor of doing more in selectDumpableTable(), since we can
now track the fact that we should restore just the data for a given
table through the dump bitmap.

As for $SUBJECT, I feel like it really depends, doesn't it?  If the
extension creates the matview w/ no data in it, and doesn't mark it as a
config table, should we really refresh it?  On the other hand, if the
extension creates the matview and either refreshes it, or something
else refreshes it later, then perhaps we should do so too, to get us
back to the same state.

Of course, we also need to consider pg_upgrade and make sure that,
whatever we do, what happens with pg_upgrade makes sense.

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] Change in "policy" on dump ordering?

2017-02-21 Thread Jim Nasby

AFAICT in older versions only object types that absolutely had to wait 
for DO_POST_DATA_BOUNDARY would do so. More recently though, objects are 
being added after that (presumably because it's easier than renumbering 
everything in dbObjectTypePriority).


Is this change a good or bad idea? Should there be an official guide for 
where new things go?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump does not refresh matviews from extensions

2017-02-21 Thread Jim Nasby

I think $SUBJECT is a bug. While it would be unusual for an extension to 
have a matview, it's still allowed, and as it stands right now that view 
won't be refreshed at the end of a restore, unlike other matviews.


I started looking into a patch for this, but I'm not sure of the best 
way to handle it. One possibility is to glom this in with the code that 
handles extension config tables, but that doesn't feel right since 
matviews aren't really config.


Would the best option be to change selectDumpableTable(), but I suspect 
that'd have to use the same "dumpobj" logic that 
processExtensionTables() uses.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-21 Thread Pavel Stehule

2017-02-14 11:51 GMT+01:00 Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp>:

> Thank you for the comment.
>
> At Mon, 6 Feb 2017 17:10:43 +0100, Pavel Stehule 
> wrote in  mail.gmail.com>
> > > 0001-Refactoring-tab-complete-to-make-psql_completion-cod.patch
> > >   - Just a refactoring of psql_completion
> > >
> > > 0002-Make-keywords-case-follow-to-input.patch
> > >   - The letter case of additional suggestions for
> > > COMPLETION_WITH_XX follows input.
> > >
> > > 0003-Introduce-word-shift-and-removal-feature-to-psql-com.patch
> > >   - A feature to ignore preceding words. And a feature to remove
> > > intermediate words.
> > >
> > > 0004-Add-README-for-tab-completion.patch
> > >   - README
> > >
> > > 0005-Make-SET-RESET-SHOW-varialble-follow-input-letter-ca.patch
> > > 0006-Allow-complete-schema-elements-in-more-natural-way.patch
> > > 0007-Allow-CREATE-RULE-to-use-command-completion-recursiv.patch
> > > 0008-Allow-completing-the-body-of-EXPLAIN.patch
> > > 0009-Simpilfy-ALTER-TABLE-ALTER-COLUMN-completion.patch
> > > 0010-Simplify-completion-for-CLUSTER-VERBOSE.patch
> > > 0011-Simplify-completion-for-COPY.patch
> > > 0012-Simplify-completion-for-CREATE-INDEX.patch
> > > 0013-Simplify-completion-for-CREATE-SEQUENCE.patch
> > > 0014-Simplify-completion-for-DROP-INDEX.patch
> > > 0015-Add-CURRENT_USER-to-some-completions-of-role.patch
> > > 0016-Refactor-completion-for-ALTER-DEFAULT-PRIVILEGES.patch
> > > 0017-Add-suggestions-of-IF-NOT-EXISTS.patch
> > >   - A kind of sample refctoring (or augmenting) suggestion code
> > > based on the new infrastructure.
> > >
> > > 0018-Debug-output-of-psql-completion.patch
> > >   - Debug logging for psql_completion (described above)
> > >
> > > 0019-Add-suggestion-of-OR-REPLACE.patch
> > >   - Suggestion of CREATE OR REPLACE.
> > >
> > >
> > > # I hear the footsteps of another conflict..
> > >
> > The patch 0018 was not be applied.
>
> The fear came true. fd6cd69 conflicts with it but on a
> comment. The attached patch set applies on top of the current
> master head (ae0e550).
>

Now first patch is broken :(

It is pretty sensitive to any changes. Isn't possible to commit first four
patches first and separately maybe out of commitfest window?



>
> > Few other notes from testing - probably these notes should not be related
> > to your patch set
> >
> > 1. When we have set of keywords, then the upper or lower chars should to
> > follow previous keyword. Is it possible? It should to have impact only on
> > keywords.
>
> It sounds reasonable, more flexible than "upper"/"lower" of
> COMP_KEYWORD_CASE. The additional 20th(!) patch does that. It
> adds a new value 'follow-first' to COMP_KEYWORD_CASE. All
> keywords in a command line will be in the case of the first
> letter of the first word. ("CREATE" in the following case. I
> think it is enogh for the purpose.)
>
> postgres=# \set COMP_KEYWORD_CASE follow-first
> postgres=# CREATE in
>=># CREATE INDEX hoge 
>=># CREATE INDEX hoge ON emp
>=># CREATE INDEX hoge ON employee ..
> postgres=# create IN
>=># create index
>
> Typing tab at the first in a command line shows all available
> keywords in upper case.
>

It is great - from my perspective the best step in last years in this area.


>
> > 2. the list of possible functions after EXECUTE PROCEDURE in CREATE
> TRIGGER
> > statement should be reduced to trigger returns function only.
>
> Actually Query_for_list_of_trigger_functions returns too many
> candidates. The suggested restriction reduces them to a
> reasonable number. The 21th patch does that.
>
> > CREATE OR REPLACE FUNCTIONS works great, thank you!
>
> Thanks. It was easier than expected.
>
> As the result, 21 paches are attached to this message. 1 - 19th
> are described above and others are described below.
>
> 0020-New-COMP_KEYWORD_CASE-mode-follow-first.patch
>
>   - Add new COMP_KEYWORD_CASE mode "follow-first". The completion
> works with the case of the first word. (This doesn't rely on
> this patchset but works in more cases with 0002)
>
> 0021-Suggest-only-trigger-functions-for-CREAET-TRIGGER.EX.patch
>   - Restrict suggestion for the syntax to ones acutually usable
> there. (This relies on none of this patchset, though..)
>
> regards,
>

Thank you very much for this your work.

Regards

Pavel


>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

Re: [HACKERS] Resolved typo in a comment

2017-02-21 Thread Fujii Masao

On Mon, Feb 20, 2017 at 7:34 AM, neha khatri  wrote:
> Hi,
>
> Attached is a patch that fixes a comment typo in varlena.c

Thanks! Pushed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SUBSCRIPTION command hangs up, segfault occurs in the server process and to cancel the execution.

2017-02-21 Thread Fujii Masao

On Wed, Feb 22, 2017 at 1:54 AM, Masahiko Sawada  wrote:
> On Tue, Feb 21, 2017 at 7:11 AM, nuko yokohama  
> wrote:
>> Hi.
>> While verifying the logical replication of PostgreSQL 10-devel, I found the
>> following problem.

Thanks for the problem report! This problem was also reported before.
https://www.postgresql.org/message-id/a1e9cb90-1fac-4cad-8dba-9aa62a6e9...@postgrespro.ru

>> * When you run the SUBSCRIPTION command against a table in the same
>> PostgreSQL server, hang up.
>> * Canceling the hung SUBSCRIPTION command with CTRL + C causes a server
>> process occurs Segfault, and the PostgreSQL server to restart.
>>
>> 
>> [Steps to Reproduce]
>>
>> $ createdb db1 -U postgres
>> $ createdb db2 -U postgres
>> $ psql -U postgres db1 -c "CREATE TABLE test (id int primary key, data
>> text)"
>> $ psql -U postgres db2 -c "CREATE TABLE test (id int primary key, data
>> text)"
>> $ psql -U postgres db1 -c "CREATE PUBLICATION pub_db1_test FOR TABLE test"
>> $ psql -U postgres db2 -c "CREATE SUBSCRIPTION sub_db2_test CONNECTION
>> 'dbname=db1 port=5432 user=postgres' PUBLICATION pub_db1_test"
>>
>> The SUBSCRIPTION command does not end, it hangs up.
>> At this time, the following logs are output in the server log.
>>
>> 2017-02-21 06:58:05.082 JST [22060] LOG:  logical decoding found initial
>> starting point at 0/1C06660
>> 2017-02-21 06:58:05.082 JST [22060] DETAIL:  1 transaction needs to finish.
>>
>> Suspending psql (running SUBSCRIPTION) with CTRL + C causes a Segfault in
>> the server process.
>> At this time, the following message is output to the server log.
>>
>> 2017-02-21 07:01:00.246 JST [22059] ERROR:  canceling statement due to user
>> request
>> 2017-02-21 07:01:00.246 JST [22059] STATEMENT:  CREATE SUBSCRIPTION
>> sub_db2_test CONNECTION 'dbname=db1 port=5432 user=postgres' PUBLICATION
>> pub_db1_test
>> 2017-02-21 07:01:01.006 JST [21445] LOG:  server process (PID 22060) was
>> terminated by signal 11: Segmentation fault
>> 2017-02-21 07:01:01.007 JST [21445] LOG:  terminating any other active
>> server processes
>>
>> 
>> [Environment]
>>
>> postgres=# SELECT version();
>>   version
>> 
>>  PostgreSQL 10devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2
>> 20140120 (Red Hat 4.8.2-16), 64-bit
>> (1 row)
>>
>> postgres=# SHOW wal_level;
>>  wal_level
>> ---
>>  logical
>> (1 row)
>>
>> postgres=# SHOW max_wal_senders;;
>>  max_wal_senders
>> -
>>  10
>> (1 row)
>>
>> postgres=# SHOW max_replication_slots;
>>  max_replication_slots
>> ---
>>  10
>> (1 row)
>>
>> postgres=# TABLE pg_hba_file_rules ;
>>  line_number | type  |   database| user_name  |  address  |
>> netmask | auth_method | options | error
>> -+---+---++---+-+-+-+---
>>2 | local | {all} | {all}  |   |
>> | trust   | |
>>4 | host  | {all} | {all}  | 127.0.0.1 |
>> 255.255.255.255 | trust   | |
>>6 | host  | {all} | {all}  | ::1   |
>> ::::::: | trust   | |
>>9 | local | {replication} | {postgres} |   |
>> | trust   | |
>> (4 rows)
>>
>>
>
> As log message says, the reason why CREATE SUBSCRIPTION hangs up is
> that it's waiting for another transaction finish.
>
> Regarding SEGV, it occurs by wal sender process creating replication
> slot. When SEV occurs the backtrace is as follows.
>
> (gdb) bt
> #0  0x006e3d47 in resetStringInfo (str=0xea1c60) at stringinfo.c:96
> #1  0x007e4738 in ProcessRepliesIfAny () at walsender.c:1347
> #2  0x007e42c5 in WalSndWaitForWal (loc=23400264) at walsender.c:1142
> #3  0x007e38f6 in logical_read_xlog_page (state=0x2732008,
> targetPagePtr=23396352, reqLen=3912, targetRecPtr=23400240,
> cur_page=0x2733d58 "\225\320\005", pageTLI=0x27328b4) at
> walsender.c:717
> #4  0x005458c8 in ReadPageInternal (state=0x2732008,
> pageptr=23396352, reqLen=3912) at xlogreader.c:556
> #5  0x00545068 in XLogReadRecord (state=0x2732008,
> RecPtr=23400240, errormsg=0x7fff7cf34cf8) at xlogreader.c:255
> #6  0x007d09db in DecodingContextFindStartpoint
> (ctx=0x2731d48) at logical.c:432
> #7  0x007e3a87 in CreateReplicationSlot (cmd=0x26edfa0) at
> walsender.c:796
> #8  0x007e45ae in exec_replication_command
> (cmd_string=0x268b5f8 "CREATE_REPLICATION_SLOT \"sub_db2_test\"
> LOGICAL pgoutput") at walsender.c:1272
> #9  0x00846783 in PostgresMain (argc=1, argv=0x2697f38,
> dbname=0x2697e68 "db1", username=0x2697e40 "masahiko") at
>

Re: [HACKERS] Walsender crash

2017-02-21 Thread Fujii Masao

On Mon, Feb 13, 2017 at 8:55 PM, Stas Kelvich  wrote:
> Hello.
>
> While playing with logical publications/subscriptions I’ve noted that 
> walsender crashes
> when i’m trying to CREATE SUBSCRIPTION to a current server.
>
> So having one running postgres instance on a standard port:
>
> stas=# CREATE SUBSCRIPTION mysub CONNECTION 'dbname=stas host=127.0.0.1' 
> PUBLICATION mypub;
> ^CCancel request sent
> ERROR:  canceling statement due to user request
> stas=# CREATE SUBSCRIPTION mysub CONNECTION 'dbname=stas host=127.0.0.1' 
> PUBLICATION mypub;
> WARNING:  terminating connection because of crash of another server process
>
> Crash happens due to uninitialised StringBuffer reply_message in walsender.c:
>
> (lldb) bt
> * thread #1: tid = 0x13e3f, 0x00010e74ebaf 
> postgres`resetStringInfo(str=0x00010ecd12a0) + 15 at stringinfo.c:96, 
> queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, 
> address=0x0)
>   * frame #0: 0x00010e74ebaf 
> postgres`resetStringInfo(str=0x00010ecd12a0) + 15 at stringinfo.c:96
> frame #1: 0x00010e8778c9 postgres`ProcessRepliesIfAny + 169 at 
> walsender.c:1347
> frame #2: 0x00010e8770fd postgres`WalSndWaitForWal(loc=35929928) + 
> 221 at walsender.c:1142
> frame #3: 0x00010e876cf2 
> postgres`logical_read_xlog_page(state=0x7fbcac0446f8, 
> targetPagePtr=35921920, reqLen=8008, targetRecPtr=35929904, cur_page="\x95?", 
> pageTLI=0x7fbcac044fa4) + 50 at walsender.c:717
> frame #4: 0x00010e571e5e 
> postgres`ReadPageInternal(state=0x7fbcac0446f8, pageptr=35921920, 
> reqLen=8008) + 542 at xlogreader.c:556
> frame #5: 0x00010e571336 
> postgres`XLogReadRecord(state=0x7fbcac0446f8, RecPtr=35929904, 
> errormsg=0x7fff5178f1e0) + 326 at xlogreader.c:255
> frame #6: 0x00010e85ea9b 
> postgres`DecodingContextFindStartpoint(ctx=0x7fbcac044438) + 139 at 
> logical.c:432
> frame #7: 0x00010e874dfd 
> postgres`CreateReplicationSlot(cmd=0x7fbcad8004d0) + 333 at 
> walsender.c:796
> frame #8: 0x00010e8747a4 
> postgres`exec_replication_command(cmd_string="CREATE_REPLICATION_SLOT 
> \"mysub\" LOGICAL pgoutput") + 532 at walsender.c:1272
> frame #9: 0x00010e8e6adc postgres`PostgresMain(argc=1, 
> argv=0x7fbcad005f50, dbname="stas", username="stas") + 2332 at 
> postgres.c:4064
> frame #10: 0x00010e839abe 
> postgres`BackendRun(port=0x7fbcabc033a0) + 654 at postmaster.c:4310
> frame #11: 0x00010e838eb3 
> postgres`BackendStartup(port=0x7fbcabc033a0) + 483 at postmaster.c:3982
> frame #12: 0x00010e837ea5 postgres`ServerLoop + 597 at 
> postmaster.c:1722
> frame #13: 0x00010e8356b5 postgres`PostmasterMain(argc=3, 
> argv=0x7fbcabc02b90) + 5397 at postmaster.c:1330
> frame #14: 0x00010e76371f postgres`main(argc=3, 
> argv=0x7fbcabc02b90) + 751 at main.c:228
> frame #15: 0x7fffa951c255 libdyld.dylib`start + 1
> frame #16: 0x7fffa951c255 libdyld.dylib`start + 1
>
> Patch with lacking initStringInfo() attached.

Thanks for the report and patch!
I pushed the modified version of the patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_monitor role

2017-02-21 Thread Masahiko Sawada

On Mon, Feb 20, 2017 at 8:48 PM, Dave Page  wrote:
> Further to the patch I just submitted
> (https://www.postgresql.org/message-id/CA%2BOCxow-X%3DD2fWdKy%2BHP%2BvQ1LtrgbsYQ%3DCshzZBqyFT5jOYrFw%40mail.gmail.com)
> I'd like to propose the addition of a default role, pg_monitor.
>
> The intent is to make it easy for users to setup a role for fully
> monitoring their servers, without requiring superuser level privileges
> which is a problem for many users working within strict security
> policies.
>
> At present, functions or system config info that divulge any
> installation path related info typically require superuser privileges.
> This makes monitoring for unexpected changes in configuration or
> filesystem level monitoring (e.g. checking for large numbers of WAL
> files or log file info) impossible for non-privileged roles.
>
> A similar example is the restriction on the pg_stat_activity.query
> column, which prevents non-superusers seeing any query strings other
> than their own.
>
> Using ACLs is a problem for a number of reasons:
>
> - Users often don't like their database schemas to be modified
> (cluttered with GRANTs).
> - ACL modifications would potentially have to be made in every
> database in a cluster.
> - Using a pre-defined role minimises the setup that different tools
> would have to require.
> - Not all functionality has an ACL (e.g. SHOW)
>
> Other DBMSs solve this problem in a similar way.
>
> Initially I would propose that permission be granted to the role to:
>
> - Execute pg_ls_logdir() and pg_ls_waldir()
> - Read pg_stat_activity, including the query column for all queries.
> - Allow "SELECT pg_tablespace_size('pg_global')"
> - Read all GUCs
>

Thank you for working on this.

What about granting to the role to read other statistic views such as
pg_stat_replication and pg_stat_wal_receiver? Since these informations
can only be seen by superuser the for example monitoring and
clustering tool seems to have the same concern.
And what about the diagnostic tools such as pageinspect and pgstattuple?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK

2017-02-21 Thread Masahiko Sawada

On Wed, Feb 22, 2017 at 1:52 AM, Fujii Masao  wrote:
> On Tue, Feb 21, 2017 at 7:52 PM, Masahiko Sawada  
> wrote:
>> On Tue, Feb 21, 2017 at 3:42 AM, Fujii Masao  wrote:
>>> On Thu, Feb 16, 2017 at 11:40 AM, Masahiko Sawada  
>>> wrote:
 On Thu, Feb 16, 2017 at 7:52 AM, Petr Jelinek
  wrote:
> On 15/02/17 06:43, Masahiko Sawada wrote:
>> On Tue, Feb 14, 2017 at 1:13 AM, Fujii Masao  
>> wrote:
>>> On Mon, Feb 13, 2017 at 4:05 PM, Masahiko Sawada 
>>>  wrote:
 On Sat, Feb 11, 2017 at 8:18 PM, Petr Jelinek
  wrote:
> On 10/02/17 19:55, Masahiko Sawada wrote:
>> On Thu, Feb 9, 2017 at 12:44 AM, Petr Jelinek
>>  wrote:
>>> On 08/02/17 07:40, Masahiko Sawada wrote:
 On Wed, Feb 8, 2017 at 9:01 AM, Michael Paquier
  wrote:
> On Wed, Feb 8, 2017 at 1:30 AM, Fujii Masao 
>  wrote:
>> On Wed, Feb 8, 2017 at 12:26 AM, Petr Jelinek
>>  wrote:
>>> For example what happens if apply crashes during the DROP
>>> SUBSCRIPTION/COMMIT and is not started because the delete from 
>>> catalog
>>> is now visible so the subscription is no longer there?
>>
>> Another idea is to treat DROP SUBSCRIPTION in the same way as 
>> VACUUM, i.e.,
>> make it emit an error if it's executed within user's transaction 
>> block.
>
> It seems to me that this is exactly Petr's point: using
> PreventTransactionChain() to prevent things to happen.

 Agreed. It's better to prevent to be executed inside user 
 transaction
 block. And I understood there is too many failure scenarios we 
 need to
 handle.

>> Also DROP SUBSCRIPTION should call CommitTransactionCommand() 
>> just
>> after removing the entry from pg_subscription, then connect to 
>> the publisher
>> and remove the replication slot.
>
> For consistency that may be important.

 Agreed.

 Attached patch, please give me feedback.

>>>
>>> This looks good (and similar to what initial patch had btw). Works 
>>> fine
>>> for me as well.
>>>
>>> Remaining issue is, what to do about CREATE SUBSCRIPTION then, 
>>> there are
>>> similar failure scenarios there, should we prevent it from running
>>> inside transaction as well?
>>>
>>
>> Hmm,  after thought I suspect current discussing approach. For
>> example, please image the case where CRAETE SUBSCRIPTION creates
>> subscription successfully but fails to create replication slot for
>> whatever reason, and then DROP SUBSCRIPTION drops the subscription 
>> but
>> dropping replication slot is failed. In such case, CREAET 
>> SUBSCRIPTION
>> and DROP SUBSCRIPTION return ERROR but the subscription is created 
>> and
>> dropped successfully. I think that this behaviour confuse the user.
>>
>> I think we should just prevent calling DROP SUBSCRIPTION in user's
>> transaction block. Or I guess that it could be better to separate the
>> starting/stopping logical replication from subscription management.
>>
>
> We need to stop the replication worker(s) in order to be able to drop
> the slot. There is no such issue with startup of the worker as that 
> one
> is launched by launcher after the transaction has committed.
>
> IMO best option is to just don't allow DROP/CREATE SUBSCRIPTION 
> inside a
> transaction block and don't do any commits inside of those (so that
> there are no rollbacks, which solves your initial issue I believe). 
> That
> way failure to create/drop slot will result in subscription not being
> created/dropped which is what we want.
>>>
>>> On second thought, +1.
>>>
 I basically agree this option, but why do we need to change CREATE
 SUBSCRIPTION as well?
>>>
>>> Because the window between the creation of replication slot and the 
>>> transaction
>>> commit of CREATE SUBSCRIPTION should be short. Otherwise, if any error 
>>> happens
>>> during that window, the replication slot unexpectedly remains while 
>>> there is no
>>> corresponding subscription. Of

Re: [HACKERS] drop support for Python 2.3

2017-02-21 Thread Peter Eisentraut

On 2/21/17 10:18, Tom Lane wrote:
> Based on some not-fun I had along the way to that, I think it would be
> a good idea to do the Python version check a bit earlier than you have
> here.  The shlib search in PGAC_CHECK_PYTHON_EMBED_SETUP is pretty fragile
> and version-dependent, which means that a person could waste a lot of time
> trying to get past the "could not find shared library for Python" error
> before being told that their Python is too old.  I propose to move the
> minimum-version check into _PGAC_CHECK_PYTHON_DIRS, say right after the
> [Python configuration directory] stanza.  Unless there was some specific
> reason for not wanting it to happen in python.m4?

It's fine to move it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] safer node casting

2017-02-21 Thread Peter Eisentraut

On 1/26/17 16:15, Andres Freund wrote:
> On 2017-01-25 19:21:40 -0500, Tom Lane wrote:
>> Andres Freund  writes:
>>> On 2016-12-31 12:08:22 -0500, Peter Eisentraut wrote:
 RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc));
>>
>>> Are you planning to add this / update this patch? Because I really would
>>> have liked this a number of times already...  I can update it according
>>> to my suggestions (to avoid multiple eval scenarios) if helpful
>>
>> Yeah, I'd like that in sooner rather than later, too.  But we do need
>> it to be foolproof - no multiple evals.  The first draft had
>> inadequate-parenthesization hazards,
> 
> How about something like the attached? The first patch just contains
> castNode(), the second one a rebased version of Peter's changes (with
> some long lines broken up).

Thanks for finishing that up.  I have committed more uses that I had
lying around partially done.  Looks nice now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK

2017-02-21 Thread Fujii Masao

On Tue, Feb 21, 2017 at 7:52 PM, Masahiko Sawada  wrote:
> On Tue, Feb 21, 2017 at 3:42 AM, Fujii Masao  wrote:
>> On Thu, Feb 16, 2017 at 11:40 AM, Masahiko Sawada  
>> wrote:
>>> On Thu, Feb 16, 2017 at 7:52 AM, Petr Jelinek
>>>  wrote:
 On 15/02/17 06:43, Masahiko Sawada wrote:
> On Tue, Feb 14, 2017 at 1:13 AM, Fujii Masao  
> wrote:
>> On Mon, Feb 13, 2017 at 4:05 PM, Masahiko Sawada  
>> wrote:
>>> On Sat, Feb 11, 2017 at 8:18 PM, Petr Jelinek
>>>  wrote:
 On 10/02/17 19:55, Masahiko Sawada wrote:
> On Thu, Feb 9, 2017 at 12:44 AM, Petr Jelinek
>  wrote:
>> On 08/02/17 07:40, Masahiko Sawada wrote:
>>> On Wed, Feb 8, 2017 at 9:01 AM, Michael Paquier
>>>  wrote:
 On Wed, Feb 8, 2017 at 1:30 AM, Fujii Masao 
  wrote:
> On Wed, Feb 8, 2017 at 12:26 AM, Petr Jelinek
>  wrote:
>> For example what happens if apply crashes during the DROP
>> SUBSCRIPTION/COMMIT and is not started because the delete from 
>> catalog
>> is now visible so the subscription is no longer there?
>
> Another idea is to treat DROP SUBSCRIPTION in the same way as 
> VACUUM, i.e.,
> make it emit an error if it's executed within user's transaction 
> block.

 It seems to me that this is exactly Petr's point: using
 PreventTransactionChain() to prevent things to happen.
>>>
>>> Agreed. It's better to prevent to be executed inside user 
>>> transaction
>>> block. And I understood there is too many failure scenarios we need 
>>> to
>>> handle.
>>>
> Also DROP SUBSCRIPTION should call CommitTransactionCommand() just
> after removing the entry from pg_subscription, then connect to 
> the publisher
> and remove the replication slot.

 For consistency that may be important.
>>>
>>> Agreed.
>>>
>>> Attached patch, please give me feedback.
>>>
>>
>> This looks good (and similar to what initial patch had btw). Works 
>> fine
>> for me as well.
>>
>> Remaining issue is, what to do about CREATE SUBSCRIPTION then, there 
>> are
>> similar failure scenarios there, should we prevent it from running
>> inside transaction as well?
>>
>
> Hmm,  after thought I suspect current discussing approach. For
> example, please image the case where CRAETE SUBSCRIPTION creates
> subscription successfully but fails to create replication slot for
> whatever reason, and then DROP SUBSCRIPTION drops the subscription but
> dropping replication slot is failed. In such case, CREAET SUBSCRIPTION
> and DROP SUBSCRIPTION return ERROR but the subscription is created and
> dropped successfully. I think that this behaviour confuse the user.
>
> I think we should just prevent calling DROP SUBSCRIPTION in user's
> transaction block. Or I guess that it could be better to separate the
> starting/stopping logical replication from subscription management.
>

 We need to stop the replication worker(s) in order to be able to drop
 the slot. There is no such issue with startup of the worker as that one
 is launched by launcher after the transaction has committed.

 IMO best option is to just don't allow DROP/CREATE SUBSCRIPTION inside 
 a
 transaction block and don't do any commits inside of those (so that
 there are no rollbacks, which solves your initial issue I believe). 
 That
 way failure to create/drop slot will result in subscription not being
 created/dropped which is what we want.
>>
>> On second thought, +1.
>>
>>> I basically agree this option, but why do we need to change CREATE
>>> SUBSCRIPTION as well?
>>
>> Because the window between the creation of replication slot and the 
>> transaction
>> commit of CREATE SUBSCRIPTION should be short. Otherwise, if any error 
>> happens
>> during that window, the replication slot unexpectedly remains while 
>> there is no
>> corresponding subscription. Of course, even If we prevent CREATE 
>> SUBSCRIPTION
>> from being executed within user's transaction block, there is still such
>> window. But we can reduce the possibility of that problem.
>
> Thank you for the explanation. I

Re: [HACKERS] Documentation improvements for partitioning

2017-02-21 Thread Yugo Nagata

Hi,

There is a very small typo that a comma is missing.
Attached is the patch to fix it.

On Wed, 15 Feb 2017 07:57:53 -0500
Robert Haas  wrote:

> On Wed, Feb 15, 2017 at 4:26 AM, Ashutosh Bapat
>  wrote:
> > Noticed some typos in the documentation. Here's patch to correct
> > those. Sorry, if it has been already taken care of.
> 
> Thanks.  That is indeed nonstandard capitalization.  Committed.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Yugo Nagata 
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 5779eac..ef0f7cf 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2899,7 +2899,7 @@ VALUES ('Albany', NULL, NULL, 'NY');
 
  
   
-   Since primary keys are not supported on partitioned tables
+   Since primary keys are not supported on partitioned tables,
foreign keys referencing partitioned tables are not supported, nor
are foreign key references from a partitioned table to some other table.
   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bold itemized list entries

2017-02-21 Thread Magnus Hagander

On Feb 13, 2017 8:45 AM, "Ashutosh Bapat" 
wrote:

Hi
Bulletted list in Devel branch are all showing up bold as seen in the
screenshot of [1] attached. But the same page for 9.6 branch shows
normal bulletted lists. If I try to bulid docs from the latest
resources (at commit ae0e550ce1703621efb3b75f4558df253cef5bca), the
html renders bulletted items normally. Does anybody else notice this?

Sorry, if there was some previous report of this and the issue has
been fixed but HTML pages on postgresql.org haven't got refreshed.


Hi!

Did this one get fixed with the changes that went on a couple of days ago?

/Magnus

Re: [HACKERS] SUBSCRIPTION command hangs up, segfault occurs in the server process and to cancel the execution.

2017-02-21 Thread Masahiko Sawada

On Tue, Feb 21, 2017 at 7:11 AM, nuko yokohama  wrote:
> Hi.
> While verifying the logical replication of PostgreSQL 10-devel, I found the
> following problem.
>
> * When you run the SUBSCRIPTION command against a table in the same
> PostgreSQL server, hang up.
> * Canceling the hung SUBSCRIPTION command with CTRL + C causes a server
> process occurs Segfault, and the PostgreSQL server to restart.
>
> 
> [Steps to Reproduce]
>
> $ createdb db1 -U postgres
> $ createdb db2 -U postgres
> $ psql -U postgres db1 -c "CREATE TABLE test (id int primary key, data
> text)"
> $ psql -U postgres db2 -c "CREATE TABLE test (id int primary key, data
> text)"
> $ psql -U postgres db1 -c "CREATE PUBLICATION pub_db1_test FOR TABLE test"
> $ psql -U postgres db2 -c "CREATE SUBSCRIPTION sub_db2_test CONNECTION
> 'dbname=db1 port=5432 user=postgres' PUBLICATION pub_db1_test"
>
> The SUBSCRIPTION command does not end, it hangs up.
> At this time, the following logs are output in the server log.
>
> 2017-02-21 06:58:05.082 JST [22060] LOG:  logical decoding found initial
> starting point at 0/1C06660
> 2017-02-21 06:58:05.082 JST [22060] DETAIL:  1 transaction needs to finish.
>
> Suspending psql (running SUBSCRIPTION) with CTRL + C causes a Segfault in
> the server process.
> At this time, the following message is output to the server log.
>
> 2017-02-21 07:01:00.246 JST [22059] ERROR:  canceling statement due to user
> request
> 2017-02-21 07:01:00.246 JST [22059] STATEMENT:  CREATE SUBSCRIPTION
> sub_db2_test CONNECTION 'dbname=db1 port=5432 user=postgres' PUBLICATION
> pub_db1_test
> 2017-02-21 07:01:01.006 JST [21445] LOG:  server process (PID 22060) was
> terminated by signal 11: Segmentation fault
> 2017-02-21 07:01:01.007 JST [21445] LOG:  terminating any other active
> server processes
>
> 
> [Environment]
>
> postgres=# SELECT version();
>   version
> 
>  PostgreSQL 10devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2
> 20140120 (Red Hat 4.8.2-16), 64-bit
> (1 row)
>
> postgres=# SHOW wal_level;
>  wal_level
> ---
>  logical
> (1 row)
>
> postgres=# SHOW max_wal_senders;;
>  max_wal_senders
> -
>  10
> (1 row)
>
> postgres=# SHOW max_replication_slots;
>  max_replication_slots
> ---
>  10
> (1 row)
>
> postgres=# TABLE pg_hba_file_rules ;
>  line_number | type  |   database| user_name  |  address  |
> netmask | auth_method | options | error
> -+---+---++---+-+-+-+---
>2 | local | {all} | {all}  |   |
> | trust   | |
>4 | host  | {all} | {all}  | 127.0.0.1 |
> 255.255.255.255 | trust   | |
>6 | host  | {all} | {all}  | ::1   |
> ::::::: | trust   | |
>9 | local | {replication} | {postgres} |   |
> | trust   | |
> (4 rows)
>
>

As log message says, the reason why CREATE SUBSCRIPTION hangs up is
that it's waiting for another transaction finish.

Regarding SEGV, it occurs by wal sender process creating replication
slot. When SEV occurs the backtrace is as follows.

(gdb) bt
#0  0x006e3d47 in resetStringInfo (str=0xea1c60) at stringinfo.c:96
#1  0x007e4738 in ProcessRepliesIfAny () at walsender.c:1347
#2  0x007e42c5 in WalSndWaitForWal (loc=23400264) at walsender.c:1142
#3  0x007e38f6 in logical_read_xlog_page (state=0x2732008,
targetPagePtr=23396352, reqLen=3912, targetRecPtr=23400240,
cur_page=0x2733d58 "\225\320\005", pageTLI=0x27328b4) at
walsender.c:717
#4  0x005458c8 in ReadPageInternal (state=0x2732008,
pageptr=23396352, reqLen=3912) at xlogreader.c:556
#5  0x00545068 in XLogReadRecord (state=0x2732008,
RecPtr=23400240, errormsg=0x7fff7cf34cf8) at xlogreader.c:255
#6  0x007d09db in DecodingContextFindStartpoint
(ctx=0x2731d48) at logical.c:432
#7  0x007e3a87 in CreateReplicationSlot (cmd=0x26edfa0) at
walsender.c:796
#8  0x007e45ae in exec_replication_command
(cmd_string=0x268b5f8 "CREATE_REPLICATION_SLOT \"sub_db2_test\"
LOGICAL pgoutput") at walsender.c:1272
#9  0x00846783 in PostgresMain (argc=1, argv=0x2697f38,
dbname=0x2697e68 "db1", username=0x2697e40 "masahiko") at
postgres.c:4064
#10 0x007b34e0 in BackendRun (port=0x268f4b0) at postmaster.c:4310
#11 0x007b2c60 in BackendStartup (port=0x268f4b0) at postmaster.c:3982
#12 0x007af2e0 in ServerLoop () at postmaster.c:1722
#13 0x007ae9a4 in PostmasterMain (argc=5, argv=0x266c020) at
postmaster.c:1330
#14 0x006f487a in main (argc=5, argv=0x266c020) at main.c:228

I

[HACKERS] pg_upgrade loses security lables and COMMENTs on blobs

2017-02-21 Thread Stephen Frost

Greetings,

When pg_upgrade calls pg_dump, it passes in "--schema-only", which is
generally correct, except that this causes everything having to do with
large objects to be excluded.  That's still usually correct, because
pg_upgrade will simply copy the pg_largeobject and
pg_largeobject_metadata tables through to the new cluster as-is
(essentially treating them as if they were user tables).

Unfortunately, those tables aren't, actually, the only places that we
store information about large objects; the general-purpose tables like
pg_seclabel and pg_description can hold information about large objects
too.

What this means is that performing a pg_upgrade will result in any
security labels or comments on large objects being dropped.  This
seems to go back at least as far as 9.2, though I found it through the
pg_dump regression testing that I've been working on.

I haven't looked at trying to fix this yet, but I'm thinking the
approach to use will probably be to modify pg_dump to still call
getBlobs() when in binary-upgrade mode (regardless of the schema-only
flag) but then have dumpBlobs(), when in binary-upgrade mode, only
output the security labels and comments.  I hope that doesn't end up
causing some kind of chicken-and-egg problem..  Presumably the large
object tables are in place and correct before the dump is restored, so I
think this will work.

Just wanted to get a note out to -hackers about the issue, I'll see
about getting a fix written up for it soon.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] drop support for Python 2.3

2017-02-21 Thread Tom Lane

Peter Eisentraut  writes:
> On 2/19/17 23:33, Devrim Gündüz wrote:
>> Thanks! Looks like buildfarm is green again.

> Thank.  I have committed the patch to drop Python 2.3 support.

I spent some time last night building a (rather makeshift) python 2.4.1
installation on prairiedog, which I will now switch it to use in HEAD,
but keeping it pointed at 2.3 for 9.6 and older.

Based on some not-fun I had along the way to that, I think it would be
a good idea to do the Python version check a bit earlier than you have
here.  The shlib search in PGAC_CHECK_PYTHON_EMBED_SETUP is pretty fragile
and version-dependent, which means that a person could waste a lot of time
trying to get past the "could not find shared library for Python" error
before being told that their Python is too old.  I propose to move the
minimum-version check into _PGAC_CHECK_PYTHON_DIRS, say right after the
[Python configuration directory] stanza.  Unless there was some specific
reason for not wanting it to happen in python.m4?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-21 Thread Amit Kapila

On Tue, Feb 21, 2017 at 1:09 AM, Simon Riggs  wrote:
> On 20 February 2017 at 10:27, Amit Kapila  wrote:
>> On Mon, Feb 20, 2017 at 3:01 PM, Simon Riggs  wrote:
>>> On 20 February 2017 at 09:15, Amit Kapila  wrote:
 On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada  
 wrote:
> On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas  
> wrote:
>> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  
>> wrote:
>>> On 15 February 2017 at 08:07, Masahiko Sawada  
>>> wrote:
 It's a bug. Attached latest version patch, which passed make check.
>>>
>>> 2. The current btree vacuum code requires 2 vacuums to fully reuse
>>> half-dead pages. So skipping an index vacuum might mean that second
>>> index scan never happens at all, which would be bad.
>>
>> Maybe.  If there are a tiny number of those half-dead pages in a huge
>> index, it probably doesn't matter.  Also, I don't think it would never
>> happen, unless the table just never gets any more updates or deletes -
>> but that case could also happen today.  It's just a matter of
>> happening less frequently.
>

 Yeah thats right and I am not sure if it is worth to perform a
 complete pass to reclaim dead/deleted pages unless we know someway
 that there are many such pages.
>>>
>>> Agreed which is why
>>> On 16 February 2017 at 11:17, Simon Riggs  wrote:
 I suggest that we store the number of half-dead pages in the metapage
 after each VACUUM, so we can decide whether to skip the scan or not.
>>>
>>>
 Also, I think we do reclaim the
 complete page while allocating a new page in btree.
>>>
>>> That's not how it works according to the README at least.
>>>
>>
>> I am referring to code (_bt_getbuf()->if (_bt_page_recyclable(page))),
>> won't that help us in reclaiming the space?
>
> Not unless the README is incorrect, no.
>

Just to ensure that we both have the same understanding, let me try to
write what I understand about this reclaim algorithm.  AFAIU, in the
first pass vacuum will mark the half dead pages as Deleted and in the
second pass, it will record such pages as free in FSM so that they can
be reused as new pages when the indexam asked for a new block instead
of extending the index relation.  Now, if we introduce this new GUC,
then there are chances that sometimes we skip the second pass where it
would not have been skipped.

Note that we do perform the second pass in the same vacuum cycle when
index has not been scanned for deleting the tuples as per below code:

btvacuumcleanup()
{
..
if (stats == NULL)
{
stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
btvacuumscan(info, stats, NULL, NULL, 0);
..
}

In above code stats won't be NULL, if the vacuum has scanned index for
deleting tuples (btbulkdelete).  So, based on this I think it will
skip scanning the index (or recycling pages marked as deleted) in the
second vacuum only when there are no dead tuple removals in that
vacuum.  Do we agree till here?
I understand that there could be some delay in reclaiming dead pages
but do you think it is such a big deal that we completely scan the
index for such cases or even try to change the metapage format?

> That section of code is just a retest of pages retrieved from FSM;
>

Yes, I think you are right.  In short, I agree that only vacuum can
reclaim half-dead pages.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] drop support for Python 2.3

2017-02-21 Thread Peter Eisentraut

On 2/19/17 23:33, Devrim Gündüz wrote:
> Thanks! Looks like buildfarm is green again.

Thank.  I have committed the patch to drop Python 2.3 support.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-21 Thread Bernd Helmle

Am Dienstag, den 14.02.2017, 15:53 +0300 schrieb Alexander Korotkov:
> +1
> And you could try to use pg_wait_sampling
>  to sampling of wait
> events.

I've tried this with your example from your blog post[1] and got this:

(pgbench scale 1000)

pgbench -Mprepared -S -n -c 100 -j 100 -T 300 -P2 pgbench2

SELECT-only:

SELECT * FROM profile_log ;
 ts |  event_type   | event | count 
+---+---+---
 2017-02-21 15:21:52.45719  | LWLockNamed   | ProcArrayLock | 8
 2017-02-21 15:22:11.19594  | LWLockTranche | lock_manager  | 1
 2017-02-21 15:22:11.19594  | LWLockNamed   | ProcArrayLock |24
 2017-02-21 15:22:31.220803 | LWLockNamed   | ProcArrayLock | 1
 2017-02-21 15:23:01.255969 | LWLockNamed   | ProcArrayLock | 1
 2017-02-21 15:23:11.272254 | LWLockNamed   | ProcArrayLock | 2
 2017-02-21 15:23:41.313069 | LWLockNamed   | ProcArrayLock | 1
 2017-02-21 15:24:31.37512  | LWLockNamed   | ProcArrayLock |19
 2017-02-21 15:24:41.386974 | LWLockNamed   | ProcArrayLock | 1
 2017-02-21 15:26:41.530399 | LWLockNamed   | ProcArrayLock | 1
(10 rows)

writes pgbench runs have far more events logged, see the attached text
file. Maybe this is of interest...


[1] http://akorotkov.github.io/blog/2016/03/25/wait_monitoring_9_6/ ts |  event_type   |event | count 
+---+--+---
 2017-02-21 15:45:18.875842 | Lock  | tuple|18
 2017-02-21 15:45:18.875842 | LWLockTranche | lock_manager |35
 2017-02-21 15:45:18.875842 | LWLockTranche | buffer_mapping   | 1
 2017-02-21 15:45:18.875842 | LWLockTranche | buffer_content   |   402
 2017-02-21 15:45:18.875842 | LWLockTranche | wal_insert   |  1880
 2017-02-21 15:45:18.875842 | Lock  | transactionid|   716
 2017-02-21 15:45:18.875842 | LWLockNamed   | ProcArrayLock|   644
 2017-02-21 15:45:18.875842 | LWLockNamed   | CLogControlLock  |   255
 2017-02-21 15:45:18.875842 | LWLockNamed   | XidGenLock   |   104
 2017-02-21 15:45:18.875842 | Lock  | extend   |   647
 2017-02-21 15:45:28.889785 | Lock  | tuple|65
 2017-02-21 15:45:28.889785 | LWLockTranche | lock_manager |96
 2017-02-21 15:45:28.889785 | LWLockTranche | buffer_content   |   811
 2017-02-21 15:45:28.889785 | LWLockTranche | wal_insert   |  2542
 2017-02-21 15:45:28.889785 | Lock  | transactionid|  1686
 2017-02-21 15:45:28.889785 | LWLockNamed   | ProcArrayLock|  1194
 2017-02-21 15:45:28.889785 | LWLockNamed   | CLogControlLock  |  1244
 2017-02-21 15:45:28.889785 | LWLockNamed   | XidGenLock   |   401
 2017-02-21 15:45:28.889785 | Lock  | extend   |   818
 2017-02-21 15:45:38.904761 | Lock  | tuple|54
 2017-02-21 15:45:38.904761 | LWLockTranche | lock_manager |87
 2017-02-21 15:45:38.904761 | LWLockTranche | buffer_content   |   756
 2017-02-21 15:45:38.904761 | LWLockTranche | wal_insert   |  2161
 2017-02-21 15:45:38.904761 | Lock  | transactionid|  1624
 2017-02-21 15:45:38.904761 | LWLockNamed   | ProcArrayLock|  1154
 2017-02-21 15:45:38.904761 | LWLockNamed   | CLogControlLock  |  1418
 2017-02-21 15:45:38.904761 | LWLockNamed   | XidGenLock   |   450
 2017-02-21 15:45:38.904761 | Lock  | extend   |   650
 2017-02-21 15:45:48.917774 | Lock  | tuple|51
 2017-02-21 15:45:48.917774 | LWLockTranche | lock_manager |85
 2017-02-21 15:45:48.917774 | LWLockTranche | buffer_content   |   947
 2017-02-21 15:45:48.917774 | LWLockTranche | wal_insert   |  2050
 2017-02-21 15:45:48.917774 | Lock  | transactionid|  1625
 2017-02-21 15:45:48.917774 | LWLockNamed   | ProcArrayLock|  1094
 2017-02-21 15:45:48.917774 | LWLockNamed   | CLogControlLock  |  2575
 2017-02-21 15:45:48.917774 | LWLockNamed   | XidGenLock   |   372
 2017-02-21 15:45:48.917774 | Lock  | extend   |   528
 2017-02-21 15:45:58.930732 | Lock  | tuple|63
 2017-02-21 15:45:58.930732 | LWLockTranche | lock_manager |86
 2017-02-21 15:45:58.930732 | LWLockTranche | buffer_content   |  1026
 2017-02-21 15:45:58.930732 | LWLockTranche | wal_insert   |  1543
 2017-02-21 15:45:58.930732 | Lock  | transactionid|  1794
 2017-02-21 15:45:58.930732 | LWLockNamed   | ProcArrayLock|  1154
 2017-02-21 15:45:58.930732 | LWLockNamed   | CLogControlLock  |  3850
 2017-02-21 15:45:58.930732 | LWLockNamed   | XidGenLock   |   315
 2017-02-21 15:45:58.930732 | Lock  | extend   |

Re: [HACKERS] pg_monitor role

2017-02-21 Thread Thomas Reiss

Le 20/02/2017 à 12:48, Dave Page a écrit :
> Further to the patch I just submitted
> (https://www.postgresql.org/message-id/CA%2BOCxow-X%3DD2fWdKy%2BHP%2BvQ1LtrgbsYQ%3DCshzZBqyFT5jOYrFw%40mail.gmail.com)
> I'd like to propose the addition of a default role, pg_monitor.
> 
> The intent is to make it easy for users to setup a role for fully
> monitoring their servers, without requiring superuser level privileges
> which is a problem for many users working within strict security
> policies.
> 
> At present, functions or system config info that divulge any
> installation path related info typically require superuser privileges.
> This makes monitoring for unexpected changes in configuration or
> filesystem level monitoring (e.g. checking for large numbers of WAL
> files or log file info) impossible for non-privileged roles.
> 
> A similar example is the restriction on the pg_stat_activity.query
> column, which prevents non-superusers seeing any query strings other
> than their own.
> 
> Using ACLs is a problem for a number of reasons:
> 
> - Users often don't like their database schemas to be modified
> (cluttered with GRANTs).
> - ACL modifications would potentially have to be made in every
> database in a cluster.
> - Using a pre-defined role minimises the setup that different tools
> would have to require.
> - Not all functionality has an ACL (e.g. SHOW)
> 
> Other DBMSs solve this problem in a similar way.
> 
> Initially I would propose that permission be granted to the role to:
> 
> - Execute pg_ls_logdir() and pg_ls_waldir()
> - Read pg_stat_activity, including the query column for all queries.
> - Allow "SELECT pg_tablespace_size('pg_global')"
> - Read all GUCs
> 
> In the future I would also like to see us add additional roles for
> system administration functions, for example, a backup operator role
> that would have the appropriate rights to make and restore backups.
> 
> Comments?

Hello,

That's something really useful. Some customers would like to use a
non-privileged user to connect their monitoring.

I've come to a set of hacks to give such features to a particular
customer, but that remains a hack. But this only works if the monitoring
tool does not prefix explicitly each view or functions with schema
pg_catalog.

I'm really looking forward such feature. Let me know if I can help in
some way.

Regards


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-21 Thread Petr Jelinek

On 21/02/17 07:50, Michael Paquier wrote:
> On Tue, Feb 21, 2017 at 12:48 AM, Stephen Frost  wrote:
>> * Fujii Masao (masao.fu...@gmail.com) wrote:
>>> On Fri, Feb 17, 2017 at 11:17 PM, Peter Eisentraut
>>> 1) Expose all the columns except subconninfo in pg_subscription to
>>> non-superusers. In this idea, the tab-completion and psql meta-command
>>> for subscription still sees pg_subscription. One good point of
>>> idea is that even non-superusers can see all the useful information
>>> about subscriptions other than sensitive information like conninfo.
>>>
>>> 2) Change pg_stat_subscription so that it also shows all the columns except
>>> subconninfo in pg_subscription. Also change the tab-completion and
>>> psql meta-command for subscription so that they see pg_stat_subscription
>>> instead of pg_subscription. One good point is that pg_stat_subscription
>>> exposes all the useful information about subscription to even
>>> non-superusers. OTOH, pg_subscription and pg_stat_subscription have
>>> to have several same columns. This would be redundant and a bit 
>>> confusing.
>>>
>>> 3) Expose subdbid in pg_stat_subscription. Also change the tab-completion
>>> and psql meta-command for subscription so that they see
>>> pg_stat_subscription. This change is very simple. But non-superusers 
>>> cannot
>>> see useful information like subslotname because of privilege problem.
>>>
>>> I like #1, but any better approach?
>>
>> #1 seems alright to me, at least.  We could start using column-level
>> privs in other places also, as I mentioned up-thread.
> 
> Among the three, this looks like a good one.
> 
>> I don't particularly like the suggestions to have psql use pg_stat_X
>> views or to put things into pg_stat_X views which are object definitions
>> for non-superusers.  If we really don't want to use column-level
>> privileges then we should have an appropriate view create instead which
>> provides non-superusers with the non-sensitive object information.
> 
> Neither do I, those are by definition tables for statistics. Having
> tab completion depend on them is not right.
> 
> So what do you think about the patch attached? This does the following:
> - complete subscription list for CREATE/ALTER SUBSCRIPTION
> - complete publication list for CREATE/ALTER PUBLICATION
> - complete to nothing for PUBLICATION in CREATE/ALTER SUBSCRIPTION as
> this refers to remote objects defined in subconninfo.
> - authorize read access to public for all columns of pg_subscription
> except subconninfo
> Thoughts?
> 

Maybe it would make sense to also have pg_subscriptions view which
selects everything but subconninfo (or does some removal of sensitive
info there, if we have function for that)? The reason why I am thinking
about this is that, it's much more convenient to do SELECT * than
specifying all columns you have access to.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-21 Thread Ashutosh Bapat

Some comments about 0003 patch.

@@ -996,10 +996,20 @@ inheritance_planner(PlannerInfo *root)
Index   rti;
+   RangeTblEntry *parent_rte;
There's already another variable declared in that function within a loop
foreach(lc, root->append_rel_list)
{
...
RangeTblEntry *parent_rte;
RangeTblEntry *child_rte;

You might want to choose a different name or delete the one within the loop.

I am wondering whether we should deal with inh flat reset in a
slightly different way. Let expand_inherited_rtentry() mark inh =
false for the partitioned tables without any partitions and deal with
those at the time of estimating size by marking those as dummy. That
might be better than multiple changes here. I will try to cook up a
patch soon for that.

Also we should add tests to make sure the scans on partitioned tables
without any partitions do not get into problems. PFA patch which adds
those tests.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


0003-Always-plan-partitioned-tables-as-inheritance-sets_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-02-21 Thread Ashutosh Sharma

On Tue, Feb 21, 2017 at 5:52 PM, Alexander Korotkov
 wrote:
> On Tue, Feb 21, 2017 at 2:37 PM, Andres Freund  wrote:
>>
>> On 2017-02-21 16:57:36 +0530, Ashutosh Sharma wrote:
>> > Yes, there is still some regression however it has come down by a
>> > small margin. I am not doing initdb for each run instead I am doing,
>> > dropdb-->createdb-->pgbench -i. Is dropping old database and creating
>> > a new one for every run not okay, Do I have to do initdb every time.
>> > Okay, I can change the order of reading and let you know the results.
>>
>> That does make a difference. Primarily because WAL writes in a new
>> cluster are a more expensive than in an old one, because of segment
>> recycling.
>
>
> +1

okay, understood. Will take care of it when I start with next round of
testing. Thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring replay lag

2017-02-21 Thread Simon Riggs

On 17 February 2017 at 07:45, Thomas Munro
 wrote:
> On Fri, Feb 17, 2017 at 12:45 AM, Simon Riggs  wrote:
>> Feeling happier about this for now at least.
>
> Thanks!

And happier again, leading me to move to the next stage of review,
focusing on the behaviour emerging from the design.

So my current understanding is that this doesn't rely upon LSN
arithmetic to measure lag, which is good. That means logical
replication should "just work" and future mechanisms to filter
physical WAL will also just work. This is important, so please comment
if you see that isn't the case.

I notice that LagTrackerRead() doesn't do anything to interpolate the
time given, so at present any attempt to prune the lag sample buffer
would result in falsely increasing the lag times reported. Which is
probably the reason why you say "There may be better adaptive
compaction algorithms."  We need to look at this some more, an initial
guess might be that we need to insert fewer samples as the buffer
fills since the LagTrackerRead() algorithm is O(N) on number of
samples and thus increasing the buffer itself isn't a great plan.

It would be very nice to be able to say something like that the +/-
confidence limits of the lag are no more than 50% of the lag time, so
we have some idea of how accurate the value is at any point. We need
to document the accuracy of the result, otherwise we'll be answering
questions on that for some time. So lets think about that now.

Given LagTrackerRead() is reading the 3 positions in order, it seems
sensible to start reading the LAG_TRACKER_FLUSH_HEAD from the place
you finished reading LAG_TRACKER_WRITE_HEAD etc.. Otherwise we end up
doing way too much work with larger buffers.

Which makes me think about the read more. The present design
calculates everything on receipt of standby messages. I think we
should simply record the last few messages and do the lag calculation
when the values are later read, if indeed they are ever read. That
would allow us a much better diagnostic view, at least. And it would
allow you to show a) latest value, b) smoothed in various ways, or c)
detail of last few messages for diagnostics. The latest value would be
the default value in pg_stat_replication - I agree we shouldn't make
that overly wide, so we'd need another function to access the details.

What is critical is that we report stable values as lag increases.
i.e. we need to iron out any usage cases so we don't have to fix them
in PG11 and spend a year telling people "yeh, it does that" (like
we've been doing for some time). So the diagnostics will help us
investigate this patch over various use cases...

I think what we need to show some test results with the graph of lag
over time for these cases:
1. steady state - pgbench on master, so we can see how that responds
2. blocked apply on standby - so we can see how the lag increases but
also how the accuracy goes down as the lag increases and whether the
reported value changes (depending upon algo)
3. burst mode - where we go from not moving to moving at high speed
and then stop again quickly
+other use cases you or others add

Does the proposed algo work for these cases? What goes wrong with it?
It's the examination of these downsides, if any, are the things we
need to investigate now to allow this to get committed.

Some minor points on code...
Why are things defined in walsender.c and not in .h?
Why is LAG_TRACKER_NUM_READ_HEADS not the same as NUM_SYNC_REP_WAIT_MODE?
...and other related constants shouldn't be redefined either.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-21 Thread Alexander Korotkov

On Tue, Feb 21, 2017 at 1:47 PM, Bernd Helmle  wrote:

> Am Samstag, den 11.02.2017, 15:42 +0300 schrieb Alexander Korotkov:
> > I think it would make sense to run more kinds of tests.  Could you
> > try set
> > of tests provided by Tomas Vondra?
> > If even we wouldn't see win some of the tests, it would be still
> > valuable
> > to see that there is no regression there.
>
> Sorry for the delay.
>
> But here are the results after having run Tomas' benchmarks scripts.
> The pgbench-ro benchmark shows a clear win of the lwlock-power-3 patch,
> also you see a slight advantage of this patch in the pgbench-rw graph.
> At least, i don't see any signifikant regression somewhere.
>
> The graphs are showing the average values over all five runs for each
> client count.
>

Thank you very much for testing, results look good.
I'll keep trying to access bigger Power machine where advantage of patch is
expected to be larger.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Should we cacheline align PGXACT?

2017-02-21 Thread Alexander Korotkov

On Tue, Feb 21, 2017 at 2:37 PM, Andres Freund  wrote:

> On 2017-02-21 16:57:36 +0530, Ashutosh Sharma wrote:
> > Yes, there is still some regression however it has come down by a
> > small margin. I am not doing initdb for each run instead I am doing,
> > dropdb-->createdb-->pgbench -i. Is dropping old database and creating
> > a new one for every run not okay, Do I have to do initdb every time.
> > Okay, I can change the order of reading and let you know the results.
>
> That does make a difference. Primarily because WAL writes in a new
> cluster are a more expensive than in an old one, because of segment
> recycling.
>

+1

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-21 Thread Pavan Deolasee

Hi Tom,

On Wed, Feb 1, 2017 at 3:51 AM, Tom Lane  wrote:

>
> (I'm a little more concerned by Alvaro's apparent position that WARM
> is a done deal; I didn't think so.

Are there any specific aspects of the design that you're not comfortable
with? I'm sure there could be some rough edges in the implementation that
I'm hoping will get handled during the further review process. But if there
are some obvious things I'm overlooking please let me know.

Probably the same question to Andres/Robert who has flagged concerns. On my
side, I've run some very long tests with data validation and haven't found
any new issues with the most recent patches.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Should we cacheline align PGXACT?

2017-02-21 Thread Andres Freund

Hi,

On 2017-02-21 16:57:36 +0530, Ashutosh Sharma wrote:
> Yes, there is still some regression however it has come down by a
> small margin. I am not doing initdb for each run instead I am doing,
> dropdb-->createdb-->pgbench -i. Is dropping old database and creating
> a new one for every run not okay, Do I have to do initdb every time.
> Okay, I can change the order of reading and let you know the results.

That does make a difference. Primarily because WAL writes in a new
cluster are a more expensive than in an old one, because of segment
recycling.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-02-21 Thread Ashutosh Sharma

On Tue, Feb 21, 2017 at 4:21 PM, Alexander Korotkov
 wrote:
>
> Hi, Ashutosh!
> On Mon, Feb 20, 2017 at 1:20 PM, Ashutosh Sharma  
> wrote:
>>
>> Following are the pgbench results for read-write workload, I got with 
>> pgxact-align-3 patch. The results are for 300 scale factor with 8GB of 
>> shared buffer i.e. when data fits into the shared buffer. For 1000 scale 
>> factor with 8GB shared buffer the test is still running, once it is 
>> completed I will share the results for that as well.
>>
>> pgbench settings:
>> pgbench -i -s 300 postgres
>> pgbench -M prepared -c $thread -j $thread -T $time_for_reading  postgres
>>
>> where, time_for_reading = 30mins
>>
>> non default GUC param:
>> shared_buffers=8GB
>> max_connections=300
>>
>> pg_xlog is located in SSD.
>
>
> Thank you for testing.
> It seems that there is still regression.  While padding was reduced from 116 
> bytes to 4 bytes, it makes me think that probably there is something wrong in 
> testing methodology.
> Are you doing re-initdb and pgbench -i before each run?  I would ask you to 
> do the both.
> Also, if regression would still exist, let's change the order of versions.  
> Thus, if you run master before patched version, I would ask you to run 
> patched version before master.


Yes, there is still some regression however it has come down by a
small margin. I am not doing initdb for each run instead I am doing,
dropdb-->createdb-->pgbench -i. Is dropping old database and creating
a new one for every run not okay, Do I have to do initdb every time.
Okay, I can change the order of reading and let you know the results.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dropping partitioned tables without CASCADE

2017-02-21 Thread Ashutosh Bapat

On Tue, Feb 21, 2017 at 12:05 PM, Amit Langote
 wrote:
> Hi Ashutosh,
>
> Thanks for taking a look at the patch.
>
> On 2017/02/20 21:49, Ashutosh Bapat wrote:
>> Thanks for working on all the follow on work for partitioning feature.
>>
>> May be you should add all those patches in the next commitfest, so
>> that we don't forget those.
>
> I think adding these as one of the PostgreSQL 10 Open Items [0] might be
> better.  I've done that.
>
>> On Mon, Feb 20, 2017 at 7:46 AM, Amit Langote wrote:
>>> So I count more than a few votes saying that we should be able to DROP
>>> partitioned tables without specifying CASCADE.
>>>
>>> I tried to implement that using the attached patch by having
>>> StoreCatalogInheritance1() create DEPENDENCY_AUTO dependency between
>>> parent and child if the child is a partition, instead of DEPENDENCY_NORMAL
>>> that would otherwise be created.  Now it seems that that is one way of
>>> making sure that partitions are dropped when the root partitioned table is
>>> dropped, not sure if the best; why create the pg_depend entries at all one
>>> might ask.  I chose it for now because that's the one with fewest lines of
>>> change.  Adjusted regression tests as well, since we recently tweaked
>>> tests [1] to work around the irregularities of test output when using 
>>> CASCADE.
>>
>> The patch applies cleanly and regression does not show any failures.
>>
>> Here are some comments
>>
>> For the sake of readability you may want reverse the if and else order.
>> -recordDependencyOn(, , DEPENDENCY_NORMAL);
>> +if (!child_is_partition)
>> +recordDependencyOn(, , DEPENDENCY_NORMAL);
>> +else
>> +recordDependencyOn(, , DEPENDENCY_AUTO);
>> like
>> +if (child_is_partition)
>> +recordDependencyOn(, , DEPENDENCY_AUTO);
>> +else
>> +recordDependencyOn(, , DEPENDENCY_NORMAL);
>
> Sure, done.

I still see
-recordDependencyOn(, , DEPENDENCY_NORMAL);
+if (!child_is_partition)
+recordDependencyOn(, , DEPENDENCY_NORMAL);
+else
+recordDependencyOn(, , DEPENDENCY_AUTO);

Are you sure you have attached the right patch?

To avoid duplication you could actually write
recordDependencyOn(, ,
child_is_partition ?
DEPENDENCY_AUTO : DEPENDENCY_NORMAL);

>
>> It's weird that somebody can perform DROP TABLE on the partition without
>> referring to its parent. That may be a useful feature as it allows one to
>> detach the partition as well as remove the table in one command. But it looks
>> wierd for someone familiar with partitioning features of other DBMSes. But 
>> then
>> our partition creation command is CREATE TABLE  So may be this is 
>> expected
>> difference.
>
> There is a line on the CREATE TABLE page in the description of PARTITION
> OF clause:
>
> "Note that dropping a partition with DROP TABLE requires taking an ACCESS
> EXCLUSIVE lock on the parent table."
>
> In earlier proposals I had included the ALTER TABLE parent ADD/DROP
> PARTITION commands, but CRAETE TABLE PARTITION OF / DROP TABLE prevailed.

Ok.

>
>> --- cleanup: avoid using CASCADE
>> -DROP TABLE list_parted, part_1;
>> -DROP TABLE list_parted2, part_2, part_5, part_5_a;
>> -DROP TABLE range_parted, part1, part2;
>> +-- cleanup
>> +DROP TABLE list_parted, list_parted2, range_parted;
>> Testcases usually drop one table at a time, I guess, to reduce the 
>> differences
>> when we add or remove tables from testcases. All such blocks should probably
>> follow same policy.
>
> Hmm, I see this in src/test/regress/sql/inherit.sql:141
>
> DROP TABLE firstparent, secondparent, jointchild, thirdparent, otherchild;

Hmm, I can spot some more such usages. Let's keep this for the
committer to decide.

>
>>  drop table list_parted cascade;
>> -NOTICE:  drop cascades to 3 other objects
>> -DETAIL:  drop cascades to table part_ab_cd
>> probably we should remove cascade from there, unless you are testing CASCADE
>> functionality. Similarly for other blocks like
>>  drop table range_parted cascade;
>>
>> BTW, I noticed that although we are allowing foreign tables to be
>> partitions, there are no tests in foreign_data.sql for testing it. If
>> there would have been we would tests DROP TABLE on a partitioned table
>> with foreign partitions as well. That file has testcases for testing
>> foreign table inheritance, and should have tests for foreign table
>> partitions.
>
> That makes sense.  Patch 0002 is for that (I'm afraid this should be
> posted separately though).  I didn't add/repeat all the tests that were
> added by the foreign table inheritance patch again for foreign partitions
> (common inheritance rules apply to both cases), only added those for the
> new partitioning commands and certain new rules.

Thanks. Yes, a separate thread would do. I will review it there. May
be you want to add it to the commitfest too.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database

Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK

2017-02-21 Thread Masahiko Sawada

On Tue, Feb 21, 2017 at 3:42 AM, Fujii Masao  wrote:
> On Thu, Feb 16, 2017 at 11:40 AM, Masahiko Sawada  
> wrote:
>> On Thu, Feb 16, 2017 at 7:52 AM, Petr Jelinek
>>  wrote:
>>> On 15/02/17 06:43, Masahiko Sawada wrote:
 On Tue, Feb 14, 2017 at 1:13 AM, Fujii Masao  wrote:
> On Mon, Feb 13, 2017 at 4:05 PM, Masahiko Sawada  
> wrote:
>> On Sat, Feb 11, 2017 at 8:18 PM, Petr Jelinek
>>  wrote:
>>> On 10/02/17 19:55, Masahiko Sawada wrote:
 On Thu, Feb 9, 2017 at 12:44 AM, Petr Jelinek
  wrote:
> On 08/02/17 07:40, Masahiko Sawada wrote:
>> On Wed, Feb 8, 2017 at 9:01 AM, Michael Paquier
>>  wrote:
>>> On Wed, Feb 8, 2017 at 1:30 AM, Fujii Masao  
>>> wrote:
 On Wed, Feb 8, 2017 at 12:26 AM, Petr Jelinek
  wrote:
> For example what happens if apply crashes during the DROP
> SUBSCRIPTION/COMMIT and is not started because the delete from 
> catalog
> is now visible so the subscription is no longer there?

 Another idea is to treat DROP SUBSCRIPTION in the same way as 
 VACUUM, i.e.,
 make it emit an error if it's executed within user's transaction 
 block.
>>>
>>> It seems to me that this is exactly Petr's point: using
>>> PreventTransactionChain() to prevent things to happen.
>>
>> Agreed. It's better to prevent to be executed inside user transaction
>> block. And I understood there is too many failure scenarios we need 
>> to
>> handle.
>>
 Also DROP SUBSCRIPTION should call CommitTransactionCommand() just
 after removing the entry from pg_subscription, then connect to the 
 publisher
 and remove the replication slot.
>>>
>>> For consistency that may be important.
>>
>> Agreed.
>>
>> Attached patch, please give me feedback.
>>
>
> This looks good (and similar to what initial patch had btw). Works 
> fine
> for me as well.
>
> Remaining issue is, what to do about CREATE SUBSCRIPTION then, there 
> are
> similar failure scenarios there, should we prevent it from running
> inside transaction as well?
>

 Hmm,  after thought I suspect current discussing approach. For
 example, please image the case where CRAETE SUBSCRIPTION creates
 subscription successfully but fails to create replication slot for
 whatever reason, and then DROP SUBSCRIPTION drops the subscription but
 dropping replication slot is failed. In such case, CREAET SUBSCRIPTION
 and DROP SUBSCRIPTION return ERROR but the subscription is created and
 dropped successfully. I think that this behaviour confuse the user.

 I think we should just prevent calling DROP SUBSCRIPTION in user's
 transaction block. Or I guess that it could be better to separate the
 starting/stopping logical replication from subscription management.

>>>
>>> We need to stop the replication worker(s) in order to be able to drop
>>> the slot. There is no such issue with startup of the worker as that one
>>> is launched by launcher after the transaction has committed.
>>>
>>> IMO best option is to just don't allow DROP/CREATE SUBSCRIPTION inside a
>>> transaction block and don't do any commits inside of those (so that
>>> there are no rollbacks, which solves your initial issue I believe). That
>>> way failure to create/drop slot will result in subscription not being
>>> created/dropped which is what we want.
>
> On second thought, +1.
>
>> I basically agree this option, but why do we need to change CREATE
>> SUBSCRIPTION as well?
>
> Because the window between the creation of replication slot and the 
> transaction
> commit of CREATE SUBSCRIPTION should be short. Otherwise, if any error 
> happens
> during that window, the replication slot unexpectedly remains while there 
> is no
> corresponding subscription. Of course, even If we prevent CREATE 
> SUBSCRIPTION
> from being executed within user's transaction block, there is still such
> window. But we can reduce the possibility of that problem.

 Thank you for the explanation. I understood and agree.

 I think we should disallow to call ALTER SUBSCRIPTION inside a user's
 transaction block as well.
>>>
>>> Why? ALTER SUBSCRIPTION does not create/drop anything on remote server ever.
>>>
>>

Re: [HACKERS] Should we cacheline align PGXACT?

2017-02-21 Thread Alexander Korotkov

Hi, Ashutosh!

On Mon, Feb 20, 2017 at 1:20 PM, Ashutosh Sharma 
wrote:

> Following are the pgbench results for read-write workload, I got with
> pgxact-align-3 patch. The results are for 300 scale factor with 8GB of
> shared buffer i.e. when data fits into the shared buffer. For 1000 scale
> factor with 8GB shared buffer the test is still running, once it is
> completed I will share the results for that as well.
>
> *pgbench settings:*
> pgbench -i -s 300 postgres
> pgbench -M prepared -c $thread -j $thread -T $time_for_reading  postgres
>
> where, time_for_reading = 30mins
>
> *non default GUC param:*
> shared_buffers=8GB
> max_connections=300
>
> pg_xlog is located in SSD.
>

Thank you for testing.
It seems that there is still regression.  While padding was reduced from
116 bytes to 4 bytes, it makes me think that probably there is something
wrong in testing methodology.
Are you doing re-initdb and pgbench -i before each run?  I would ask you to
do the both.
Also, if regression would still exist, let's change the order of versions.
Thus, if you run master before patched version, I would ask you to run
patched version before master.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2017-02-21 Thread Rushabh Lathia

On Mon, Feb 20, 2017 at 1:41 PM, Etsuro Fujita 
wrote:

> On 2017/02/13 18:24, Rushabh Lathia wrote:
>
>> I started reviewing the patch again. Patch applied cleanly on latest
>> source
>> as well as regression pass through with the patch. I also performed
>> few manual testing and haven't found any regression. Patch look
>> much cleaner the earlier version, and don't have any major concern as
>> such.
>>
>
> Thanks for the review!
>
> Here are few comments:
>>
>> 1)
>>
>> @@ -211,6 +211,12 @@ typedef struct PgFdwDirectModifyState
>>  PGresult   *result;/* result for query */
>>  intnum_tuples;/* # of result tuples */
>>  intnext_tuple;/* index of next one to return */
>> +RelationresultRel;/* relcache entry for the target table
>> */
>>
>
> Why we need resultRel? Can't we directly use dmstate->rel ?
>>
>
> The reason why we need that is because in get_returning_data, we pass
> dmstate->rel to make_tuple_from_result_row, which requires that
> dmstate->rel be NULL when the scan tuple is described by fdw_scan_tlist.
> So in that case we set dmstate->rel to NULL and have dmstate->resultRel
> that is the relcache entry for the target relation in
> postgresBeginDirectModify.
>
>
Thanks for the explanation. We might do something here by using
fdw_scan_tlist or changing the assumption of make_tuple_from_result_row(),
and that way we can avoid two similar variable pointer in the
PgFdwDirectModifyState.

I am okay with currently also, but it adding a note somewhere about this
would be great. Also let keep this point open for the committer, if
committer feel this is good then lets go ahead with this.

Here are few other cosmetic changes:

1)

+ *
+ * 'target_rel' is either zero or the rangetable index of a target
relation.
+ * In the latter case this construncts FROM clause of UPDATE or USING
clause
+ * of DELETE by simply ignoring the target relation while deparsing the
given

Spell correction: - construncts

2)

+/*
+ * If either input is the target relation, get all the joinclauses.
+ * Otherwise extract conditions mentioning the target relation from
+ * the joinclauses.
+ */


space between joinclauses needed.

3)

+/*
+ * If UPDATE/DELETE on a join, create a RETURINING list used in the
+ * remote query.
+ */
+if (fscan->scan.scanrelid == 0)
+returningList = make_explicit_returning_list(resultRelation,
+ rel,
+ returningList);


Spell correction: RETURINING

I did above changes in the attached patch. Please have  a look once and
then I feel like this patch is ready for committer.

Thanks,
Rushabh Lathia


postgres-fdw-more-update-pushdown-v3_cosmetic_changes.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [patch] reorder tablespaces in basebackup tar stream for backup_label

2017-02-21 Thread Michael Banck

Hi,

currently, the backup_label and (I think) the tablespace_map files are
(by design) conveniently located at the beginning of the main tablespace
tarball when making a basebackup. However, (by accident or also by
design?) the main tablespace is the last tarball[1] to be streamed via
the BASE_BACKUP replication protocol command. 

For pg_basebackup, this is not a real problem, as either each tablespace
is its own tarfile (in tar format mode), or the files are extracted
anyway (in plain mode).

However, third party tools using the BASE_BACKUP command might want to
extract the backup_label, e.g. in order to figure out the START WAL
LOCATION. If they make a big tarball for the whole cluster potentially
including all external tablespaces, then the backup_label file is
somewhere in the middle of it and it takes a long time for tar to
extract it.

So I am proposing the attached patch, which sends the base tablespace
first, and then all the other external tablespaces afterwards, thus
having base_backup be the first file in the tar in all cases. Does
anybody see a problem with that?


Michael

[1] Chapter 52.3 of the documentation says "one or more CopyResponse
results will be sent, one for the main data directory and one for each
additional tablespace other than pg_default and pg_global.", which makes
it sound like the main data directory is first, but in my testing, this
is not the case.

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 09ecc15..a5a96b7 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -230,10 +230,10 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 		else
 			statrelpath = pgstat_stat_directory;
 
-		/* Add a node for the base directory at the end */
+		/* Add a node for the base directory at the beginning */
 		ti = palloc0(sizeof(tablespaceinfo));
 		ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
-		tablespaces = lappend(tablespaces, ti);
+		tablespaces = lcons(ti, tablespaces);
 
 		/* Send tablespace header */
 		SendBackupHeader(tablespaces);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] powerpc(32) point/polygon regression failures on Debian Jessie

2017-02-21 Thread Christoph Berg

Re: To Tom Lane 2017-02-20 <20170220161556.5ukosuj5o572b...@msg.credativ.de>
> I was compiling the program on jessie and on sid, and running the
> jessie binary on sid made it output the same as the sid binary, so the
> difference isn't in the binary, but in some system library.

Fwiw, the problem will be fixed in Jessie's glibc by backporting this update:

2015-02-12  Joseph Myers  

[BZ #17964]
* sysdeps/powerpc/fpu/e_sqrt.c (__slow_ieee754_sqrt): Use
__builtin_fma instead of relying on contraction of a * b + c.

https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/?h=jessie=b26c084f6eba0057b1cd93e0caf424a1d06bd97e

(Upstream it's probably one of these, didn't dig deeper:
https://sourceware.org/git/?p=glibc.git=search=HEAD=commit=__builtin_fma)

Thanks for the input,
Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-02-21 Thread Mithun Cy

Thanks, Amit

On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila  wrote:
> How will high and lowmask calculations work in this new strategy?
> Till now they always work on doubling strategy and I don't see you
> have changed anything related to that code.  Check below places.

It is important that the mask has to be (2^x) -1, if we have to retain
the same hash map function. So mask variables will take same values as
before. Only place I think we need change is  _hash_metapinit();
unfortunately, I did not test for the case where we build the hash
index on already existing tuples. Now I have fixed in the latest
patch.

> Till now, we have worked hard for not changing the page format in a
> backward incompatible way, so it will be better if we could find some
> way of doing this without changing the meta page format in a backward
> incompatible way.

We are not adding any new variable/ deleting some, we increase the
size of hashm_spares and hence mapping functions should be adjusted.
The problem is the block allocation, and its management is based on
the fact that all of the buckets(will be 2^x in number) belonging to a
particular split-point is allocated at once and together. The
hashm_spares is used to record those allocations and that will be used
further by map functions to reach a particular block in the file. If
we want to change the way we allocate the buckets then hashm_spares
will change and hence mapping function. So I do not think we can avoid
incompatibility issue.

One thing I can think of is if we can increase the hashm_version of
hash index; then for old indexes, we can continue to use doubling
method and its mapping. For new indexes, we can use new way as above.

Have you considered to store some information in
> shared memory based on which we can decide how much percentage of
> buckets are allocated in current table half?  I think we might be able
> to construct this information after crash recovery as well.

I think all of above data has to be persistent. I am not able to
understand what should be/can be stored in shared buffers. Can you
please correct me if I am wrong?

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

expand_hashbucket_efficiently_02
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-02-21 Thread Ashutosh Sharma

Thanks for reporting it. This is because of incorrect data typecasting.
Attached is the patch that fixes this issue.

On Tue, Feb 21, 2017 at 2:58 PM, Mithun Cy 
wrote:

> On Fri, Feb 10, 2017 at 1:06 AM, Robert Haas 
> wrote:
>
> > Alright, committed with a little further hacking.
>
> I did pull the latest code, and tried
> Test:
> 
> create table t1(t int);
> create index i1 on t1 using hash(t);
> insert into t1 select generate_series(1, 1000);
>
> postgres=# SELECT spares FROM hash_metapage_info(get_raw_page('i1', 0));
>spares
> 
> 
>  {0,0,0,1,9,17,33,65,-127,1,1,0,1,-1,-1,-4,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0}
>
> spares are showing negative numbers; I think the wrong type has been
> chosen, seems it is rounding at 127, spares are defined as following
> uint32 hashm_spares[HASH_MAX_SPLITPOINTS]; /* spare pages before each
> splitpoint */
>
> it should be always positive.
>
> --
> Thanks and Regards
> Mithun C Y
> EnterpriseDB: http://www.enterprisedb.com
>


typecast_fix.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-02-21 Thread Mithun Cy

On Fri, Feb 10, 2017 at 1:06 AM, Robert Haas  wrote:

> Alright, committed with a little further hacking.

I did pull the latest code, and tried
Test:

create table t1(t int);
create index i1 on t1 using hash(t);
insert into t1 select generate_series(1, 1000);

postgres=# SELECT spares FROM hash_metapage_info(get_raw_page('i1', 0));
   spares

 {0,0,0,1,9,17,33,65,-127,1,1,0,1,-1,-1,-4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}

spares are showing negative numbers; I think the wrong type has been
chosen, seems it is rounding at 127, spares are defined as following
uint32 hashm_spares[HASH_MAX_SPLITPOINTS]; /* spare pages before each
splitpoint */

it should be always positive.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [doc fix] Really trivial fix for BRIN documentation

2017-02-21 Thread Simon Riggs

On 21 February 2017 at 06:34, Tsunakawa, Takayuki
 wrote:
> Hello,
>
> This is just a correction from "index" to "table".  I was a bit confused when 
> I first read this part.

Pushed, but using "heap" rather than "table", for clarity. Thanks for the patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

88 matches

Mail list logo