Re: proposal - plpgsql unique statement id

2019-01-17 Thread Peter Eisentraut
On 04/01/2019 15:07, Pavel Stehule wrote:
> pá 4. 1. 2019 v 13:58 odesílatel Peter Eisentraut
>  > napsal:
> 
> On 13/11/2018 14:35, Pavel Stehule wrote:
> > I am try to enhance plpgsql_check about profiler functionality and I
> > have to solve how to identify every plpgsql statement across different
> > sessions. There is lineno, but surely it should not be unique. I
> propose
> > introduce stmtid to every statement structure. This number will be
> > unique inside function.
> 
> That seems reasonable enough, although I wouldn't use 0 as a valid
> stmtid.
> 
> done

Related, should stmtid be unsigned int rather than signed int?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: using expression syntax for partition bounds

2019-01-17 Thread Peter Eisentraut
On 16/01/2019 08:41, Amit Langote wrote:
> OK, will change it back to partition_bound_expr.  Removing "bound" from it
> makes the term ambiguous?

Yeah, let's leave it in.

> How about the following note in the documentation:
> 
> +  Although volatile expressions such as
> +  CURRENT_TIMESTAMP can be used
> +  for this, be careful when using them, because
> +  PostgreSQL will evaluate them only once
> +  when adding the partition.

I don't think we have to phrase it in this warning way.  Just say that
volatile expressions are evaluated at the time of the DDL statement.

> Sorry but I'm not sure how or what I would test about this.  Maybe, just
> add a test in create_table.sql/alter_table.sql that shows that using
> volatile expression doesn't cause an error?

Possibilities: Create a range partition with current_timestamp as the
upper bound and then in a separate transaction insert current_timestamp
and have it appear in the default partition.  Or create list partition
with session_user as one partition's value and then insert session_user
and have it appear in that table.  Or something like those.

>> I think that needs more refinement.  In v8, the following errors
>>
>> CREATE TABLE t2 ( a name COLLATE "POSIX" ) PARTITION BY RANGE (a);
>> CREATE TABLE t2a PARTITION OF t2 FOR VALUES FROM (name 'foo') TO (name
>> 'xyz');
>> ERROR:  collation of partition bound value for column "a" does not match
>> partition key collation "POSIX"
>>
>> The problem here is that the "name" type has a collation that is neither
>> the one of the column nor the default collation.  We can allow that.
> 
> So, should the "name" type's collation should simply be discarded in favor
> of "POSIX" that's being used for partitioning?

In that specific case, yes, I think so.

>> What we don't want is someone writing an explicit COLLATE clause.  I
>> think we just need to check that there is no top-level COLLATE clause.
>> This would then sort of match the logic parse_collate.c for combining
>> collations.
> 
> Maybe I'm missing something, but isn't it OK to allow the COLLATE clause
> as long as it specifies the matching collation as the parent?

Yes, that should be OK.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Protect syscache from bloating with negative cache entries

2019-01-17 Thread Kyotaro HORIGUCHI
Hello.

At Fri, 18 Jan 2019 11:46:03 +1300, Gavin Flower 
 wrote in 
<4e62e6b7-0ffb-54ae-3757-5583fcca3...@archidevsys.co.nz>
> On 18/01/2019 08:48, Bruce Momjian wrote:
> > On Thu, Jan 17, 2019 at 11:33:35AM -0500, Robert Haas wrote:
> >> The flaw in your thinking, as it seems to me, is that in your concern
> >> for "the likelihood that cache flushes will simply remove entries
> >> we'll soon have to rebuild," you're apparently unwilling to consider
> >> the possibility of workloads where cache flushes will remove entries
> >> we *won't* soon have to rebuild.  Every time that issue gets raised,
> >> you seem to blow it off as if it were not a thing that really happens.
> >> I can't make sense of that position.  Is it really so hard to imagine
> >> a connection pooler that switches the same connection back and forth
> >> between two applications with different working sets?  Or a system
> >> that keeps persistent connections open even when they are idle?  Do
> >> you really believe that a connection that has not accessed a cache
> >> entry in 10 minutes still derives more benefit from that cache entry
> >> than it would from freeing up some memory?
> > Well, I think everyone agrees there are workloads that cause undesired
> > cache bloat.  What we have not found is a solution that doesn't cause
> > code complexity or undesired overhead, or one that >1% of users will
> > know how to use.
> >
> > Unfortunately, because we have not found something we are happy with,
> > we
> > have done nothing.  I agree LRU can be expensive.  What if we do some
> > kind of clock sweep and expiration like we do for shared buffers?  I

So, it doesn't use LRU but a kind of clock-sweep method. If it
finds the size is about to exceed the threshold by
resiz(doubl)ing when the current hash is filled up, it tries to
trim away the entries that are left for a duration corresponding
to usage count. This is not a hard limit but seems to be a good
compromise.

> > think the trick is figuring how frequently to do the sweep.  What if
> > we
> > mark entries as unused every 10 queries, mark them as used on first
> > use,
> > and delete cache entries that have not be used in the past 10 queries.

As above, it tires pruning at every resizing time. So this adds
complexity to the frequent paths only by setting last accessed
time and incrementing access counter. It scans the whole hash at
resize time but it doesn't add much comparing to resizing itself.

> If you take that approach, then this number should be configurable. 
> What if I had 12 common queries I used in rotation?

This basically has two knobs. The minimum hash size to do the
pruning and idle time before reaping unused entries, per
catcache.

> The ARM3 processor cache logic was to simply eject an entry at random,
> as the obviously Acorn felt that the silicon required to have a more
> sophisticated algorithm would reduce the cache size too much!
> 
> I upgraded my Acorn Archimedes that had an 8MHZ bus, from an 8MHz ARM2
> to a 25MZ ARM3. that is a clock rate improvement of about 3 times. 
> However BASIC programs ran about 7 times faster, which I put down to
> the ARM3 having a cache.
> 
> Obviously for Postgres this is not directly relevant, but I think it
> suggests that it may be worth considering replacing cache items at
> random.  As there are no pathological corner cases, and the logic is
> very simple.

Memory was expensive than nowadays by.. about 10^3 times?  An
obvious advantage of random reaping is requiring less silicon. I
think we don't need to be so stingy but perhaps clock-sweep is at
the maximum we can pay.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Python versions (was Re: RHEL 8.0 build)

2019-01-17 Thread Peter Eisentraut
On 16/01/2019 17:30, Tom Lane wrote:
>> The following are listed but don't affect any other tests, so I didn't
>> include them:
> 
>> BISON
>> DTRACE
>> DTRACEFLAGS
>> FLEX
>> XML2_CONFIG
> 
> (slightly confused ...)  Surely XML2_CONFIG feeds into what we choose for
> CPPFLAGS?  If that doesn't matter, why not?

Committed with that addition.

(I think I had thought that XML2_CONFIG only affects separate variables
like ICU_CFLAGS etc. but that's not the case.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: Libpq support to connect to standby server as priority

2019-01-17 Thread Tsunakawa, Takayuki
From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com]
> As for prefer-standby/prefer-read, if host parameter specifies host1,host2
> in this order, and host1 is the primary with
> default_transaction_read_only=true, does the app get a connection to host1?
> I want to connect to host2 (standby) only if host1 is down.

Oops, reverse -- I wanted to say "I want to connect to host1 (primary) only if 
host2 is down."

Regards
Takayuki Tsunakawa



Re: pg_dump multi VALUES INSERT

2019-01-17 Thread Surafel Temesgen
On Fri, Jan 18, 2019 at 7:14 AM David Rowley 
wrote:

> On Fri, 18 Jan 2019 at 01:15, Surafel Temesgen 
> wrote:
> > The attache patch use your method mostly
>
> I disagree with the "mostly" part.  As far as I can see, you took the
> idea and then made a series of changes to completely break it.  For
> bonus points, you put back my comment change to make it incorrect
> again.
>
> Here's what I got after applying your latest patch:
>
> $ pg_dump --table=t --inserts --rows-per-insert=4 postgres
>
> [...]
> INSERT INTO public.t VALUES (1);
> )
> INSERT INTO public.t VALUES (, ( 2);
> )
> INSERT INTO public.t VALUES (, ( 3);
> )
> INSERT INTO public.t VALUES (, ( 4);
> );
> INSERT INTO public.t VALUES (5);
> )
> INSERT INTO public.t VALUES (, ( 6);
> )
> INSERT INTO public.t VALUES (, ( 7);
> )
> INSERT INTO public.t VALUES (, ( 8);
> );
> INSERT INTO public.t VALUES (9);
> )
> ;
>
> I didn't test, but I'm pretty sure that's not valid INSERT syntax.
>

this happen because i don't disallow the usage of --inserts  and
--rows-per-insert
option together.it should be error out in those case.i correct it in
attached patch


> I'd suggest taking my changes and doing the plumbing work to tie the
> rows_per_statement into the command line arg instead of how I left it
> hardcoded as 3.
>
> >> +When using --inserts, this allows the maximum
> number
> >> +of rows per INSERT statement to be
> specified.
> >> +This setting defaults to 1.
> >>
> > i change it too except "This setting defaults to 1"  because it doesn't
> have default value.
> > 1 row per statement means --inserts option .
>
> If it does not default to 1 then what happens when the option is not
> specified


if --inserts option specified it use single values insert statement
otherwise
it use COPY command

regards
Surafel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 9e0bb93f08..4195fb81a2 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -775,6 +775,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --rows-per-insert
+  
+   
+When using --rows-per-insert, this allows the maximum number
+of rows per INSERT statement to be specified.
+   
+  
+ 
+
  
   --load-via-partition-root
   
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 4a2e122e2d..73a243ecb0 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -72,6 +72,7 @@ typedef struct _restoreOptions
 	int			dropSchema;
 	int			disable_dollar_quoting;
 	int			dump_inserts;
+	int			dump_inserts_multiple;
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
@@ -144,6 +145,7 @@ typedef struct _dumpOptions
 	/* flags for various command-line long options */
 	int			disable_dollar_quoting;
 	int			dump_inserts;
+	int			dump_inserts_multiple;
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 0e129f9654..7a2a9789f9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -313,6 +313,7 @@ main(int argc, char **argv)
 	int			plainText = 0;
 	ArchiveFormat archiveFormat = archUnknown;
 	ArchiveMode archiveMode;
+	char   *p;
 
 	static DumpOptions dopt;
 
@@ -359,6 +360,7 @@ main(int argc, char **argv)
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"if-exists", no_argument, _exists, 1},
 		{"inserts", no_argument, _inserts, 1},
+		{"rows-per-insert", required_argument, NULL, 8},
 		{"lock-wait-timeout", required_argument, NULL, 2},
 		{"no-tablespaces", no_argument, , 1},
 		{"quote-all-identifiers", no_argument, _all_identifiers, 1},
@@ -557,6 +559,27 @@ main(int argc, char **argv)
 dosync = false;
 break;
 
+			case 8:			/* inserts row number */
+errno = 0;
+dopt.dump_inserts_multiple = strtol(optarg, , 10);
+if (p == optarg || *p != '\0')
+{
+	write_msg(NULL, "argument of --rows-per-insert must be a number\n");
+	exit_nicely(1);
+}
+if (errno == ERANGE)
+{
+	write_msg(NULL, "argument of --rows-per-insert exceeds integer range.\n");
+	exit_nicely(1);
+}
+if (dopt.dump_inserts_multiple <= 0)
+{
+	write_msg(NULL, "argument of --rows-per-insert must be positive number\n");
+	exit_nicely(1);
+}
+
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -580,6 +603,12 @@ main(int argc, char **argv)
 		exit_nicely(1);
 	}
 
+	if (dopt.dump_inserts && dopt.dump_inserts_multiple)
+	{
+		write_msg(NULL, "options --inserts and --rows-per-insert cannot be used together\n");
+		exit_nicely(1);
+	}
+
 	/* --column-inserts implies --inserts */
 	if (dopt.column_inserts)
 		dopt.dump_inserts = 1;
@@ -607,8 +636,9 @@ main(int argc, char **argv)
 	if (dopt.if_exists && !dopt.outputClean)
 		exit_horribly(NULL, "option --if-exists requires 

Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Tatsuo Ishii
>> If you need some input from me regarding finding a primary node,
>> please say so.  While working on Pgpool-II project, I learned the
>> necessity in a hard way.
>>
>>
> I would really like to have a consistent way of doing this, and consistent
> terms for the connection parameters.
> 
> that said yes, I would like input from you.

Sure, no problem.

- Upon Pgpool-II starting up or recieving failover event or switch
  over event, primary node finding is executed.

- It repeats following until timeout parameter
  ("search_primary_node_timeout" is expired)

do until the timeout is expired
{
for all_live_backends
{
connect to the backend.
execute "SELECT pg_is_in_recovery()".

if it returns false, the we find the primary node. Assume
other backend as standbys and we are done.
disconnect to the backend
}
sleep 1 second;
}

If no primary node was found, all backends are regarded as standbys.

In addition to above, recent Pgpool-II versions does optional checking
to verify backend status, for example, finding a case where there
are two primary nodes.

- If there are two primaries, check the connectivity between each
  primary and standbys using pg_stat_wal_receiver() (so this can not
  be executed with PostgreSQL version 9.5 or before)

- If there's a primary (call it "A") which is not connected to any of
  standbys while there's a primary (call it "B") which is connected to
  all of standbys, then A is regarded as a "false primary" (and
  Pgpool-II detaches it from the streaming replication cluster managed
  by Pgpool-II if detach_false_primary is enabled).

See Pgpool-II manual "detach_false_primary" section in
http://tatsuo-ishii.github.io/pgpool-II/current/runtime-config-failover.html 
for more details.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



RE: Libpq support to connect to standby server as priority

2019-01-17 Thread Tsunakawa, Takayuki
From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
> IMO, if we try to use only pg_is_in_recovery() only to decide to connect,
> we may not
> support all the target_session_attrs that are possible. how about using
> both to decide?

I favor adding a new parameter like target_server_type whose purpose is to 
select the server role.  That aligns better with the PgJDBC's naming, which 
conveys consistency to PostgreSQL users.  Again, the original desire should 
have been to connect to a standby to eliminate the burdon on the primary, where 
the primary may be read-only by default and only a limited group of users are 
allowed to write to the database.

I don't know what kind of realistic use cases there are that request read-only 
session in the logical replication configuration.  I think we can just leave 
target_session_attrs as what it is now in PostgreSQL 11, for compatibility and 
possibly future new use cases.


> Master/read-write -- recovery = false and default_transaction_read_only
> = false
> Standby/read-only -- recovery = true
> prefer-standby/prefer-read -- recovery = true or
> default_transaction_read_only = true
> any -- Nothing to be verified
> 
> 
> I feel above verifications can cover for both physical and logical
> replication.

As for prefer-standby/prefer-read, if host parameter specifies host1,host2 in 
this order, and host1 is the primary with default_transaction_read_only=true, 
does the app get a connection to host1?  I want to connect to host2 (standby) 
only if host1 is down.


Regards
Takayuki Tsunakawa



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
BTW, if you're wondering why curculio is still failing the pgbench
test, all is explained here:

https://man.openbsd.org/srandom

Or at least most is explained there.  While curculio is unsurprisingly
failing all four seeded_random tests, when I try it locally on an
OpenBSD 6.4 installation, only the uniform, exponential, and gaussian
cases reliably "fail".  zipfian usually doesn't.  It looks like the
zipfian code almost always produces 4000 regardless of the seed value,
though occasionally it produces 4001.  Bad parameters for that
algorithm, perhaps?

regards, tom lane



Re: Delay locking partitions during INSERT and UPDATE

2019-01-17 Thread sho kato
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Hi,

Increasing the number of clients, I benchmarked with a table partitioned into 
1k partition.
I confirmed that this patch improve performance by 10 times or more.

master (commit: 90525d7b4e Date:   Tue Jan 15 12:19:21 2019 -0800)

cpu:
Xeon(R) CPU E5-2667 v3 * 2

setup:
create table history(aid int, delta int, mtime timestamp without time zone) 
partition by range(aid);
\o /dev/null
select 'create table history_' || x || ' partition of history for values from(' 
|| x ||') to(' || x+1 || ')' from generate_series(1, 1000) x;
\gexec

benchmark.sql:
\set aid random(1, 1000)
\set delta random(-5000, 5000)
INSERT INTO history VALUES (:aid, :delta, CURRENT_TIMESTAMP);

command line:
pgbench -d testdb -f benchmark.sql -c number_of_clients -T 60 -r -n

Results:

 clients | tps_patched | tps_unpatched | tps_unpatched / tps_patched 
-+-+---+-
   1 |8890 |   841 |  11
   2 |   17484 |  1470 |  12
   4 |   29218 |  2474 |  12
   8 |   48789 |  3876 |  13
  16 |   68794 |  4030 |  17
  32 |   69550 |  2888 |  24
  64 |   71196 |  2555 |  28
 128 |   71739 |  2295 |  31
 256 |   66548 |  2105 |  32

I wonder why performance does not increase much when the number of clients 
exceeds 16.
Even though it is caused by competition of lightweight locks I thought 16 
clients are small.


Also, I did make installcheck world, but test partition_prune failed.
However, this test case failed even before applying a patch, so this patch is 
not a problem.
One of the results is as follows.

 explain (analyze, costs off, summary off, timing off) execute ab_q1 (2, 2, 3);
-   QUERY PLAN
--
+  QUERY PLAN  
+--
  Append (actual rows=0 loops=1)
-   Subplans Removed: 6
->  Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
- Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
+ Filter: ((a >= 2) AND (a <= 2) AND (b <= 3))
->  Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
- Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
+ Filter: ((a >= 2) AND (a <= 2) AND (b <= 3))
->  Seq Scan on ab_a2_b3 (actual rows=0 loops=1)
- Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
-(8 rows)
+ Filter: ((a >= 2) AND (a <= 2) AND (b <= 3))
+(7 rows)

regards,
sho kato

Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Haribabu Kommi
On Fri, Jan 18, 2019 at 2:34 PM Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Laurenz Albe [mailto:laurenz.a...@cybertec.at]
> > I think that transaction_read_only is good.
> >
> > If it is set to false, we are sure to be on a replication primary or
> > stand-alone server, which is enough to know for the load balancing use
> case.
>
> As Tatsuo-san said, setting default_transaction_read_only leads to a
> misjudgement of the primary.
>
>
> > I deem it unlikely that someone will set default_transaction_read_only to
> > FALSE and then complain that the feature is not working as expected, but
> > again
> > I cannot prove that claim.
>
> I wonder what default_transaction_read_only exists for.  For maing the
> database by default and allowing only specific users to write to the
> database with "CREATE/ALTER USER SET default_transaction_read_only = false"?
>

default_transaction_read_only is a user settable parameter, even if it set
as true by default,
any user can change it later. Deciding server type based on this whether it
supports read-write
or read-only can go wrong, as the user can change it later.


> I'm sorry to repeat myself, but anyway, I think we need a method to
> connect to a standby as the original desire, because the primary instance
> may be read only by default while only limited users update data.  That's
> for reducing the burdon on the primary and minimizing the impact on users
> who update data.  For example,
>
> * run data reporting on the standby
> * backup the database from the standby
> * cascade replication from the standby
>

IMO, if we try to use only pg_is_in_recovery() only to decide to connect,
we may not
support all the target_session_attrs that are possible. how about using
both to decide?

Master/read-write -- recovery = false and default_transaction_read_only =
false
Standby/read-only -- recovery = true
prefer-standby/prefer-read -- recovery = true or
default_transaction_read_only = true
any -- Nothing to be verified

I feel above verifications can cover for both physical and logical
replication.
we can decide what type of options that we can support? and also if we
don't want to rely on default_transaction_read_only user settable parameter,
we can add a new parameter that cannot be changed only with server restart?

Regards,
Haribabu Kommi
Fujitsu Australia


Simplify set of flags used by MyXactFlags

2019-01-17 Thread Michael Paquier
Hi all,

c5660e0 has introduced a new flag for MyXactFlags to restrict
two-phase commit from working with temporary objects, and as a matter
of fact XACT_FLAGS_ACCESSEDTEMPREL has been kept around to keep the
error handling message compatible with past versions, still it is
weird to keep both ACCESSEDTEMPNAMESPACE and ACCESSEDTEMPREL as the
former implies the latter, so attached is a cleanup patch for HEAD.

Keeping both messages makes the error handling at PREPARE time perhaps
a bit cleaner to make the difference about schema-level access or
table-level access, still I'd rather simplify the code and just only
keep the schema-level change as something we complain about.  Another
thing is that ACCESSEDTEMPREL is used in PreCommit_on_commit_actions()
to avoid the truncates of ON COMMIT DELETE ROWS if no temporary tables
have been accessed, still it would just mean that truncation would be
tried on empty tables for nothing even if say a function is created in
pg_temp.

Thoughts?
--
Michael
From 52c23f7a3f63dffa57d57ceeac7c1e2ef5ba68ad Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 16 Jan 2019 10:34:13 +0900
Subject: [PATCH] Simplify 2PC restriction handling for temporary objects

There are two flags used to control access to temporary tables and
access to the temporary namespace of a session, however the first
control flag is actually a concept included in the second.  This removes
the flag for temporary table tracking, keeping around only the one at
namespace level.
---
 src/backend/access/heap/heapam.c   |  4 ++--
 src/backend/access/transam/xact.c  | 19 ++-
 src/backend/commands/lockcmds.c|  2 +-
 src/backend/commands/tablecmds.c   |  2 +-
 src/include/access/xact.h  | 12 +++-
 src/test/regress/expected/temp.out | 14 +++---
 6 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9afbc8228d..dd54810052 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1144,7 +1144,7 @@ relation_open(Oid relationId, LOCKMODE lockmode)
 
 	/* Make note that we've accessed a temporary relation */
 	if (RelationUsesLocalBuffers(r))
-		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL;
+		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;
 
 	pgstat_initstats(r);
 
@@ -1194,7 +1194,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
 
 	/* Make note that we've accessed a temporary relation */
 	if (RelationUsesLocalBuffers(r))
-		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL;
+		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;
 
 	pgstat_initstats(r);
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 18467d96d2..0c20729c68 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2259,13 +2259,18 @@ PrepareTransaction(void)
 	/* NOTIFY will be handled below */
 
 	/*
-	 * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in
+	 * Don't allow PREPARE TRANSACTION if we've accessed a temporary object in
 	 * this transaction.  Having the prepared xact hold locks on another
 	 * backend's temp table seems a bad idea --- for instance it would prevent
 	 * the backend from exiting.  There are other problems too, such as how to
 	 * clean up the source backend's local buffers and ON COMMIT state if the
 	 * prepared xact includes a DROP of a temp table.
 	 *
+	 * Other objects types, like functions, operators or extensions, share the
+	 * same restriction as they should not be created, locked or dropped as
+	 * this can mess up with this session or even a follow-up session trying
+	 * to use the same temporary namespace.
+	 *
 	 * We must check this after executing any ON COMMIT actions, because they
 	 * might still access a temp relation.
 	 *
@@ -2273,18 +2278,6 @@ PrepareTransaction(void)
 	 * cases, such as a temp table created and dropped all within the
 	 * transaction.  That seems to require much more bookkeeping though.
 	 */
-	if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPREL))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has operated on temporary tables")));
-
-	/*
-	 * Similarly, PREPARE TRANSACTION is not allowed if the temporary
-	 * namespace has been involved in this transaction as we cannot allow it
-	 * to create, lock, or even drop objects within the temporary namespace
-	 * as this can mess up with this session or even a follow-up session
-	 * trying to use the same temporary namespace.
-	 */
 	if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE))
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 4cd7ec2d5f..bea99b59a2 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -108,7 +108,7 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid,
 	 */
 

Re: speeding up planning with partitions

2019-01-17 Thread Amit Langote
Tsunakawa-san,

On 2019/01/18 14:12, Tsunakawa, Takayuki wrote:
> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
>> Are you saying that, when using auto mode, all executions of the query
>> starting from 7th are slower than the first 5 executions?  That is, the
>> latency of creating and using a custom plan increases *after* a generic
>> plan is created and discarded on the 6th execution of the query?  If so,
>> that is inexplicable to me.
> 
> Isn't CheckCachedPlan() (and AcquireExecutorLocks() therein) called in every 
> EXECUTE after 6th one due to some unknow issue?

CheckCachedPlan is only called if choose_custom_plan() returns false
resulting in generic plan being created/reused.  With plan_cache_mode =
auto, I see it always returns true, because a custom plan containing a
single partition to scan is way cheaper than the generic plan.

> Does choose_custom_plan() always return true after 6th EXECUTE?

Yes.

Thanks,
Amit




Re: problems with foreign keys on partitioned tables

2019-01-17 Thread Amit Langote
On 2019/01/18 7:54, Alvaro Herrera wrote:
> On 2019-Jan-09, Amit Langote wrote:
> 
>> 1. Foreign keys of partitions stop working correctly after being detached
>> from the parent table
> 
>> This happens because the action triggers defined on the PK relation (pk)
>> refers to p as the referencing relation.  On detaching p1 from p, p1's
>> data is no longer accessible to that trigger.
> 
> Ouch.
> 
>> To fix this problem, we need create action triggers on PK relation
>> that refer to p1 when it's detached (unless such triggers already
>> exist which might be true in some cases).  Attached patch 0001 shows
>> this approach.
> 
> Hmm, okay.  I'm not in love with the idea that such triggers might
> already exist -- seems unclean.  We should remove redundant action
> triggers when we attach a table as a partition, no?

OK, I agree.  I have updated the patch to make things work that way.  With
the patch:

create table pk (a int primary key);
create table p (a int references pk) partition by list (a);

-- this query shows the action triggers on the referenced rel ('pk'), name
-- of the constraint that the trigger is part of and the foreign key rel
-- ('p', etc.)

select tgrelid::regclass as pkrel, c.conname as fkconname,
tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where
tgrelid = 'pk'::regclass and tgconstraint = c.oid;
 pkrel │ fkconname │ fkrel
───┼───┼───
 pk│ p_a_fkey  │ p
 pk│ p_a_fkey  │ p
(2 rows)

create table p1 (
   a int references pk,
   foreign key (a) references pk (a) ON UPDATE CASCADE ON DELETE CASCADE
DEFERRABLE,
   foreign key (a) references pk (a) MATCH FULL ON UPDATE CASCADE ON
DELETE CASCADE
) partition by list (a);

-- p1_a_fkey on 'p1' is equivalent to p_a_fkey on 'p', but they're not
-- attached yet

select tgrelid::regclass as pkrel, conname as fkconname,
tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where
tgrelid = 'pk'::regclass and tgconstraint = c.oid;
 pkrel │ fkconname  │ fkrel
───┼┼───
 pk│ p_a_fkey   │ p
 pk│ p_a_fkey   │ p
 pk│ p1_a_fkey  │ p1
 pk│ p1_a_fkey  │ p1
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey2 │ p1
 pk│ p1_a_fkey2 │ p1
(8 rows)

create table p11 (like p1, foreign key (a) references pk);

-- again, p11_a_fkey, p1_a_fkey, and p_a_fkey are equivalent

select tgrelid::regclass as pkrel, conname as fkconname,
tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where
tgrelid = 'pk'::regclass and tgconstraint = c.oid;
 pkrel │ fkconname  │ fkrel
───┼┼───
 pk│ p_a_fkey   │ p
 pk│ p_a_fkey   │ p
 pk│ p1_a_fkey  │ p1
 pk│ p1_a_fkey  │ p1
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey2 │ p1
 pk│ p1_a_fkey2 │ p1
 pk│ p11_a_fkey │ p11
 pk│ p11_a_fkey │ p11
(10 rows)


alter table p1 attach partition p11 for values in (1);

-- p1_a_fkey and p11_a_fkey merged, so triggers for the latter dropped

select tgrelid::regclass as pkrel, conname as fkconname,
tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where
tgrelid = 'pk'::regclass and tgconstraint = c.oid;
 pkrel │ fkconname  │ fkrel
───┼┼───
 pk│ p_a_fkey   │ p
 pk│ p_a_fkey   │ p
 pk│ p1_a_fkey  │ p1
 pk│ p1_a_fkey  │ p1
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey2 │ p1
 pk│ p1_a_fkey2 │ p1
(8 rows)

-- p_a_fkey and p1_a_fkey merged, so triggers for the latter dropped

alter table p attach partition p1 for values in (1);

select tgrelid::regclass as pkrel, conname as fkconname,
tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where
tgrelid = 'pk'::regclass and tgconstraint = c.oid;
 pkrel │ fkconname  │ fkrel
───┼┼───
 pk│ p_a_fkey   │ p
 pk│ p_a_fkey   │ p
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey2 │ p1
 pk│ p1_a_fkey2 │ p1
(6 rows)


alter table p detach partition p1;

-- p1_a_fkey needs its own triggers again

select tgrelid::regclass as pkrel, conname as fkconname,
tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where
tgrelid = 'pk'::regclass and tgconstraint = c.oid;
 pkrel │ fkconname  │ fkrel
───┼┼───
 pk│ p_a_fkey   │ p
 pk│ p_a_fkey   │ p
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey2 │ p1
 pk│ p1_a_fkey2 │ p1
 pk│ p1_a_fkey  │ p1
 pk│ p1_a_fkey  │ p1
(8 rows)

alter table p1 detach partition p11;

-- p11_a_fkey needs its own triggers again

select tgrelid::regclass as pkrel, conname as fkconname,
tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where
tgrelid = 'pk'::regclass and tgconstraint = c.oid;
 pkrel │ fkconname  │ fkrel
───┼┼───
 pk│ p_a_fkey   │ p
 pk│ p_a_fkey   │ p
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey1 │ p1
 pk│ p1_a_fkey2 │ p1
 pk│ p1_a_fkey2 │ p1
 pk│ p1_a_fkey  │ p1
 pk│ 

RE: speeding up planning with partitions

2019-01-17 Thread Tsunakawa, Takayuki
Imai-san,

From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> On 2019/01/09 11:08, Imai, Yoshikazu wrote:
> > I wonder why force_custom_plan is faster than auto after applied the patch.
> >
> > When we use PREPARE-EXECUTE, a generic plan is created and used if its
> cost is
> > cheaper than creating and using a custom plan with plan_cache_mode='auto',
> > while a custom plan is always created and used with
> plan_cache_mode='force_custom_plan'.
> > So one can think the difference in above results is because of creating
> or
> > using a generic plan.
> >
> > I checked how many times a generic plan is created during executing pgbench
> and
> > found a generic plan is created only once and custom plans are created
> at other
> > times with plan_cache_mode='auto'. I also checked the time of creating
> a
> > generic plan, but it didn't take so much(250ms or so with 4096 partitions).
> So
> > the time of creating a generic plan does not affect the performance.
> >
> > Currently, a generic plan is created at sixth time of executing EXECUTE
> query.
> > I changed it to more later (ex. at 400,000th time of executing EXECUTE
> query on
> > master with 4096 partitions, because 7000TPS x 60sec=420,
> transactions are
> > run while executing pgbench.), then there are almost no difference between
> auto
> > and force_custom_plan. I think that creation of a generic plan affects
> the time
> > of executing queries which are ordered after creating generic plan.
> >
> > If my assumption is right, we can expect some additional process is
> occurred at
> > executing queries ordered after creating a generic plan, which results
> in auto is
> > slower than force_custom_plan because of additional process. But looking
> at
> > above results, on master with 4096 partitions, auto is faster than
> force_custom_plan.
> > So now I am confused.
> >
> > Do you have any ideas what does affect the performance?
> 
> Are you saying that, when using auto mode, all executions of the query
> starting from 7th are slower than the first 5 executions?  That is, the
> latency of creating and using a custom plan increases *after* a generic
> plan is created and discarded on the 6th execution of the query?  If so,
> that is inexplicable to me.

Isn't CheckCachedPlan() (and AcquireExecutorLocks() therein) called in every 
EXECUTE after 6th one due to some unknow issue?
Does choose_custom_plan() always return true after 6th EXECUTE?

Regards
Takayuki Tsunakawa




Re: pg_dump multi VALUES INSERT

2019-01-17 Thread David G. Johnston
On Thu, Jan 17, 2019 at 5:15 AM Surafel Temesgen  wrote:
> On Fri, Jan 4, 2019 at 3:08 PM David Rowley  
> wrote:
>>
>> On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen  wrote:
>>
>>
>> 2. Is --insert-multi a good name? What if they do --insert-multi=1?
>> That's not very "multi".  Is --rows-per-insert better?
>>
>
> --rows-per-insert is better for me .

Some thoughts/suggestions:

+ int dump_inserts_multiple;

The option name uses rows, seems like this should mirror that and be
named "dump_inserts_max_rows"

+ 
+  --rows-per-insert
+  
+   
+When using --rows-per-insert, this allows
the maximum number
+of rows per INSERT statement to be specified.
+   
+  
+ 

"When using ..." - no other
option description uses this redundant language and this should not
either.  Just say what it does.

This specifies the maximum number of rows (default 1) that will be
attached to each INSERT command generated by the
--inserts or --column-inserts
options; exactly one of which must be specified as well. (see my note
at the end)

+ printf(_("  --rows-per-insertnumber of row per INSERT
command\n"));

(maximum?) number of row[s] per INSERT command

+ qr/\Qpg_dump: option --on-conflict-do-nothing requires option
--inserts , --rows-per-insert or --column-inserts\E/,
+ 'pg_dump: option --on-conflict-do-nothing requires option --inserts
, --rows-per-insert or --column-inserts');

You don't put spaces on both sides of the comma, just after; and add a
comma before the "or" (I think...not withstanding the below)

I suggest we require that --rows-per-insert be dependent upon exactly
one of --inserts or --column-inserts being set and not let it be set
by itself (in which case the original message for
--on-conflict-do-nothing is OK).

David J.



Re: Fix function name in commet in vacuumlazy.c

2019-01-17 Thread Michael Paquier
On Fri, Jan 18, 2019 at 12:50:15PM +0900, Masahiko Sawada wrote:
> Attached small patch fixes the function name heap_vacuum_rel in the comment.
> 
> s/vacuum_heap_rel/heap_vacuum_rel/

Fixed, thanks.
--
Michael


signature.asc
Description: PGP signature


Re: Pluggable Storage - Andres's take

2019-01-17 Thread Amit Khandekar
On Tue, 15 Jan 2019 at 17:58, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Tue, Jan 15, 2019 at 10:52 AM Amit Khandekar  
> > wrote:
> >
> > Need to bump K_VERS_MINOR as well.
>
> I've bumped it up, but somehow this change escaped the previous version. Now
> should be there, thanks!
>
> > On Mon, 14 Jan 2019 at 18:36, Amit Khandekar  wrote:
> > > +static void _selectTableAccessMethod(ArchiveHandle *AH, const char
> > > *tablespace);
> > > tablespace => tableam
> >
> > This is yet to be addressed.
>
> Fixed.

Thanks, the patch looks good to me. Of course there's the other thread
about ArchiveEntry arguments which may alter this patch, but
otherwise, I have no more comments on this patch.

>
> Also I guess another attached patch should address the psql part, namely
> displaying a table access method with \d+ and possibility to hide it with a
> psql variable (HIDE_TABLEAM, but I'm open for suggestion about the name).

Will have a look at this one.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: ArchiveEntry optional arguments refactoring

2019-01-17 Thread Amit Khandekar
On Wed, 16 Jan 2019 at 17:45, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> Hi,
>
> During the discussion in [1] an idea about refactoring ArchiveEntry was
> suggested. The reason is that currently this function has significant number 
> of
> arguments that are "optional", and every change that has to deal with it
> introduces a lot of useless diffs. In the thread, mentioned above, such an
> example is tracking current table access method, and I guess "Remove WITH 
> OIDS"
> commit 578b229718e is also similar.
>
> Proposed idea is to refactor out all/optional arguments into a separate data
> structure, so that adding/removing a new argument wouldn't change that much of
> unrelated code. Then for every invocation of ArchiveEntry this structure needs
> to be prepared before the call, or as Andres suggested:
>
> ArchiveEntry((ArchiveArgs){.tablespace = 3,
>.dumpFn = somefunc,
>...});

I didn't know we could do it this way. I thought we would have to
declare a variable and have to initialize fields with non-const values
separately. This looks nice. We could even initialize fields with
non-const values. +1 from me.

I think, we could use the same TocEntry structure as parameter, rather
than a new structure. Most of the arguments already resemble fields of
this structure. Also, we could pass pointer to that structure :

 ArchiveEntry( &(TocEntry){.tablespace = 3,
   .dumpFn = somefunc,
   ...});



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: draft patch for strtof()

2019-01-17 Thread Andrew Gierth
This one builds ok on appveyor with at least three different VS
versions. Though I've not tried the exact combination of commands
used by cfbot... yet.

-- 
Andrew (irc:RhodiumToad)

diff --git a/configure b/configure
index 06fc3c6835..e3176e24e9 100755
--- a/configure
+++ b/configure
@@ -15802,6 +15802,19 @@ esac
 
 fi
 
+ac_fn_c_check_func "$LINENO" "strtof" "ac_cv_func_strtof"
+if test "x$ac_cv_func_strtof" = xyes; then :
+  $as_echo "#define HAVE_STRTOF 1" >>confdefs.h
+
+else
+  case " $LIBOBJS " in
+  *" strtof.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS strtof.$ac_objext"
+ ;;
+esac
+
+fi
+
 
 
 case $host_os in
diff --git a/configure.in b/configure.in
index 4efb912c4d..bdaab717d7 100644
--- a/configure.in
+++ b/configure.in
@@ -1703,6 +1703,7 @@ AC_REPLACE_FUNCS(m4_normalize([
 	strlcat
 	strlcpy
 	strnlen
+	strtof
 ]))
 
 case $host_os in
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 117ded8d1d..31f53878a2 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -104,13 +104,39 @@ is_infinite(double val)
 
 /*
  *		float4in		- converts "num" to float4
+ *
+ * Note that this code now uses strtof(), where it used to use strtod().
+ *
+ * The motivation for using strtof() is to avoid a double-rounding problem:
+ * for certain decimal inputs, if you round the input correctly to a double,
+ * and then round the double to a float, the result is incorrect in that it
+ * does not match the result of rounding the decimal value to float directly.
+ *
+ * One of the best examples is 7.038531e-26:
+ *
+ * 0xAE43FDp-107 = 7.03853069185120912085...e-26
+ *  midpoint   7.038531022281...e-26
+ * 0xAE43FEp-107 = 7.03853130814879132477...e-26
+ *
+ * making 0xAE43FDp-107 the correct float result, but if you do the conversion
+ * via a double, you get
+ *
+ * 0xAE43FD.7FF8p-107 = 7.038530907487...e-26
+ *   midpoint   7.038530964884...e-26
+ * 0xAE43FD.8000p-107 = 7.038531022281...e-26
+ * 0xAE43FD.8008p-107 = 7.038531137076...e-26
+ *
+ * so the value rounds to the double exactly on the midpoint between the two
+ * nearest floats, and then rounding again to a float gives the incorrect
+ * result of 0xAE43FEp-107.
+ *
  */
 Datum
 float4in(PG_FUNCTION_ARGS)
 {
 	char	   *num = PG_GETARG_CSTRING(0);
 	char	   *orig_num;
-	double		val;
+	float		val;
 	char	   *endptr;
 
 	/*
@@ -135,7 +161,7 @@ float4in(PG_FUNCTION_ARGS)
 		"real", orig_num)));
 
 	errno = 0;
-	val = strtod(num, );
+	val = strtof(num, );
 
 	/* did we not see anything that looks like a double? */
 	if (endptr == num || errno != 0)
@@ -143,14 +169,14 @@ float4in(PG_FUNCTION_ARGS)
 		int			save_errno = errno;
 
 		/*
-		 * C99 requires that strtod() accept NaN, [+-]Infinity, and [+-]Inf,
+		 * C99 requires that strtof() accept NaN, [+-]Infinity, and [+-]Inf,
 		 * but not all platforms support all of these (and some accept them
 		 * but set ERANGE anyway...)  Therefore, we check for these inputs
-		 * ourselves if strtod() fails.
+		 * ourselves if strtof() fails.
 		 *
 		 * Note: C99 also requires hexadecimal input as well as some extended
 		 * forms of NaN, but we consider these forms unportable and don't try
-		 * to support them.  You can use 'em if your strtod() takes 'em.
+		 * to support them.  You can use 'em if your strtof() takes 'em.
 		 */
 		if (pg_strncasecmp(num, "NaN", 3) == 0)
 		{
@@ -195,8 +221,17 @@ float4in(PG_FUNCTION_ARGS)
 			 * precision).  We'd prefer not to throw error for that, so try to
 			 * detect whether it's a "real" out-of-range condition by checking
 			 * to see if the result is zero or huge.
+			 *
+			 * Use isinf() rather than HUGE_VALF on VS2013 because it generates
+			 * a spurious overflow warning for -HUGE_VALF.
 			 */
-			if (val == 0.0 || val >= HUGE_VAL || val <= -HUGE_VAL)
+			if (val == 0.0 ||
+#if defined(_MSC_VER) && (_MSC_VER < 1900)
+isinf(val)
+#else
+(val >= HUGE_VALF || val <= -HUGE_VALF)
+#endif
+)
 ereport(ERROR,
 		(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 		 errmsg("\"%s\" is out of range for type real",
@@ -232,13 +267,7 @@ float4in(PG_FUNCTION_ARGS)
  errmsg("invalid input syntax for type %s: \"%s\"",
 		"real", orig_num)));
 
-	/*
-	 * if we get here, we have a legal double, still need to check to see if
-	 * it's a legal float4
-	 */
-	check_float4_val((float4) val, isinf(val), val == 0);
-
-	PG_RETURN_FLOAT4((float4) val);
+	PG_RETURN_FLOAT4(val);
 }
 
 /*
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 9d99816eae..84ca929439 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -555,6 +555,9 @@
 /* Define to 1 if you have the `strsignal' function. */
 #undef HAVE_STRSIGNAL
 
+/* Define to 1 if you have the `strtof' function. */
+#undef HAVE_STRTOF
+
 /* Define to 1 if you have the `strtoll' function. */
 #undef HAVE_STRTOLL
 
diff --git a/src/include/pg_config.h.win32 

Re: Tid scan improvements

2019-01-17 Thread Edmund Horner
Hi all,

I am a bit stuck and I think it's best to try to explain where.

I'm still rebasing the patches for the changes Tom made to support
parameterised TID paths for joins.  While the addition of join support
itself does not really touch the same code, the modernisation -- in
particular, returning a list of RestrictInfos rather than raw quals -- does
rewrite quite a bit of tidpath.c.

The original code returned:

List (with OR semantics)
  CTID = ?   or   CTID = ANY (...)   or   IS CURRENT OF
  (more items)

That changed recently to return:

List (with OR semantics)
  RestrictInfo
CTID = ?   or ...
  (more items)

My last set of patches extended the tidqual extraction to pull out lists
(with AND semantics) of range quals of the form CTID < ?, etc.  Each list
of more than one item was converted into an AND clause before being added
to the tidqual list; a single range qual can be added to tidquals as is.

This required looking across multiple RestrictInfos at the top level, for
example:

  - "WHERE ctid > ? AND ctid < ?" would arrive at tidpath as a list of two
RestrictInfos, from which we extract a single tidqual in the form of an AND
clause.
  - "WHERE ctid = ? OR (ctid > ? AND ctid < ?)" arrives as only one
RestrictInfo, but we extract two tidquals (an OpExpr, and an AND clause).

The code could also ignore additional unusable quals from a list of
top-level RestrictInfos, or from a list of quals from an AND clause, for
example:

  - "WHERE foo = ? AND ctid > ? AND ctid < ?" gives us the single tidqual
"ctid > ? AND ctid < ?".
  - "WHERE (ctid = ? AND bar = ?) OR (foo = ? AND ctid > ? AND ctid < ?)"
gives us the two tidquals "ctid = ?" and "ctid > ? AND ctid < ?".

As the extracted tidquals no longer match the original query quals, they
aren't removed from scan_clauses in createplan.c, and hence are correctly
checked by the filter.

Aside: The analogous situation with an indexed user attribute "x" behaves a
bit differently:
  - "WHERE x = ? OR (x > ? AND x < ?)", won't use a regular index scan, but
might use a bitmap index scan.

My patch uses the same path type and executor for all extractable tidquals.

This worked pretty well, but I am finding it difficult to reimplement it in
the new tidpath.c code.

In the query information given to the path generator, there is no existing
RestrictInfo relating to the whole expression "ctid > ? AND ctid < ?".  I
am still learning about RestrictInfos, but my understanding is it doesn't
make sense to have a RestrictInfo for an AND clause, anyway; you're
supposed to have them for the sub-expressions of it.

And it doesn't seem a good idea to try to create new RestrictInfos in the
path generation just to pass the tidquals back to plan creation.  They're
complicated objects.

There's also the generation of scan_clauses in create_tidscan_plan
(createplan.c:3107).  This now uses RestrictInfos -- I'd image we'd need
each AND clause to be wrapped in a RestrictInfo to be able to check it
properly.

To summarise, I'm not sure what kind of structure I should add to the
tidquals list to represent a compound range expression.  Maybe it's better
to create a different path (either a new path type, or a flag in TidPath to
say what kind of quals are attached) ?

Edmund


Re: pg_dump multi VALUES INSERT

2019-01-17 Thread David Rowley
On Fri, 18 Jan 2019 at 01:15, Surafel Temesgen  wrote:
> The attache patch use your method mostly

I disagree with the "mostly" part.  As far as I can see, you took the
idea and then made a series of changes to completely break it.  For
bonus points, you put back my comment change to make it incorrect
again.

Here's what I got after applying your latest patch:

$ pg_dump --table=t --inserts --rows-per-insert=4 postgres

[...]
INSERT INTO public.t VALUES (1);
)
INSERT INTO public.t VALUES (, ( 2);
)
INSERT INTO public.t VALUES (, ( 3);
)
INSERT INTO public.t VALUES (, ( 4);
);
INSERT INTO public.t VALUES (5);
)
INSERT INTO public.t VALUES (, ( 6);
)
INSERT INTO public.t VALUES (, ( 7);
)
INSERT INTO public.t VALUES (, ( 8);
);
INSERT INTO public.t VALUES (9);
)
;

I didn't test, but I'm pretty sure that's not valid INSERT syntax.

I'd suggest taking my changes and doing the plumbing work to tie the
rows_per_statement into the command line arg instead of how I left it
hardcoded as 3.

>> +When using --inserts, this allows the maximum 
>> number
>> +of rows per INSERT statement to be specified.
>> +This setting defaults to 1.
>>
> i change it too except "This setting defaults to 1"  because it doesn't have 
> default value.
> 1 row per statement means --inserts option .

If it does not default to 1 then what happens when the option is not
specified?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: current_logfiles not following group access and instead follows log_file_mode permissions

2019-01-17 Thread Haribabu Kommi
On Thu, Jan 17, 2019 at 5:49 AM Stephen Frost  wrote:

> Greetings,
>
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
> > > * Michael Paquier (mich...@paquier.xyz) wrote:
> > >> On Tue, Jan 15, 2019 at 10:53:30AM -0500, Tom Lane wrote:
> > >>> On reflection, maybe the problem is not that we're giving the file
> > >>> the wrong permissions, but that we're putting it in the wrong place?
> >
> > > I'm not really sure putting it into the logfile directory is such a hot
> > > idea as users might have set up external log file rotation of files in
> > > that directory.  Of course, in that case they'd probably signal PG
> right
> > > afterwards and PG would go write out a new file, but it still seems
> > > pretty awkward.  I'm not terribly against solving this issue that way
> > > either though, but I tend to think the originally proposed patch is
> more
> > > sensible.
> >
> > I dunno, I think that the current design was made without any thought
> > whatsoever about the log-files-outside-the-data-directory case.  If
> > you're trying to set things up that way, it's because you want to give
> > logfile read access to people who shouldn't be able to look into the
> > data directory proper.  That makes current_logfiles pretty useless
> > to such people, as it's now designed.
>
> ... or you just want to move the log files to a more sensible location
> than the data directory.  The justification for log_file_mode existing
> is because you might want to have log files with different privileges,
> but that's quite a different thing.
>

Thanks for sharing your opinions.

The current_logfiles is used to store the meta data information of current
writing log files, that is different to log files, so giving permissions of
the
log file may not be correct,

> Now, if the expectation is that current_logfiles is just an internal
> > working file that users shouldn't access directly, then this argument
> > is wrong --- but then why is it documented in user-facing docs?
>
> I really couldn't say why it's documented in the user-facing docs, and
> for my 2c I don't really think it should be- there's a function to get
> that information.  Sprinkling the data directory with files for users to
> access directly doesn't exactly fit my view of what a good API looks
> like.
>
> The fact that there isn't any discussion about where that file actually
> lives does make me suspect you're right that log files outside the data
> directory wasn't really contemplated.
>

I can only think of reading this file by the user directly when the server
is not available, but I don't find any scenario where that is required?



> > If we're going to accept the patch as-is, then it logically follows
> > that we should de-document current_logfiles, because we're taking the
> > position that it's an internal temporary file not meant for user access.
>
> ... and hopefully we'd get rid of it one day entirely.
>

If there is no use of it when server is offline, it will be better to
remove that
file with an alternative to provide the current log file name.

With group access mode, the default value of log_file_mode is changed,
Attached patch reflects the same in docs.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-log_file_mode-default-value-update.patch
Description: Binary data


Thread-unsafe coding in ecpg

2019-01-17 Thread Tom Lane
I've found that a couple of different OpenBSD 6.4 machines fall over
badly in the ecpg regression tests, with output like

test sql/parser   ... ok
test thread/thread... stdout stderr FAILED (test process was 
terminated by signal 6: Abort trap)
test thread/thread_implicit   ... stdout FAILED (test process was 
terminated by signal 10: Bus error)
test thread/prep  ... ok (test process was terminated by signal 
10: Bus error)
test thread/alloc ... stderr FAILED (test process was 
terminated by signal 6: Abort trap)
test thread/descriptor... ok

It's somewhat variable as to which tests fail, but it's always thread
tests.  Examining the core dumps shows traces like

#0  thrkill () at -:3
#1  0x0c04f427dd6e in _libc_abort () at /usr/src/lib/libc/stdlib/abort.c:51
#2  0x0c04f425f7e9 in wrterror (d=Variable "d" is not available.
)
at /usr/src/lib/libc/stdlib/malloc.c:291
#3  0x0c04f42628fb in find_chunknum (d=Variable "d" is not available.
)
at /usr/src/lib/libc/stdlib/malloc.c:1043
#4  0x0c04f425fe23 in ofree (argpool=Variable "argpool" is not available.
)
at /usr/src/lib/libc/stdlib/malloc.c:1359
#5  0x0c04f425f8ec in free (ptr=0xc04df0e26e0)
at /usr/src/lib/libc/stdlib/malloc.c:1419
#6  0x0c04f427ec83 in freegl (oldgl=0xc05a022d080)
at /usr/src/lib/libc/locale/setlocale.c:32
#7  0x0c04f427eb49 in _libc_setlocale (category=4, 
locname=0xc059b605180 "C") at /usr/src/lib/libc/locale/setlocale.c:177
#8  0x0c0531a6f955 in ecpg_do_epilogue (stmt=0xc0587bb0c00)
at execute.c:1986
#9  0x0c0531a6fa65 in ecpg_do (lineno=Variable "lineno" is not available.
) at execute.c:2018
#10 0x0c0531a6fb31 in ECPGdo (lineno=Variable "lineno" is not available.
) at execute.c:2037
#11 0x0c02a9f00b19 in test_thread (arg=Variable "arg" is not available.
) at thread.pgc:131
#12 0x0c04b180b26e in _rthread_start (v=Variable "v" is not available.
)
at /usr/src/lib/librthread/rthread.c:96
#13 0x0c04f42ba77b in __tfork_thread ()
at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#14 0x in ?? ()

or this one:

#0  _libc_strlcpy (dst=0x2c61e133ee0 "C", 
src=0xdfdfdfdfdfdfdfdf , 
dsize=256) at /usr/src/lib/libc/string/strlcpy.c:36
#1  0x02c61de99a71 in _libc_setlocale (category=4, locname=0x0)
   from /usr/lib/libc.so.92.5
#2  0x02c5ae2693a8 in ecpg_do_prologue (lineno=59, compat=0, 
force_indicator=1, connection_name=0x0, questionmarks=false, 
statement_type=ECPGst_execute, query=0x2c333701418 "i", 
args=0x2c61a96f6d0, stmt_out=0x2c61a96f5e0) at execute.c:1776
#3  0x02c5ae269a20 in ecpg_do (lineno=Variable "lineno" is not available.
) at execute.c:2001
#4  0x02c5ae269b31 in ECPGdo (lineno=Variable "lineno" is not available.
) at execute.c:2037
#5  0x02c333600b47 in fn (arg=Variable "arg" is not available.
) at prep.pgc:59
#6  0x02c56a00b26e in _rthread_start (v=Variable "v" is not available.
)
at /usr/src/lib/librthread/rthread.c:96
#7  0x02c61ded577b in __tfork_thread ()
at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#8  0x in ?? ()


The common denominator is always a call to setlocale(), and
that generally is calling malloc or free or some other libc
function that is unhappy.  When output appears on stderr,
it's usually free complaining about double-frees or some such.

So my conclusion is that this version of setlocale() has some
thread-safety issues.  I was all set to go file a bug report
when I noticed this in the POSIX spec for setlocale:

The setlocale() function need not be thread-safe.

as well as this:

The global locale established using setlocale() shall only be used
in threads for which no current locale has been set using
uselocale() or whose current locale has been set to the global
locale using uselocale(LC_GLOBAL_LOCALE).

IOW, not only is setlocale() not necessarily thread-safe itself,
but using it to change locales in a multithread program is seriously
unsafe because of concurrent effects on other threads.

Therefore, it's plain crazy for ecpg to be calling setlocale() inside
threaded code.  It looks to me like what ecpg is doing is trying to defend
itself against non-C LC_NUMERIC settings, which is laudable, but this
implementation of that is totally unsafe.

Don't know what's the best way out of this.  The simplest thing would
be to just remove that code and document that you'd better run ecpg
in LC_NUMERIC locale, but it'd be nice if we could do better.

regards, tom lane



Fix function name in commet in vacuumlazy.c

2019-01-17 Thread Masahiko Sawada
Hi,

Attached small patch fixes the function name heap_vacuum_rel in the comment.

s/vacuum_heap_rel/heap_vacuum_rel/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_heap_vacuum_rel.patch
Description: Binary data


RE: Libpq support to connect to standby server as priority

2019-01-17 Thread Tsunakawa, Takayuki
From: Laurenz Albe [mailto:laurenz.a...@cybertec.at]
> I think that transaction_read_only is good.
> 
> If it is set to false, we are sure to be on a replication primary or
> stand-alone server, which is enough to know for the load balancing use case.

As Tatsuo-san said, setting default_transaction_read_only leads to a 
misjudgement of the primary.


> I deem it unlikely that someone will set default_transaction_read_only to
> FALSE and then complain that the feature is not working as expected, but
> again
> I cannot prove that claim.

I wonder what default_transaction_read_only exists for.  For maing the database 
by default and allowing only specific users to write to the database with 
"CREATE/ALTER USER SET default_transaction_read_only = false"?

I'm sorry to repeat myself, but anyway, I think we need a method to connect to 
a standby as the original desire, because the primary instance may be read only 
by default while only limited users update data.  That's for reducing the 
burdon on the primary and minimizing the impact on users who update data.  For 
example,

* run data reporting on the standby
* backup the database from the standby
* cascade replication from the standby



> As Robert said, transaction_read_only might even give you the option to
> use the feature for more than just load balancing between replication master
> and standby.

What use case do you think of?  If you want to load balance the workload 
between the primary and standbys, we can follow PgJDBC -- targetServerType=any.


Regards
Takayuki Tsunakawa




Shouldn't current_schema() be at least PARALLEL RESTRICTED?

2019-01-17 Thread Michael Paquier
Hi all,

While working on the recent issues with 2PC and temporary objects I
have added a test case based on current_schema() for the first time in
history, and the buildfarm complained about it, as mentioned here:
https://www.postgresql.org/message-id/20190118005949.gd1...@paquier.xyz

The has been causing crake and lapwing to complain about
current_schema() failing to create a temporary schema, which can
happen when invoked for the first time of a session:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake=2019-01-18%2000%3A34%3A20
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2019-01-18%2001%3A20%3A01

Here is the problem:
SET search_path TO 'pg_temp';
BEGIN;
SELECT current_schema() ~ 'pg_temp' AS is_temp_schema;
- is_temp_schema
-
- t
-(1 row)
-
+ERROR:  cannot create temporary tables during a parallel operation
PREPARE TRANSACTION 'twophase_search';
-ERROR:  cannot PREPARE a transaction that has operated on temporary namespace

current_schemas() also has this problem.

For now I have stabilized the test by making sure that non-parallel
plans get used, which makes the buildfarm happy, still that's just a
workaround in my opinion.   One of the reasons why current_schema()
can create temporary schemas is that there are string dependencies
with search_path and the way sessions use it, hence it seems to me
that it would be better to mark the function at least parallel
restricted on HEAD and avoid any unstable behavior?

Thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: Feature: temporary materialized views

2019-01-17 Thread Mitar
Hi!

On Thu, Jan 17, 2019 at 2:40 PM Andreas Karlsson  wrote:
> I did some functional testing today and everything seems to work as
> expected other than that the tab completion for psql seems to be missing.

Thanks. I can add those as soon as I figure how. :-)

So what are next steps here besides tab autocompletion? It is OK to
remove that security check? If I understand correctly, there are some
general refactoring of code Tom is proposing, but I am not sure if I
am able to do that/understand that.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: Feature: temporary materialized views

2019-01-17 Thread Mitar
Hi!

On Thu, Jan 17, 2019 at 9:53 AM Andreas Karlsson  wrote:
> > What is the stumbling block to just leaving that alone?
>
> I think the issue Mitar ran into is that the temporary materialized view
> is created in the rStartup callback of the receiver which happens after
> SECURITY_RESTRICTED_OPERATION is set in ExecCreateTableAs(), so the
> creation of the view itself is denied.

Yes, the error without that change is:

ERROR:  cannot create temporary table within security-restricted operation


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: [HACKERS] Block level parallel vacuum

2019-01-17 Thread Haribabu Kommi
On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada 
wrote:

>
> Rebased.
>

I started reviewing the patch, I didn't finish my review yet.
Following are some of the comments.

+PARALLEL N
+
+ 
+  Execute index vacuum and cleanup index in parallel with

I doubt that user can understand the terms index vacuum and cleanup index.
May be it needs some more detailed information.


- VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */
+ VACOPT_PARALLEL = 1 << 7, /* do lazy VACUUM in parallel */
+ VACOPT_DISABLE_PAGE_SKIPPING = 1 << 8 /* don't skip any pages */
+} VacuumOptionFlag;

Any specific reason behind not adding it as last member of the enum?


-typedef enum VacuumOption
+typedef enum VacuumOptionFlag
 {

I don't find the new name quite good, how about VacuumFlags?


+typedef struct VacuumOption
+{

How about VacuumOptions? Because this structure can contains all the
options provided to vacuum operation.



+ vacopt1->flags |= vacopt2->flags;
+ if (vacopt2->flags == VACOPT_PARALLEL)
+ vacopt1->nworkers = vacopt2->nworkers;
+ pfree(vacopt2);
+ $$ = vacopt1;
+ }

As the above statement indicates the the last parallel number of workers
is considered into the account, can we explain it in docs?


postgres=# vacuum (parallel 2, verbose) tbl;

With verbose, no parallel workers related information is available.
I feel giving that information is required even when it is not parallel
vacuum also.


Regards,
Haribabu Kommi
Fujitsu Australia


Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Peter Geoghegan
On Thu, Jan 17, 2019 at 5:09 PM Peter Geoghegan  wrote:
> In the kludgey patch that I posted, the 4-byte value is manufactured
> artificially within a backend in descending order. That may have a
> slight advantage over object oid, even after the pg_depend correctness
> issues are addressed. A fixed order within a backend or originating
> transaction seems like it might avoid a few remaining instability
> issues. Not sure. I seem to recall there being some disagreement
> between you and Andres on this very point (is object oid a perfectly
> stable tie-breaker in practice?) on a similar thread from 2017.

Here are your remarks about it on that 2017 thread:
https://www.postgresql.org/message-id/11852.1501610262%40sss.pgh.pa.us

-- 
Peter Geoghegan



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Peter Geoghegan
On Thu, Jan 17, 2019 at 4:40 PM Tom Lane  wrote:
> Yeah, that's the policy we've followed so far, but I remain concerned
> about its effects on the regression tests.  There are a lot of places
> where we print full DROP CASCADE output because "it hasn't failed yet".
> I fear every one of those is a gotcha that's waiting to bite us.

Right. I don't want to have to play whack-a-mole with this later on,
and risk suppressing the wrong thing.

> Also, is index scan order really guaranteed for equal-keyed items?
> It isn't today, and I didn't think your patches were going to make
> it so.

The idea is that you'd have an extra column, so they wouldn't be
equal-keyed (the keys that the scan was interested in would be
duplicated, but we could still rely on the ordering). Are you
suggesting that there'd be a stability issue regardless?

My patch does guarantee order for equal-keyed items, since heap TID is
treated as just another attribute, at least as far as the nbtree key
space is concerned. The big unknown is how exactly that works out when
the regression tests are run in parallel, on a busy system, since heap
TID isn't just another column outside of nbtree. I think that this
will probably cause test flappiness if we don't nail it down now. That
was certainly my experience before I nailed it down in my own
tentative way. My sense is that without addressing this issue, we're
going to be sensitive to concurrent heap TID recycling by VACUUM in a
way that might become an annoyance. I see no reason to not go fix it
once and for all, since the vast majority of all problems are around
the two pg_depend indexes.

> We're going to stick all these items into an ObjectAddress array anyway,
> so at worst it'd be 2X growth, most likely a lot less since we'd only
> be sorting one level of dependency at a time.

It sounds like we don't have a good reason to not just sort them
explicitly, then. I'm happy to go that way. I mostly just wanted to be
sure that you were aware that we could add a tie-breaker column
without any storage overhead.

> > As I've pointed out a couple of times already, we can add a 4 byte
> > tie-breaker column to both pg_depend indexes without increasing the
> > size of the on-disk representation, since the extra space is already
> > lost to alignment (we could even add a new 4 byte column to the table
> > without any storage overhead, if that happened to make sense).
>
> Meh ... where do you get the 4-byte value from?

In the kludgey patch that I posted, the 4-byte value is manufactured
artificially within a backend in descending order. That may have a
slight advantage over object oid, even after the pg_depend correctness
issues are addressed. A fixed order within a backend or originating
transaction seems like it might avoid a few remaining instability
issues. Not sure. I seem to recall there being some disagreement
between you and Andres on this very point (is object oid a perfectly
stable tie-breaker in practice?) on a similar thread from 2017.

-- 
Peter Geoghegan



Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Dave Cramer
On Thu, 17 Jan 2019 at 19:56, Tatsuo Ishii  wrote:

> >> > I'm curious; under what circumstances would the above occur?
> >>
> >> Former primary goes down and one of standbys is promoting but it is
> >> not promoted to new primary yet.
> >>
> >
> > seems like JDBC might have some work to do...Thanks
> >
> > I'm going to wait to implement until we resolve this discussion
>
> If you need some input from me regarding finding a primary node,
> please say so.  While working on Pgpool-II project, I learned the
> necessity in a hard way.
>
>
I would really like to have a consistent way of doing this, and consistent
terms for the connection parameters.

that said yes, I would like input from you.

Thanks,

Dave


Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Tatsuo Ishii
>> > I'm curious; under what circumstances would the above occur?
>>
>> Former primary goes down and one of standbys is promoting but it is
>> not promoted to new primary yet.
>>
> 
> seems like JDBC might have some work to do...Thanks
> 
> I'm going to wait to implement until we resolve this discussion

If you need some input from me regarding finding a primary node,
please say so.  While working on Pgpool-II project, I learned the
necessity in a hard way.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Michael Paquier
On Thu, Jan 17, 2019 at 07:21:08PM -0500, Tom Lane wrote:
> Sorry, I don't buy this line of argument.  Reasonable test design requires
> making cost/benefit tradeoffs: the cost to run the test over and over,
> and the cost to maintain the test itself (e.g. fix portability issues in
> it) have to be balanced against the probability of it finding something
> useful.  I judge that the chance of this particular test finding something
> is small, and I've had quite enough of the maintenance costs.

Yes, I agree with Tom's line of thoughts here.  It seems to me that
just dropping this part of the test is just but fine.
--
Michael


signature.asc
Description: PGP signature


Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, Jan 17, 2019 at 12:42 PM Tom Lane  wrote:
>> Now, perhaps we should make such stability a design goal, as it'd allow
>> us to get rid of some "suppress the cascade outputs" hacks in the
>> regression tests.  But it's a bit of a new feature.  If we wanted to
>> do that, I'd be inclined to do it by absorbing all the pg_depend entries
>> for a particular object into an ObjectAddress array and then sorting
>> them before we process them.  The main stumbling block here is "what
>> would the sort order be?".  The best idea I can come up with offhand
>> is to sort by OID, which at least for regression test purposes would
>> mean objects would be listed/processed more or less in creation order.

> I think that we might as well have a stable order. Maybe an explicit
> sort step is unnecessary -- we can actually rely on scan order, while
> accepting you'll get a different order with "ignore_system_indexes=on"
> (though without getting substantively different/incorrect messages).

Yeah, that's the policy we've followed so far, but I remain concerned
about its effects on the regression tests.  There are a lot of places
where we print full DROP CASCADE output because "it hasn't failed yet".
I fear every one of those is a gotcha that's waiting to bite us.

Also, is index scan order really guaranteed for equal-keyed items?
It isn't today, and I didn't think your patches were going to make
it so.

> I'm slightly concerned that an explicit sort step might present
> difficulties in extreme cases. How much memory are we prepared to
> allocate, just to get a stable order?

We're going to stick all these items into an ObjectAddress array anyway,
so at worst it'd be 2X growth, most likely a lot less since we'd only
be sorting one level of dependency at a time.

> As I've pointed out a couple of times already, we can add a 4 byte
> tie-breaker column to both pg_depend indexes without increasing the
> size of the on-disk representation, since the extra space is already
> lost to alignment (we could even add a new 4 byte column to the table
> without any storage overhead, if that happened to make sense).

Meh ... where do you get the 4-byte value from?

regards, tom lane



Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Dave Cramer
On Thu, 17 Jan 2019 at 19:38, Tatsuo Ishii  wrote:

> >> >> >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp]
> >> >> >> >> But pg_is_in_recovery() returns true even for a promoting
> >> standby. So
> >> >> >> >> you have to wait and retry to send pg_is_in_recovery() until it
> >> >> >> >> finishes the promotion to find out it is now a primary. I am
> not
> >> sure
> >> >> >> >> if backend out to be responsible for this process. If not,
> libpq
> >> >> would
> >> >> >> >> need to handle it but I doubt it would be possible.
> >> >> >> >
> >> >> >> > Yes, the application needs to retry connection attempts until
> >> success.
> >> >> >> That's not different from PgJDBC and other DBMSs.
> >> >> >>
> >> >> >> I don't know what PgJDBC is doing, however I think libpq needs to
> do
> >> >> >> more than just retrying.
> >> >> >>
> >> >> >> 1) Try to find a node on which pg_is_in_recovery() returns false.
> If
> >> >> >>found, then we assume that is the primary. We also assume that
> >> >> >>other nodes are standbys. done.
> >> >> >>
> >> >> >> 2) If there's no node on which pg_is_in_recovery() returns false,
> >> then
> >> >> >>we need to retry until we find it. To not retry forever, there
> >> >> >>should be a timeout counter parameter.
> >> >> >>
> >> >> >>
> >> >> > IIRC this is essentially what pgJDBC does.
> >> >>
> >> >> Thanks for clarifying that. Pgpool-II also does that too. Seems like
> a
> >> >> common technique to find out a primary node.
> >> >>
> >> >>
> >> > Checking the code I see we actually use show transaction_read_only.
> >> >
> >> > Sorry for the confusion
> >>
> >> So if all PostgreSQL servers returns transaction_read_only = on, how
> >> does pgJDBC find the primary node?
> >>
> >> well preferSecondary would return a connection.
>
> This is not my message :-)
>
> > I'm curious; under what circumstances would the above occur?
>
> Former primary goes down and one of standbys is promoting but it is
> not promoted to new primary yet.
>

seems like JDBC might have some work to do...Thanks

I'm going to wait to implement until we resolve this discussion

Dave

>
>


Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Tatsuo Ishii
>> >> >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp]
>> >> >> >> But pg_is_in_recovery() returns true even for a promoting
>> standby. So
>> >> >> >> you have to wait and retry to send pg_is_in_recovery() until it
>> >> >> >> finishes the promotion to find out it is now a primary. I am not
>> sure
>> >> >> >> if backend out to be responsible for this process. If not, libpq
>> >> would
>> >> >> >> need to handle it but I doubt it would be possible.
>> >> >> >
>> >> >> > Yes, the application needs to retry connection attempts until
>> success.
>> >> >> That's not different from PgJDBC and other DBMSs.
>> >> >>
>> >> >> I don't know what PgJDBC is doing, however I think libpq needs to do
>> >> >> more than just retrying.
>> >> >>
>> >> >> 1) Try to find a node on which pg_is_in_recovery() returns false. If
>> >> >>found, then we assume that is the primary. We also assume that
>> >> >>other nodes are standbys. done.
>> >> >>
>> >> >> 2) If there's no node on which pg_is_in_recovery() returns false,
>> then
>> >> >>we need to retry until we find it. To not retry forever, there
>> >> >>should be a timeout counter parameter.
>> >> >>
>> >> >>
>> >> > IIRC this is essentially what pgJDBC does.
>> >>
>> >> Thanks for clarifying that. Pgpool-II also does that too. Seems like a
>> >> common technique to find out a primary node.
>> >>
>> >>
>> > Checking the code I see we actually use show transaction_read_only.
>> >
>> > Sorry for the confusion
>>
>> So if all PostgreSQL servers returns transaction_read_only = on, how
>> does pgJDBC find the primary node?
>>
>> well preferSecondary would return a connection.

This is not my message :-)

> I'm curious; under what circumstances would the above occur?

Former primary goes down and one of standbys is promoting but it is
not promoted to new primary yet.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Dave Cramer
On Thu, 17 Jan 2019 at 19:09, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Dave Cramer [mailto:p...@fastcrypt.com]
> >   >> 2) If there's no node on which pg_is_in_recovery() returns
> false,
> > then
> >   >>we need to retry until we find it. To not retry forever,
> there
> >   >>should be a timeout counter parameter.
>
> > Checking the code I see we actually use show transaction_read_only.
>
> Also, does PgJDBC really repeat connection attempts for a user-specified
> duration?  Having a quick look at the code, it seemed to try each host once
> in a while loop.
>

You are correct looking at the code again. On the initial connection
attempt we only try once.


Dave Cramer

da...@postgresintl.com
www.postgresintl.com

>
>


Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Dave Cramer
On Thu, 17 Jan 2019 at 19:15, Tatsuo Ishii  wrote:

> > On Thu, 17 Jan 2019 at 18:03, Tatsuo Ishii  wrote:
> >
> >> > On Wed, 16 Jan 2019 at 01:02, Tatsuo Ishii 
> wrote:
> >> >
> >> >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp]
> >> >> >> But pg_is_in_recovery() returns true even for a promoting
> standby. So
> >> >> >> you have to wait and retry to send pg_is_in_recovery() until it
> >> >> >> finishes the promotion to find out it is now a primary. I am not
> sure
> >> >> >> if backend out to be responsible for this process. If not, libpq
> >> would
> >> >> >> need to handle it but I doubt it would be possible.
> >> >> >
> >> >> > Yes, the application needs to retry connection attempts until
> success.
> >> >> That's not different from PgJDBC and other DBMSs.
> >> >>
> >> >> I don't know what PgJDBC is doing, however I think libpq needs to do
> >> >> more than just retrying.
> >> >>
> >> >> 1) Try to find a node on which pg_is_in_recovery() returns false. If
> >> >>found, then we assume that is the primary. We also assume that
> >> >>other nodes are standbys. done.
> >> >>
> >> >> 2) If there's no node on which pg_is_in_recovery() returns false,
> then
> >> >>we need to retry until we find it. To not retry forever, there
> >> >>should be a timeout counter parameter.
> >> >>
> >> >>
> >> > IIRC this is essentially what pgJDBC does.
> >>
> >> Thanks for clarifying that. Pgpool-II also does that too. Seems like a
> >> common technique to find out a primary node.
> >>
> >>
> > Checking the code I see we actually use show transaction_read_only.
> >
> > Sorry for the confusion
>
> So if all PostgreSQL servers returns transaction_read_only = on, how
> does pgJDBC find the primary node?
>
> well preferSecondary would return a connection.

I'm curious; under what circumstances would the above occur?

Regards,
Dave


Re: pgsql: Remove references to Majordomo

2019-01-17 Thread Michael Paquier
On Thu, Jan 17, 2019 at 01:04:44PM +, Magnus Hagander wrote:
> Remove references to Majordomo
> 
> Lists are not handled by Majordomo anymore and haven't been for a while,
> so remove the reference and instead direct people to the list server.

Wouldn't it be better to also switch the references to pgsql-bugs in
all the C code for the different --help outputs?
--
Michael


signature.asc
Description: PGP signature


Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
Fabien COELHO  writes:
>> I am, TBH, inclined to fix this by removing that test case rather
>> than teaching it another spelling to accept.  I think it's very
>> hard to make the case that tests like this one are anything but
>> a waste of developer and buildfarm time.  When they are also a
>> portability hazard, it's time to cut our losses.  (I also note
>> that this test has caused us problems before, cf 869aa40a2 and
>> 933851033.)

> I'd rather keep it by simply adding the "|unknown" alternative. 30 years 
> of programming have taught me that testing limit & error cases is useful, 
> although you never know when it will be proven so.

Sorry, I don't buy this line of argument.  Reasonable test design requires
making cost/benefit tradeoffs: the cost to run the test over and over,
and the cost to maintain the test itself (e.g. fix portability issues in
it) have to be balanced against the probability of it finding something
useful.  I judge that the chance of this particular test finding something
is small, and I've had quite enough of the maintenance costs.

Just to point up that we're still not clearly done with the maintenance
costs of supposing that we know how every version of getopt_long will
spell this error message, I note that my Linux box seems to have two
variants of it:

$ pgbench -z 
pgbench: invalid option -- 'z'
Try "pgbench --help" for more information.
$ pgbench --z
pgbench: unrecognized option '--z'
Try "pgbench --help" for more information.

of which the "invalid" alternative is also not in our list right now.
Who's to say how many more versions of getopt_long are out there,
or what the maintainers thereof might do in the future?

> I agree that some tests can be useless, but I do not think that it applies 
> to this one. This test also checks that under a bad option pgbench stops 
> with an appropriate 1 exit status.

It's possible that it's worth the trouble to check for exit status 1,
but I entirely fail to see the point of checking exactly what is the
spelling of a message that is issued by code not under our control.

Looking closer at the test case:

[
'bad option',
'-h home -p 5432 -U calvin -d --bad-option',
[ qr{(unrecognized|illegal) option}, qr{--help.*more information} ]
],

ISTM that just removing the first qr{} pattern, and checking only that
we get the text that *is* under our control, is a reasonable compromise
here.

regards, tom lane



Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Tatsuo Ishii
> On Thu, 17 Jan 2019 at 18:03, Tatsuo Ishii  wrote:
> 
>> > On Wed, 16 Jan 2019 at 01:02, Tatsuo Ishii  wrote:
>> >
>> >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp]
>> >> >> But pg_is_in_recovery() returns true even for a promoting standby. So
>> >> >> you have to wait and retry to send pg_is_in_recovery() until it
>> >> >> finishes the promotion to find out it is now a primary. I am not sure
>> >> >> if backend out to be responsible for this process. If not, libpq
>> would
>> >> >> need to handle it but I doubt it would be possible.
>> >> >
>> >> > Yes, the application needs to retry connection attempts until success.
>> >> That's not different from PgJDBC and other DBMSs.
>> >>
>> >> I don't know what PgJDBC is doing, however I think libpq needs to do
>> >> more than just retrying.
>> >>
>> >> 1) Try to find a node on which pg_is_in_recovery() returns false. If
>> >>found, then we assume that is the primary. We also assume that
>> >>other nodes are standbys. done.
>> >>
>> >> 2) If there's no node on which pg_is_in_recovery() returns false, then
>> >>we need to retry until we find it. To not retry forever, there
>> >>should be a timeout counter parameter.
>> >>
>> >>
>> > IIRC this is essentially what pgJDBC does.
>>
>> Thanks for clarifying that. Pgpool-II also does that too. Seems like a
>> common technique to find out a primary node.
>>
>>
> Checking the code I see we actually use show transaction_read_only.
> 
> Sorry for the confusion

So if all PostgreSQL servers returns transaction_read_only = on, how
does pgJDBC find the primary node?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



RE: Libpq support to connect to standby server as priority

2019-01-17 Thread Tsunakawa, Takayuki
From: Dave Cramer [mailto:p...@fastcrypt.com]
>   >> 2) If there's no node on which pg_is_in_recovery() returns false,
> then
>   >>we need to retry until we find it. To not retry forever, there
>   >>should be a timeout counter parameter.

> Checking the code I see we actually use show transaction_read_only.

Also, does PgJDBC really repeat connection attempts for a user-specified 
duration?  Having a quick look at the code, it seemed to try each host once in 
a while loop.


Regards
Takayuki Tsunakawa




Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Peter Geoghegan
On Thu, Jan 17, 2019 at 12:42 PM Tom Lane  wrote:
> So I poked around for awhile with running the regression tests under
> ignore_system_indexes.  There seem to be a number of issues involved
> here.  To a significant extent, they aren't bugs, at least not according
> to the original conception of the dependency code: it was not a design
> goal that different dependencies of the same object-to-be-deleted would
> be processed in a fixed order.

I agree that it's the exceptional cases that are of concern here. The
vast majority of the changes you'll see with
"ignore_system_indexes=on" are noise.

> Now, perhaps we should make such stability a design goal, as it'd allow
> us to get rid of some "suppress the cascade outputs" hacks in the
> regression tests.  But it's a bit of a new feature.  If we wanted to
> do that, I'd be inclined to do it by absorbing all the pg_depend entries
> for a particular object into an ObjectAddress array and then sorting
> them before we process them.  The main stumbling block here is "what
> would the sort order be?".  The best idea I can come up with offhand
> is to sort by OID, which at least for regression test purposes would
> mean objects would be listed/processed more or less in creation order.

I think that we might as well have a stable order. Maybe an explicit
sort step is unnecessary -- we can actually rely on scan order, while
accepting you'll get a different order with "ignore_system_indexes=on"
(though without getting substantively different/incorrect messages).
I'm slightly concerned that an explicit sort step might present
difficulties in extreme cases. How much memory are we prepared to
allocate, just to get a stable order?

It probably won't really matter what the specific order is, once the
current problems (the DEPENDENCY_INTERNAL_AUTO issue and the issue
you'll fix with DEPFLAG_IS_SUBOBJECT) are handled in a direct manner.
As I've pointed out a couple of times already, we can add a 4 byte
tie-breaker column to both pg_depend indexes without increasing the
size of the on-disk representation, since the extra space is already
lost to alignment (we could even add a new 4 byte column to the table
without any storage overhead, if that happened to make sense).

What is the likelihood that somebody will ever find a better use for
this alignment padding? These two indexes are typically the largest
system catalog indexes by far, so the opportunity cost matters. I
don't think that the direct cost (more cycles) is worth worrying
about, though. Nobody has added a pg_depend column since it was first
introduced back in 2002.

-- 
Peter Geoghegan



Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Dave Cramer
On Thu, 17 Jan 2019 at 18:03, Tatsuo Ishii  wrote:

> > On Wed, 16 Jan 2019 at 01:02, Tatsuo Ishii  wrote:
> >
> >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp]
> >> >> But pg_is_in_recovery() returns true even for a promoting standby. So
> >> >> you have to wait and retry to send pg_is_in_recovery() until it
> >> >> finishes the promotion to find out it is now a primary. I am not sure
> >> >> if backend out to be responsible for this process. If not, libpq
> would
> >> >> need to handle it but I doubt it would be possible.
> >> >
> >> > Yes, the application needs to retry connection attempts until success.
> >> That's not different from PgJDBC and other DBMSs.
> >>
> >> I don't know what PgJDBC is doing, however I think libpq needs to do
> >> more than just retrying.
> >>
> >> 1) Try to find a node on which pg_is_in_recovery() returns false. If
> >>found, then we assume that is the primary. We also assume that
> >>other nodes are standbys. done.
> >>
> >> 2) If there's no node on which pg_is_in_recovery() returns false, then
> >>we need to retry until we find it. To not retry forever, there
> >>should be a timeout counter parameter.
> >>
> >>
> > IIRC this is essentially what pgJDBC does.
>
> Thanks for clarifying that. Pgpool-II also does that too. Seems like a
> common technique to find out a primary node.
>
>
Checking the code I see we actually use show transaction_read_only.

Sorry for the confusion

Dave Cramer

da...@postgresintl.com
www.postgresintl.com

>
>


Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Tom Lane
I wrote:
> Also, I am suspicious that this implementation fails on point 3
> anyway ... If nothing else, it looks like ALTER OBJECT DEPENDS ON
> EXTENSION can be used to attach a random dependency to just
> about anything.

Yup:

regression=# drop table if exists idxpart cascade;
DROP TABLE
regression=# create table idxpart (a int) partition by range (a);
CREATE TABLE
regression=# create index on idxpart (a);
CREATE INDEX
regression=# create table idxpart1 partition of idxpart for values from (0) to 
(10);
CREATE TABLE
regression=# drop index idxpart1_a_idx;  -- no way
ERROR:  cannot drop index idxpart1_a_idx because index idxpart_a_idx requires it
HINT:  You can drop index idxpart_a_idx instead.
regression=# \d idxpart1
  Table "public.idxpart1"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 a  | integer |   |  | 
Partition of: idxpart FOR VALUES FROM (0) TO (10)
Indexes:
"idxpart1_a_idx" btree (a)
regression=# create extension cube;
CREATE EXTENSION
regression=# alter index idxpart1_a_idx depends on extension cube;
ALTER INDEX
regression=# drop extension cube;
DROP EXTENSION
regression=# \d idxpart1
  Table "public.idxpart1"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 a  | integer |   |  | 
Partition of: idxpart FOR VALUES FROM (0) TO (10)


regards, tom lane



Re: draft patch for strtof()

2019-01-17 Thread Andrew Gierth
> "Andrew" == Andrew Gierth  writes:

 Andrew> Because it turns out that Windows (at least the version running
 Andrew> on Appveyor) completely fucks this up; strtof() is apparently
 Andrew> returning infinity or zero _without setting errno_ for values
 Andrew> out of range for float: input of "10e70" returns +inf with no
 Andrew> error, input of "10e-70" returns (exactly) 0.0 with no error.

This bug turns out to be dependent on compiler/SDK versions, not
surprisingly. So far I have figured out how to invoke these combinations
on appveyor:

VS2013 / SDK 7.1 (as per cfbot): fails
VS2015 / SDK 8.1: works

Trying to figure out how to get other combinations to test.

-- 
Andrew (irc:RhodiumToad)



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-18 00:31, Tom Lane wrote:

=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:

And now also the NetBSD machine failed on pgbenchCheck.


Indeed, as expected.


Ok.



should I leave it as it is for now?


Please.  I'll push a fix for the broken test case in a bit --- I
just wanted to confirm that somebody else's machines agreed that
it's broken.


Ok, I will leave it on then.

/Mikael




Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Peter Geoghegan
On Thu, Jan 17, 2019 at 3:20 PM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > There is a symmetry to these that led me to have the same kind of
> > dependency from the index partition to the other two.
>
> It's symmetric as long as you suppose that the above are the only
> requirements.  However, there's another requirement, which is that
> if you do try to drop the index partition directly, we would like
> the error message to suggest dropping the master index, not the
> table.  The only way to be sure about what will be suggested is
> if there can be only one "owning object".

+1. This is certainly a necessary requirement. Absurd error messages
are not okay.

-- 
Peter Geoghegan



Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead

2019-01-17 Thread Alvaro Herrera
You introduced new macros IsHeapRelation and IsViewRelation, but I don't
want to introduce such API.  Such things have been heavily contested and
I don't to have one more thing to worry about for this patch, so please
just put the relkind directly in the code.

On 2019-Jan-07, Nikolay Shaplov wrote:

> I also reworked some comments. BTW do you know what is this comments spoke 
> about:
> 
>  * RelationGetFillFactor() and RelationGetTargetPageFreeSpace() can only  
>   
>  * be applied to relations that use this format or a superset for 
>   
>  * private options data.
> 
> It is speaking about some old times when there can be some other type of 
> options? 'cause I do not understand what it is speaking about. 
> I've removed it, I hope I did not remove anything important.

As I understand it's talking about the varlenas being StdRdOptions and
not anything else.  I think extensibility could cause some relkinds to
use different options.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
> And now also the NetBSD machine failed on pgbenchCheck.

Indeed, as expected.

> should I leave it as it is for now?

Please.  I'll push a fix for the broken test case in a bit --- I
just wanted to confirm that somebody else's machines agreed that
it's broken.

regards, tom lane



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jan-17, Tom Lane wrote:
>> Hm.  Still, I can't believe that it's appropriate for a partitioned index
>> to have exactly the same kind of dependency on the master index as it
>> does on the associated table.

> So there are three necessary features:
> * The partition can be dropped, which takes down the index partition
> * The master index can be dropped, which takes down the index partition
> * The index partition must never be allowed to be dropped on its own.

> There is a symmetry to these that led me to have the same kind of
> dependency from the index partition to the other two.

It's symmetric as long as you suppose that the above are the only
requirements.  However, there's another requirement, which is that
if you do try to drop the index partition directly, we would like
the error message to suggest dropping the master index, not the
table.  The only way to be sure about what will be suggested is
if there can be only one "owning object".

Also, I am suspicious that this implementation fails on point 3
anyway.  It looks to me like if a recursive drop reaches the
index partition from a dependency other than either the table
partition or the master index, it will let the index partition
be dropped by itself.  Ordinarily, this would likely not matter
because the index partition would share any other dependencies
(opclasses, for instance) with one or the other owning object.
But I don't think it's too hard to construct cases where that's
not true.  If nothing else, it looks like ALTER OBJECT DEPENDS ON
EXTENSION can be used to attach a random dependency to just
about anything.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
>> But it looks like in NetBSD the options are called:

Sorry about that, I copied-and-pasted from the openbsd machine I was
looking at without remembering that netbsd is just a shade different.

> but the OpenBSD machine went further and now fails on:
> pgbenchCheck instead.
> Is that the failure you expected to get?

Yup, sure is.  Thanks!

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-18 00:00, Mikael Kjellström wrote:


I just started another run on sidewinder (NetBSD 7), let's see how that 
goes.


but the OpenBSD machine went further and now fails on:

pgbenchCheck instead.

Is that the failure you expected to get?


And now also the NetBSD machine failed on pgbenchCheck.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder=2019-01-17%2022%3A57%3A14

should I leave it as it is for now?

/Mikael



Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Tatsuo Ishii
> On Wed, 16 Jan 2019 at 01:02, Tatsuo Ishii  wrote:
> 
>> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp]
>> >> But pg_is_in_recovery() returns true even for a promoting standby. So
>> >> you have to wait and retry to send pg_is_in_recovery() until it
>> >> finishes the promotion to find out it is now a primary. I am not sure
>> >> if backend out to be responsible for this process. If not, libpq would
>> >> need to handle it but I doubt it would be possible.
>> >
>> > Yes, the application needs to retry connection attempts until success.
>> That's not different from PgJDBC and other DBMSs.
>>
>> I don't know what PgJDBC is doing, however I think libpq needs to do
>> more than just retrying.
>>
>> 1) Try to find a node on which pg_is_in_recovery() returns false. If
>>found, then we assume that is the primary. We also assume that
>>other nodes are standbys. done.
>>
>> 2) If there's no node on which pg_is_in_recovery() returns false, then
>>we need to retry until we find it. To not retry forever, there
>>should be a timeout counter parameter.
>>
>>
> IIRC this is essentially what pgJDBC does.

Thanks for clarifying that. Pgpool-II also does that too. Seems like a
common technique to find out a primary node.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström




On 2019-01-17 23:54, Mikael Kjellström wrote:


But it looks like in NetBSD the options are called:

netbsd7-pgbf# sysctl -a | grep semmn
kern.ipc.semmni = 10
kern.ipc.semmns = 60
kern.ipc.semmnu = 30

so I will try and set that in /etc/sysctl.conf and reboot and see what 
happens.


That seems to have done the trick:

netbsd7-pgbf# sysctl -a | grep semmn
kern.ipc.semmni = 100
kern.ipc.semmns = 2000
kern.ipc.semmnu = 30

I just started another run on sidewinder (NetBSD 7), let's see how that 
goes.


but the OpenBSD machine went further and now fails on:

pgbenchCheck instead.

Is that the failure you expected to get?

/Mikael



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-17 23:37, Mikael Kjellström wrote:


On 2019-01-17 23:23, Tom Lane wrote:


Yeah, you might've been able to get by with OpenBSD/NetBSD's default
semaphore settings before, but they really only let one postmaster
run at a time; and the TAP tests want to start more than one.
For me it seems to work to append this to /etc/sysctl.conf:

kern.seminfo.semmni=100
kern.seminfo.semmns=2000

and either reboot, or install those settings manually with sysctl.


Looks that way.

I've increased the values and rebooted the machines.

Let's hope 5th time is the charm :-)


Nope!

But it looks like in NetBSD the options are called:

netbsd7-pgbf# sysctl -a | grep semmn
kern.ipc.semmni = 10
kern.ipc.semmns = 60
kern.ipc.semmnu = 30

so I will try and set that in /etc/sysctl.conf and reboot and see what 
happens.


/Mikael




Re: problems with foreign keys on partitioned tables

2019-01-17 Thread Alvaro Herrera
On 2019-Jan-09, Amit Langote wrote:

> 1. Foreign keys of partitions stop working correctly after being detached
> from the parent table

> This happens because the action triggers defined on the PK relation (pk)
> refers to p as the referencing relation.  On detaching p1 from p, p1's
> data is no longer accessible to that trigger.

Ouch.

> To fix this problem, we need create action triggers on PK relation
> that refer to p1 when it's detached (unless such triggers already
> exist which might be true in some cases).  Attached patch 0001 shows
> this approach.

Hmm, okay.  I'm not in love with the idea that such triggers might
already exist -- seems unclean.  We should remove redundant action
triggers when we attach a table as a partition, no?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Protect syscache from bloating with negative cache entries

2019-01-17 Thread Gavin Flower

On 18/01/2019 08:48, Bruce Momjian wrote:

On Thu, Jan 17, 2019 at 11:33:35AM -0500, Robert Haas wrote:

The flaw in your thinking, as it seems to me, is that in your concern
for "the likelihood that cache flushes will simply remove entries
we'll soon have to rebuild," you're apparently unwilling to consider
the possibility of workloads where cache flushes will remove entries
we *won't* soon have to rebuild.  Every time that issue gets raised,
you seem to blow it off as if it were not a thing that really happens.
I can't make sense of that position.  Is it really so hard to imagine
a connection pooler that switches the same connection back and forth
between two applications with different working sets?  Or a system
that keeps persistent connections open even when they are idle?  Do
you really believe that a connection that has not accessed a cache
entry in 10 minutes still derives more benefit from that cache entry
than it would from freeing up some memory?

Well, I think everyone agrees there are workloads that cause undesired
cache bloat.  What we have not found is a solution that doesn't cause
code complexity or undesired overhead, or one that >1% of users will
know how to use.

Unfortunately, because we have not found something we are happy with, we
have done nothing.  I agree LRU can be expensive.  What if we do some
kind of clock sweep and expiration like we do for shared buffers?  I
think the trick is figuring how frequently to do the sweep.  What if we
mark entries as unused every 10 queries, mark them as used on first use,
and delete cache entries that have not be used in the past 10 queries.

If you take that approach, then this number should be configurable.  
What if I had 12 common queries I used in rotation?


The ARM3 processor cache logic was to simply eject an entry at random, 
as the obviously Acorn felt that the silicon required to have a more 
sophisticated algorithm would reduce the cache size too much!


I upgraded my Acorn Archimedes that had an 8MHZ bus, from an 8MHz ARM2 
to a 25MZ ARM3. that is a clock rate improvement of about 3 times.  
However BASIC programs ran about 7 times faster, which I put down to the 
ARM3 having a cache.


Obviously for Postgres this is not directly relevant, but I think it 
suggests that it may be worth considering replacing cache items at 
random.  As there are no pathological corner cases, and the logic is 
very simple.



Cheers,
Gavin






Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Alvaro Herrera
On 2019-Jan-17, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2019-Jan-17, Tom Lane wrote:
> >> DEPENDENCY_INTERNAL_AUTO, however, broke this completely, as the code
> >> has no hesitation about making multiple entries of that kind.   After
> >> rather cursorily looking at that code, I'm leaning to the position
> >> that DEPENDENCY_INTERNAL_AUTO is broken-by-design and needs to be
> >> nuked from orbit.  In the cases where it's being used, such as
> >> partitioned indexes, I think that probably the right design is one
> >> DEPENDENCY_INTERNAL dependency on the partition master index, and
> >> then one DEPENDENCY_AUTO dependency on the matching partitioned table.
> 
> > As I recall, the problem with that approach is that you can't drop the
> > partition when a partitioned index exists, because it follows the link
> > to the parent index and tries to drop that.
> 
> Hm.  Still, I can't believe that it's appropriate for a partitioned index
> to have exactly the same kind of dependency on the master index as it
> does on the associated table.

So there are three necessary features:
* The partition can be dropped, which takes down the index partition
* The master index can be dropped, which takes down the index partition
* The index partition must never be allowed to be dropped on its own.

There is a symmetry to these that led me to have the same kind of
dependency from the index partition to the other two.

I'm now wondering if it would work to have the one from index partition
to table partition as DEPENDENCY_AUTO and the one from index partition
to parent index as DEPENDENCY_INTERNAL_AUTO.  I stared at a lot of
possible dependency graphs, but I'm not sure if this one was one of
them.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Feature: temporary materialized views

2019-01-17 Thread Andreas Karlsson

On 1/11/19 8:47 PM, Mitar wrote:

Thanks for doing the review!


I did some functional testing today and everything seems to work as 
expected other than that the tab completion for psql seems to be missing.


Andreas




Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-17 23:23, Tom Lane wrote:


Yeah, you might've been able to get by with OpenBSD/NetBSD's default
semaphore settings before, but they really only let one postmaster
run at a time; and the TAP tests want to start more than one.
For me it seems to work to append this to /etc/sysctl.conf:

kern.seminfo.semmni=100
kern.seminfo.semmns=2000

and either reboot, or install those settings manually with sysctl.


Looks that way.

I've increased the values and rebooted the machines.

Let's hope 5th time is the charm :-)

/Mikael



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jan-17, Tom Lane wrote:
>> DEPENDENCY_INTERNAL_AUTO, however, broke this completely, as the code
>> has no hesitation about making multiple entries of that kind.   After
>> rather cursorily looking at that code, I'm leaning to the position
>> that DEPENDENCY_INTERNAL_AUTO is broken-by-design and needs to be
>> nuked from orbit.  In the cases where it's being used, such as
>> partitioned indexes, I think that probably the right design is one
>> DEPENDENCY_INTERNAL dependency on the partition master index, and
>> then one DEPENDENCY_AUTO dependency on the matching partitioned table.

> As I recall, the problem with that approach is that you can't drop the
> partition when a partitioned index exists, because it follows the link
> to the parent index and tries to drop that.

Hm.  Still, I can't believe that it's appropriate for a partitioned index
to have exactly the same kind of dependency on the master index as it
does on the associated table.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
>> Let's see if it works better this time.

> Hmmm, nope:

> 2019-01-17 23:09:20.343 CET [9129] FATAL:  could not create semaphores: 
> No space left on device

Yeah, you might've been able to get by with OpenBSD/NetBSD's default
semaphore settings before, but they really only let one postmaster
run at a time; and the TAP tests want to start more than one.
For me it seems to work to append this to /etc/sysctl.conf:

kern.seminfo.semmni=100
kern.seminfo.semmns=2000

and either reboot, or install those settings manually with sysctl.

regards, tom lane



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Alvaro Herrera
On 2019-Jan-17, Tom Lane wrote:

> DEPENDENCY_INTERNAL_AUTO, however, broke this completely, as the code
> has no hesitation about making multiple entries of that kind.   After
> rather cursorily looking at that code, I'm leaning to the position
> that DEPENDENCY_INTERNAL_AUTO is broken-by-design and needs to be
> nuked from orbit.  In the cases where it's being used, such as
> partitioned indexes, I think that probably the right design is one
> DEPENDENCY_INTERNAL dependency on the partition master index, and
> then one DEPENDENCY_AUTO dependency on the matching partitioned table.

As I recall, the problem with that approach is that you can't drop the
partition when a partitioned index exists, because it follows the link
to the parent index and tries to drop that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström

On 2019-01-17 22:47, Mikael Kjellström wrote:



On 2019-01-17 22:42, Tom Lane wrote:


=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:

It says:
configure: error: Additional Perl modules are required to run TAP tests
so how do I find out with Perl modules that are required?


If you look into the configure log it should say just above that,
but I'm betting you just need p5-IPC-Run.


Yes it seems to be IPC::Run that is missing.

I've installed it manually through CPAN.

Let's see if it works better this time.


Hmmm, nope:

== 
pgsql.build/src/bin/pg_ctl/tmp_check/log/003_promote_standby.log 
===
2019-01-17 23:09:20.343 CET [9129] LOG:  listening on Unix socket 
"/tmp/g66P1fpMFK/.s.PGSQL.64980"
2019-01-17 23:09:20.343 CET [9129] FATAL:  could not create semaphores: 
No space left on device
2019-01-17 23:09:20.343 CET [9129] DETAIL:  Failed system call was 
semget(64980002, 17, 03600).
2019-01-17 23:09:20.343 CET [9129] HINT:  This error does *not* mean 
that you have run out of disk space.  It occurs when either the system 
limit for the maximum number of semaphore sets (SEMMNI), or the system 
wide maximum number of semaphores (SEMMNS), would be exceeded.  You need 
to raise the respective kernel parameter.  Alternatively, reduce 
PostgreSQL's consumption of semaphores by reducing its max_connections 
parameter.
	The PostgreSQL documentation contains more information about 
configuring your system for PostgreSQL.

2019-01-17 23:09:20.345 CET [9129] LOG:  database system is shut down

will try and increase SEMMNI and see if that helps.

/Mikael




Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Tom Lane
I wrote:
> I propose that we handle this case by adding a new DEPFLAG_IS_SUBOBJECT
> flag to the column object's flags, denoting that we know the whole table
> will be dropped later.  The only effect of this flag is to suppress
> reporting of the column object in reportDependentObjects.

Here's a proposed patch for that bit.  As expected, it seems to eliminate
the variation in number-of-cascaded-drops-reported under
ignore_system_indexes.  I do not see any regression outputs change
otherwise.

regards, tom lane

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 3529084..6b8c735 100644
*** a/src/backend/catalog/dependency.c
--- b/src/backend/catalog/dependency.c
*** typedef struct
*** 102,107 
--- 102,108 
  #define DEPFLAG_INTERNAL	0x0008	/* reached via internal dependency */
  #define DEPFLAG_EXTENSION	0x0010	/* reached via extension dependency */
  #define DEPFLAG_REVERSE		0x0020	/* reverse internal/extension link */
+ #define DEPFLAG_SUBOBJECT	0x0040	/* subobject of another deletable object */
  
  
  /* expansible list of ObjectAddresses */
*** reportDependentObjects(const ObjectAddre
*** 886,891 
--- 887,896 
  		if (extra->flags & DEPFLAG_ORIGINAL)
  			continue;
  
+ 		/* Also ignore sub-objects; we'll report the whole object elsewhere */
+ 		if (extra->flags & DEPFLAG_SUBOBJECT)
+ 			continue;
+ 
  		objDesc = getObjectDescription(obj);
  
  		/*
*** object_address_present_add_flags(const O
*** 2320,2332 
   * DROP COLUMN action even though we know we're gonna delete
   * the table later.
   *
   * Because there could be other subobjects of this object in
   * the array, this case means we always have to loop through
   * the whole array; we cannot exit early on a match.
   */
  ObjectAddressExtra *thisextra = addrs->extras + i;
  
! thisextra->flags |= flags;
  			}
  		}
  	}
--- 2325,2343 
   * DROP COLUMN action even though we know we're gonna delete
   * the table later.
   *
+  * What we can do, though, is mark this as a subobject so that
+  * we don't report it separately, which is confusing because
+  * it's unpredictable whether it happens or not.  But do so
+  * only if flags != 0 (flags == 0 is a read-only probe).
+  *
   * Because there could be other subobjects of this object in
   * the array, this case means we always have to loop through
   * the whole array; we cannot exit early on a match.
   */
  ObjectAddressExtra *thisextra = addrs->extras + i;
  
! if (flags)
! 	thisextra->flags |= (flags | DEPFLAG_SUBOBJECT);
  			}
  		}
  	}
*** stack_address_present_add_flags(const Ob
*** 2374,2380 
   * object_address_present_add_flags(), we should propagate
   * flags for the whole object to each of its subobjects.
   */
! stackptr->flags |= flags;
  			}
  		}
  	}
--- 2385,2392 
   * object_address_present_add_flags(), we should propagate
   * flags for the whole object to each of its subobjects.
   */
! if (flags)
! 	stackptr->flags |= (flags | DEPFLAG_SUBOBJECT);
  			}
  		}
  	}


Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström




On 2019-01-17 22:42, Tom Lane wrote:


=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:

It says:
configure: error: Additional Perl modules are required to run TAP tests
so how do I find out with Perl modules that are required?


If you look into the configure log it should say just above that,
but I'm betting you just need p5-IPC-Run.


Yes it seems to be IPC::Run that is missing.

I've installed it manually through CPAN.

Let's see if it works better this time.

/Mikael




Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
> It says:
> configure: error: Additional Perl modules are required to run TAP tests
> so how do I find out with Perl modules that are required?

If you look into the configure log it should say just above that,
but I'm betting you just need p5-IPC-Run.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-17 22:19, Mikael Kjellström wrote:


On 2019-01-17 22:16, Tom Lane wrote:


For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio)
and NetBSD 7 (sidewinder) animals now.


Oh, thanks!  I'm guessing they'll fail their next runs, but I'll
wait to see confirmation of that before I do anything about the
test bug.


They should run the next time within the hour or hour and a half so I 
guess we will find out soon enough.


Hm, that didn't go so well.

It says:

configure: error: Additional Perl modules are required to run TAP tests

so how do I find out with Perl modules that are required?

/Mikael



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-17 22:16, Tom Lane wrote:


For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio)
and NetBSD 7 (sidewinder) animals now.


Oh, thanks!  I'm guessing they'll fail their next runs, but I'll
wait to see confirmation of that before I do anything about the
test bug.


They should run the next time within the hour or hour and a half so I 
guess we will find out soon enough.


/Mikael




Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
> On 2019-01-17 06:04, Tom Lane wrote:
>> Although we've got a few NetBSD and OpenBSD buildfarm critters,
>> none of them are doing --enable-tap-tests.  If they were, we'd
>> have noticed the pgbench regression tests falling over:

> For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio) 
> and NetBSD 7 (sidewinder) animals now.

Oh, thanks!  I'm guessing they'll fail their next runs, but I'll
wait to see confirmation of that before I do anything about the
test bug.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström

On 2019-01-17 06:04, Tom Lane wrote:

Although we've got a few NetBSD and OpenBSD buildfarm critters,
none of them are doing --enable-tap-tests.  If they were, we'd
have noticed the pgbench regression tests falling over:


For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio) 
and NetBSD 7 (sidewinder) animals now.


/Mikael



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-17 Thread Tomas Vondra
s:
>
> * If asked to build both MCV and histogram, first build the MCV part
> * and then histogram on the remaining rows.
>
> I guess that means we'll get different estimates with:
>
> create statistic a_stats (mcv,histogram) on a,b from t;
>
> vs
>
> create statistic a_stats1 (mcv) on a,b from t;
> create statistic a_stats2 (histogram) on a,b from t;
>
> Is that going to be surprising to people?

Well, I don't have a good answer to this, except for mentioning this in
the SGML docs.

> 5. serialize_histogram() and statext_histogram_deserialize(), should
> these follow the same function naming format?

Perhaps, although serialize_histogram() is static and so it's kinda
internal API.

> 8. Looking at statext_clauselist_selectivity() I see it calls
> choose_best_statistics() passing requiredkinds as STATS_EXT_INFO_MCV |
> STATS_EXT_INFO_HISTOGRAM, do you think the function now needs to
> attempt to find the best match plus the one with the most statistics
> kinds?
>
> It might only matter if someone had:
>
> create statistic a_stats1 (mcv) on a,b from t;
> create statistic a_stats2 (histogram) on a,b from t;
> create statistic a_stats3 (mcv,histogram) on a,b from t;
>
> Is it fine to just return a_stats1 and ignore the fact that a_stats3
> is probably better? Or too corner case to care?

I don't know. My assumption is people will not create such overlapping
statics.

> 9. examine_equality_clause() assumes it'll get a Var. I see we should
> only allow clauses that pass statext_is_compatible_clause_internal(),
> so maybe it's worth an Assert(IsA(var, Var)) along with a comment to
> mention anything else could not have been allowed.

Maybe.

> 10. Does examine_equality_clause need 'root' as an argument?

Probably not. I guess it's a residue some older version.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-multivariate-MCV-lists-20190117.patch.gz
Description: application/gzip


0002-multivariate-histograms-20190117.patch.gz
Description: application/gzip


Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-17 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Dec 18, 2018 at 2:26 PM Tom Lane  wrote:
>> Hm, that definitely leads me to feel that we've got bug(s) in
>> dependency.c.  I'll take a look sometime soon.

> Thanks!
> I'm greatly relieved that I probably won't have to become an expert on
> dependency.c after all.   :-)

So I poked around for awhile with running the regression tests under
ignore_system_indexes.  There seem to be a number of issues involved
here.  To a significant extent, they aren't bugs, at least not according
to the original conception of the dependency code: it was not a design
goal that different dependencies of the same object-to-be-deleted would
be processed in a fixed order.  That leads to the well-known behavior
that cascade drops report the dropped objects in an unstable order.

Now, perhaps we should make such stability a design goal, as it'd allow
us to get rid of some "suppress the cascade outputs" hacks in the
regression tests.  But it's a bit of a new feature.  If we wanted to
do that, I'd be inclined to do it by absorbing all the pg_depend entries
for a particular object into an ObjectAddress array and then sorting
them before we process them.  The main stumbling block here is "what
would the sort order be?".  The best idea I can come up with offhand
is to sort by OID, which at least for regression test purposes would
mean objects would be listed/processed more or less in creation order.

However, there are a couple of places where the ignore_system_indexes
output does something weird like

 DROP TABLE PKTABLE;
 ERROR:  cannot drop table pktable because other objects depend on it
-DETAIL:  constraint constrname2 on table fktable depends on table pktable
+DETAIL:  constraint constrname2 on table fktable depends on index pktable_pkey
 HINT:  Use DROP ... CASCADE to drop the dependent objects too.

The reason for this is that the reported "nearest dependency" is the
object that we first reached the named object by recursing from.
If there are multiple dependency paths to the same object then it's
order-traversal-dependent which path we take first.  Sorting the
pg_depend entries before processing, as I suggested above, would
remove the instability, but it's not real clear whether we'd get a
desirable choice of reported object that way.  Perhaps we could
improve matters by having the early exits from findDependentObjects
(at stack_address_present_add_flags and object_address_present_add_flags)
consider replacing the already-recorded dependee with the current
source object, according to some set of rules.  One rule that I think
would be useful is to compare dependency types: I think a normal
dependency is more interesting than an auto dependency which is
more interesting than an internal one.  Beyond that, though, I don't
see anything we could do but compare OIDs.  (If we do compare OIDs,
then the result should be stable with or without pre-sorting of the
pg_depend entries.)

I also noticed some places where the output reports a different number
of objects dropped by a cascade.  This happens when a table column
is reached by some dependency path, while the whole table is reached
by another one.  If we find the whole table first, then when we find
the table column we just ignore it.  But in the other order, they're
reported as two separate droppable objects.  The reasoning is
explained in object_address_present_add_flags:

 * We get here if we find a need to delete a whole table after
 * having already decided to drop one of its columns.  We
 * can't report that the whole object is in the array, but we
 * should mark the subobject with the whole object's flags.
 *
 * It might seem attractive to physically delete the column's
 * array entry, or at least mark it as no longer needing
 * separate deletion.  But that could lead to, e.g., dropping
 * the column's datatype before we drop the table, which does
 * not seem like a good idea.  This is a very rare situation
 * in practice, so we just take the hit of doing a separate
 * DROP COLUMN action even though we know we're gonna delete
 * the table later.

I think this reasoning is sound, and we should continue to do the separate
DROP COLUMN step, but it occurs to me that just because we drop the column
separately doesn't mean we have to report it separately to the user.
I propose that we handle this case by adding a new DEPFLAG_IS_SUBOBJECT
flag to the column object's flags, denoting that we know the whole table
will be dropped later.  The only effect of this flag is to suppress
reporting of the column object in reportDependentObjects.

Another order-dependent effect that can be seen in the regression tests
comes from the fact that the first loop in findDependentObjects (over
the target's referenced objects) errors out immediately on the first

Re: Protect syscache from bloating with negative cache entries

2019-01-17 Thread Bruce Momjian
On Thu, Jan 17, 2019 at 11:33:35AM -0500, Robert Haas wrote:
> The flaw in your thinking, as it seems to me, is that in your concern
> for "the likelihood that cache flushes will simply remove entries
> we'll soon have to rebuild," you're apparently unwilling to consider
> the possibility of workloads where cache flushes will remove entries
> we *won't* soon have to rebuild.  Every time that issue gets raised,
> you seem to blow it off as if it were not a thing that really happens.
> I can't make sense of that position.  Is it really so hard to imagine
> a connection pooler that switches the same connection back and forth
> between two applications with different working sets?  Or a system
> that keeps persistent connections open even when they are idle?  Do
> you really believe that a connection that has not accessed a cache
> entry in 10 minutes still derives more benefit from that cache entry
> than it would from freeing up some memory?

Well, I think everyone agrees there are workloads that cause undesired
cache bloat.  What we have not found is a solution that doesn't cause
code complexity or undesired overhead, or one that >1% of users will
know how to use.

Unfortunately, because we have not found something we are happy with, we
have done nothing.  I agree LRU can be expensive.  What if we do some
kind of clock sweep and expiration like we do for shared buffers?  I
think the trick is figuring how frequently to do the sweep.  What if we
mark entries as unused every 10 queries, mark them as used on first use,
and delete cache entries that have not be used in the past 10 queries.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Feature: temporary materialized views

2019-01-17 Thread Tom Lane
Andreas Karlsson  writes:
> On 1/17/19 4:57 PM, Tom Lane wrote:
>> What is the stumbling block to just leaving that alone?

> I think the issue Mitar ran into is that the temporary materialized view 
> is created in the rStartup callback of the receiver which happens after 
> SECURITY_RESTRICTED_OPERATION is set in ExecCreateTableAs(), so the 
> creation of the view itself is denied.

Hm.

>  From a cursory glance it looks like it would be possible to move the 
> setting of SECURITY_RESTRICTED_OPERATION to inside the rStartup 
> callabck, other than that the code for resetting the security context 
> might get a bit ugly. Do you see any flaws with that solution?

Don't think that works: the point here is to restrict what can happen
during planning/execution of the view query, so letting planning and
query startup happen first is no good.

Creating the view object inside the rStartup callback is itself pretty
much of a kluge; you'd expect that to happen earlier.  I think the
reason it was done that way was it was easier to find out the view's
column set there, but I'm sure we can find another way --- doing the
object creation more like a regular view does it is the obvious approach.

regards, tom lane



Re: Proposal for Signal Detection Refactoring

2019-01-17 Thread Chris Travers
On Thu, Jan 17, 2019 at 7:08 PM Andres Freund  wrote:

> Hi,
>
> On 2018-10-09 16:04:35 +0200, Peter Eisentraut wrote:
> > More generally, I'd like this material to be code comments.  It's the
> > kind of stuff that gets outdated before long if it's kept separate.
>
> I'm not sure I buy this here - we don't have (but perhaps should?) a
> convenient location for an overview comment around this. There's no
> "signal_handling.c" where it'd clearly belong - given the lack of a
> clear point to look to, I don't think a README.SIGNAL_HANDLING would get
> out-of-date more quickly than code comments in mildly related place (say
> postgres.c or miscinit.c) would get out of date at a different pace.
>

The other point is that "This is the way to do it" is a little easier when
in a README rather than in code comments sine those might be scattered in a
bunch of different places.

>
> Greetings,
>
> Andres Freund
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: ArchiveEntry optional arguments refactoring

2019-01-17 Thread Andres Freund
Hi,

On 2019-01-17 09:29:04 -0800, Andres Freund wrote:
> On 2019-01-17 10:23:39 -0500, Tom Lane wrote:
> > I don't buy the argument that this would move the goalposts in terms
> > of how much work it is to add a new argument.  You'd still end up
> > touching every call site.
> 
> Why? A lot of arguments that'd be potentially added or removed would not
> be set by each callsites.
> 
> If you e.g. look at
> 
> you can see that a lot of changes where like
> ArchiveEntry(fout, nilCatalogId, createDumpId(),
>  "pg_largeobject", NULL, NULL, "",
> -false, "pg_largeobject", SECTION_PRE_DATA,
> +"pg_largeobject", SECTION_PRE_DATA,
>  loOutQry->data, "", NULL,
>  NULL, 0,
>  NULL, NULL);
> 
> i.e. just removing a 'false' argument. In like 70+ callsites. With the
> above scheme, we'd instead just have removed a single .withoids = true,
> from dumpTableSchema()'s ArchiveEntry() call.

the "at" I was trying to reference above is
578b229718e8f15fa779e20f086c4b6bb3776106 / the WITH OID removal, and
therein specifically the pg_dump changes.

Greetings,

Andres Freund



Re: ArchiveEntry optional arguments refactoring

2019-01-17 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-01-17 10:23:39 -0500, Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > On 2019-Jan-16, Dmitry Dolgov wrote:
> > >> ArchiveEntry((ArchiveArgs){.tablespace = 3,
> > >> .dumpFn = somefunc,
> > >> ...});
> >
> > > Is there real savings to be had by doing this?  What would be the
> > > arguments to each function?  Off-hand, I'm not liking this idea too
> > > much.
> >
> > I'm not either.  What this looks like it will mainly do is create
> > a back-patching barrier, with little if any readability improvement.
> 
> I don't really buy this. We've whacked around the arguments to
> ArchiveEntry() repeatedly over the last few releases, so there's already
> a hindrance to backpatching. And given the current setup we've to whack
> around all 70+ callsites whenever a single argument is added. With the
> setup I'd suggested you don't, because the designated initializer syntax
> allows you to omit items that ought to be zero-initialized.
> 
> And given the number of arguments to ArchiveEntry() having a name for
> each argument would help for readability too. It's currently not exactly
> obvious what is an argument for what:
>   ArchiveEntry(AH, nilCatalogId, createDumpId(),
>"ENCODING", NULL, NULL, "",
>"ENCODING", SECTION_PRE_DATA,
>qry->data, "", NULL,
>NULL, 0,
>NULL, NULL);
> 
> If you compare that with
> 
>   ArchiveEntry(AH,
>  (ArchiveEntry){.catalogId = nilCatalogId,
> .dumpId = createDumpId(),
> .tag = "ENCODING",
> .desc = "ENCODING",
> .section = SECTION_PRE_DATA,
> .defn = qry->data});
> 
> it's definitely easier to see what argument is what.

+1.  I was on the fence about this approach when David started using it
in pgBackRest but I've come to find that it's actually pretty nice and
being able to omit things which should be zero/default is very nice.  I
feel like it's quite similar to what we do in other places too- just
look for things like:

utils/adt/jsonfuncs.c:600

sem = palloc0(sizeof(JsonSemAction));

...

sem->semstate = (void *) state;
sem->array_start = okeys_array_start;
sem->scalar = okeys_scalar;
sem->object_field_start = okeys_object_field_start;
/* remainder are all NULL, courtesy of palloc0 above */

pg_parse_json(lex, sem);

...

pfree(sem);

> > I don't buy the argument that this would move the goalposts in terms
> > of how much work it is to add a new argument.  You'd still end up
> > touching every call site.
> 
> Why? A lot of arguments that'd be potentially added or removed would not
> be set by each callsites.
> 
> If you e.g. look at
> 
> you can see that a lot of changes where like
> ArchiveEntry(fout, nilCatalogId, createDumpId(),
>  "pg_largeobject", NULL, NULL, "",
> -false, "pg_largeobject", SECTION_PRE_DATA,
> +"pg_largeobject", SECTION_PRE_DATA,
>  loOutQry->data, "", NULL,
>  NULL, 0,
>  NULL, NULL);
> 
> i.e. just removing a 'false' argument. In like 70+ callsites. With the
> above scheme, we'd instead just have removed a single .withoids = true,
> from dumpTableSchema()'s ArchiveEntry() call.

Agreed.  Using this approach in more places, when appropriate and
sensible, seems like a good direction to go in.  To be clear, I don't
think we should go rewrite pieces of code just for the sake of it as
that would make back-patching more difficult, but when we're making
changes anyway, or where it wouldn't really change the landscape for
back-patching, then it seems like a good change.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Proposal for Signal Detection Refactoring

2019-01-17 Thread Andres Freund
Hi,

On 2018-10-09 16:04:35 +0200, Peter Eisentraut wrote:
> More generally, I'd like this material to be code comments.  It's the
> kind of stuff that gets outdated before long if it's kept separate.

I'm not sure I buy this here - we don't have (but perhaps should?) a
convenient location for an overview comment around this. There's no
"signal_handling.c" where it'd clearly belong - given the lack of a
clear point to look to, I don't think a README.SIGNAL_HANDLING would get
out-of-date more quickly than code comments in mildly related place (say
postgres.c or miscinit.c) would get out of date at a different pace.

Greetings,

Andres Freund



Re: Feature: temporary materialized views

2019-01-17 Thread Andreas Karlsson

On 1/17/19 4:57 PM, Tom Lane wrote:

Andreas Karlsson  writes:

On 1/11/19 8:47 PM, Mitar wrote:

Is it really ok to just remove SECURITY_RESTRICTED_OPERATION from
ExecCreateTableAs()?



The comment there said that this is not really necessary for security:
"This is not necessary for security, but this keeps the behavior
similar to REFRESH MATERIALIZED VIEW.  Otherwise, one could create a
materialized view not possible to refresh."



Hm, I am still not convinced just removing it is a good idea. Sure, it
is not a security issue but usability is also important.


Indeed.  I don't buy the argument that this should work differently
for temp views.  The fact that they're only accessible in the current
session is no excuse for that: security considerations still matter,
because you can have different privilege contexts within a single
session (consider SECURITY DEFINER functions etc).

What is the stumbling block to just leaving that alone?


I think the issue Mitar ran into is that the temporary materialized view 
is created in the rStartup callback of the receiver which happens after 
SECURITY_RESTRICTED_OPERATION is set in ExecCreateTableAs(), so the 
creation of the view itself is denied.


From a cursory glance it looks like it would be possible to move the 
setting of SECURITY_RESTRICTED_OPERATION to inside the rStartup 
callabck, other than that the code for resetting the security context 
might get a bit ugly. Do you see any flaws with that solution?


Andreas



Re: WIP: Avoid creation of the free space map for small tables

2019-01-17 Thread John Naylor
On Wed, Jan 16, 2019 at 10:35 PM Amit Kapila  wrote:
> Yes, I think it would be good if you can explain the concept of
> local-map with the help of this example.

> Then let's not add a reference to the version number in this case.  I

Okay, done in v14. I kept your spelling of the new macro. One minor
detail added: use uint8 rather than char for the local map array. This
seems to be preferred, especially in this file.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 38bc2d9b4ae3f0d1ab6d21c54ed638c5aee6d637 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Thu, 17 Jan 2019 12:25:30 -0500
Subject: [PATCH v14 1/2] Avoid creation of the free space map for small heap 
 relations.

Previously, all heaps had FSMs. For very small tables, this means that
the FSM took up more space than the heap did. This is wasteful, so now
we refrain from creating the FSM for heaps with 4 pages or fewer. If the
last known target block has insufficient space, we still try to insert
into some other page before before giving up and extending the relation,
since doing otherwise leads to table bloat. Testing showed that trying
every page penalized performance slightly, so we compromise and try
every other page. This way, we visit at most two pages. Any pages with
wasted free space become visible at next relation extension, so we still
control table bloat. As a bonus, directly attempting one or two pages
can even be faster than consulting the FSM would have been.
---
 contrib/pageinspect/expected/page.out |  77 +++---
 contrib/pageinspect/sql/page.sql  |  33 +--
 doc/src/sgml/storage.sgml |  13 +-
 src/backend/access/brin/brin.c|   2 +-
 src/backend/access/brin/brin_pageops.c|  10 +-
 src/backend/access/heap/hio.c |  47 ++--
 src/backend/access/heap/vacuumlazy.c  |  17 +-
 src/backend/access/transam/xact.c |  14 ++
 src/backend/storage/freespace/README  |  32 ++-
 src/backend/storage/freespace/freespace.c | 272 +-
 src/backend/storage/freespace/indexfsm.c  |   6 +-
 src/include/storage/freespace.h   |   9 +-
 src/test/regress/expected/fsm.out |  62 +
 src/test/regress/parallel_schedule|   6 +
 src/test/regress/serial_schedule  |   1 +
 src/test/regress/sql/fsm.sql  |  46 
 16 files changed, 545 insertions(+), 102 deletions(-)
 create mode 100644 src/test/regress/expected/fsm.out
 create mode 100644 src/test/regress/sql/fsm.sql

diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 3fcd9fbe6d..83e5910453 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -1,48 +1,69 @@
 CREATE EXTENSION pageinspect;
-CREATE TABLE test1 (a int, b int);
-INSERT INTO test1 VALUES (16777217, 131584);
-VACUUM test1;  -- set up FSM
+CREATE TABLE test_rel_forks (a int);
+-- Make sure there are enough blocks in the heap for the FSM to be created.
+INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g;
+-- set up FSM and VM
+VACUUM test_rel_forks;
 -- The page contents can vary, so just test that it can be read
 -- successfully, but don't keep the output.
-SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0;
  main_0 
 
8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1;
-ERROR:  block number 1 is out of range for relation "test1"
-SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
+ERROR:  block number 100 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
  fsm_0 
 ---
   8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1;
- fsm_1 

-  8192
-(1 row)
-
-SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
+ERROR:  block number 10 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0;
  vm_0 
 --
  8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1;
-ERROR:  block number 1 is out of range for relation "test1"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1;
+ERROR:  block number 1 is out of range for relation "test_rel_forks"
 SELECT octet_length(get_raw_page('xxx', 'main', 0));
 ERROR:  relation "xxx" does not exist
-SELECT octet_length(get_raw_page('test1', 'xxx', 0));
+SELECT octet_length(get_raw_page('test_rel_forks', 'xxx', 0));
 ERROR:  invalid fork name
 HINT:  Valid fork names are "main", "fsm", "vm", and "init".
-SELECT get_raw_page('test1', 0) = get_raw_page('test1', 'main', 0);
+SELECT * FROM 

Re: Proposal for Signal Detection Refactoring

2019-01-17 Thread Andres Freund
Hi,

On 2019-01-17 10:50:56 +0100, Chris Travers wrote:
> diff --git a/src/backend/utils/init/globals.c 
> b/src/backend/utils/init/globals.c
> index c6939779b9..5ed715589e 100644
> --- a/src/backend/utils/init/globals.c
> +++ b/src/backend/utils/init/globals.c
> @@ -27,12 +27,35 @@
>  
>  ProtocolVersion FrontendProtocol;
>  
> +/*
> + * Signal Handling variables here are set in signal handlers and can be
> + * checked by code to determine the implications of calling
> + * CHECK_FOR_INTERRUPTS(). While all are supported, in practice
> + * InterruptPending, QueryCancelPending, and ProcDiePending appear to be
> + * sufficient for almost all use cases.
> + */

Especially the latter part of comment seems like a bad idea.


> +/* Whether CHECK_FOR_INTERRUPTS will do anything */
>  volatile sig_atomic_t InterruptPending = false;

That's not actually a correct description. CFI is dependent on other
things than just InterruptPending, namely HOLD_INTERRUPTS() (even though
it's inside ProcessInterrupts()). It also always does more on windows.
I think the variable name is at least as good a description as the
comment you added.


> +/* Whether we are planning on cancelling the current query */
>  volatile sig_atomic_t QueryCancelPending = false;

> +/* Whether we are planning to exit the current process when safe to do so.*/
>  volatile sig_atomic_t ProcDiePending = false;
> +
> +/* Whether we have detected a lost client connection */
>  volatile sig_atomic_t ClientConnectionLost = false;
> +
> +/*
> + * Whether the process is dying because idle transaction timeout has been
> + * reached.
> + */
>  volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false;
> +
> +/* Whether we will reload the configuration at next opportunity */
>  volatile sig_atomic_t ConfigReloadPending = false;
> +

These comments all seem to add no information above the variable name.


I'm not quite understanding what this patch is intended to solve, sorry.

Greetings,

Andres Freund



Re: ArchiveEntry optional arguments refactoring

2019-01-17 Thread Andres Freund
On 2019-01-17 10:23:39 -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > On 2019-Jan-16, Dmitry Dolgov wrote:
> >> ArchiveEntry((ArchiveArgs){.tablespace = 3,
> >> .dumpFn = somefunc,
> >> ...});
>
> > Is there real savings to be had by doing this?  What would be the
> > arguments to each function?  Off-hand, I'm not liking this idea too
> > much.
>
> I'm not either.  What this looks like it will mainly do is create
> a back-patching barrier, with little if any readability improvement.

I don't really buy this. We've whacked around the arguments to
ArchiveEntry() repeatedly over the last few releases, so there's already
a hindrance to backpatching. And given the current setup we've to whack
around all 70+ callsites whenever a single argument is added. With the
setup I'd suggested you don't, because the designated initializer syntax
allows you to omit items that ought to be zero-initialized.

And given the number of arguments to ArchiveEntry() having a name for
each argument would help for readability too. It's currently not exactly
obvious what is an argument for what:
ArchiveEntry(AH, nilCatalogId, createDumpId(),
 "ENCODING", NULL, NULL, "",
 "ENCODING", SECTION_PRE_DATA,
 qry->data, "", NULL,
 NULL, 0,
 NULL, NULL);

If you compare that with

ArchiveEntry(AH,
 (ArchiveEntry){.catalogId = nilCatalogId,
.dumpId = createDumpId(),
.tag = "ENCODING",
.desc = "ENCODING",
.section = SECTION_PRE_DATA,
.defn = qry->data});

it's definitely easier to see what argument is what.


> I don't buy the argument that this would move the goalposts in terms
> of how much work it is to add a new argument.  You'd still end up
> touching every call site.

Why? A lot of arguments that'd be potentially added or removed would not
be set by each callsites.

If you e.g. look at

you can see that a lot of changes where like
ArchiveEntry(fout, nilCatalogId, createDumpId(),
 "pg_largeobject", NULL, NULL, "",
-false, "pg_largeobject", SECTION_PRE_DATA,
+"pg_largeobject", SECTION_PRE_DATA,
 loOutQry->data, "", NULL,
 NULL, 0,
 NULL, NULL);

i.e. just removing a 'false' argument. In like 70+ callsites. With the
above scheme, we'd instead just have removed a single .withoids = true,
from dumpTableSchema()'s ArchiveEntry() call.

Greetings,

Andres Freund



Re: Protect syscache from bloating with negative cache entries

2019-01-17 Thread Robert Haas
On Sun, Jan 13, 2019 at 11:41 AM Tom Lane  wrote:
> Putting a limit on the size of the syscaches doesn't accomplish anything
> except to add cycles if your cache working set is below the limit, or
> make performance fall off a cliff if it's above the limit.

If you're running on a Turing machine, sure.  But real machines have
finite memory, or at least all the ones I use do.  Horiguchi-san is
right that this is a real, not theoretical problem.  It is one of the
most frequent operational concerns that EnterpriseDB customers have.
I'm not against solving specific cases with more targeted fixes, but I
really believe we need something more.  Andres mentioned one problem
case: connection poolers that eventually end up with a cache entry for
every object in the system.  Another case is that of people who keep
idle connections open for long periods of time; those connections can
gobble up large amounts of memory even though they're not going to use
any of their cache entries any time soon.

The flaw in your thinking, as it seems to me, is that in your concern
for "the likelihood that cache flushes will simply remove entries
we'll soon have to rebuild," you're apparently unwilling to consider
the possibility of workloads where cache flushes will remove entries
we *won't* soon have to rebuild.  Every time that issue gets raised,
you seem to blow it off as if it were not a thing that really happens.
I can't make sense of that position.  Is it really so hard to imagine
a connection pooler that switches the same connection back and forth
between two applications with different working sets?  Or a system
that keeps persistent connections open even when they are idle?  Do
you really believe that a connection that has not accessed a cache
entry in 10 minutes still derives more benefit from that cache entry
than it would from freeing up some memory?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Feature: temporary materialized views

2019-01-17 Thread Tom Lane
Andreas Karlsson  writes:
> On 1/11/19 8:47 PM, Mitar wrote:
>>> Is it really ok to just remove SECURITY_RESTRICTED_OPERATION from
>>> ExecCreateTableAs()?

>> The comment there said that this is not really necessary for security:
>> "This is not necessary for security, but this keeps the behavior
>> similar to REFRESH MATERIALIZED VIEW.  Otherwise, one could create a
>> materialized view not possible to refresh."

> Hm, I am still not convinced just removing it is a good idea. Sure, it 
> is not a security issue but usability is also important.

Indeed.  I don't buy the argument that this should work differently
for temp views.  The fact that they're only accessible in the current
session is no excuse for that: security considerations still matter,
because you can have different privilege contexts within a single
session (consider SECURITY DEFINER functions etc).

What is the stumbling block to just leaving that alone?

regards, tom lane



Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-17 Thread James Coleman
On Wed, Jan 16, 2019 at 8:49 AM James Coleman  wrote:
>
> On Tue, Jan 15, 2019 at 11:37 PM Tom Lane  wrote:
> > Well, as I said upthread, it seems like we need to think a bit more
> > carefully about what it is that clause_is_strict_for is testing ---
> > and if that ends up finding that some other name is more apposite,
> > I'd not have any hesitation about renaming it.  But we're really
> > missing a bet if the ScalarArrayOp-specific check isn't inside that
> > recursive check of an "atomic" expression's properties.  The
> > routines above there only recurse through things that act like
> > AND or OR, but clause_is_strict_for is capable of descending
> > through other stuff as long as it's strict in some sense.  What
> > we need to be clear about here is exactly what that sense is.
>
> All right, I'll look over all of the other callers and see what makes
> sense as far as naming (or perhaps consider a new parallel function;
> unsure at this point.)

I investigated all of the callers of clause_is_strict_for and
confirmed they fall into two buckets:
- Recursive case inside the function itself.
- Callers proving a variant of IS [NOT] NULL.

Given all cases appear to be safe to extend to scalar array ops, I've
tentatively renamed the function clause_proved_for_null_test and moved
the code back into that function rather than be in the caller. This
also guarantees that beyond proving implication of IS NOT NULL and
refutation of IS NULL we also get proof of weak refutation of strict
predicates (since IS NOT NULL refutes them and we strongly imply IS
NOT NULL); I've added tests for this case.

I added many more comments explaining the proofs here, but it boils
down to the fact that non-empty array ops are really just a strict
clause, and the empty array case isn't strict, but when not returning
NULL returns false, and therefore we can still make the same proofs.

Since the empty array case is really the only interesting one, I
brought back the empty array tests, but modified them to use
calculated/non-constant arrays so that they actually hit the new code
paths. I also verified that the proofs they make are a subset of the
ones we get from existing code for constant empty arrays (it makes
sense we'd be able to prove less about possibly empty arrays than
known empty arrays.)

> I might also try to see if we can edit the tests slightly to require
> the recursion case to be exercised.

I didn't tackle this, but if you think it would add real value, then I
can look into it further.

I also updated the commit message to better match the more extended
implications of this change over just the IS NOT NULL case I was
originally pursuing.

One other thing: for known non-empty arrays we could further extend
this to prove that an IS NULL clause weakly implies the ScalarArrayOp.
I'm not sure this is worth is it; if someone things otherwise let me
know.

James Coleman


saop_is_not_null-v7.patch
Description: Binary data


Re: Early WIP/PoC for inlining CTEs

2019-01-17 Thread Tom Lane
Andreas Karlsson  writes:
> On 1/11/19 8:10 PM, Robert Haas wrote:
>> WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...

> Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the 
> usefulness of forcing inlining other than if we by default do not inline 
> when a CTE is referenced multiple times.

I'm also concerned about what we do if the user says NOT MATERIALIZED
but there are semantic or implementation reasons not to inline.  Either
we throw an error or do something the user didn't expect, and neither
is very nice.  So I'm not in favor of having that option.

regards, tom lane



Re: Feature: temporary materialized views

2019-01-17 Thread Andreas Karlsson

On 1/11/19 8:47 PM, Mitar wrote:

In create_ctas_internal() why do you copy the relation even when you do
not modify it?


I was modelling this after code in view.c [1]. I can move copy into the "if".


Makes sense.


Is it really ok to just remove SECURITY_RESTRICTED_OPERATION from
ExecCreateTableAs()? I feel it is there for a good reason and that we
preferably want to reduce the duration of SECURITY_RESTRICTED_OPERATION
to only include when we actually execute the query.


The comment there said that this is not really necessary for security:

"This is not necessary for security, but this keeps the behavior
similar to REFRESH MATERIALIZED VIEW.  Otherwise, one could create a
materialized view not possible to refresh."

Based on my experimentation, this is required to be able to use
temporary materialized views, but it does mean one has to pay
attention from where one can refresh. For example, you cannot refresh
from outside of the current session, because temporary object is not
available there. I have not seen any other example where refresh would
not be possible.

This is why I felt comfortable removing this. Also, no test failed
after removing this.


Hm, I am still not convinced just removing it is a good idea. Sure, it 
is not a security issue but usability is also important. The question is 
how much this worsens usability and how much extra work it would be to 
keep the restriction.


Btw, if we are going to remove SECURITY_RESTRICTED_OPERATION we should 
remove more code. There is no reason to save and reset the bitmask if we 
do not alter it.


Andreas



Re: Early WIP/PoC for inlining CTEs

2019-01-17 Thread Andreas Karlsson

On 1/11/19 8:10 PM, Robert Haas wrote:

WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...


Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the 
usefulness of forcing inlining other than if we by default do not inline 
when a CTE is referenced multiple times.


Do you imaging it working something like the below?

1. Default

# Not inlined

- Referenced multiple times
- Includes FOR UPDATE|SHARE
- Includes volatile functions
- Recurisve
- DML

# Inlined

- Simple case (no side effects, referenced once)

2. MATERIALIZED

# Not inlined

- Everything

# Inlined

- (none)

3. NOT MATERIALIZED

# Not inlined

- Recursive
- DML

# Inlined

- Everything else

Andreas




Re: ArchiveEntry optional arguments refactoring

2019-01-17 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jan-16, Dmitry Dolgov wrote:
>> ArchiveEntry((ArchiveArgs){.tablespace = 3,
>> .dumpFn = somefunc,
>> ...});

> Is there real savings to be had by doing this?  What would be the
> arguments to each function?  Off-hand, I'm not liking this idea too
> much.

I'm not either.  What this looks like it will mainly do is create
a back-patching barrier, with little if any readability improvement.

I don't buy the argument that this would move the goalposts in terms
of how much work it is to add a new argument.  You'd still end up
touching every call site.

regards, tom lane



Re: Acceptable/Best formatting of callbacks (for pluggable storage)

2019-01-17 Thread Alvaro Herrera
On 2019-Jan-11, Andres Freund wrote:

> Just as an example of why I think this isn't great:

Hmm, to me, the first example is much better because of *vertical* space
-- I can have the whole API in one screenful.  In the other example, the
same number of functions take many more lines.  The fact that the
arguments are indented differently doesn't bother me.


> typedef Size (*EstimateDSMForeignScan_function) (ForeignScanState *node,
>  ParallelContext *pcxt);
> typedef void (*InitializeDSMForeignScan_function) (ForeignScanState *node,
>ParallelContext *pcxt,
>void *coordinate);
> typedef void (*ReInitializeDSMForeignScan_function) (ForeignScanState *node,
>  ParallelContext *pcxt,
>  void *coordinate);
> typedef void (*InitializeWorkerForeignScan_function) (ForeignScanState *node,
>   shm_toc *toc,
>   void *coordinate);
> typedef void (*ShutdownForeignScan_function) (ForeignScanState *node);
> typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
> RelOptInfo *rel,
> RangeTblEntry *rte);
> typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
> List *fdw_private,
> RelOptInfo 
> *child_rel);
> 
> that's a lot of indentation variability in a small amount of space - I
> find it noticably slower to mentally parse due to that. Compare that
> with:
> 
> typedef Size (*EstimateDSMForeignScan_function)
> (ForeignScanState *node,
>  ParallelContext *pcxt);
> 
> typedef void (*InitializeDSMForeignScan_function)
> (ParallelContext *pcxt,
>  void *coordinate);
> 
> typedef void (*ReInitializeDSMForeignScan_function)
> (ForeignScanState *node,
>  ParallelContext *pcxt,
>  void *coordinate);
> 
> typedef void (*InitializeWorkerForeignScan_function)
> (ForeignScanState *node,
>  shm_toc *toc,
>  void *coordinate);
> 
> typedef void (*ShutdownForeignScan_function)
> (ForeignScanState *node);
> 
> typedef bool (*IsForeignScanParallelSafe_function)
> (PlannerInfo *root,
>  RelOptInfo *rel,
>  RangeTblEntry *rte);
> 
> typedef List *(*ReparameterizeForeignPathByChild_function)
> (PlannerInfo *root,
>  List *fdw_private,
>  RelOptInfo *child_rel);
> 
> I find the second formatting considerably easier to read, albeit not for
> the first few seconds.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ArchiveEntry optional arguments refactoring

2019-01-17 Thread Alvaro Herrera
On 2019-Jan-16, Dmitry Dolgov wrote:


> Proposed idea is to refactor out all/optional arguments into a separate data
> structure, so that adding/removing a new argument wouldn't change that much of
> unrelated code. Then for every invocation of ArchiveEntry this structure needs
> to be prepared before the call, or as Andres suggested:
> 
> ArchiveEntry((ArchiveArgs){.tablespace = 3,
>.dumpFn = somefunc,
>...});

Prepping the struct before the call would be our natural style, I think.
This one where the struct is embedded in the function call does not look
*too* horrible, but I'm curious as to what does pgindent do with it.

> Another suggestion from Amit is to have an ArchiveEntry() function with 
> limited
> number of parameters, and an ArchiveEntryEx() with those extra parameters 
> which
> are not needed in usual cases.

Is there real savings to be had by doing this?  What would be the
arguments to each function?  Off-hand, I'm not liking this idea too
much.  But maybe we can combine both ideas and have one "normal"
function with only the most common args, and create ArchiveEntryExtended
to use the struct as proposed by Andres.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PROPOSAL] Shared Ispell dictionaries

2019-01-17 Thread Arthur Zakirov

I attached files of new version of the patch, I applied your tweaks.

> XXX All dictionaries, but only when there's invalid dictionary?

I've made a little optimization. I introduced hashvalue into 
TSDictionaryCacheEntry. Now released only DSM of altered or dropped 
dictionaries.


>  > /* XXX not really a pointer, so the name is misleading */
>
> I think we don't need DictPointerData struct anymore, because only
> ts_dict_shmem_release function needs it (see comments above) and we only
> need it to hash search. I'll move all fields of DictPointerData to
> TsearchDictKey struct.

I was wrong, DictInitData also needs DictPointerData. I didn't remove 
DictPointerData, I renamed it to DictEntryData. Hope that it is a more 
appropriate name.


--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
>From c20c171c2107efc6f87b688af0feecf2f98fcd69 Mon Sep 17 00:00:00 2001
From: Arthur Zakirov 
Date: Thu, 17 Jan 2019 14:27:32 +0300
Subject: [PATCH 1/4] Fix-ispell-memory-handling

---
 src/backend/tsearch/spell.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index eb39466b22..eb8416ce7f 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -78,6 +78,8 @@
 #define tmpalloc(sz)  MemoryContextAlloc(Conf->buildCxt, (sz))
 #define tmpalloc0(sz)  MemoryContextAllocZero(Conf->buildCxt, (sz))
 
+#define tmpstrdup(str)	MemoryContextStrdup(Conf->buildCxt, (str))
+
 /*
  * Prepare for constructing an ISpell dictionary.
  *
@@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag)
 	Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1);
 	strcpy(Conf->Spell[Conf->nspell]->word, word);
 	Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0')
-		? cpstrdup(Conf, flag) : VoidString;
+		? tmpstrdup(flag) : VoidString;
 	Conf->nspell++;
 }
 
@@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry,
 		entry->flag.i = i;
 	}
 	else
-		entry->flag.s = cpstrdup(Conf, s);
+		entry->flag.s = tmpstrdup(s);
 
 	entry->flagMode = Conf->flagMode;
 	entry->value = val;
@@ -1541,6 +1543,9 @@ nextline:
 	return;
 
 isnewformat:
+	pfree(recoded);
+	pfree(pstr);
+
 	if (oldformat)
 		ereport(ERROR,
 (errcode(ERRCODE_CONFIG_FILE_ERROR),
-- 
2.20.1

>From ca45e4ca314bdf8bed1a47796afb3e86c6fcd684 Mon Sep 17 00:00:00 2001
From: Arthur Zakirov 
Date: Thu, 17 Jan 2019 15:05:44 +0300
Subject: [PATCH 2/4] Change-tmplinit-argument

---
 contrib/dict_int/dict_int.c  |  4 +-
 contrib/dict_xsyn/dict_xsyn.c|  4 +-
 contrib/unaccent/unaccent.c  |  4 +-
 src/backend/commands/tsearchcmds.c   | 10 -
 src/backend/snowball/dict_snowball.c |  4 +-
 src/backend/tsearch/dict_ispell.c|  4 +-
 src/backend/tsearch/dict_simple.c|  4 +-
 src/backend/tsearch/dict_synonym.c   |  4 +-
 src/backend/tsearch/dict_thesaurus.c |  4 +-
 src/backend/utils/cache/ts_cache.c   | 13 +-
 src/include/tsearch/ts_cache.h   |  4 ++
 src/include/tsearch/ts_public.h  | 67 ++--
 12 files changed, 105 insertions(+), 21 deletions(-)

diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c
index 628b9769c3..ddde55eee4 100644
--- a/contrib/dict_int/dict_int.c
+++ b/contrib/dict_int/dict_int.c
@@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize);
 Datum
 dintdict_init(PG_FUNCTION_ARGS)
 {
-	List	   *dictoptions = (List *) PG_GETARG_POINTER(0);
+	DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0);
 	DictInt*d;
 	ListCell   *l;
 
@@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS)
 	d->maxlen = 6;
 	d->rejectlong = false;
 
-	foreach(l, dictoptions)
+	foreach(l, init_data->dict_options)
 	{
 		DefElem*defel = (DefElem *) lfirst(l);
 
diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c
index 509e14aee0..15b1a0033a 100644
--- a/contrib/dict_xsyn/dict_xsyn.c
+++ b/contrib/dict_xsyn/dict_xsyn.c
@@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename)
 Datum
 dxsyn_init(PG_FUNCTION_ARGS)
 {
-	List	   *dictoptions = (List *) PG_GETARG_POINTER(0);
+	DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0);
 	DictSyn*d;
 	ListCell   *l;
 	char	   *filename = NULL;
@@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS)
 	d->matchsynonyms = false;
 	d->keepsynonyms = true;
 
-	foreach(l, dictoptions)
+	foreach(l, init_data->dict_options)
 	{
 		DefElem*defel = (DefElem *) lfirst(l);
 
diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c
index fc5176e338..f3663cefd0 100644
--- a/contrib/unaccent/unaccent.c
+++ b/contrib/unaccent/unaccent.c
@@ -270,12 +270,12 @@ PG_FUNCTION_INFO_V1(unaccent_init);
 Datum
 unaccent_init(PG_FUNCTION_ARGS)
 {
-	List	   *dictoptions = (List *) PG_GETARG_POINTER(0);
+	DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0);
 	TrieChar   *rootTrie = NULL;
 	bool		fileloaded 

Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Dave Cramer
On Thu, 17 Jan 2019 at 05:59, Laurenz Albe  wrote:

> Tsunakawa, Takayuki wrote:
> > From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> > > The problem here of course is that whoever invented
> target_session_attrs
> > > was unconcerned with following that precedent, so what we have is
> > > "target_session_attrs=(any | read-write)".
> > > Are we prepared to add some aliases in service of unifying these names?
> >
> > I think "yes".
> >
> > > 2. Whether or not you want to follow pgJDBC's naming, it seems like we
> > > ought to have both "require read only" and "prefer read only" behaviors
> > > in this patch, and maybe likewise "require read write" versus "prefer
> > > read write".
>

I just had a look at the JDBC code there is no prefer read write. There is
a "preferSecondary"
The logic behind this is that the connection would presumably be only doing
reads so ideally it would like a secondary,
but if it can't find one it will connect to a primary.

To be clear there are 4 target server types in pgJDBC, "any",
"master","secondary", and "preferSecondary" (looking at this I need to
alias master to primary, but that's another discussion)

I have no idea where "I want to write but I'm OK if I cannot came from"?

Dave


Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Dave Cramer
On Wed, 16 Jan 2019 at 01:02, Tatsuo Ishii  wrote:

> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp]
> >> But pg_is_in_recovery() returns true even for a promoting standby. So
> >> you have to wait and retry to send pg_is_in_recovery() until it
> >> finishes the promotion to find out it is now a primary. I am not sure
> >> if backend out to be responsible for this process. If not, libpq would
> >> need to handle it but I doubt it would be possible.
> >
> > Yes, the application needs to retry connection attempts until success.
> That's not different from PgJDBC and other DBMSs.
>
> I don't know what PgJDBC is doing, however I think libpq needs to do
> more than just retrying.
>
> 1) Try to find a node on which pg_is_in_recovery() returns false. If
>found, then we assume that is the primary. We also assume that
>other nodes are standbys. done.
>
> 2) If there's no node on which pg_is_in_recovery() returns false, then
>we need to retry until we find it. To not retry forever, there
>should be a timeout counter parameter.
>
>
IIRC this is essentially what pgJDBC does.


Dave Cramer

da...@postgresintl.com
www.postgresintl.com

>
>


Re: Libpq support to connect to standby server as priority

2019-01-17 Thread Dave Cramer
On Tue, 15 Jan 2019 at 23:21, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Dave Cramer [mailto:p...@fastcrypt.com]
> >   The original desire should have been the ability to connect to a
> > primary or a standby.  So, I think we should go back to the original
> thinking
> > (and not complicate the feature), and create a read only GUC_REPORT
> variable,
> > say, server_role, that identifies whether the server is a primary or a
> > standby.
> >
> >
> >
> > I'm confused as to how this would work. Who or what determines if the
> server
> > is a primary or standby?
>
> Overall, the server determines the server role (primary or standby) using
> the same mechanism as pg_is_in_recovery(), and set the server_role GUC
> parameter.  As the parameter is GUC_REPORT, the change is reported to the
> clients using the ParameterStatus ('S') message.  The clients also get the
> value at connection.
>

Thanks, that clarifies it.


Dave Cramer

da...@postgresintl.com
www.postgresintl.com

>
>


Re: pg_dump multi VALUES INSERT

2019-01-17 Thread Surafel Temesgen
On Fri, Jan 4, 2019 at 3:08 PM David Rowley 
wrote:

> On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen 
> wrote:
> > On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO 
> wrote:
> >> > At first i also try to do it like that but it seems the function will
> >> > became long and more complex to me
> >>
> >> Probably. But calling it with size 100 should result in the same
> behavior,
> >> so it is really just an extension of the preceeding one? Or am I missing
> >> something?
> >>
> >
> > Specifying table data using single value insert statement and user
> specified values insert statement
> > have enough deference that demand to be separate function and they are
> not the same thing that should implement
> > with the same function. Regarding code duplication i think the solution
> is making those code separate function
> > and call at appropriate place.
>
> I don't really buy this. I've just hacked up a version of
> dumpTableData_insert() which supports a variable number rows per
> statement. It seems fairly clean and easy to me. Likely the fact that
> this is very possible greatly increases the chances of this getting in
> since it gets rid of the code duplication. I did also happen to move
> the row building code out of the function into its own function, but
> that's not really required. I just did that so I could see all the
> code in charge of terminating each statement on my screen without
> having to scroll.  I've not touched any of the plumbing work to plug
> the rows_per_statement variable into the command line argument. So
> it'll need a bit of merge work with the existing patch.   There will
> need to be some code that ensures that the user does not attempt to
> have 0 rows per statement. The code I wrote won't behave well if that
> happens.
>
>
The attache patch use your method mostly

... Checks existing patch ...
>
> I see you added a test, but missed checking for 0. That needs to be fixed.
>
> + if (dopt.dump_inserts_multiple < 0)
> + {
> + write_msg(NULL, "argument of --insert-multi must be positive number\n");
> + exit_nicely(1);
> + }
>
>
fixed

I also didn't adopt passing the rows-per-statement into the FETCH.  I
> think that's a very bad idea and we should keep that strictly at 100.
> I don't see any reason to tie the two together. If a user wants 10
> rows per statement, do we really want to FETCH 10 times more often?
> And what happens when they want 1 million rows per statement? We've no
> reason to run out of memory from this since we're just dumping the
> rows out to the archive on each row.
>
>
okay

>
> +Specify the number of values per INSERT
> command.
> +This will make the dump file smaller than
> --inserts
> +and it is faster to reload but lack per row data lost on error
> +instead entire affected insert statement data lost.
>
> Unsure what you mean about "data lost".  It also controls the number
> of "rows" per INSERT statement, not the number of
> values.
>
> I think it would be fine just to say:
>
> +When using --inserts, this allows the maximum
> number
> +of rows per INSERT statement to be specified.
> +This setting defaults to 1.
>
>
>
i change it too except "This setting defaults to 1"  because it doesn't
have default value.
1 row per statement means --inserts option .


>
> 2. Is --insert-multi a good name? What if they do --insert-multi=1?
> That's not very "multi".  Is --rows-per-insert better?
>
>
--rows-per-insert is better for me .

regards
Surafel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 9e0bb93f08..4195fb81a2 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -775,6 +775,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --rows-per-insert
+  
+   
+When using --rows-per-insert, this allows the maximum number
+of rows per INSERT statement to be specified.
+   
+  
+ 
+
  
   --load-via-partition-root
   
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 4a2e122e2d..73a243ecb0 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -72,6 +72,7 @@ typedef struct _restoreOptions
 	int			dropSchema;
 	int			disable_dollar_quoting;
 	int			dump_inserts;
+	int			dump_inserts_multiple;
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
@@ -144,6 +145,7 @@ typedef struct _dumpOptions
 	/* flags for various command-line long options */
 	int			disable_dollar_quoting;
 	int			dump_inserts;
+	int			dump_inserts_multiple;
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 0e129f9654..e49e2206e7 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -313,6 +313,7 @@ main(int argc, char **argv)
 	int			plainText = 0;
 	ArchiveFormat archiveFormat = archUnknown;
 	ArchiveMode archiveMode;
+	char   *p;
 
 	static 

  1   2   >