Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-26 Thread David Steele

On 11/26/18 10:35 AM, Abhijit Menon-Sen wrote:

At 2018-11-26 10:18:52 -0500, sfr...@snowman.net wrote:



This came originally to check that recovery.conf handles correctly
its recovery targets when multiple ones are defined with the last
one winning […]


Ugh, no, I don't agree that this property makes sense to keep and
since we're making these changes, I'd argue that it's exactly the
right time to do away with that.


I strongly agree with everything Stephen said here.


+1.

--
-David
da...@pgmasters.net



Re: Constraint documentation

2018-11-26 Thread Alvaro Herrera
I have pushed this.

On 2018-Nov-26, David Fetter wrote:

> On Thu, Nov 22, 2018 at 03:16:11PM +0100, Patrick Francelle wrote:
>
> > To address your remark, I added a small message in the CREATE TABLE
> > reference page to be more explicit about the topic, so that it would be
> > a warning for the users reading the section. And then a reference to the
> > CHECK constraint page where the full explanation is to be located.
> > 
> > That way, the caveat is mentioned in both pages, but the full
> > explanation is located only on a single page.

That was a good idea, but your third sentence repeated what was being
said in the first sentence in the same paragraph.  I edited that to put
the cross-reference next to the first sentence instead.

> I believe that features F671 (subqueries in CHECK constraints) and
> possibly F673 (reads SQL-data routine invocations in CHECK
> constraints) from the standard should be referred to here.
> 
> We haven't implemented either one of them, but we might some day.

I don't necessarily disagree, but I don't think we put many feature
references in the docs.  I suppose we can edit it when we implement
F671 and F673.  Or maybe you want to submit a followup patch.  It didn't
seem worth blocking this patch for your proposed change (particularly
since Lætitia seems to have given up on it already).

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-26 Thread Abhijit Menon-Sen
At 2018-11-26 10:18:52 -0500, sfr...@snowman.net wrote:
>
> > This came originally to check that recovery.conf handles correctly
> > its recovery targets when multiple ones are defined with the last
> > one winning […]
> 
> Ugh, no, I don't agree that this property makes sense to keep and
> since we're making these changes, I'd argue that it's exactly the
> right time to do away with that.

I strongly agree with everything Stephen said here.

-- Abhijit



Re: csv format for psql

2018-11-26 Thread David G. Johnston
On Sun, Nov 25, 2018 at 6:27 PM Tom Lane  wrote:
> I think there are two remaining points to settle:
>
> 1. Are we limiting the separator to be a single-byte character or not?

I agree with what others have said that expanding functionality in
this direction is more likely to mask errors than be useful.

> I'm a bit inclined to the former viewpoint.
> If we were in the business of being restrictive, why would we allow
> the field separator to be changed at all?  The name of the format is
> *comma* separated values, not something else.

I still stand by the more inclusive, and arguably modern, name
"character separated values" for the abbreviation...which can be taken
to mean any single character quite easily and is how it appears to be
used these days in any case.

> 2. Speaking of the field separator, I'm pretty desperately unhappy
> with the choice of "fieldsep_csv" as the parameter name.[...]
> We could avoid this self-inflicted confusion by choosing a different
> parameter name.  I'd be good with "csv_fieldsep" or "csvfieldsep".

Make sense to me - with the underscore personally.

David J.



Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-26 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Mon, Nov 26, 2018 at 02:06:36PM +0100, Peter Eisentraut wrote:
> > What is the reason for allowing multiple recovery_target_* settings and
> > taking the last one?
> 
> This came originally to check that recovery.conf handles correctly its
> recovery targets when multiple ones are defined with the last one
> winning, as users may want to append additional targets in an existing
> recovery.conf.  I have not checked the code in details, but I think that
> we should be careful to keep that property, even with EXEC_BACKEND.

Ugh, no, I don't agree that this property makes sense to keep and since
we're making these changes, I'd argue that it's exactly the right time
to do away with that.

I would think we'd have the different GUCs and then the check functions
would only validate that they're valid inputs and then when we get to
the point where we're starting to do recovery we check and make sure
we've been given a sane overall configuration- which means that only
*one* is set, and it matches the recovery target requested.

As with backup, recovery should be extremely clear and straight-forward,
with alarms going off if there's something unclear or inconsistent.  One
of the arguments around using postgresql.auto.conf (or any other managed
config file, as discussed elsewhere) is that it can be machine-processed
and therefore tooling around it can make sure that what goes into that
file is correct and sane and doesn't need to rely on being able to just
append things to the file and hope that it works.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Updated backup APIs for non-exclusive backups

2018-11-26 Thread Stephen Frost
Greetings,

* David Steele (da...@pgmasters.net) wrote:
> On 11/26/18 12:31 AM, Laurenz Albe wrote:
> >If there is a crash during the backup procedure, the backup is bad.
> >Doesn't matter during which part of the backup procedure it happens.
> 
> Yes, but in this case with exclusive backups your cluster also will not
> start.  That's a lot different than a bad backup because you should have
> many backups and a way to mark when they are complete.  If your cluster
> requires manual intervention to start because a backup was in progress
> that's not a good thing.
> 
> I'm of the opinion that we should remove the exclusive option for PG12. The
> recovery changes recently committed will completely break any automated
> recovery written for PG <= 11 so we might as go all the way.

Yeah, after chatting with David about that this morning, I agree.
Breaking everything around how restores works means that anyone doing
backups today needs to re-evaluate their restore procedures, even if
it's not an automated script, so it makes sense to also remove the
deprecated exclusive backup mode at the same time.

> Requiring non-exclusive backup will be an easy change for tools that already
> understand non-exclusive backup, which has been available in the last three
> versions of Postgres.
> 
> The exclusive backup method tries to make a hard problem look simple but
> does not succeed and causes issues besides.  Performing backups properly
> *is* hard and we don't do anyone a service by making it look like something
> can be done with a trivial bash script.

+1

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Updated backup APIs for non-exclusive backups

2018-11-26 Thread David Steele

On 11/26/18 12:31 AM, Laurenz Albe wrote:

On Sun, 2018-11-25 at 16:04 -0500, Stephen Frost wrote:

There isn’t any need to write the backup label before you restore the database,
just as you write recovery.conf then.


Granted.
But it is pretty convenient, and writing it to the data directory right away
is a good thing on top, because it reduces the danger of inadvertedly
starting the backup without recovery.


Writing it into the data directory is *not* a good thing because as soon as you 
do
that you risk there being an issue if there’s a crash.  Writing into the backup
isn’t a bad idea but if you’re managing your backups then writing it somewhere 
else
(such as where you write your WAL) and associating it with the backup 
(presumably
it has a label) should make it easy to pull back when you restore.


If there is a crash during the backup procedure, the backup is bad.
Doesn't matter during which part of the backup procedure it happens.


Yes, but in this case with exclusive backups your cluster also will not 
start.  That's a lot different than a bad backup because you should have 
many backups and a way to mark when they are complete.  If your cluster 
requires manual intervention to start because a backup was in progress 
that's not a good thing.


I'm of the opinion that we should remove the exclusive option for PG12. 
The recovery changes recently committed will completely break any 
automated recovery written for PG <= 11 so we might as go all the way.


Requiring non-exclusive backup will be an easy change for tools that 
already understand non-exclusive backup, which has been available in the 
last three versions of Postgres.


The exclusive backup method tries to make a hard problem look simple but 
does not succeed and causes issues besides.  Performing backups properly 
*is* hard and we don't do anyone a service by making it look like 
something can be done with a trivial bash script.


Regards,

--
-David
da...@pgmasters.net



Re: Doc patch on psql output formats

2018-11-26 Thread Tom Lane
"Daniel Verite"  writes:
>   Tom Lane wrote:
>> As of HEAD, it's impossible to select latex output format
>> at all:
>> regression=# \pset format latex
>> \pset: ambiguous abbreviation "latex" matches both "latex" and
>> "latex-longtable"

> Oops!

>> We could fix that by adding a special case to accept an exact match
>> immediately.

> Personally I would favor that one, but to me the problem is that
> allowing abbreviations doesn't really work well in the long run,
> that is, if new format names are going to appear recurringly in the
> future.

Perhaps, but that ship sailed years ago.  I do not think we can or
should remove the ability to write abbreviations here.  People
writing scripts are obviously taking a risk of future breakage
if they abbreviate, but how is it an improvement in their lives
if we change the risk from "uncertain" to "100%"?

regards, tom lane



Re: COPY FROM WHEN condition

2018-11-26 Thread Surafel Temesgen
On Sat, Nov 24, 2018 at 5:09 AM Tomas Vondra 
wrote:

>
> 1) While comparing this to the FILTER clause we already have for
> aggregates, I've noticed the aggregate version is
>
> FILTER '(' WHERE a_expr ')'
>
> while here we have
>
> FILTER '(' a_expr ')'
>
> For a while I was thinking that maybe we should use the same syntax
> here, but I don't think so. The WHERE bit seems rather unnecessary and
> we probably implemented it only because it's required by SQL standard,
> which does not apply to COPY. So I think this is fine.


ok

>
> 2) The various parser checks emit errors like this:
>
> case EXPR_KIND_COPY_FILTER:
> err = _("cannot use subquery in copy from FILTER condition");
> break;
>
> I think the "copy from" should be capitalized, to make it clear that it
> refers to a COPY FROM command and not a copy of something.
>
>
> 3) I think there should be regression tests for these prohibited things,
> i.e. for a set-returning function, for a non-existent column, etc.
>
>
fixed

>
> 4) What might be somewhat confusing for users is that the filter uses a
> single snapshot to evaluate the conditions for all rows. That is, one
> might do this
>
> create or replace function f() returns int as $$
> select count(*)::int from t;
> $$ language sql;
>
> and hope that
>
> copy t from '/...' filter (f() <= 100);
>
> only ever imports the first 100 rows - but that's not true, of course,
> because f() uses the snapshot acquired at the very beginning. For
> example INSERT SELECT does behave differently:
>
> test=# copy t from '/home/user/t.data' filter (f() < 100);
> COPY 81
> test=# insert into t select * from t where f() < 100;
> INSERT 0 19
>
> Obviously, this is not an issue when the filter clause references only
> the input row (which I assume will be the primary use case).
>
> Not sure if this is expected / appropriate behavior, and if the patch
> needs to do something else here.
>
> Comparing with overhead of setting snapshot before evaluating every row
and considering this

kind of usage is not frequent it seems to me the behavior is acceptable

regards
Surafel
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 411941ed31..15b6ddebed 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -25,6 +25,7 @@ PostgreSQL documentation
 COPY table_name [ ( column_name [, ...] ) ]
 FROM { 'filename' | PROGRAM 'command' | STDIN }
 [ [ WITH ] ( option [, ...] ) ]
+[ FILTER ( condition ) ]
 
 COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
 TO { 'filename' | PROGRAM 'command' | STDOUT }
@@ -353,6 +354,30 @@ COPY { table_name [ ( 

 
+   
+FILTER
+
+   
+The optional FILTER clause has the general form
+
+FILTER condition
+
+where condition is
+any expression that evaluates to a result of type
+boolean.  Any row that does not satisfy this
+condition will not be inserted to the table.  A row satisfies the
+condition if it returns true when the actual row values are
+substituted for any variable references.
+   
+
+   
+Currently, FILTER expressions cannot contain
+subqueries.
+   
+
+
+   
+
   
  
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4aa8890fe8..bb260c41fd 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -40,7 +40,11 @@
 #include "miscadmin.h"
 #include "optimizer/clauses.h"
 #include "optimizer/planner.h"
+#include "optimizer/prep.h"
 #include "nodes/makefuncs.h"
+#include "parser/parse_coerce.h"
+#include "parser/parse_collate.h"
+#include "parser/parse_expr.h"
 #include "parser/parse_relation.h"
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
@@ -150,6 +154,7 @@ typedef struct CopyStateData
 	bool		convert_selectively;	/* do selective binary conversion? */
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
+	Node	   *filterClause;	/* FILTER condition (or NULL) */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -180,6 +185,7 @@ typedef struct CopyStateData
 	ExprState **defexprs;		/* array of default att expressions */
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
+	ExprState  *qualexpr;
 
 	TransitionCaptureState *transition_capture;
 
@@ -801,6 +807,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	Relation	rel;
 	Oid			relid;
 	RawStmt*query = NULL;
+	Node*filterClause = NULL;
 
 	/*
 	 * Disallow COPY to/from file or program except to users with the
@@ -854,6 +861,25 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 			NULL, false, false);
 		rte->requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT);
 
+		if (stmt->filterClause)
+		{
+			/* add rte to column namespace  */
+			addRTEtoQuery(pstate, rte, false, true, 

Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-26 Thread Sergei Kornilov
Hi

> What is the reason for allowing multiple recovery_target_* settings and
> taking the last one? Could we change this aspect to make this behave
> better?
Correct response to user configuration, similar to old recovery.conf logic or 
other GUC - we allow redefine with rule "latest win".
I think we can disallow using multiple recovery_target_* settings. But 
currently i have no good idea how this check can be done in check_* callbacks.

regards, Sergei



Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-26 Thread Michael Paquier
On Mon, Nov 26, 2018 at 02:06:36PM +0100, Peter Eisentraut wrote:
> What is the reason for allowing multiple recovery_target_* settings and
> taking the last one?

This came originally to check that recovery.conf handles correctly its
recovery targets when multiple ones are defined with the last one
winning, as users may want to append additional targets in an existing
recovery.conf.  I have not checked the code in details, but I think that
we should be careful to keep that property, even with EXEC_BACKEND.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-26 Thread Peter Eisentraut
On 26/11/2018 13:32, Sergei Kornilov wrote:
>> Timed out while waiting for standby to catch up at
>> t/003_recovery_targets.pl line 34.
> 
> I can reproduce this and notice wrong assign settings order. For example 
> standby_6 has
>>  recovery_target_name = '$recovery_name'
>>  recovery_target_xid  = '$recovery_txid'
>>  recovery_target_time = '$recovery_time'
> But recoveryTarget was set to RECOVERY_TARGET_XID
> Without -DEXEC_BACKEND all fine.
> 
> As far as I understand the guc.c code with EXEC_BACKEND all processes uses 
> different config processing logic. We serialize all GUC and restore at 
> process start. And we sort GUC by name in build_guc_variables - so we restore 
> settings in wrong order. I was afraid that the GUC system does not guarantee 
> the order of settings...
> 
> What is preferable solution? Seems we can not simple change this logic.

What is the reason for allowing multiple recovery_target_* settings and
taking the last one?  Could we change this aspect to make this behave
better?

> I think i can reorganize all new recovery_target_* GUC into single one with 
> format, for example, recovery_target = "xid:607" (was mentioned in patch 
> discussion).

That would be another option.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: COPY FROM WHEN condition

2018-11-26 Thread Surafel Temesgen
On Sat, Nov 24, 2018 at 12:02 PM Dean Rasheed 
wrote:

> Right now we have 2 syntaxes for filtering rows in queries, both of
> which use WHERE immediately before the condition:
>
> 1). SELECT ... FROM ... WHERE condition
>
> 2). SELECT agg_fn FILTER (WHERE condition) FROM ...
>
> I'm not a huge fan of (2), but that's the SQL standard, so we're stuck
> with it. There's a certain consistency in it's use of WHERE to
> introduce the condition, and preceding that with FILTER helps to
> distinguish it from any later WHERE clause. But what you'd be adding
> here would be a 3rd syntax
>
> 3). COPY ... FROM ... FILTER condition
>
> which IMO will just lead to confusion.
>


your case is for retrieving data but this is for deciding which data to
insert and word FILTER I think describe it more and not lead to confusion.

regards
Surafel


Re: A WalSnd issue related to state WALSNDSTATE_STOPPING

2018-11-26 Thread Michael Paquier
On Thu, Nov 22, 2018 at 06:31:04PM +0800, Paul Guo wrote:
> Yes, please see the attached patch.

Thanks, I have reviewed your patch, and could not resist to change
SyncRepReleaseWaiters() on top of the rest with a comment to not be
confused about the WAL sender states allowed to release waiters.

The previous experience is proving that it seems unwise to rely on the
order of the elements in WalSndState, so I have tweaked things so as the
state is listed for all the code paths involved.

Paul, what do you think?
--
Michael
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 9a13c50ce8..5b8a268fa1 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -425,10 +425,12 @@ SyncRepReleaseWaiters(void)
 	 * If this WALSender is serving a standby that is not on the list of
 	 * potential sync standbys then we have nothing to do. If we are still
 	 * starting up, still running base backup or the current flush position is
-	 * still invalid, then leave quickly also.
+	 * still invalid, then leave quickly also.  Streaming or stopping WAL
+	 * senders are allowed to release waiters.
 	 */
 	if (MyWalSnd->sync_standby_priority == 0 ||
-		MyWalSnd->state < WALSNDSTATE_STREAMING ||
+		(MyWalSnd->state != WALSNDSTATE_STREAMING &&
+		 MyWalSnd->state != WALSNDSTATE_STOPPING) ||
 		XLogRecPtrIsInvalid(MyWalSnd->flush))
 	{
 		announce_next_takeover = true;
@@ -730,8 +732,9 @@ SyncRepGetSyncStandbysQuorum(bool *am_sync)
 		if (pid == 0)
 			continue;
 
-		/* Must be streaming */
-		if (state != WALSNDSTATE_STREAMING)
+		/* Must be streaming or stopping */
+		if (state != WALSNDSTATE_STREAMING &&
+			state != WALSNDSTATE_STOPPING)
 			continue;
 
 		/* Must be synchronous */
@@ -809,8 +812,9 @@ SyncRepGetSyncStandbysPriority(bool *am_sync)
 		if (pid == 0)
 			continue;
 
-		/* Must be streaming */
-		if (state != WALSNDSTATE_STREAMING)
+		/* Must be streaming or stopping */
+		if (state != WALSNDSTATE_STREAMING &&
+			state != WALSNDSTATE_STOPPING)
 			continue;
 
 		/* Must be synchronous */


signature.asc
Description: PGP signature


Re: SQL/JSON: functions

2018-11-26 Thread Dmitry Dolgov
> On Tue, Jul 3, 2018 at 2:57 PM Pavel Stehule  wrote:
>
> 2018-07-03 14:30 GMT+02:00 Nikita Glukhov :
>>
>> Attached 16th version of the patches:
>>  * changed type of new SQL keyword STRING
>>(STRING is used as a function parameter name in Pl/Tcl tests)
>>  * removed implicit coercion via I/O from JSON_VALUE (see below)

Unfortunately, the current version of patch 0010-add-invisible-coercion-form
doesn't not apply anymore without conflicts, could you please rebase it?



Re: SQL/JSON: JSON_TABLE

2018-11-26 Thread Dmitry Dolgov
> On Tue, Jul 3, 2018 at 4:50 PM Nikita Glukhov  wrote:
>
> Attached 16th version of JSON_TABLE patches.
>
> Changed only results of regression tests after the implicit coercion via I/O
> was removed from JSON_VALUE.

Thank you for working on this patch! Unfortunately, the current version of
patch 0015-json_table doesn't not apply anymore without conflicts, could you
please rebase it? In the meantime I'll try to provide some review.



Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-26 Thread Sergei Kornilov
Hi

> The buildfarm is unhappy after this one:
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=culicidae=HEAD
>
> If I use -DEXEC_BACKEND when compiling the tests complain about a
> timeout in 003_recovery_targets. Here is what the buildfarm reports, I
> can see the failure by myself as well:
> # Postmaster PID for node "standby_6" is 26668
> # poll_query_until timed out executing this query:
> # SELECT '0/3022690'::pg_lsn <= pg_last_wal_replay_lsn()
> # expecting this output:
> # t
> # last actual query output:
> # f
> # with stderr:
> Timed out while waiting for standby to catch up at
> t/003_recovery_targets.pl line 34.

I can reproduce this and notice wrong assign settings order. For example 
standby_6 has
>   recovery_target_name = '$recovery_name'
>   recovery_target_xid  = '$recovery_txid'
>   recovery_target_time = '$recovery_time'
But recoveryTarget was set to RECOVERY_TARGET_XID
Without -DEXEC_BACKEND all fine.

As far as I understand the guc.c code with EXEC_BACKEND all processes uses 
different config processing logic. We serialize all GUC and restore at process 
start. And we sort GUC by name in build_guc_variables - so we restore settings 
in wrong order. I was afraid that the GUC system does not guarantee the order 
of settings...

What is preferable solution? Seems we can not simple change this logic.
I think i can reorganize all new recovery_target_* GUC into single one with 
format, for example, recovery_target = "xid:607" (was mentioned in patch 
discussion).

regards, Sergei



RE: Global shared meta cache

2018-11-26 Thread Ideriha, Takeshi
Hi,

>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>Sent: Wednesday, October 3, 2018 3:18 PM
>At this moment this patch only allocates catalog cache header and CatCache 
>data on
>the shared memory area.
On this allocation stuffs I'm trying to handle it in another thread [1] in a 
broader way.

>The followings are major items I haven't touched:
>- how to treat cache visibility and invalidation coming from transactions 
>including DDL

On this point some of you gave me comment before but at that time I had less 
knowledge 
and couldn't replay them immediately. Sorry for that.

Right now I hit upon two things.
Plan A is that all of the works is done in the shared memory and no local cache 
is used.
Plan B is that both shared cache and local cache are used.
Maybe based on the discussion several month ago in this thread, plan B would be 
better.
But there are some variations of plan B so I'd like to hear opinions. 

A. Use only shared memory
Because everything should be done inside shared memory it needs same machinery 
as current DB shared_buffers
That is, handling transaction including DDL in a proper way needs MVCC and 
cleaning up obsoleted cache needs vacuum.
Taking advantage of MVCC and vacuum would work but it seems to me pretty tough 
to implement them.
So another option is plan B, which handles version control of cache and clean 
them up in a different way.

B. Use both shared memory and local memory
Basic policy is that the shared memory keeps the latest version cache as much 
as possible and each cache has version information (xmin, xmax).
Local cache is a kind of cache of shared one and its lifetime is temporal.

[Search cache]
When a backend wants to use relation or catalog cache in a transaction, it 
tries to find them in a following order:
1. local cache
2. shared cache
3. disk

At first there is no local cache so it tries to search shared cache and if 
found loads it into the local memory.
If wanted cache is not found in shared memory, backend fetches it from disk. 

[Lifetime of local cache]
When ALTER TABLE/DROP TABLE is issued in a transaction, relevant local cache 
should be different from the original one.
On this point I'm thinking two cases.
B-1: Create a local cache at the first reference and keep it until transaction 
ends.
 The relevant local cache is updated or deleted when the DROP/ALTER is 
issued. It's freed when transaction is committed or aborted.
B-2: The lifetime of local cache is during one snapshot. If isolation-level is 
read-committed, every time the command is issued local cache is deleted.

In case of B-1 sinval messages machinery is necessary to update the local 
cache, which is same as current machinery.
On the other hand, case B-2 doesn't need sinval message because after one 
snapshot duration is expired the local cache is deleted.
From another point of view, there is trade-off relation between B-1 and B-2. 
B-1 would outweigh B-2 in terms of performance 
but B-2 would use less memory.

[Invalidation of shared cache]
I'm thinking that invalidating shared cache can be responsible for a backend 
which wants to see the latest version rather than
one has committed DROP/ALTER command. In my sketch caches has its own version 
information so transaction can compare its snapshot 
with shared cache version and if cache is not wanted one, we can obtain it from 
disk.

Do you have any thoughts?

[1] 
https://www.postgresql.org/message-id/flat/4E72940DA2BF16479384A86D54D0988A6F1EE452%40G01JPEXMBKW04
 
Regards,
Takeshi Ideriha



Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-26 Thread Vik Fearing
On 23/11/2018 00:09, Haribabu Kommi wrote:
> 
> On Fri, Nov 23, 2018 at 1:54 AM Peter Eisentraut
>  > wrote:
> 
> On 19/11/2018 06:18, Haribabu Kommi wrote:
> > Amit suggested another option in another mail, so total viable 
> > solutions that are discussed as of now are,
> >
> > 1. Single API with NULL input treat as invalid value
> > 2. Multiple API to deal with NULL input of other values
> > 3. Single API with NULL value to treat them as current user, current
> > database
> >  and NULL queryid.
> > 4. Single API with -1 as invalid value, treat NULL as no matching.
> (Only
> > problem
> >  with this approach is till now -1 is also a valid queryid, but
> setting
> > -1 as queryid
> > needs to be avoided.
> 
> Can you show examples of what these would look like?
> 
> 
> Following are some of the examples how the various options
> work.
> 
> Option -1:
> 
> select pg_stat_statements_reset(NULL,NULL,NULL);
> -- No change, just return. Because the function is strict.
> 
> select pg_stat_statements_reset(10,10,NULL);
> -- Resets all the statements that are executed by userid 10 on database
> id 10.
> 
> select pg_stat_statements_reset(NULL,10,NULL);
> -- Resets all the statements executed on database id 10

If the function is strict, none of these will do anything.

My preference is for NULL to mean *all* so this is my favorite option,
except that the first query should reset everything.

+1 with that change.

> Option - 2:
> 
> select pg_stat_statements_reset(NULL,NULL,NULL);
> -- No change, just return. Function are of type strict.
> 
> select pg_stat_statements_reset(10,10,1234)
> -- Resets the query statement 1234 executed by userid 10 on database id 10
> 
> select pg_stat_statements_reset_by_userid(10)
> -- Resets all the statements executed by userid 10
> 
> select pg_stat_statements_reset_by_dbid(10);
> -- Rests all the statements executed on database id 10.
> 
> select pg_stat_statements_reset_by_userid_and_dbid(10, 10);
> -- Resets all the statements executed by userid 10 on datbase id 10
> 
> Likewise total 7 API's.

I could live with this, but I prefer as described above.

I vote +0.

> Option -3:
> 
> select pg_stat_statements_reset(NULL,NULL,NULL);
> --Similar like option-1, but when the userid or databaseid are NULL, it uses
> the current_role and current_database as the options. But for the queryid
> when the options is NULL, it deletes all the queries.
> 
> Rest of the details are same as option-1.

I don't like this one at all.  Something like resetting stats don't need
shortcuts and the dba can easily specify current_user and current_catalog.

-1 from me.

> Option -4:
> 
> select pg_stat_statements_reset(NULL,NULL,NULL);
> -- No change, just return. Because the function is strict.
> 
> select pg_stat_statements_reset(0,0,0);
> -- Resets all the statements
> 
> select pg_stat_statements_reset(0,10,0);
> -- Resets all the statements executed on database id 10
> 
> select pg_stat_statements_reset(10,0,0);
> -- Resets all the statements executed on userid 10.

This one is my very least favorite.

-1 from me.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: Doc patch on psql output formats

2018-11-26 Thread Daniel Verite
Tom Lane wrote:

> As of HEAD, it's impossible to select latex output format
> at all:
> 
> regression=# \pset format latex
> \pset: ambiguous abbreviation "latex" matches both "latex" and
> "latex-longtable"

Oops!

> We could fix that by adding a special case to accept an exact match
> immediately.

Personally I would favor that one, but to me the problem is that
allowing abbreviations doesn't really work well in the long run,
that is, if new format names are going to appear recurringly in the
future.

In interactive mode, tab-completion seems good enough (maybe \pset was
not tab-completed in the past, when abbreviations were initially
implemented?).

In non-interactive mode, anything abbreviated may clash in the future
with another format, so there's a forward-compatibility hazzard that
should motivate to avoid them as well in scripts.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-26 Thread Amit Kapila
On Fri, Nov 23, 2018 at 4:40 AM Haribabu Kommi  wrote:
>
>
> On Fri, Nov 23, 2018 at 1:54 AM Peter Eisentraut 
>  wrote:
>>
>> On 19/11/2018 06:18, Haribabu Kommi wrote:
>> > Amit suggested another option in another mail, so total viable
>> > solutions that are discussed as of now are,
>> >
>> > 1. Single API with NULL input treat as invalid value
>> > 2. Multiple API to deal with NULL input of other values
>> > 3. Single API with NULL value to treat them as current user, current
>> > database
>> >  and NULL queryid.
>> > 4. Single API with -1 as invalid value, treat NULL as no matching. (Only
>> > problem
>> >  with this approach is till now -1 is also a valid queryid, but setting
>> > -1 as queryid
>> > needs to be avoided.
>>
>> Can you show examples of what these would look like?
>
>
> Following are some of the examples how the various options
> work.
>

Do the examples provided by Haribabu helps in understanding the proposal?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-26 Thread Amit Kapila
On Thu, Nov 22, 2018 at 7:51 PM Amit Kapila  wrote:
>
> On Thu, Nov 22, 2018 at 7:48 PM Amit Kapila  wrote:
> >
> > On Thu, Nov 22, 2018 at 7:02 PM Alvaro Herrera  
> > wrote:
> > >
> > > On 2018-Nov-20, Haribabu Kommi wrote:
> > >
> > > > > > 4. Single API with -1 as invalid value, treat NULL as no matching. 
> > > > > > (Only
> > > > > problem
> > > > > >  with this approach is till now -1 is also a valid queryid, but 
> > > > > > setting
> > > > > -1 as queryid
> > > > > > needs to be avoided.
> > > > >
> > > > > Hmm, can we use 0 as default value without any such caveat?
> > > >
> > > > Yes, with strict and 0 as default value can work.
> > > > If there is no problem, I can go ahead with the above changes?
> > >
> > > I'm not sure I understand this proposal.  Does this mean that passing -1
> > > as databaseid / userid would match all databases/users, and passing 0 as
> > > queryid would match all queries?
> > >
> >
> > No, for userid/databaseid also it will be 0 (aka InvalidOid).
> >
>
> Not sure if the above statement is clear,  but what I wanted to say
> was "for all the three userid/databaseid/queryid, default will be 0
> (aka InvalidOid)."
>

Is the proposal clear?  If so, can you please share your opinion even
if it is the same as previous, because, I think that will help us in
moving forward with this patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2018-11-26 Thread Amit Kapila
On Mon, Nov 26, 2018 at 3:46 PM John Naylor  wrote:
>
> On 11/24/18, Amit Kapila  wrote:
> > On Fri, Nov 23, 2018 at 11:56 AM John Naylor  wrote:
> >> On 11/16/18, Amit Kapila  wrote:
> >>
> >> >
> >> > 3.
> >> >  static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16
> >> > slot,
> >> > -uint8 newValue, uint8 minValue);
> >> > + uint8 newValue, uint8 minValue);
> >> >
> >> > This appears to be a spurious change.
> >>
> >> 2 and 3 are separated into 0001.
> >>
> >
> > Is the point 3 change related to pgindent?  I think even if you want
> > these, then don't prepare other patches on top of this, keep it
> > entirely separate.
>
> There are some places in the codebase that have spaces after tabs for
> no apparent reason (not to line up function parameters or pointer
> declarations). pgindent hasn't been able to fix this. If you wish, you
> are of course free to apply 0001 separately at any time.
>

I am not sure that I am interested in generally changing the parts of
code that are not directly related to this patch, feel free to post
them separately.

> >> Also, we don't quite have a consensus on the threshold value, but I
> >> have set it to 4 pages for v8. If this is still considered too
> >> expensive (and basic tests show it shouldn't be), I suspect it'd be
> >> better to interleave the available block numbers as described a couple
> >> days ago than lower the threshold further.
> >>
> >
> > Can you please repeat the copy test you have done above with
> > fillfactor as 20 and 30?
>
> I did two kinds of tests. The first had a fill-factor of 10 [1], the
> second had the default storage, but I prevented the backend from
> caching the target block [2], to fully exercise the free space code.
> Would you like me to repeat the first one with 20 and 30?
>

Yes.

> And do you
> think it is useful enough to test the copying of 4 blocks and not
> smaller numbers?
>

You can try that, but I am mainly interested with 4 as threshold or
may be higher to see where we start loosing.  I think 4 should be a
reasonable default for this patch, if later anyone wants to extend, we
might want to provide a table level knob.

> > Few more comments:
> > ---
> > 1. I think we can add some test(s) to test the new functionality, may
> > be something on the lines of what Robert has originally provided as an
> > example of this behavior [1].
>
> Maybe the SQL script attached to [3] (which I probably based on
> Robert's report) can be cleaned up into a regression test.
>

Yeah, something on those lines works for me.

> > 3.
> > GetPageWithFreeSpace(Relation rel, Size spaceNeeded)
> > {
> > ..
> > + if (target_block == InvalidBlockNumber &&
> > + rel->rd_rel->relkind == RELKIND_RELATION)
> > + {
> > + nblocks = RelationGetNumberOfBlocks(rel);
> > +
> > + if (nblocks > HEAP_FSM_CREATION_THRESHOLD)
> > + {
> > + /*
> > + * If the FSM knows nothing of the rel, try the last page before
> > + * we give up and extend.  This avoids one-tuple-per-page syndrome
> > + * during bootstrapping or in a recently-started system.
> > + */
> > + target_block = nblocks - 1;
> > + }
> > ..
> > }
> >
> > Moving this check inside GetPageWithFreeSpace has one disadvantage, we
> > will always consider last block which can have some inadvertent
> > effects.  Consider when this function gets called from
> > RelationGetBufferForTuple just before extension, it can consider to
> > check for the last block even though that is already being done in the
> > begining when GetPageWithFreeSpace was called.  I am not completely
> > sure how much this is a case to worry because  it will help to check
> > last block when the same is concurrently added and FSM is not updated
> > for same.  I am slightly worried because the unpatched code doesn't
> > care for such case and we have no intention to change this behaviour.
> > What do you think?
>
> I see what you mean. If the other backend extended by 1 block, the
> intention is to keep it out of the FSM at first, and by extension, not
> visible in other ways. The comment implies that's debatable, but I
> agree we shouldn't change that without a reason to. One simple idea is
> add a 3rd boolean parameter to GetPageWithFreeSpace() to control
> whether it gives up if the FSM fork doesn't indicate free space, like
>
> if (target_block == InvalidBlockNumber &&
> rel->rd_rel->relkind == RELKIND_RELATION &&
> !check_fsm_only)
> {
> nblocks = RelationGetNumberOfBlocks(rel);
>

I have the exact fix in my mind, so let's do it that way.

> > 4. You have mentioned above that "system catalogs created during
> > bootstrap still have a FSM if they have any data." and I can also see
> > this behavior, have you investigated this point further?
>
> Code reading didn't uncover the cause. I might have to step through
> with a debugger or something similar. I should find time for that next
> month.
>

Okay.

> > 5. Your logic to update FSM 

Re: csv format for psql

2018-11-26 Thread Daniel Verite
Tom Lane wrote:

> And, in fact, right now *none* of psql's table output formats is both
> unambiguous and reasonably simple/popular to use.  So the astonishing
> thing about this patch, IMO, is that we didn't do it a decade ago.

Yeah, that's what motivated this submission in the first place.

> I feel that if we allow multi-byte characters here, we might as well
> take the training wheels off and just say you can use any separator
> string you want, as long as it doesn't contain double quote, \r, or \n.

The reason behind disallowing multiple characters was the
likeliness of them being mistakes. For instance, this example came
up at some point in the discussion:
  \pset fieldsep_csv ,,
To me the probability that a user has fat-fingered this is pretty high,
and this would silently produce a really bogus file.

Another kind of mistake comes from the difficulty of properly
quoting on the command line:
psql -- csv -P fieldsep_csv='\t'
would be interpreted as a two-character separator despite being
obviously not the user's intention.

About disallowing characters beyond US-ASCII, I can't find a similar
justification. COPY does not allow them, but it's justified (in the
archives) by the fear of being slower when importing, which is not a
concern here.

> We could avoid this self-inflicted confusion by choosing a different
> parameter name.  I'd be good with "csv_fieldsep" or "csvfieldsep".

+1

> Or we could kill both issues by hard-wiring the separator as ','.

Ideally people would understand that they can use -A for any delimiter
but no quoting, or --csv with strict quoting and in that case a fixed
delimiter is fine, since it's going to be safely quoted, its presence in
the data is not a problem. But I'm not too confident that everyone
would understand it that way, even if it's well explained in the doc.

When one is told "please produce CSV files with semi-colons as
separators", it's simpler to just produce that rather than arguing
that these requirements are probably ill-advised.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



RE: Shared Memory: How to use SYSV rather than MMAP ?

2018-11-26 Thread REIX, Tony
Hi Thomas,


About reliability, I've compiled/tested with GCC/XLCC on 2 machines in order to 
check that my patches are OK (no impact to PostgreSQL tests, OK both with GCC & 
XLC).


We do not have yet performance comparison between GCC & XLC since, though we 
experimented with both, we moved from v11beta1 to beta4 to 11.0 and now with 
11.1 . We'll do asap.


About performance, we have deeply compared MMAP (4KB) vs SysV (64KB) Shared 
Memory, for dynamic and main shared memory segments, with the SAME exact HW + 
SW environment, using XLC -O2 + tune=pwr9.


We have not yet experimented with Large Pages (16MB), however the flags added 
to the 3rd parameter of shmget() are said to have no impact to performance 
unless Large Pages are really used.

Same with Huge Pages (16GB). We'll study this later.


So, the +37% (maximum value seen. +29% in average) improvement is the result of 
the single change: MMAP 4K to SysV 64K.

(this improvement is due to 2 things: mmap on AIX has perf drawbacks vs Sys V 
ShMem, and 64K vs 4K).

That's for 64bit only, on AIX 7.2 only. About 32bit, we do not have done 
measures.


We'll have to discuss in more depth your last paragraph how to handle this not 
only for AIX in PostgreSQL code.


Regards,


Cordialement,

Tony Reix

tony.r...@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net

De : Thomas Munro 
Envoyé : vendredi 23 novembre 2018 22:07:23
À : REIX, Tony
Cc : Andres Freund; Robert Haas; Pg Hackers; EMPEREUR-MOT, SYLVIE; BERGAMINI, 
DAMIEN
Objet : Re: Shared Memory: How to use SYSV rather than MMAP ?

On Sat, Nov 24, 2018 at 4:54 AM REIX, Tony  wrote:
> Here is a patch for enabling SystemV Shared Memory on AIX, for 64K or bigger 
> page size, rather than using MMAP shared memory, which is slower on AIX.

> We have tested this code with 64K pages and pgbench, on AIX 7.2 TL2 Power 9, 
> and it provided a maximum of +37% improvement.

You also mentioned changing from XLC to GCC.  Did you test the various
changes in isolation?  XLC->GCC, mmap->shmget, with/without
SHM_LGPAGE.  37% is a bigger performance change than I expected from
large pages, since reports from other architectures are single-digit
percentage increases with pgbench -S.

If just changing to GCC gives you a big speed-up, it could of course
just be different/better code generation (though that'd be a bit sad
for XLC), but I also wonder if the different atomics support in our
tree could be implicated.

> We'll test this code with Large Pages (SHM_LGPAGE | SHM_PIN | S_IRUSR | 
> S_IWUSR flags of shmget() ) ASAP.
>
>
> However, I wanted first to get your comments about this change in order to 
> improve it for acceptance.

I think we should respect the huge_pages GUC, as we do on Linux and
Windows (since there are downsides to using large pages, maybe not
everyone would want that).  It could even be useful to allow different
page sizes to be requested by GUC (I see that DB2 has an option to use
16GB pages -- yikes).  It also seems like a good idea to have a
shared_memory_type GUC as Andres proposed (see his link), instead of
using a compile time option.  I guess it was made a compile time
option because nobody could imagine wanting to go back to SysV shm!
(I'm still kinda surprised that MAP_ANONYMOUS memory can't be coaxed
into large pages by environment variables or loader controls, since
apparently other things like data segments etc apparently can, though
I can't find any text that says that's the case and I have no AIX
system).

--
Thomas Munro
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.comdata=01%7C01%7Ctony.reix%40atos.net%7C1e06667e1d304905267c08d65187c41e%7C33440fc6b7c7412cbb730e70b0198d5a%7C0sdata=%2Feor3O4UXCcXlLrJWXQS8HWpfa77b86HCYQ3Ot24Vzk%3Dreserved=0


Re: WIP: Avoid creation of the free space map for small tables

2018-11-26 Thread John Naylor
On 11/24/18, Amit Kapila  wrote:
> On Fri, Nov 23, 2018 at 11:56 AM John Naylor  wrote:
>> On 11/16/18, Amit Kapila  wrote:
>>
>> >
>> > 3.
>> >  static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16
>> > slot,
>> > -uint8 newValue, uint8 minValue);
>> > + uint8 newValue, uint8 minValue);
>> >
>> > This appears to be a spurious change.
>>
>> 2 and 3 are separated into 0001.
>>
>
> Is the point 3 change related to pgindent?  I think even if you want
> these, then don't prepare other patches on top of this, keep it
> entirely separate.

There are some places in the codebase that have spaces after tabs for
no apparent reason (not to line up function parameters or pointer
declarations). pgindent hasn't been able to fix this. If you wish, you
are of course free to apply 0001 separately at any time.

>> Also, we don't quite have a consensus on the threshold value, but I
>> have set it to 4 pages for v8. If this is still considered too
>> expensive (and basic tests show it shouldn't be), I suspect it'd be
>> better to interleave the available block numbers as described a couple
>> days ago than lower the threshold further.
>>
>
> Can you please repeat the copy test you have done above with
> fillfactor as 20 and 30?

I did two kinds of tests. The first had a fill-factor of 10 [1], the
second had the default storage, but I prevented the backend from
caching the target block [2], to fully exercise the free space code.
Would you like me to repeat the first one with 20 and 30? And do you
think it is useful enough to test the copying of 4 blocks and not
smaller numbers?

> Few more comments:
> ---
> 1. I think we can add some test(s) to test the new functionality, may
> be something on the lines of what Robert has originally provided as an
> example of this behavior [1].

Maybe the SQL script attached to [3] (which I probably based on
Robert's report) can be cleaned up into a regression test.

> 3.
> GetPageWithFreeSpace(Relation rel, Size spaceNeeded)
> {
> ..
> + if (target_block == InvalidBlockNumber &&
> + rel->rd_rel->relkind == RELKIND_RELATION)
> + {
> + nblocks = RelationGetNumberOfBlocks(rel);
> +
> + if (nblocks > HEAP_FSM_CREATION_THRESHOLD)
> + {
> + /*
> + * If the FSM knows nothing of the rel, try the last page before
> + * we give up and extend.  This avoids one-tuple-per-page syndrome
> + * during bootstrapping or in a recently-started system.
> + */
> + target_block = nblocks - 1;
> + }
> ..
> }
>
> Moving this check inside GetPageWithFreeSpace has one disadvantage, we
> will always consider last block which can have some inadvertent
> effects.  Consider when this function gets called from
> RelationGetBufferForTuple just before extension, it can consider to
> check for the last block even though that is already being done in the
> begining when GetPageWithFreeSpace was called.  I am not completely
> sure how much this is a case to worry because  it will help to check
> last block when the same is concurrently added and FSM is not updated
> for same.  I am slightly worried because the unpatched code doesn't
> care for such case and we have no intention to change this behaviour.
> What do you think?

I see what you mean. If the other backend extended by 1 block, the
intention is to keep it out of the FSM at first, and by extension, not
visible in other ways. The comment implies that's debatable, but I
agree we shouldn't change that without a reason to. One simple idea is
add a 3rd boolean parameter to GetPageWithFreeSpace() to control
whether it gives up if the FSM fork doesn't indicate free space, like

if (target_block == InvalidBlockNumber &&
rel->rd_rel->relkind == RELKIND_RELATION &&
!check_fsm_only)
{
nblocks = RelationGetNumberOfBlocks(rel);

> 4. You have mentioned above that "system catalogs created during
> bootstrap still have a FSM if they have any data." and I can also see
> this behavior, have you investigated this point further?

Code reading didn't uncover the cause. I might have to step through
with a debugger or something similar. I should find time for that next
month.

> 5. Your logic to update FSM on standby seems okay, but can you show
> some tests which proves its sanity?

I believe to convince myself it was working, I used the individual
commands in the sql file in [3], then used the size function on the
secondary. I'll redo that to verify.

--
[1] 
https://www.postgresql.org/message-id/CAJVSVGWCRMyi8sSqguf6PfFcpM3hwNY5YhPZTt-8Q3ZGv0UGYw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAJVSVGWMXzsqYpPhO3Snz4n5y8Tq-QiviuSCKyB5czCTnq9rzA%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ%40mail.gmail.com


-John Naylor



Re: Continue work on changes to recovery.conf API

2018-11-26 Thread Michael Paquier
On Sun, Nov 25, 2018 at 04:56:13PM +0100, Peter Eisentraut wrote:
> Committed with that addition.

Thanks for adding that!
--
Michael


signature.asc
Description: PGP signature


Re: allow online change primary_conninfo

2018-11-26 Thread Sergei Kornilov
Hi

>>  I think we have no futher reason to have a copy of conninfo and slot name 
>> in WalRcvData, so i propose remove these fields from 
>> pg_stat_get_wal_receiver() (and pg_stat_wal_receiver view). This data can be 
>> accesible now via regular GUC commands.
>
> Is that wise? It seems possible that wal receivers run for a while with
> the old connection information, and without this information that seems
> hard to debug.
Hmm... I considered SIGHUP processing was in fast loop and therefore shutdown 
should be fast. But i recheck code and found a possible long loop without 
processing SIGHUP (in case we receive new data faster than writes to disk). Ok, 
i will revert back.
How about write to WalRcvData only clobbered conninfo?

> Should probably note something like
>
> "This parameter can only be set in the postgresql.conf
> file or on the server command line."
Seems other PGC_SIGHUP settings have such note, fixed, thank you.

> I'd strongly advocate moving this to a separate function.
Done

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index db1a2d4..c3f19d8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3797,7 +3797,8 @@ ANY num_sync ( walRcvState == WALRCV_STOPPED ||
 		   walrcv->walRcvState == WALRCV_WAITING);
 
-	if (conninfo != NULL)
-		strlcpy((char *) walrcv->conninfo, conninfo, MAXCONNINFO);
-	else
-		walrcv->conninfo[0] = '\0';
-
-	if (slotname != NULL)
-		strlcpy((char *) walrcv->slotname, slotname, NAMEDATALEN);
-	else
-		walrcv->slotname[0] = '\0';
-
 	if (walrcv->walRcvState == WALRCV_STOPPED)
 	{
 		launch = true;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6497393..301a4b5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3435,7 +3435,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"primary_conninfo", PGC_POSTMASTER, REPLICATION_STANDBY,
+		{"primary_conninfo", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the connection string to be used to connect to the sending server."),
 			NULL,
 			GUC_SUPERUSER_ONLY
@@ -3446,7 +3446,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"primary_slot_name", PGC_POSTMASTER, REPLICATION_STANDBY,
+		{"primary_slot_name", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the name of the replication slot to use on the sending server."),
 			NULL
 		},
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 5913b58..510c7f6 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -104,7 +104,7 @@ typedef struct
 	TimestampTz latestWalEndTime;
 
 	/*
-	 * connection string; initially set to connect to the primary, and later
+	 * connection string used by runned walreceiver;
 	 * clobbered to hide security-sensitive fields.
 	 */
 	char		conninfo[MAXCONNINFO];
@@ -117,8 +117,7 @@ typedef struct
 	int			sender_port;
 
 	/*
-	 * replication slot name; is also used for walreceiver to connect with the
-	 * primary
+	 * replication slot name used by runned walreceiver
 	 */
 	char		slotname[NAMEDATALEN];
 
@@ -306,8 +305,7 @@ extern void WalRcvShmemInit(void);
 extern void ShutdownWalRcv(void);
 extern bool WalRcvStreaming(void);
 extern bool WalRcvRunning(void);
-extern void RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr,
-	 const char *conninfo, const char *slotname);
+extern void RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr);
 extern XLogRecPtr GetWalRcvWriteRecPtr(XLogRecPtr *latestChunkStart, TimeLineID *receiveTLI);
 extern int	GetReplicationApplyDelay(void);
 extern int	GetReplicationTransferLatency(void);
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index beb4555..5b9bc77 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 26;
+use Test::More tests => 27;
 
 # Initialize master node
 my $node_master = get_new_node('master');
@@ -146,7 +146,7 @@ $node_standby_2->append_conf('postgresql.conf',
 	"primary_slot_name = $slotname_2");
 $node_standby_2->append_conf('postgresql.conf',
 	"wal_receiver_status_interval = 1");
-$node_standby_2->restart;
+$node_standby_2->reload; # should have effect without restart
 
 # Fetch xmin columns from slot's pg_replication_slots row, after waiting for
 # given boolean condition to be true to ensure we've reached a quiescent state
@@ -282,3 +282,21 @@ is($catalog_xmin, '',
 is($xmin, '', 'xmin of cascaded slot null with hs feedback reset');
 is($catalog_xmin, '',
 	'catalog xmin of cascaded slot still null with hs_feedback reset');
+
+note "check change primary_conninfo without restart";
+$node_standby_2->append_conf('postgresql.conf',
+	"primary_slot_name = ''");
+$node_standby_2->enable_streaming($node_master);
+$node_standby_2->reload;
+
+# be 

Re: Updated backup APIs for non-exclusive backups

2018-11-26 Thread Magnus Hagander
On Mon, Nov 26, 2018 at 6:44 AM Laurenz Albe 
wrote:

> On Sun, 2018-11-25 at 22:01 +0100, Magnus Hagander wrote:
> [about managing backups from pre- and post-file-system-backup scrips]
> > I agree with your point that it's not an uncommon thing to need. If a
> good solution
> > for it can be proposed that requires the exclusive backup interface,
> then I wouldn't
> > be against un-deprecating that.  But that's going to require a lot more
> than just a
> > documentation change, IMHO. What could perhaps be handled with a
> documentation change,
> > however, is to document a good way for this type of setup to use the new
> interfaces.
>
> Good point, and it puts the ball in my court :^)
>

Enjoy :)



> > > I'm arguing on behalf of users that run a few databases, want a simple
> backup
> > > solution and are ready to deal with the shortcomings.
> >
> > Those that want a simple backup solution have one -- pg_basebackup.
> >
> > The exclusive backup API is *not* simple. It is convenient, but it is
> not simple.
> >
> > Actually having a simple API that worked with the pre/post backup
> scripts, that
> > would be an improvement that we should definitely want!
>
> Right; unfortunately it is not simple to come up with one that doesn't
> exhibit
> the problems of the existing exclusive backup.
>

Right, it turns out to actually be a hard problem. The old API pretended it
wasn't, which wasn't really very helpful in the long run...


Perhaps it's theoretically impossible.  The goal is to disambiguate what a
> file
> system backup sees in backup mode and what the startup process sees after
> a crash
> in backup mode, and I can't see how that could be done.
>

Not if it's in the same physical location, no, I think that's really hard.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: csv format for psql

2018-11-26 Thread Magnus Hagander
n Mon, Nov 26, 2018 at 5:53 AM Pavel Stehule 
wrote:

>
>
> po 26. 11. 2018 v 5:36 odesílatel Andrew Gierth <
> and...@tao11.riddles.org.uk> napsal:
>
>> > "Tom" == Tom Lane  writes:
>>
>>  Tom> Or we could kill both issues by hard-wiring the separator as ','.
>>
>> Using ';' for the delimiter isn't all that rare.
>>
>
> ; is default for CSV produced by MS Excel in Czech mutation - so for some
> countries, like CR is common.
>
>
I believe this is a common occurance in countries that use decimal comma
instead of decimal point, which is a fairly large portion of the world.

/Magnus


Re: Undo logs

2018-11-26 Thread Amit Kapila
On Tue, Nov 20, 2018 at 7:37 PM Dilip Kumar  wrote:
> Along with that I have merged latest changes in zheap branch committed
> by Rafia Sabih for cleaning up the undo buffer information in abort
> path.
>

Thanks, few more comments:

1.
@@ -2627,6 +2653,7 @@ AbortTransaction(void)
  AtEOXact_HashTables(false);
  AtEOXact_PgStat(false);
  AtEOXact_ApplyLauncher(false);
+ AtAbort_ResetUndoBuffers();

Don't we need similar handling in AbortSubTransaction?

2.
 Read undo record header in by calling UnpackUndoRecord, if the undo
+ * record header is splited across buffers then we need to read the complete
+ * header by invoking UnpackUndoRecord multiple times.

/splited/splitted.  You can just use split here.

3.
+/*
+ * Identifying information for a transaction to which this undo belongs.
+ * it will also store the total size of the undo for this transaction.
+ */
+typedef struct UndoRecordTransaction
+{
+ uint32 urec_progress;  /* undo applying progress. */
+ uint32 urec_xidepoch;  /* epoch of the current transaction */
+ Oid urec_dbid; /* database id */
+ uint64 urec_next; /* urec pointer of the next transaction */
+} UndoRecordTransaction;

/it will/It will.
BTW, which field(s) in the above structure stores the size of the undo?

4.
+ /*
+ * Set uur_info for an UnpackedUndoRecord appropriately based on which
+ * fields are set and calculate the size of the undo record based on the
+ * uur_info.
+ */

Can we rephrase it as "calculate the size of the undo record based on
the info required"?

5.
+/*
+ * Unlock and release undo buffers.  This step performed after exiting any
+ * critical section.
+ */
+void
+UnlockReleaseUndoBuffers(void)

Can we change the later sentence as "This step is performed after
exiting any critical section where we have performed undo action."?

6.
+InsertUndoRecord()
{
..
+ Assert (uur->uur_info != 0);

Add a comment above Assert "The undo record must contain a valid information."

6.
+UndoRecordAllocateMulti(UnpackedUndoRecord *undorecords, int nrecords,
+ UndoPersistence upersistence, TransactionId txid)
{
..
+ first_rec_in_recovery = InRecovery && IsTransactionFirstRec(txid);
+
+ if ((!InRecovery && prev_txid[upersistence] != txid) ||
+ first_rec_in_recovery)
+ {
+ need_start_undo = true;
+ }

Here, I think we can avoid using two boolean variables
(first_rec_in_recovery and need_start_undo).  Also, this same check is
used in this function twice.  I have tried to simplify it in the
attached.  Can you check and let me know if that sounds okay to you?

7.
UndoRecordAllocateMulti
{
..
/*
+ * If this is the first undo record of the transaction then initialize
+ * the transaction header fields of the undorecord. Also, set the flag
+ * in the uur_info to indicate that this record contains the transaction
+ * header so allocate the space for the same.
+ */
+ if (need_start_undo && i == 0)
+ {
+ urec->uur_next = InvalidUndoRecPtr;
+ urec->uur_xidepoch = GetEpochForXid(txid);
+ urec->uur_progress = 0;
+
+ /* During recovery, Fetch database id from the undo log state. */
+ if (InRecovery)
+ urec->uur_dbid = UndoLogStateGetDatabaseId();
+ else
+ urec->uur_dbid = MyDatabaseId;
+
+ /* Set uur_info to include the transaction header. */
+ urec->uur_info |= UREC_INFO_TRANSACTION;
+ }
..
}

It seems here you have written the code in your comments.  I have
changed it in the attached delta patch.

8.
UndoSetPrepareSize(int max_prepare, UnpackedUndoRecord *undorecords,
+TransactionId xid, UndoPersistence upersistence)
+{
..
..
+ multi_prep_urp = UndoRecordAllocateMulti(undorecords, max_prepare,

Can we rename this variable as prepared_urec_ptr or prepared_urp?

9.
+void
+UndoSetPrepareSize(int max_prepare,

I think it will be better to use nrecords instead of 'max_prepare'
similar to how you have it in UndoRecordAllocateMulti()

10.
+ if (!UndoRecPtrIsValid(multi_prep_urp))
+ urecptr = UndoRecordAllocateMulti(urec, 1, upersistence, txid);
+ else
+ urecptr = multi_prep_urp;
+
+ size = UndoRecordExpectedSize(urec);
..
..
+ if (UndoRecPtrIsValid(multi_prep_urp))
+ {
+ UndoLogOffset insert = UndoRecPtrGetOffset(multi_prep_urp);
+ insert = UndoLogOffsetPlusUsableBytes(insert, size);
+ multi_prep_urp = MakeUndoRecPtr(UndoRecPtrGetLogNo(urecptr), insert);
+ }

Can't we use urecptr instead of multi_prep_urp in above code after
urecptr is initialized?

11.
+static int max_prepare_undo = MAX_PREPARED_UNDO;

Let's change the name of this variable as max_prepared_undo.  Already
changed in attached delta patch

12.
PrepareUndoInsert()
{
..
+ /* Already reached maximum prepared limit. */
+ if (prepare_idx == max_prepare_undo)
+ return InvalidUndoRecPtr;
..
}

I think in the above condition, we should have elog, otherwise,
callers need to be prepared to handle it.

13.
UndoRecordAllocateMulti()

How about naming it as UndoRecordAllocate as this is used to allocate
even a single undo record?

14.
If not already done, can you pgindent the new code added by this patch?

Attached is a delta patch on top of your 

Re: [HACKERS] Block level parallel vacuum

2018-11-26 Thread Masahiko Sawada
On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila  wrote:
>
> On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila  wrote:
> > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada  
> > wrote:
> > >
> >

Thank you for the comment.

> > I could see that you have put a lot of effort on this patch and still
> > we are not able to make much progress mainly I guess because of
> > relation extension lock problem.  I think we can park that problem for
> > some time (as already we have invested quite some time on it), discuss
> > a bit about actual parallel vacuum patch and then come back to it.
> >
>
> Today, I was reading this and previous related thread [1] and it seems
> to me multiple people Andres [2], Simon [3] have pointed out that
> parallelization for index portion is more valuable.  Also, some of the
> results [4] indicate the same.  Now, when there are no indexes,
> parallelizing heap scans also have benefit, but I think in practice we
> will see more cases where the user wants to vacuum tables with
> indexes.  So how about if we break this problem in the following way
> where each piece give the benefit of its own:
> (a) Parallelize index scans wherein the workers will be launched only
> to vacuum indexes.  Only one worker per index will be spawned.
> (b) Parallelize per-index vacuum.  Each index can be vacuumed by
> multiple workers.
> (c) Parallelize heap scans where multiple workers will scan the heap,
> collect dead TIDs and then launch multiple workers for indexes.
>
> I think if we break this problem into multiple patches, it will reduce
> the scope of each patch and help us in making progress.   Now, it's
> been more than 2 years that we are trying to solve this problem, but
> still didn't make much progress.  I understand there are various
> genuine reasons and all of that work will help us in solving all the
> problems in this area.  How about if we first target problem (a) and
> once we are done with that we can see which of (b) or (c) we want to
> do first?

Thank you for suggestion. It seems good to me. We would get a nice
performance scalability even by only (a), and vacuum will get more
powerful by (b) or (c). Also, (a) would not require to resovle the
relation extension lock issue IIUC. I'll change the patch and submit
to the next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: inconsistency and inefficiency in setup_conversion()

2018-11-26 Thread John Naylor
v7 is a rebase over recent catalog changes.

-John Naylor
From 52aff43dcc91733c1b941fed4582d9c0602004d0 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 13 Oct 2018 19:28:08 +0700
Subject: [PATCH v7 1/2] Add pg_language lookup.

This didn't seem worth doing before, but an upcoming commit will add
88 entries to pg_proc whose languge is 'c'. In addition, we can
now use 'internal' in Gen_fmgr.pl instead of looking up the OID.
This simplifies that script and its Makefile a bit.
---
 doc/src/sgml/bki.sgml|  6 +--
 src/backend/catalog/genbki.pl|  8 +++
 src/backend/utils/Gen_fmgrtab.pl |  9 ++--
 src/backend/utils/Makefile   |  8 +--
 src/include/catalog/pg_proc.dat  | 92 
 src/include/catalog/pg_proc.h|  2 +-
 src/tools/msvc/Solution.pm   |  4 +-
 7 files changed, 65 insertions(+), 64 deletions(-)

diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 786abb95d4..37bb7a2346 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -409,8 +409,8 @@
 that's error-prone and hard to understand, so for frequently-referenced
 catalogs, genbki.pl provides mechanisms to write
 symbolic references instead.  Currently this is possible for references
-to access methods, functions, operators, opclasses, opfamilies, and
-types.  The rules are as follows:
+to access methods, functions, languages, operators, opclasses, opfamilies,
+and types.  The rules are as follows:

 

@@ -420,7 +420,7 @@
   Use of symbolic references is enabled in a particular catalog column
   by attaching BKI_LOOKUP(lookuprule)
   to the column's definition, where lookuprule
-  is pg_am, pg_proc,
+  is pg_am, pg_proc, pg_language,
   pg_operator, pg_opclass,
   pg_opfamily, or pg_type.
   BKI_LOOKUP can be attached to columns of
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index edc8ea9f53..3410816882 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -175,6 +175,13 @@ foreach my $row (@{ $catalog_data{pg_am} })
 	$amoids{ $row->{amname} } = $row->{oid};
 }
 
+# language OID lookup
+my %langoids;
+foreach my $row (@{ $catalog_data{pg_language} })
+{
+	$langoids{ $row->{lanname} } = $row->{oid};
+}
+
 # opclass OID lookup
 my %opcoids;
 foreach my $row (@{ $catalog_data{pg_opclass} })
@@ -249,6 +256,7 @@ foreach my $row (@{ $catalog_data{pg_type} })
 # Map catalog name to OID lookup.
 my %lookup_kind = (
 	pg_am   => \%amoids,
+	pg_language => \%langoids,
 	pg_opclass  => \%opcoids,
 	pg_operator => \%operoids,
 	pg_opfamily => \%opfoids,
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index ca28291355..87d250f318 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -59,6 +59,8 @@ die "No include path; you must specify -I.\n" if !$include_path;
 # Note: We pass data file names as arguments and then look for matching
 # headers to parse the schema from. This is backwards from genbki.pl,
 # but the Makefile dependencies look more sensible this way.
+# We currently only need pg_proc, but retain the possibility of reading
+# more than one data file.
 my %catalogs;
 my %catalog_data;
 foreach my $datfile (@input_files)
@@ -78,13 +80,10 @@ foreach my $datfile (@input_files)
 	$catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 0);
 }
 
-# Fetch some values for later.
+# Fetch a value for later.
 my $FirstBootstrapObjectId =
   Catalog::FindDefinedSymbol('access/transam.h', $include_path,
 	'FirstBootstrapObjectId');
-my $INTERNALlanguageId =
-  Catalog::FindDefinedSymbolFromData($catalog_data{pg_language},
-	'INTERNALlanguageId');
 
 # Collect certain fields from pg_proc.dat.
 my @fmgr = ();
@@ -94,7 +93,7 @@ foreach my $row (@{ $catalog_data{pg_proc} })
 	my %bki_values = %$row;
 
 	# Select out just the rows for internal-language procedures.
-	next if $bki_values{prolang} ne $INTERNALlanguageId;
+	next if $bki_values{prolang} ne 'internal';
 
 	push @fmgr,
 	  {
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index e797539d09..da40f6b6c4 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -31,15 +31,11 @@ generated-header-symlinks: $(top_builddir)/src/include/utils/header-stamp $(top_
 
 $(SUBDIRS:%=%-recursive): fmgr-stamp errcodes.h
 
-FMGR_DATA := $(addprefix $(top_srcdir)/src/include/catalog/,\
-	pg_language.dat pg_proc.dat \
-	)
-
 # fmgr-stamp records the last time we ran Gen_fmgrtab.pl.  We don't rely on
 # the timestamps of the individual output files, because the Perl script
 # won't update them if they didn't change (to avoid unnecessary recompiles).
-fmgr-stamp: Gen_fmgrtab.pl $(catalogdir)/Catalog.pm $(FMGR_DATA) $(top_srcdir)/src/include/access/transam.h
-	$(PERL) -I $(catalogdir) $< -I $(top_srcdir)/src/include/ $(FMGR_DATA)
+fmgr-stamp: Gen_fmgrtab.pl $(catalogdir)/Catalog.pm 

Re: Add extension options to control TAP and isolation tests

2018-11-26 Thread Michael Paquier
On Mon, Nov 26, 2018 at 10:50:35AM +0300, Nikolay Shaplov wrote:
> I see there PGXS mentioned several times as "a build infrastructure
> for extensions" and as an environment variable it is mentioned only in
> code sample

As a variable PGXS stands for the location of the makefile which holds
all the basic rules an extension can use, as pointed for example by
pg_config --pgxs when it comes to extensions compiled out of the tree.
--
Michael


signature.asc
Description: PGP signature


<    1   2