date:20181214

On Thu, Dec 13, 2018 at 3:18 AM John Naylor  wrote:
>
> On 11/24/18, Amit Kapila  wrote:
> > 4. You have mentioned above that "system catalogs created during
> > bootstrap still have a FSM if they have any data." and I can also see
> > this behavior, have you investigated this point further?
>
> I found the cause of this. There is some special-case code in md.c to
> create any file if it's opened in bootstrap mode. I removed this and a
> similar special case (attached), and make check still passes. After
> digging through the history, I'm guessing this has been useless code
> since about 2001, when certain special catalogs were removed.
>

Good finding, but I think it is better to discuss this part
separately.  I have started a new thread for this issue [1].

[1] - 
https://www.postgresql.org/message-id/CAA4eK1KsET6sotf%2BrzOTQfb83pzVEzVhbQi1nxGFYVstVWXUGw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

unnecessary creation of FSM files during bootstrap mode

As pointed out by John Naylor [1], it seems during bootstrap mode, we
are always creating FSM files which are not required.  In commit's
b9d01fe288 and 3908473c80, we have added some code where we allowed
the creation of files during mdopen even if they didn't exist during
the bootstrap mode.  The comments in the code say: "During bootstrap,
there are cases where a system relation will be accessed (by internal
backend processes) before the bootstrap script nominally creates it."
I am sure this will be the case when that code is added but is it
required today?  While going through commit 3908473c80, I came across
below comment:

- * During bootstrap processing, we skip that check, because pg_time,
- * pg_variable, and pg_log get created before their .bki file entries
- * are processed.
- */
+ fd = FileNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, 0600);

The system tables mentioned in above commit are not present today, so
do we really need that code and even if it is required shall we do it
only for 'main' or 'init' forks?

Tom, as you are a committer of the commits b9d01fe288 and 3908473c80,
do you remember anything in this regard?


[1] - 
https://www.postgresql.org/message-id/CAJVSVGVtf%2B-2sQGVyEZJQh15dpVicpFA6BBiPLugaD4oBEaiHg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: removal of dangling temp tables

On Fri, Dec 14, 2018 at 11:06:32PM -0300, Alvaro Herrera wrote:
> I did propose in my OP the idea of a PGPROC boolean flag that indicates
> whether the temp namespace has been set up.  If not, have autovac remove
> those tables.  I like this option better, but I fear it adds more
> ProcArrayLock contention.  Maybe I can just use a new LWLock to
> coordinate that particular member of the ProcGlobal array ... (but then
> it can no longer be a boolean.)

Isn't that what tempNamespaceId can be used for in PGPROC now?  The flag
would be set only when a backend creates a new temporary schema so as it
can be tracked as the owner of the schema.
--
Michael


signature.asc
Description: PGP signature

Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

On Sat, Dec 15, 2018 at 12:14 AM Robert Haas  wrote:
>
> On Wed, Nov 28, 2018 at 1:43 PM Alvaro Herrera  
> wrote:
> > Right, I think option 4 is a clear improvement over option 1.  I can get
> > behind that one.  Since not many people care to vote, I think this tips
> > the scales enough to that side.
>
> I'm showing up very late to the party here,
>

Not a problem, people like you are always welcome.

> but I like option 1 best.
>

You are not alone in this camp, so, IIUC, below are voting by different people

Option-1: Vik, Sergei, Robert
Option-2: Alvaro, Magnus
Option-3: Michael, Hari
Option-4: Amit, Hari, Magnus, Alvaro, Michael, Peter

Some people's name is present in two options as they are okay with
either one of those.  I see that now more people are in favor of
option-1, but still, the tilt is more towards option-4.

> I feel like the SQL standard has a pretty clear idea that NULL is how
> you represent a value is unknown, which shows up in a lot of places.
> Deciding that we're going to use a different sentinel value in this
> one case because NULL is a confusing concept in general seems pretty
> strange to me.
>

I agree that NULL seems to be the attractive choice for a default
value, but we have to also see what it means?  In this case, NULL will
mean 'all' which doesn't go with generally what NULL means (aka
unknown, only NULL can be compared with other NULL).  There are a few
people on this thread who feel that having NULL can lead to misuse of
this API [1] as explained here and probably we need to use some
workaround for it to be used in production [2].

[1] - 
https://www.postgresql.org/message-id/CAA4eK1LicqWY55XxmahQXti4RjQ28iuASAk1X8%2ByKX0J051_VQ%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/20181117111653.cetidngkgol5e5xn%40alvherre.pgsql

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: removal of dangling temp tables

On 2018-Dec-14, Robert Haas wrote:

> On Fri, Dec 14, 2018 at 6:35 PM Tom Lane  wrote:
> > Hm.  It *could*, if we wanted it to run some transactions after
> > finishing recovery.
> 
> It'd have to launch a separate process per database.  That would be
> useful infrastructure for other things, too, like automatic catalog
> upgrades in minor releases, but I'm not volunteering to write that
> infrastructure right now.

This looks like a major project.

> > Alternatively, maybe we could have backends flag whether they've
> > taken ownership of their temp schemas or not, and let autovacuum
> > flush old temp tables if not?
> 
> Yes, that seems like a possibly promising approach.

I did propose in my OP the idea of a PGPROC boolean flag that indicates
whether the temp namespace has been set up.  If not, have autovac remove
those tables.  I like this option better, but I fear it adds more
ProcArrayLock contention.  Maybe I can just use a new LWLock to
coordinate that particular member of the ProcGlobal array ... (but then
it can no longer be a boolean.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: removal of dangling temp tables

On 2018-Dec-14, Robert Haas wrote:

> On Fri, Dec 14, 2018 at 12:27 PM Alvaro Herrera
>  wrote:

> > Maybe it'd be better to change temp table removal to always drop
> > max_locks_per_transaction objects at a time (ie. commit/start a new
> > transaction every so many objects).
> 
> We're basically just doing DROP SCHEMA ... CASCADE, so I'm not sure
> how we'd implement that, but I agree it would be significantly better.

(Minor nit: even currently, we don't drop the schema itself, only the
objects it contains.)

I was thinking we could scan pg_depend for objects depending on the
schema, add them to an ObjectAddresses array, and do
performMultipleDeletions once every max_locks_per_transaction objects.
But in order for this to have any useful effect we'd have to commit the
transaction and start another one; maybe that's too onerous.

Maybe we could offer such a behavior as a special case to be used only
in case the regular mechanism fails.  So add a PG_TRY which, in case of
failure, sends a hint to do the cleanup.  Not sure this is worthwhile.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: automatically assigning catalog toast oids

Hi,

On 2018-12-14 22:54:14 -0300, Alvaro Herrera wrote:
> On 2018-Dec-13, Tom Lane wrote:
> > Looking at the existing entries, it seems like we'd have to have
> > one special case: schema public has OID 2200 but is intentionally
> > not pinned.  I wouldn't have a hard time with teaching isObjectPinned
> > about that; though if it turns out that many places need to know
> > about it, maybe it's not workable.
> 
> Why not just move that OID outside the Genbki special range?  I have
> seen quite a few installs where schema public was removed and later
> re-added.  I've never seen a query hardcode OID 2200, and I'd call one
> which does buggy.

+1

Greetings,

Andres Freund

Re: automatically assigning catalog toast oids

On 2018-Dec-13, Tom Lane wrote:

> We could take it a bit further and not make the 'p' entries for such
> OIDs in the first place, greatly reducing the size of pg_depend in
> typical installations.  Perhaps that'd confuse clients that look at
> that catalog, though.

The number of 'p' entries in pg_depend is often annoying when manually
querying it.  I support the idea of special-casing such entries.

> Looking at the existing entries, it seems like we'd have to have
> one special case: schema public has OID 2200 but is intentionally
> not pinned.  I wouldn't have a hard time with teaching isObjectPinned
> about that; though if it turns out that many places need to know
> about it, maybe it's not workable.

Why not just move that OID outside the Genbki special range?  I have
seen quite a few installs where schema public was removed and later
re-added.  I've never seen a query hardcode OID 2200, and I'd call one
which does buggy.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Computing the conflict xid for index page-level-vacuum on primary

2018-12-14 Thread Peter Geoghegan

On Thu, Dec 13, 2018 at 5:42 PM Andres Freund  wrote:
> This leads me to think that we possibly should move computation of the
> last removed xid from recovery to the primary, during the generation of
> the xl_btree_delete WAL record.

For the record, I share your sense that this isn't a great design for
microvacuuming. It would be better if it happened on the primary,
where we generally have a lot more control/context. Of course, the
devil is in the details.

> Talking with Peter Geoghegan we were pondering a few ideas to address
> that:
> - we could stop doing page level deletion after the tuples on the first
>   heap page, if that frees sufficient space on the index page, to stave
>   of a page split for longer.  That has the downside that it's possible
>   that we'd end up WAL logging more, because we'd repeatedly emit
>   xl_btree_delete records.
> - We could use the visits to the heap pages to be *more* aggressive
>   about deleting the heap page. If there's some tuples on a heap page
>   that are deletable, and we know about that due to the kill_prior_tuple
>   logic, it's fairly likely that other pointers to the same page
>   actually are also dead: We could re-check that for all index entries
>   pointing to pages that we'd visit anyway.

Right. As we discussed, B-Tree is concerned solely with delaying and
probably even preventing a page split in its _bt_findsplitloc() path.
It's far from obvious that an eventual page split is inevitable, since
of course the range of values that belong on the leaf page are always
constrained. We can probably afford to be lazy about the things we
actually kill, since we're operating on the primary at a critical
moment: the point that we need some exact amount of free space to
delay a page split. We can free only as much space as needed,
balancing the need to access heap pages against the immediate need for
free space on the leaf page.

Now, I admit that I haven't thought through the implications for WAL
volume at all. However, it's possible that we might actually manage to
*decrease* the WAL volume in many cases by never freeing space that
isn't truly needed. It's not impossible that we'd regularly come out
ahead there, since a split is not inevitable just because we're
running low on free space this instant. And, I suspect that WAL volume
is the only major additional overhead that comes from being more
discerning about how many tuples are actually killed. Eagerly freeing
space to prevent a page split is a high priority for many B-Tree
implementations; the additional fragmentation of the leaf page caused
by a more incremental approach to deletion isn't likely to be a
problem.

Once you establish the principle that you can be lazy here, you also
establish the principle that you can be more eager, finding additional
things to kill that didn't already have their LP_DEAD bits set based
on heuristics (e.g. "I'm visiting the same heap page anyway, might as
well look for extra stuff"). It isn't necessary to actually have any
of that fancy stuff in your initial version, though.

-- 
Peter Geoghegan

Re: Add pg_partition_root to get top-most parent of a partition tree

On Fri, Dec 14, 2018 at 02:20:27PM +0900, Amit Langote wrote:
> Given that pg_partition_root will return a valid result for any relation
> that can be part of a partition tree, it seems strange that the above
> sentence says "for the given partitioned table or partitioned index".  It
> should perhaps say:
> 
> Return the top-most parent of the partition tree to which the given
> relation belongs

Check.

> +static bool
> +check_rel_for_partition_info(Oid relid)
> +{
> +charrelkind;
> +
> +/* Check if relation exists */
> +if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
> +return false;
> 
> This should be checked in the caller imho.

On this one I disagree, both pg_partition_root and pg_partition_tree
share the same semantics on the matter.  If the set of functions gets
expanded again later on, I got the feeling that we could forget about it
again, and at least placing the check here has the merit to make out
future selves not forget about that pattern..

> I can't imagine this function growing more code to perform additional
> checks beside just checking the relkind, so the name of this function may
> be a bit too ambitious.  How about calling it
> check_rel_can_be_partition()?  The comment above the function could be a
> much simpler sentence too.  I know I may be just bikeshedding here
> though.

The review is also here for that.  The routine name you are suggesting
looks good to me.

> +/*
> + * If the relation is not a partition, return itself as a result.
> + */
> +if (!get_rel_relispartition(relid))
> +PG_RETURN_OID(relid);
> 
> Maybe the comment here could say "The relation itself may be the root
> parent".

Check.  I tweaked the comment in this sense.

> "valid" is duplicated in the last sentence in the comment.  Anyway, what's
> being Asserted can be described simply as:
> 
> /*
>  * 'rootrelid' must contain a valid OID, given that the input relation is
>  * a valid partition tree member as checked above.
>  */

Changed in this sense.  Please find attached an updated patch.
--
Michael
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b3336ea9be..b328c31637 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20270,6 +20270,17 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 their partitions, and so on.

   
+  
+   
+pg_partition_root
+pg_partition_root(regclass)
+   
+   regclass
+   
+Return the top-most parent of a partition tree to which the given
+relation belongs.
+   
+  
  
 

diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c
index 6fb4f6bc50..55588a8fd3 100644
--- a/src/backend/utils/adt/partitionfuncs.c
+++ b/src/backend/utils/adt/partitionfuncs.c
@@ -25,6 +25,34 @@
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
+/*
+ * Perform several checks on a relation on which is extracted some
+ * information related to its partition tree.  Returns false if the
+ * relation cannot be processed, in which case it is up to the caller
+ * to decide what to do, by either raising an error or doing something
+ * else.
+ */
+static bool
+check_rel_can_be_partition(Oid relid)
+{
+	char		relkind;
+
+	/* Check if relation exists */
+	if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
+		return false;
+
+	relkind = get_rel_relkind(relid);
+
+	/* Only allow relation types that can appear in partition trees. */
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_FOREIGN_TABLE &&
+		relkind != RELKIND_INDEX &&
+		relkind != RELKIND_PARTITIONED_TABLE &&
+		relkind != RELKIND_PARTITIONED_INDEX)
+		return false;
+
+	return true;
+}
 
 /*
  * pg_partition_tree
@@ -39,19 +67,10 @@ pg_partition_tree(PG_FUNCTION_ARGS)
 {
 #define PG_PARTITION_TREE_COLS	4
 	Oid			rootrelid = PG_GETARG_OID(0);
-	char		relkind = get_rel_relkind(rootrelid);
 	FuncCallContext *funcctx;
 	ListCell  **next;
 
-	if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(rootrelid)))
-		PG_RETURN_NULL();
-
-	/* Return NULL for relation types that cannot appear in partition trees */
-	if (relkind != RELKIND_RELATION &&
-		relkind != RELKIND_FOREIGN_TABLE &&
-		relkind != RELKIND_INDEX &&
-		relkind != RELKIND_PARTITIONED_TABLE &&
-		relkind != RELKIND_PARTITIONED_INDEX)
+	if (!check_rel_can_be_partition(rootrelid))
 		PG_RETURN_NULL();
 
 	/* stuff done only on the first call of the function */
@@ -153,3 +172,39 @@ pg_partition_tree(PG_FUNCTION_ARGS)
 	/* done when there are no more elements left */
 	SRF_RETURN_DONE(funcctx);
 }
+
+/*
+ * pg_partition_root
+ *
+ * For the given relation part of a partition tree, return its top-most
+ * root parent.
+ */
+Datum
+pg_partition_root(PG_FUNCTION_ARGS)
+{
+	Oid		relid = PG_GETARG_OID(0);
+	Oid		rootrelid;
+	List   *ancestors;
+
+	if (!check_rel_can_be_partition(relid))
+		PG_RETURN_NULL();
+
+	/*
+	 * If the relation is not a

Re: automatically assigning catalog toast oids

Hi,

On 2018-12-13 20:21:12 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > [ separation of FirstBootstrapObjectId and FirstGenbkiObjectId ]
> 
> It seems like we should also add a check to genbki.pl that the ending
> value of GenbkiNextOid is <= FirstBootstrapObjectId, so that we'll
> certainly notice when it's necessary to adjust those boundaries.

Hm, if we go there, it'd probably also good to check for <=
FirstGenbkiObjectId?


> >> I did *not* change record_plan_function_dependency(), it seems correct
> >> that it doesn't track genbki assigned oids, they certainly can't change
> >> while a server is running.  But I'm not entirely clear to why that's not
> >> using FirstNormalObjectId as the cutoff, so perhaps I'm missing
> >> something.  Similar with logic in predicate.c.
> 
> It's not about whether they can change whether the server is running,
> it's about whether they're pinned.  OIDs above 1 might or might
> not be pinned; we shouldn't hard-wire assumptions about that.  So
> I suspect you made the wrong choice there.

Hm, but aren't pins setup by initdb's setup_depend()? IOW, genbki
assigned oids will be in there? Am I missing something?


> BTW, this ties into something that was bugging me this afternoon while
> looking at dependencies on the default collation: there's a bunch of
> places that special-case DEFAULT_COLLATION_OID to skip adding dependencies
> on it, for instance this in dependency.c:
> 
> /*
>  * We must also depend on the constant's collation: it could be
>  * different from the datatype's, if a CollateExpr was const-folded to
>  * a simple constant.  However we can save work in the most common
>  * case where the collation is "default", since we know that's pinned.
>  */
> if (OidIsValid(con->constcollid) &&
> con->constcollid != DEFAULT_COLLATION_OID)
> add_object_address(OCLASS_COLLATION, con->constcollid, 0,
>context->addrs);
> 
> I'm pretty sure that that special case is my fault, added in perhaps
> over-eagerness to minimize the cost added by the collations feature.
> Looking at it now, it seems like mostly a wart.  But perhaps there is
> a more general principle here: why don't we just hard-wire an assumption
> that all OIDs below FirstGenbkiObjectId are pinned?  I'm thinking of
> getting rid of the DEFAULT_COLLATION_OID special cases in dependency-
> recording logic and instead having someplace central like isObjectPinned
> just assume that such OIDs are pinned, so that we don't bother to
> consult or update pg_depend for them.

> We could take it a bit further and not make the 'p' entries for such
> OIDs in the first place, greatly reducing the size of pg_depend in
> typical installations.  Perhaps that'd confuse clients that look at
> that catalog, though.

I think that'd be good, it's the second largest table in a new cluster,
and it's not going to get smaller.  'p' already is somewhat of a special
case, so I'm not particulary concerned with clients having to understand
that.

Greetings,

Andres Freund

Re: Change pgarch_readyXlog() to return .history files first

On Fri, Dec 14, 2018 at 08:43:20AM -0500, David Steele wrote:
> On 12/13/18 7:15 PM, Michael Paquier wrote:
>> I would like to hear opinion from other though if we should consider
>> that as an improvement or an actual bug fix.  Changing the order of the
>> files to map with what the startup process does when promoting does not
>> sound like a bug fix to me, still this is not really invasive, so we
>> could really consider it worth back-patching to reduce common pain from
>> users when it comes to timeline handling.
> 
> I think an argument can be made that it is a bug (ish).  Postgres
> generates the files in one order, and they get archived in a different
> order.

I am not completely sure either.  In my experience, if there is any
doubt on such definitions the best answer is to not backpatch.

>> Or you could just use IsTLHistoryFileName here?
> 
> We'd have to truncate .ready off the string to make that work, which
> seems easy enough.  Is that what you were thinking?

Yes, that's the idea.  pgarch_readyXlog returns the segment name which
should be archived, so you could just compute it after detecting a
.ready file.

> One thing to consider is the check above is more efficient than
> IsTLHistoryFileName() and it potentially gets run a lot.

This check misses strspn(), so any file finishing with .history would
get eaten even if that's unlikely to happen.

>> If one wants to simply check this code, you can just create dummy orphan
>> files in archive_status and see in which order they get removed.
> 
> Seems awfully racy.  Are there currently any tests like this for the
> archiver that I can look at extending?

Sorry for the confusion, I was referring to manual testing here.
Thinking about it, we could have an automatic test to check for the file
order pattern by creating dummy files, starting the server with archiver
enabled, and then parse the logs as orphan .ready files would get
removed in the order their archiving is attempted with one WARNING entry
generated for each.  I am not sure if that is worth a test though.
--
Michael


signature.asc
Description: PGP signature

Re: Add timeline to partial WAL segments

On Fri, Dec 14, 2018 at 06:05:18PM -0500, David Steele wrote:
> On 12/14/18 3:26 PM, Robert Haas wrote:
>> The new TLI is the only thing that is guaranteed to be unique with
>> each new promotion, and I would guess that it is therefore the right
>> thing to use to disambiguate them.
> 
> This is the conclusion we came to after a few months of diagnosing and
> working on this problem.

Hm.  Okay.  From that point of view I think that we should also make
pg_receivewal able to generate the same kind of segment format when it
detects a timeline switch for the last partial segment of the previous
timeline.  Switching to a new timeline makes the streaming finish, so we
could use that to close properly the WAL segment with the new name.  At
the same time it would be nice to have a macro able to generate a
partial segment name and also check after it.

> The question in my mind: is it safe to back-patch?

That's unfortunately a visibility change, meaning that any existing
backup solution able to handle .partial segments would get broken in a
minor release.  From that point of view this should go only on HEAD.
The other patch to reorder the priority of the .ready files could go
down without any problem though.
--
Michael

signature.asc
Description: PGP signature

Re: Variable-length FunctionCallInfoData

> "Andres" == Andres Freund  writes:

 >> I think it'd probably good to add accessors for value/nullness in
 >> arguments that hide the difference between > sake of extension authors. Would probably mostly make sense if we
 >> backpatched those for compatibility.

Speaking as an affected extension author: don't backpatch them.

The extension code has to cope with being compiled against a minor
release that doesn't have the backpatch, so the extension author has to
do their own workaround anyway. Having additional conditions based on
the minor release is just more pain.

-- 
Andrew (irc:RhodiumToad)

Re: valgrind issues on Fedora 28

2018-12-14 Thread Andrew Dunstan



On 12/14/18 4:35 PM, Tomas Vondra wrote:
> On 12/14/18 4:58 PM, Tom Lane wrote:
>> ...
>>
>> In general, I'm not particularly on board with our valgrind.supp
>> carrying suppressions for code outside our own code base: I think
>> that's assuming WAY too much about which version of what is installed
>> on a particular box.
>>
> Fair point.
>
>> Maybe we could do something to make it simpler to have custom
>> suppressions?  Not sure what, though.
>>
> I was thinking that perhaps we could allows specifying path to extra
> suppressions and pass that to valgrind.
>
> But we don't actually invoke valgrind, that's something people do on
> their own anyway - so we don't have anywhere to pass the path to. And
> whoever invokes valgrind can simply stick it directly into the command
> they're using (as it allows specifying multiple --suppressions=
> options). Or perhaps just put it into ~/.valgrindrc.
>
> So perhaps we should simply revert that commit and be done with it.
>
> One place that will need to solve it is buildfarm client, but it could
> pick either of the options I mentioned.
>
>

The buildfarm client has a parameter in the config file for valgrind
options. All you would have to do is an an extra --suppressions setting
in there.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add timeline to partial WAL segments

2018-12-14 Thread David Steele

On 12/14/18 3:26 PM, Robert Haas wrote:
> On Thu, Dec 13, 2018 at 12:17 AM Michael Paquier  wrote:
>> On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:
>>> The LSN switch point is often the same even when servers are going to
>>> different timelines.  If the LSN is different enough then the problem
>>> solves itself since the .partial will be on an entirely different
>>> segment.
>>
>> That would mean that WAL forked exactly at the same record.  You have
>> likely seen more cases where than can happen in real life than I do.
> 
> Suppose that the original master fails during an idle period, and we
> promote a slave.  But we accidentally promote a slave that can't serve
> as the new master, like because it's in a datacenter with an
> unreliable network connection or one which is about to be engulfed in
> lava.  

Much more common than people think.

> So, we go to promote a different slave, and because we never
> got around to reconfiguring the standbys to follow the previous
> promotion, kaboom.

Exactly.

> Or, suppose we do PITR to recover from some user error, but then
> somebody screws up the contents of the recovered cluster and we have
> to do it over again. Naturally we'll recover to the same point.
> 
> The new TLI is the only thing that is guaranteed to be unique with
> each new promotion, and I would guess that it is therefore the right
> thing to use to disambiguate them.

This is the conclusion we came to after a few months of diagnosing and
working on this problem.

The question in my mind: is it safe to back-patch?

-- 
-David
da...@pgmasters.net

Re: Variable-length FunctionCallInfoData

Hi,

On 2018-10-09 12:18:02 -0700, Andres Freund wrote:
> Here's an updated version of the patch. Besides a rebase the biggest
> change is that I've wrapped:
> 
> On 2018-06-05 10:29:52 -0700, Andres Freund wrote:
> > There's some added uglyness, which I hope we can polish a bit
> > further. Right now we allocate a good number of FunctionCallInfoData
> > struct on the stack - which doesn't quite work afterwards anymore.  So
> > the stack allocations, for the majoroity cases where the argument number
> > is known, currently looks like:
> > 
> > union {
> > FunctionCallInfoData fcinfo;
> > char *fcinfo_data[SizeForFunctionCallInfoData(0)];
> > } fcinfodata;
> > FunctionCallInfo fcinfo = &fcinfodata.fcinfo;
> > 
> > that's not pretty, but also not that bad.
> 
> into a macro STACK_FCINFO_FOR_ARGS(varname, numargs). That makes the
> code look much nicer.
> 
> I think we should go for this. If there's some agreement on that I'll
> perform a bit more polishing.
> 
> I think it'd probably good to add accessors for value/nullness in
> arguments that hide the difference between  extension authors. Would probably mostly make sense if we backpatched
> those for compatibility.
> 
> 
> Also attached is a second patch that avoids all the duplication in
> fmgr.[ch]. While the savings are nice, I'm a bit doubtful that the
> amount of magic here is reasonable.  Any opinions?

Any comments?  Otherwise I plan to press forward with this soon-ish.

Greetings,

Andres Freund

log bind parameter values on error

2018-12-14 Thread Alexey Bashtanov


Hello,

I'd like to propose a patch to log bind parameter values not only when 
logging duration,
but also on error (timeout in particular) or in whatever situation the 
statement normally gets logged.
This mostly could be useful when the request originator doesn't log them 
either, so it's hard

to reproduce the problem.

Unfortunately, when enabled, the feature comes with some memory and CPU 
overhead,

as we cannot convert certain values to text when in aborted transaction.

We potentially could do the trick with built-in types, but it would need 
cautious work with composite types,
and also require more computation on the logging stage, which is a risk 
of cascading errors.
Custom types still wouldn't be loggable, even as passed by client, which 
would be not great.


So I decided to cache textual representations on bind stage,
which is especially easy if the client uses text protocol.

Best,
  Alexey
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4a7121a..7b38aed 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6265,6 +6265,23 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   
  
 
+ 
+  log_parameters (boolean)
+  
+   log_parameters configuration parameter
+  
+  
+  
+   
+Controls whether the statement is logged with bind parameter values. 
+It adds some overhead, as postgres will cache textual
+representations of parameter values in memory for all statements,
+even if they eventually do not get logged. The default is
+off. Only superusers can change this setting.
+   
+  
+ 
+
  
   log_statement (enum)
   
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 5684997..ba431ce 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -94,7 +94,7 @@ PerformCursorOpen(DeclareCursorStmt *cstmt, ParamListInfo params,
 	/*
 	 * Create a portal and copy the plan and queryString into its memory.
 	 */
-	portal = CreatePortal(cstmt->portalname, false, false);
+	portal = CreatePortal(cstmt->portalname, false, false, false);
 
 	oldContext = MemoryContextSwitchTo(portal->portalContext);
 
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 6036b73..363c096 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -404,6 +404,7 @@ EvaluateParams(PreparedStatement *pstmt, List *params,
 	paramLI->parserSetup = NULL;
 	paramLI->parserSetupArg = NULL;
 	paramLI->numParams = num_params;
+	paramLI->hasTextValues = false;
 
 	i = 0;
 	foreach(l, exprstates)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index fc7c605..6d047cd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -921,6 +921,7 @@ postquel_sub_params(SQLFunctionCachePtr fcache,
 			paramLI->parserSetup = NULL;
 			paramLI->parserSetupArg = NULL;
 			paramLI->numParams = nargs;
+			paramLI->hasTextValues = false;
 			fcache->paramLI = paramLI;
 		}
 		else
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index ad72667..532a365 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1302,7 +1302,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
 	else
 	{
 		/* In this path, error if portal of same name already exists */
-		portal = CreatePortal(name, false, false);
+		portal = CreatePortal(name, false, false, false);
 	}
 
 	/* Copy the plan's query string into the portal */
@@ -2399,6 +2399,7 @@ _SPI_convert_params(int nargs, Oid *argtypes,
 		paramLI->parserSetup = NULL;
 		paramLI->parserSetupArg = NULL;
 		paramLI->numParams = nargs;
+		paramLI->hasTextValues = false;
 
 		for (i = 0; i < nargs; i++)
 		{
diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c
index 79197b1..3c3b152 100644
--- a/src/backend/nodes/params.c
+++ b/src/backend/nodes/params.c
@@ -31,6 +31,8 @@
  * set of parameter values.  If dynamic parameter hooks are present, we
  * intentionally do not copy them into the result.  Rather, we forcibly
  * instantiate all available parameter values and copy the datum values.
+ *
+ * We don't copy textual representations here.
  */
 ParamListInfo
 copyParamList(ParamListInfo from)
@@ -53,6 +55,7 @@ copyParamList(ParamListInfo from)
 	retval->parserSetup = NULL;
 	retval->parserSetupArg = NULL;
 	retval->numParams = from->numParams;
+	retval->hasTextValues = false;
 
 	for (i = 0; i < from->numParams; i++)
 	{
@@ -229,6 +232,7 @@ RestoreParamList(char **start_address)
 	paramLI->parserSetup = NULL;
 	paramLI->parserSetupArg = NULL;
 	paramLI->numParams = nparams;
+	paramLI->hasTextValues = false;
 
 	for (i = 0; i < nparams; i++)
 	{
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5ab7d3c..2ad56a2 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -183,7 +183,7 @@ static void f

Re: Catalog views failed to show partitioned table information.

On Fri, Dec 14, 2018 at 05:21:49PM +0530, Suraj Kharage wrote:
> There are some catalog views which do not show the partitioned table and
> its index entry.
> One of them is "pg_indexes" which failed to show the partitioned index.
> Attached the patch which fixes the same.

I tend to agree with your comment here.  pg_tables lists partitioned
tables, but pg_indexes is forgotting about partitioned indexes.  So this
is a good thing to add.

> Other views such as pg_stat*,pg_statio_* has the same problem for
> partitioned tables and indexes.
> Since the partitioned tables and its indexes considered as a dummy, they do
> not have any significance in stat tables,
> can we still consider adding relkind=p in these pg_stat_* views? Thoughts?

I am less sure about that as partitioned relations do not have a
physical presence.
--
Michael


signature.asc
Description: PGP signature

Re: 'infinity'::Interval should be added

On Fri, Dec 14, 2018 at 9:11 PM Tom Lane  wrote:
> Isaac Morland  writes:
> > I would be interested if you have an example where the ability of
> > date/timestamp values to be infinite adds special case coding.
>
> I think Robert is talking about the implementation functions for
> timestamp-related operations, which typically do have to special-case
> infinite inputs.  I take your point that there might be fewer special
> cases in the calling SQL code, though.

Exactly.  And now that we have JIT compilation, the overhead of such
cases in more worth considering than previously.  To take a related
example, int4pl() can be reduced to a single instruction if you ignore
overflow -- but you can't ignore overflow, which means you needs some
more instructions to handle the overflow case, which means that the
number of instructions is actually increasing quite a bit.  Without
JIT, maybe that's not too noticeable because you've got to pay the
cost of setting up a stack frame for the function and tearing it down,
but JIT can optimize those costs away.

So essentially I think supporting special values like infinity boils
down to trading away some small amount of performance -- more likely
to be noticeable with JIT -- for some amount of possible programmer
convenience.  Some people may think that's a good trade, and other
people may not like it.  It just depends on whether the 'infinity'
value is useful to you. If it is, you'll probably like the trade,
because aside from the notational cost which Isaac mentioned, it's
probably vastly faster to handle the infinity values in C code than to
stick CASE..WHEN in to an SQL query.  If it's not, you may dislike it.
If your application code now has to know about the possibility
'infinity' value that it otherwise wouldn't have to worry about, you
may dislike it for that reason also.

I'm not sure there's one right answer here - my personal feeling is
that infinite values are a wart, but I grant Isaac's point that they
can sometimes simplify SQL coding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bogus EPQ plan construction in postgres_fdw

On Wed, Dec 12, 2018 at 8:00 PM Tom Lane  wrote:
> By chance I noticed that postgres_fdw's postgresGetForeignPlan() assumes
> --- without any checking --- that the outer_plan it's given for a join
> relation must have a NestLoop, MergeJoin, or HashJoin node at the top.
> That's been wrong at least since commit 4bbf6edfb (which could cause
> insertion of a Sort node on top) and it seems like a pretty unsafe
> thing to Just Assume even without that.

Thanks for taking care of this.  I've looked at this code any number
of times and never quite noticed the assumption that outer_plan had to
be a join.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Speeding up text_position_next with multibyte encodings

On 12/14/18, John Naylor  wrote:
> I signed up to be a reviewer, and I will be busy next month, so I went
> ahead and fixed the typo in the patch that broke assert-enabled
> builds. While at it, I standardized on the spelling "start_ptr" in a
> few places to match the rest of the file. It's a bit concerning that
> it wouldn't compile with asserts, but the patch was written by a
> committer and seems to work.

I just noticed that the contrib/citext test fails. I've set the status
to waiting on author.

-John Naylor

Re: valgrind issues on Fedora 28

On 12/14/18 4:58 PM, Tom Lane wrote:
> ...
> 
> In general, I'm not particularly on board with our valgrind.supp
> carrying suppressions for code outside our own code base: I think
> that's assuming WAY too much about which version of what is installed
> on a particular box.
> 

Fair point.

> Maybe we could do something to make it simpler to have custom
> suppressions?  Not sure what, though.
> 

I was thinking that perhaps we could allows specifying path to extra
suppressions and pass that to valgrind.

But we don't actually invoke valgrind, that's something people do on
their own anyway - so we don't have anywhere to pass the path to. And
whoever invokes valgrind can simply stick it directly into the command
they're using (as it allows specifying multiple --suppressions=
options). Or perhaps just put it into ~/.valgrindrc.

So perhaps we should simply revert that commit and be done with it.

One place that will need to solve it is buildfarm client, but it could
pick either of the options I mentioned.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: inconsistency and inefficiency in setup_conversion()

On 12/1/18, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> I see that the author keeps patch updated, but I'm a bit worried because of
> lack of full review since probably May. I'm moving it to the next CF, let's
> see
> if there would be more feedback.
>
> P.S. adding Daniel, since he is assigned as a reviewer.

Having heard nothing in a while, I've removed Daniel as a reviewer to
make room for someone else. He is, of course free to re-add himself.
v8 is attached.

Since it's been a few months since last discussion, I'd like to
summarize the purpose of this patch and advocate for its inclusion in
v12:

1. Correctness
In the intro thread [1], I showed that object comments on some
conversions are wrong, and hard to fix given the current setup. This
is a documentation bug of sorts.

2. Clean-up
Currently, utils/mb/conversion_procs/Makefile has an ad-hoc script to
generate the SQL file, which has to be duplicated in the MSVC tooling,
and executed by initdb.c. Storing the conversions in .dat files
removes the need for any of that.

3. Performance
This patch shaves 5-6% off of initdb. Not as much as hoped, but still
a nice bonus.

[1] 
https://www.postgresql.org/message-id/CAJVSVGWtUqxpfAaxS88vEGvi%2BjKzWZb2EStu5io-UPc4p9rSJg%40mail.gmail.com


--John Naylor
From bdbdb36129708d1d417f8db9009b377c3d4e3e90 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 14 Dec 2018 15:19:54 -0500
Subject: [PATCH v8 1/2] Add pg_language lookup.

This didn't seem worth doing before, but an upcoming commit will add
88 entries to pg_proc whose languge is 'c'. In addition, we can
now use 'internal' in Gen_fmgr.pl instead of looking up the OID.
This simplifies that script and its Makefile a bit.
---
 doc/src/sgml/bki.sgml|  6 +--
 src/backend/catalog/genbki.pl|  8 +++
 src/backend/utils/Gen_fmgrtab.pl |  7 ++-
 src/backend/utils/Makefile   |  8 +--
 src/include/catalog/pg_proc.dat  | 92 
 src/include/catalog/pg_proc.h|  2 +-
 src/tools/msvc/Solution.pm   |  4 +-
 7 files changed, 64 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 786abb95d4..37bb7a2346 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -409,8 +409,8 @@
 that's error-prone and hard to understand, so for frequently-referenced
 catalogs, genbki.pl provides mechanisms to write
 symbolic references instead.  Currently this is possible for references
-to access methods, functions, operators, opclasses, opfamilies, and
-types.  The rules are as follows:
+to access methods, functions, languages, operators, opclasses, opfamilies,
+and types.  The rules are as follows:

 

@@ -420,7 +420,7 @@
   Use of symbolic references is enabled in a particular catalog column
   by attaching BKI_LOOKUP(lookuprule)
   to the column's definition, where lookuprule
-  is pg_am, pg_proc,
+  is pg_am, pg_proc, pg_language,
   pg_operator, pg_opclass,
   pg_opfamily, or pg_type.
   BKI_LOOKUP can be attached to columns of
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 8e2a2480be..8ccfcb7724 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -182,6 +182,13 @@ foreach my $row (@{ $catalog_data{pg_am} })
 	$amoids{ $row->{amname} } = $row->{oid};
 }
 
+# language OID lookup
+my %langoids;
+foreach my $row (@{ $catalog_data{pg_language} })
+{
+	$langoids{ $row->{lanname} } = $row->{oid};
+}
+
 # opclass OID lookup
 my %opcoids;
 foreach my $row (@{ $catalog_data{pg_opclass} })
@@ -256,6 +263,7 @@ foreach my $row (@{ $catalog_data{pg_type} })
 # Map catalog name to OID lookup.
 my %lookup_kind = (
 	pg_am   => \%amoids,
+	pg_language => \%langoids,
 	pg_opclass  => \%opcoids,
 	pg_operator => \%operoids,
 	pg_opfamily => \%opfoids,
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index ed16737a6a..b6809c7277 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -59,6 +59,8 @@ die "No include path; you must specify -I.\n" if !$include_path;
 # Note: We pass data file names as arguments and then look for matching
 # headers to parse the schema from. This is backwards from genbki.pl,
 # but the Makefile dependencies look more sensible this way.
+# We currently only need pg_proc, but retain the possibility of reading
+# more than one data file.
 my %catalogs;
 my %catalog_data;
 foreach my $datfile (@input_files)
@@ -82,9 +84,6 @@ foreach my $datfile (@input_files)
 my $FirstGenbkiObjectId =
   Catalog::FindDefinedSymbol('access/transam.h', $include_path,
 	'FirstGenbkiObjectId');
-my $INTERNALlanguageId =
-  Catalog::FindDefinedSymbolFromData($catalog_data{pg_language},
-	'INTERNALlanguageId');
 
 # Collect certain fields from pg_proc.dat.
 my @fmgr = ();
@@ -94,7 +93,7 @@ foreach my $row (@{ $catalog_data{pg_proc} })
 	my %bki_values = %$row;
 
 	# Select out just the rows for i

Re: 'infinity'::Interval should be added

Isaac Morland  writes:
> I would be interested if you have an example where the ability of
> date/timestamp values to be infinite adds special case coding.

I think Robert is talking about the implementation functions for
timestamp-related operations, which typically do have to special-case
infinite inputs.  I take your point that there might be fewer special
cases in the calling SQL code, though.

> The wart I'm worried about is subtraction of infinite dates. Right now
> dates subtract to give integers; and there are no infinite integers. All
> the clever solutions to this I have right now involve making highly
> backward-incompatible changes.

As far as that goes, I'm content to say that infinity is outside the
domain of type date.  If you need infinities, use timestamp.

regards, tom lane

Re: ExecBuildGroupingEqual versus collations

Andres Freund  writes:
> On 2018-12-14 14:25:30 -0500, Tom Lane wrote:
>> Now, it's certainly true that nameeq() doesn't need a collation spec
>> today, any more than texteq() does, because both types legislate that
>> equality is bitwise.  But if we leave ExecBuildGroupingEqual like this,
>> we're mandating that no type anywhere, ever, can have a
>> collation-dependent notion of equality.  Is that really a restriction
>> we're comfortable with?  citext is sort of the poster child here,
>> because it already wishes it could have collation-dependent equality.

> Don't we rely on that in other places too?

Possibly; the regression tests probably don't stress type "name" too hard,
so my Assert might well not be finding some other code paths that do
likewise.  The question at hand is whether we're going to say those are
bugs to be fixed, or that it's OK as-is.

> I think one issue here is that it's not clear how it'd be sensible to
> have collation dependent equality for cases where we don't have a real
> way to override the collation for an operation.  It's not too hard to
> imagine adding something to GROUP BY, but set operations seem harder.

Yeah, fair point.  But we've got comparable issues with respect to
which equality operator to use in the first place, for a lot of these
constructs, cf
https://www.postgresql.org/message-id/10492.1531515...@sss.pgh.pa.us
(which I've not made any further progress on).  Unless you want to
throw up your hands and say that *all* equality is bitwise, we need
to think about ways to deal with that.

regards, tom lane

Re: 'infinity'::Interval should be added

2018-12-14 Thread Isaac Morland

On Fri, 14 Dec 2018 at 15:16, Robert Haas  wrote:

> Why?  I consider it somewhat of a wart that timestamps allow infinity
> - it adds special case coding all over the place.  Allowing intervals
> to be infinite as well seems like it'll just create more special cases
> without really solving anything.
>

Au contraire, it eliminates special case coding all over the place. For
example, if I have authorizations that don't expire, I can just give them
an expiration of 'infinity' and operations like "today's date less than
expiration" just work; I don't have to say "expiration null or greater than
today's date".

I would be interested if you have an example where the ability of
date/timestamp values to be infinite adds special case coding.

If we allow intervals to be infinity, then we will eliminate more special
cases. Right now, whenever timestamp values are subtracted, the programmer
has to consider what happens if an operand is infinity and handle that case
specially. With infinite intervals allowed, this problem is reduced. For
example, how long until expiry? Right now:

CASE WHEN expiry = 'infinity' THEN NULL WHEN expiry = '-infinity' THEN '0'
ELSE expiry - current_timestamp END

With the proposal:

expiry - current_timestamp

And another improvement: infinity is represented by 'infinity' rather than
the somewhat ambiguous NULL.

Just for definiteness, I believe the following are the correct semantics
for subtraction involving infinity:

∞ - t = ∞ (t ≠ ∞)
-∞ - t = -∞ (t ≠ -∞)
∞ - ∞ = err
-∞ - -∞ = err

I'm not sure whether "err" should be an error, as it is now ("cannot
subtract infinite dates") and like division by 0, or NULL.

The wart I'm worried about is subtraction of infinite dates. Right now
dates subtract to give integers; and there are no infinite integers. All
the clever solutions to this I have right now involve making highly
backward-incompatible changes.

Re: ExecBuildGroupingEqual versus collations

Hi,

On 2018-12-14 14:25:30 -0500, Tom Lane wrote:
> In pursuit of places that might not be on board with non-default
> collations, I tried sticking
> 
> Assert(PG_GET_COLLATION() == C_COLLATION_OID);
> 
> into nameeq() and the other name comparison functions, in my build with
> type name's typcollation changed to "C".  I'm down to one place in the
> core regression tests that falls over because of that:
> ExecBuildGroupingEqual() figures it can get away with
> 
> InitFunctionCallInfoData(*fcinfo, finfo, 2,
>  InvalidOid, NULL, NULL);
> 
> i.e. passing collation = InvalidOid to the type-specific equality
> functions.

Yea, I just cargo-culted that behaviour forward, the code previously
did:

bool
execTuplesMatch(TupleTableSlot *slot1,
TupleTableSlot *slot2,
int numCols,
AttrNumber *matchColIdx,
FmgrInfo *eqfunctions,
MemoryContext evalContext)
...
/* Apply the type-specific equality function */

if (!DatumGetBool(FunctionCall2(&eqfunctions[i],
attr1, attr2)))
{
result = false; /* they aren't equal */
break;
}


> Now, it's certainly true that nameeq() doesn't need a collation spec
> today, any more than texteq() does, because both types legislate that
> equality is bitwise.  But if we leave ExecBuildGroupingEqual like this,
> we're mandating that no type anywhere, ever, can have a
> collation-dependent notion of equality.  Is that really a restriction
> we're comfortable with?  citext is sort of the poster child here,
> because it already wishes it could have collation-dependent equality.

Don't we rely on that in other places too?


> We'd have to do some work to pass a reasonable collation choice through
> to this code from wherever we have the info available, but I don't
> think it'd be a huge amount of work.

Yea, I think it shouldn't be too hard for most callers.

I think one issue here is that it's not clear how it'd be sensible to
have collation dependent equality for cases where we don't have a real
way to override the collation for an operation.  It's not too hard to
imagine adding something to GROUP BY, but set operations seem harder.


> BTW, this case is perhaps a bit of a counterexample to Peter's idea about
> doing collation lookups earlier: if we change this, we'd presumably have
> to do collation lookup right here, even though we know very well that
> common types' equality functions won't use the information.

Hm.

Greetings,

Andres Freund

Re: Ryu floating point output patch

> "Andres" == Andres Freund  writes:

 >> Is this a path we really want to go down? I'm not convinced the
 >> cost/benefit ratio is attractive.

 Andres> float->text conversion is one of the major bottlenecks when
 Andres> backing up postgres, it's definitely a pain-point in practice.

Also an issue with queries. I got into this whole area after diagnosing
a problem for a user on IRC who was seeing a 4x slowdown for PG as
compared to MSSQL, from 30 seconds to 120 seconds. We determined that
the _entire_ difference was accounted for by float conversion.

Now that was an unusual case, downloading a large dataset of floats at
once into a statistics program, but it's very easy to show
proportionally large overheads from float output:

(using this table:
  create table flttst4 as
select i a, i b, i c, i d, i::float8 e, random() f
  from generate_series(1::bigint, 1000::bigint) i;
)

postgres=# copy flttst4(d) to '/dev/null';  -- bigint column
COPY 1000
Time: 2166.001 ms (00:02.166)

postgres=# copy flttst4(e) to '/dev/null';  -- float8 col, integer values
COPY 1000
Time: 4233.005 ms (00:04.233)

postgres=# copy flttst4(f) to '/dev/null';  -- float8 col, random()
COPY 1000
Time: 7261.421 ms (00:07.261)

-- vs. timings with Ryu:

postgres=# copy flttst4(e) to '/dev/null';  -- float8 col, integer values
COPY 1000
Time: 2391.725 ms (00:02.392)

postgres=# copy flttst4(f) to '/dev/null';  -- float8 col, random()
COPY 1000
Time: 2245.699 ms (00:02.246)

-- 
Andrew (irc:RhodiumToad)

Re: ExecBuildGroupingEqual versus collations

Isaac Morland  writes:
> On Fri, 14 Dec 2018 at 14:25, Tom Lane  wrote:
>> Now, it's certainly true that nameeq() doesn't need a collation spec
>> today, any more than texteq() does, because both types legislate that
>> equality is bitwise.  But if we leave ExecBuildGroupingEqual like this,
>> we're mandating that no type anywhere, ever, can have a
>> collation-dependent notion of equality.  Is that really a restriction
>> we're comfortable with?  citext is sort of the poster child here,
>> because it already wishes it could have collation-dependent equality.

> There already seems to be a policy that individual types are not allowed to
> have their own concepts of equality:
> postgres=> select 'NaN'::float = 'NaN'::float;
>  ?column?
> --
>  t
> (1 row)

> According to the IEEE floating point specification, this should be f not t.

I don't think that's a particularly relevant counter-example.  The problem
with following the IEEE spec for that is that it breaks pretty much
everybody's notion of equality, in particular the reflexive axiom (A=A for
any A), which is one of the axioms needed for btrees to work.  That does
*not* preclude allowing non-bitwise-equal values to be considered equal,
which is the case that citext would like --- and, indeed, a case that IEEE
math also requires (0 = -0), and which we do support for floats.

regards, tom lane

Re: Add timeline to partial WAL segments

On Thu, Dec 13, 2018 at 12:17 AM Michael Paquier  wrote:
> On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:
> > The LSN switch point is often the same even when servers are going to
> > different timelines.  If the LSN is different enough then the problem
> > solves itself since the .partial will be on an entirely different
> > segment.
>
> That would mean that WAL forked exactly at the same record.  You have
> likely seen more cases where than can happen in real life than I do.

Suppose that the original master fails during an idle period, and we
promote a slave.  But we accidentally promote a slave that can't serve
as the new master, like because it's in a datacenter with an
unreliable network connection or one which is about to be engulfed in
lava.  So, we go to promote a different slave, and because we never
got around to reconfiguring the standbys to follow the previous
promotion, kaboom.

Or, suppose we do PITR to recover from some user error, but then
somebody screws up the contents of the recovered cluster and we have
to do it over again. Naturally we'll recover to the same point.

The new TLI is the only thing that is guaranteed to be unique with
each new promotion, and I would guess that it is therefore the right
thing to use to disambiguate them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ExecBuildGroupingEqual versus collations

2018-12-14 Thread Isaac Morland

On Fri, 14 Dec 2018 at 14:25, Tom Lane  wrote:

> Now, it's certainly true that nameeq() doesn't need a collation spec
> today, any more than texteq() does, because both types legislate that
> equality is bitwise.  But if we leave ExecBuildGroupingEqual like this,
> we're mandating that no type anywhere, ever, can have a
> collation-dependent notion of equality.  Is that really a restriction
> we're comfortable with?  citext is sort of the poster child here,
> because it already wishes it could have collation-dependent equality.
>

There already seems to be a policy that individual types are not allowed to
have their own concepts of equality:

postgres=> select 'NaN'::float = 'NaN'::float;
 ?column?
--
 t
(1 row)

postgres=>

According to the IEEE floating point specification, this should be f not t.

To me, this suggests that we have already made this decision. Any type that
needs an "almost but not quite equality" concept needs to define a custom
operator or function. I think this is a reasonable approach to take for
collation-related issues.

Interesting relevant Python observation:

>>> a = float ('NaN')
>>> a is a
True
>>> a == a
False
>>>

So Python works quite differently from Postgres in this respect (and many
others).

Re: 'infinity'::Interval should be added

On Thu, Dec 13, 2018 at 9:42 AM Simon Riggs  wrote:
> At present we have a timestamp of 'infinity' and infinite ranges, but no 
> infinite interval
>
> SELECT 'infinity'::timestamp;
> works
>
> SELECT 'infinity'::interval;
> ERROR:  invalid input syntax for type interval: "infinity"
>
> Seems a strange anomaly that we should fix.

Why?  I consider it somewhat of a wart that timestamps allow infinity
- it adds special case coding all over the place.  Allowing intervals
to be infinite as well seems like it'll just create more special cases
without really solving anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Ryu floating point output patch

> "Tom" == Tom Lane  writes:

 Tom> (Still, you might want to try to automate and document the coding
 Tom> format conversion steps, along the line of what I've done recently
 Tom> for our copy of tzcode.)

Working around our declaration-after-statement prohibition required
manually moving some lines of code, manually separating some
declarations from their assignments (removing const qualifiers), and as
a last resort introducing new code blocks. I doubt it could be automated
in any sane way. (This might be an argument to relax that rule.)

Mechanical search-and-replace accounted for the _t suffix on types, the
addition of the UINT64CONSTs, and changing assert to Assert; that much
could be automated.

pgindent did a pretty horrible job on the comments, a script could
probably do better.

I guess removal of the unwanted #ifs could be automated without too much
difficulty.

(But I'm not likely going to do any of this in the forseeable future,
because I don't expect it to be useful.)

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> The maintenance track record of this github repo appears to span
>  Tom> six months,

> The algorithm was only published six months ago.

Doesn't really affect my point: there's little reason to believe that
there will be an active upstream several years down the pike.

regards, tom lane

ExecBuildGroupingEqual versus collations

In pursuit of places that might not be on board with non-default
collations, I tried sticking

Assert(PG_GET_COLLATION() == C_COLLATION_OID);

into nameeq() and the other name comparison functions, in my build with
type name's typcollation changed to "C".  I'm down to one place in the
core regression tests that falls over because of that:
ExecBuildGroupingEqual() figures it can get away with

InitFunctionCallInfoData(*fcinfo, finfo, 2,
 InvalidOid, NULL, NULL);

i.e. passing collation = InvalidOid to the type-specific equality
functions.

Now, it's certainly true that nameeq() doesn't need a collation spec
today, any more than texteq() does, because both types legislate that
equality is bitwise.  But if we leave ExecBuildGroupingEqual like this,
we're mandating that no type anywhere, ever, can have a
collation-dependent notion of equality.  Is that really a restriction
we're comfortable with?  citext is sort of the poster child here,
because it already wishes it could have collation-dependent equality.

We'd have to do some work to pass a reasonable collation choice through
to this code from wherever we have the info available, but I don't
think it'd be a huge amount of work.

BTW, this case is perhaps a bit of a counterexample to Peter's idea about
doing collation lookups earlier: if we change this, we'd presumably have
to do collation lookup right here, even though we know very well that
common types' equality functions won't use the information.

In any case, I think we need a policy decision about whether it's our
intent to allow collation-dependent equality or not.

Comments?

regards, tom lane

Re: Ryu floating point output patch

> "Tom" == Tom Lane  writes:

 Tom> The maintenance track record of this github repo appears to span
 Tom> six months,

The algorithm was only published six months ago.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

Hi,

On 2018-12-14 13:47:53 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > It's been absorbed into MSVC's standard library and a bunch of other
> > projects, so there's certainly some other prominent adoption.
> 
> If we think that's going on, maybe we should just sit tight till glibc
> absorbs it.

All the stuff it'll have to put above it will make it slower
anyway. It's not like this'll be a feature that's hard to rip out if all
libc's have fast roundtrip safe conversion routines, or if something
better comes along.

Greetings,

Andres Freund

Re: Ryu floating point output patch

Andres Freund  writes:
> On 2018-12-14 13:25:29 -0500, Tom Lane wrote:
>> The maintenance track record of this github repo appears to span six
>> months, and it's now been about four months since the last commit.
>> It might be abandonware already.

> The last commit was a month ago, no? November 6th afaict.

Hmm, the commit history I was looking at ended on Aug 19, but I must've
done something wrong, because going in from a different angle shows
me commits up to November.

> It's been absorbed into MSVC's standard library and a bunch of other
> projects, so there's certainly some other prominent adoption.

If we think that's going on, maybe we should just sit tight till glibc
absorbs it.

regards, tom lane

Re: Computing the conflict xid for index page-level-vacuum on primary

Hi,

On 2018-12-14 21:39:48 +0300, Alexander Korotkov wrote:
> On Fri, Dec 14, 2018 at 4:43 AM Andres Freund  wrote:
> > This leads me to think that we possibly should move computation of the
> > last removed xid from recovery to the primary, during the generation of
> > the xl_btree_delete WAL record.
> 
> Do I understand correctly that we need this xid computation, because
> we may delete some index tuples using kill_prior_tuple before we prune
> corresponding heap tuples (which would be also replicated and cause
> required conflict)?

Correct.


> If so, then can we just give up with that?  That is before setting
> kill_prior_tuple = true, prune corresponding heap tuples.

That'd require WAL logging such changes, which'd be pretty bad, because
we'd need to do it one-by-one.  What we could do, and what I suggested
earlier, is that we do a bit more pruning when emitting such WAL
records.


Greetings,

Andres Freund

Re: removal of dangling temp tables

On Fri, Dec 14, 2018 at 6:35 PM Tom Lane  wrote:
> Hm.  It *could*, if we wanted it to run some transactions after
> finishing recovery.

It'd have to launch a separate process per database.  That would be
useful infrastructure for other things, too, like automatic catalog
upgrades in minor releases, but I'm not volunteering to write that
infrastructure right now.

> Alternatively, maybe we could have backends flag whether they've
> taken ownership of their temp schemas or not, and let autovacuum
> flush old temp tables if not?

Yes, that seems like a possibly promising approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

On Wed, Nov 28, 2018 at 1:43 PM Alvaro Herrera  wrote:
> Right, I think option 4 is a clear improvement over option 1.  I can get
> behind that one.  Since not many people care to vote, I think this tips
> the scales enough to that side.

I'm showing up very late to the party here, but I like option 1 best.
I feel like the SQL standard has a pretty clear idea that NULL is how
you represent a value is unknown, which shows up in a lot of places.
Deciding that we're going to use a different sentinel value in this
one case because NULL is a confusing concept in general seems pretty
strange to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: removal of dangling temp tables

Hi,

On 2018-12-14 13:35:50 -0500, Tom Lane wrote:
> Hm.  It *could*, if we wanted it to run some transactions after
> finishing recovery.

How, we don't have infrastructure for changing databases yet?


> But I guess I wonder why bother; if the disk
> space is gone then there's not that much reason to be in a hurry
> to get rid of the catalog entries.  The particular problem Alvaro
> is complaining of might be better handled by ignoring temp tables
> while computing max freeze age etc.  I have some recollection that
> we'd intentionally included them, but I wonder why really; it's
> not like autovacuum is going to be able to do anything about an
> ancient temp table.

We can't truncate the clog, adapt the xid horizon, etc if there's any
temp tables. Otherwise you'd get failures when reading from one, no?

Greetings,

Andres Freund

Re: Computing the conflict xid for index page-level-vacuum on primary

2018-12-14 Thread Alexander Korotkov

On Fri, Dec 14, 2018 at 4:43 AM Andres Freund  wrote:
> This leads me to think that we possibly should move computation of the
> last removed xid from recovery to the primary, during the generation of
> the xl_btree_delete WAL record.

Do I understand correctly that we need this xid computation, because
we may delete some index tuples using kill_prior_tuple before we prune
corresponding heap tuples (which would be also replicated and cause
required conflict)?  If so, then can we just give up with that?  That
is before setting kill_prior_tuple = true, prune corresponding heap
tuples.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: removal of dangling temp tables

Robert Haas  writes:
> On Fri, Dec 14, 2018 at 12:57 PM Tom Lane  wrote:
>> I seem to recall discussions about having crash recovery go around
>> and clean out temp tables.  That seems like a better plan than
>> penalizing every session start with this.

> Well, crash recovery already removes the files, but it can't really
> remove the catalog entries, can it?

Hm.  It *could*, if we wanted it to run some transactions after
finishing recovery.  But I guess I wonder why bother; if the disk
space is gone then there's not that much reason to be in a hurry
to get rid of the catalog entries.  The particular problem Alvaro
is complaining of might be better handled by ignoring temp tables
while computing max freeze age etc.  I have some recollection that
we'd intentionally included them, but I wonder why really; it's
not like autovacuum is going to be able to do anything about an
ancient temp table.

Alternatively, maybe we could have backends flag whether they've
taken ownership of their temp schemas or not, and let autovacuum
flush old temp tables if not?

regards, tom lane

Re: Ryu floating point output patch

Hi,

On 2018-12-14 13:25:29 -0500, Tom Lane wrote:
> Our track record in borrowing code from "upstream" projects is pretty
> miserable: almost without exception, we've found ourselves stuck with
> maintaining such code ourselves after a few years.  I don't see any
> reason to think that wouldn't be true here; in fact there's much more
> reason to worry here than we had for, eg, borrowing the regex code.
> The maintenance track record of this github repo appears to span six
> months, and it's now been about four months since the last commit.
> It might be abandonware already.

It's been absorbed into MSVC's standard library and a bunch of other
projects, so there's certainly some other prominent adoption.

The last commit was a month ago, no? November 6th afaict.

> Is this a path we really want to go down?  I'm not convinced the
> cost/benefit ratio is attractive.

float->text conversion is one of the major bottlenecks when backing up
postgres, it's definitely a pain-point in practice. I've not really seen
a nicer implementation anywhere, not even close.

Greetings,

Andres Freund

Re: Ryu floating point output patch

Andrew Gierth  writes:
> "Thomas" == Thomas Munro  writes:
>  Thomas> -1 for making superficial changes to code that we are
>  Thomas> "vendoring", unless it is known to be dead/abandoned and we're
>  Thomas> definitively forking it. It just makes it harder to track
>  Thomas> upstream's bug fixes and improvements, surely?

> Well, the question there is how far to take that principle; do we avoid
> pgindent'ing the code, do we avoid changing uint64_t etc. to uint64
> etc., and so on.
> The Ryu code is perhaps an extreme example because it follows almost the
> exact reverse of our coding standard.

My heart kind of sinks when looking at this patch, because it's a
large addition of not-terribly-well-documented code that nobody here
really understands, never mind the problem that it's mostly not close
to our own coding standards.

Our track record in borrowing code from "upstream" projects is pretty
miserable: almost without exception, we've found ourselves stuck with
maintaining such code ourselves after a few years.  I don't see any
reason to think that wouldn't be true here; in fact there's much more
reason to worry here than we had for, eg, borrowing the regex code.
The maintenance track record of this github repo appears to span six
months, and it's now been about four months since the last commit.
It might be abandonware already.

Is this a path we really want to go down?  I'm not convinced the
cost/benefit ratio is attractive.

If we do go down this path, though, I'm not too worried about making
it simple to absorb upstream fixes; I bet there will be few to none.
(Still, you might want to try to automate and document the coding
format conversion steps, along the line of what I've done recently
for our copy of tzcode.)

regards, tom lane

Re: Speeding up text_position_next with multibyte encodings

On 11/30/18, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> Unfortunately, patch doesn't compile anymore due:
>
> varlena.c: In function ‘text_position_next_internal’:
> varlena.c:1337:13: error: ‘start_ptr’ undeclared (first use in this
> function)
>   Assert(start_ptr >= haystack && start_ptr <= haystack_end);
>
> Could you please send an updated version? For now I'm moving it to the next
> CF.

I signed up to be a reviewer, and I will be busy next month, so I went
ahead and fixed the typo in the patch that broke assert-enabled
builds. While at it, I standardized on the spelling "start_ptr" in a
few places to match the rest of the file. It's a bit concerning that
it wouldn't compile with asserts, but the patch was written by a
committer and seems to work.

On 10/19/18, Heikki Linnakangas  wrote:
> This is a win in most cases. One case is slower: calling position() with
> a large haystack input, where the match is near the end of the string.

I wanted to test this case in detail, so I ran the attached script,
which runs position() in three scenarios:
short:   1 character needle at the end of a ~1000 character haystack,
repeated 1 million times
medium: 50 character needle at the end of a ~1 million character
haystack, repeated 1 million times
long:  250 character needle at the end of an ~80 million character
haystack (~230MB, comfortably below the 256MB limit Heikki reported),
done just one time

I took the average of 5 runs using Korean characters in both UTF-8
(3-byte) and EUC-KR (2-byte) encodings.

UTF-8
length  master  patch
short   2.26s   2.51s
med 2.28s   2.54s
long3.07s   3.11s

EUC-KR
length  master  patch
short   2.29s   2.37s
med 2.29s   2.36s
long1.75s   1.71s

With UTF-8, the patch is 11-12% slower on short and medium strings,
and about the same on long strings. With EUC-KR, the patch is about 3%
slower on short and medium strings, and 2-3% faster on long strings.
It seems the worst case is not that bad, and could be optimized, as
Heikki said.

-John Naylor
From ca7a228174acb8b85b87642bb6df182139587c05 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sun, 18 Nov 2018 17:40:09 +0700
Subject: [PATCH v2] Use single-byte Boyer-Moore-Horspool search even with
 multibyte encodings.

The old implementation first converted the input strings to arrays of
wchars, and performed the on those. However, the conversion is expensive,
and for a large input string, consumes a lot of memory.

Avoid the conversion, and instead use the single-byte algorithm even with
multibyte encodings. That has a couple of problems, though. Firstly, we
might get fooled if the needle string's byte sequence appears embedded
inside a different string. We might incorrectly return a match in the
middle of a multi-byte character. The second problem is that the
text_position function needs to return the position of the match, counted
in characters, rather than bytes. We can work around both of those problems
by an extra step after finding a match. Walk the string one character at a
time, starting from the beginning, until we reach the position of the match.
We can then check if the match was found at a valid character boundary,
which solves the first problem, and we can count the characters, so that we
can return the character offset. Walking the characters is faster than the
wchar conversion, especially in the case that there are no matches and we
can avoid it altogether. Also, avoiding the large allocation allows these
functions to work with inputs larger than 256 MB, even with multibyte
encodings.

For UTF-8, we can even skip the step to verify that the match lands on a
character boundary, because in UTF-8, the byte sequence of one character
cannot appear in the middle of a different character. I think many other
encodings have the same property, but I wasn't sure, so I only enabled
that optimization for UTF-8.

Most of the callers of the text_position_setup/next functions were actually
not interested in the position of the match, counted in characters. To
cater for them, refactor the text_position_next() interface into two parts:
searching for the next match (text_position_next()), and returning the
current match's position as a pointer (text_position_get_match_ptr()) or
as a character offset (text_position_get_match_pos()). Most callers of
text_position_setup/next are not interested in the character offsets, so
getting the pointer to the match within the original string is a more
convenient API for them. It also allows skipping the character counting
with UTF-8 encoding altogether.
---
 src/backend/utils/adt/varlena.c | 475 +---
 1 file changed, 253 insertions(+), 222 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 0fd3b15748..90f5a37f08 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/ba

Re: removal of dangling temp tables

On Fri, Dec 14, 2018 at 12:57 PM Tom Lane  wrote:
> I seem to recall discussions about having crash recovery go around
> and clean out temp tables.  That seems like a better plan than
> penalizing every session start with this.

Well, crash recovery already removes the files, but it can't really
remove the catalog entries, can it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Ryu floating point output patch

> "Thomas" == Thomas Munro  writes:

 >>> Do we have any reason for the restriction beyond stylistic preference?

 >> No. Robert and Tom were against allowing mixing declarations and
 >> code, a few more against // comments. I don't quite agree with
 >> either, but getting to the rest of C99 seemed more important than
 >> fighting those out at that moment.

 Thomas> -1 for making superficial changes to code that we are
 Thomas> "vendoring", unless it is known to be dead/abandoned and we're
 Thomas> definitively forking it. It just makes it harder to track
 Thomas> upstream's bug fixes and improvements, surely?

Well, the question there is how far to take that principle; do we avoid
pgindent'ing the code, do we avoid changing uint64_t etc. to uint64
etc., and so on.

(I notice that a stray uint8_t that I left behind broke the Windows
build but not the linux/bsd one, so something would have to be fixed in
the windows build in order to avoid having to change that.)

The Ryu code is perhaps an extreme example because it follows almost the
exact reverse of our coding standard.

-- 
Andrew (irc:RhodiumToad)

Re: removal of dangling temp tables

Robert Haas  writes:
> On Fri, Dec 14, 2018 at 11:29 AM Alvaro Herrera
>  wrote:
>> I think the best way to fix this is to call RemoveTempRelations()
>> unconditionally at session start (without doing the rest of the temp
>> table setup, just the removal.)

> That would certainly simplify things.  I think I thought about that as
> far back as a734fd5d1c309cc553b7c8c79fba96218af090f7 but it seemed
> like a significant behavior change and I wasn't sure that everyone
> would like it.  In particular, it adds overhead to backend startup
> that, in the case of a large temp schema, could be fairly long.

> Nevertheless, I tentatively think that a change like this is a good
> idea.  I wouldn't back-patch it, though.

I seem to recall discussions about having crash recovery go around
and clean out temp tables.  That seems like a better plan than
penalizing every session start with this.

regards, tom lane

Re: removal of dangling temp tables

On Fri, Dec 14, 2018 at 12:27 PM Alvaro Herrera
 wrote:
> Hmm, I think in the case covered by your commit, that is a session that
> crashes with a few thousands of temp tables, this new patch might cause
> a failure to open a new session altogether.

Oh, good point.  Or if the catalog is corrupted.

> Maybe it'd be better to change temp table removal to always drop
> max_locks_per_transaction objects at a time (ie. commit/start a new
> transaction every so many objects).

We're basically just doing DROP SCHEMA ... CASCADE, so I'm not sure
how we'd implement that, but I agree it would be significantly better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: removal of dangling temp tables

On 2018-Dec-14, Robert Haas wrote:

> On Fri, Dec 14, 2018 at 11:29 AM Alvaro Herrera
>  wrote:
> > I think the best way to fix this is to call RemoveTempRelations()
> > unconditionally at session start (without doing the rest of the temp
> > table setup, just the removal.)
> 
> That would certainly simplify things.  I think I thought about that as
> far back as a734fd5d1c309cc553b7c8c79fba96218af090f7 but it seemed
> like a significant behavior change and I wasn't sure that everyone
> would like it.  In particular, it adds overhead to backend startup
> that, in the case of a large temp schema, could be fairly long.

Hmm, I think in the case covered by your commit, that is a session that
crashes with a few thousands of temp tables, this new patch might cause
a failure to open a new session altogether.

Maybe it'd be better to change temp table removal to always drop
max_locks_per_transaction objects at a time (ie. commit/start a new
transaction every so many objects).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 13a24631fc..003bbabf1f 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4118,6 +4118,26 @@ RemoveTempRelations(Oid tempNamespaceId)
 }
 
 /*
+ * Remove temp tables, without relying on myTempNamespace being set, and
+ * without setting it.  This is used at session start.
+ */
+void
+InitRemoveTempRelations(void)
+{
+	Oid			namespaceOid;
+	char		namespaceName[NAMEDATALEN];
+
+	Assert(MyBackendId != InvalidBackendId);
+
+	snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId);
+
+	namespaceOid = get_namespace_oid(namespaceName, true);
+
+	if (OidIsValid(namespaceOid))
+		RemoveTempRelations(namespaceOid);
+}
+
+/*
  * Callback to remove temp relations at backend exit.
  */
 static void
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index b636b1e262..b41b47222c 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -1074,6 +1074,10 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (!bootstrap)
 		pgstat_bestart();
 
+	/* Remove any temp tables that might have leaked previously */
+	if (!bootstrap)
+		InitRemoveTempRelations();
+
 	/* close the transaction we started above */
 	if (!bootstrap)
 		CommitTransactionCommand();
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 0e202372d5..f90eeff1b7 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -144,6 +144,7 @@ extern void GetTempNamespaceState(Oid *tempNamespaceId,
 	  Oid *tempToastNamespaceId);
 extern void SetTempNamespaceState(Oid tempNamespaceId,
 	  Oid tempToastNamespaceId);
+extern void InitRemoveTempRelations(void);
 extern void ResetTempTableNamespace(void);
 
 extern OverrideSearchPath *GetOverrideSearchPath(MemoryContext context);

Re: Remove Deprecated Exclusive Backup Mode

On Fri, Dec 14, 2018 at 8:27 AM David Steele  wrote:
> If the server crashes during a backup, the server probably won't
> restart.  We say "may" here but if your backup runs more than a few
> checkpoints it's more like won't.  The remedy is to go manually delete a
> file the user may have never heard of.  They are often hesitant to do so
> -- for good reason.

Sure.  So it's broken in the sense that if you don't read and
carefully follow the directions, you will have problems.  But that's
also true of an automobile, a power saw, the DELETE command,
antibiotics, and the Internal Revenue Code.  Many things in life are
complicated and require due caution, and yet people regularly navigate
things that are VASTLY more complicated than "if the server crashes
during a hot backup, remove this file."  Especially because if we
subsequently fail for this reason, we give you a very precise hint
that tells you exactly what to do:

HINT: If you are not restoring from a backup, try removing the file
"/backup_label"

Now, I grant that emitting a HINT telling you exactly what to do is
not as good as not requiring manual user invention in the first place,
but it's not like we're requiring you to carry out some byzantine
process that nobody can possible understand.  The "if" condition here
is one that any DBA should be able to understand: am I, or am I not,
in the process of restoring a backup?  The system does not know, but
the DBA should.  And once you know that the answer is "no, I'm not
restoring a backup", then you just remove the exact pathname indicated
and try it again and everything works fine.

Peter's complaint -- that this is a pretty liberal reading of the word
"broken" -- is IMHO very well put.  Saying that something is broken
means it doesn't work.  But the exclusive backup mode works fine if
used according to the directions.  The fact that people often get
confused and don't follow the directions, and that the directions tend
to end up requiring manual intervention after a system crash, is
unfortunate, and it is why we have a new API.  But "that thing isn't
as well designed as it could've been" is not the same thing as "that
thing does not work when used as designed," no matter how much you
keep insisting the contrary.  It does work.  It has worked for a long
time.  Many people have used it successfully.  It was the ONLY way of
doing this for many years.  There is no necessary reason why it has to
be a huge problem, and I don't think it is.

In my experience with EnterpriseDB customers, this IS an occasional
source of confusion, but there are other things that are a problem a
lot more frequently, like work_mem, which you often can't set high
enough to get good query performance without causing OOM events when
the system gets busy.  I'd hesitate to call work_mem broken, but a GUC
for which no single value both adequately protects against OOM and
gives workable performance is closer to broken than an exclusive
backup mode, which actually does work - albeit with occasional manual
intervention - if you read and follow the directions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: fix psql \conninfo & \connect when using hostaddr

On 2018-Nov-09, Michael Paquier wrote:

> On Thu, Nov 08, 2018 at 12:13:31PM -0300, Alvaro Herrera wrote:

> > Right, that too.  Fortunately I think compilers warn about
> > mismatching indentation nowadays, at least in some cases.
> 
> I don't recall seeing a compiler warning about that, but I do recall
> coverity complaining loudly about such things.

:-)

/pgsql/source/master/src/backend/catalog/namespace.c: In function 
'InitRemoveTempRelations':
/pgsql/source/master/src/backend/catalog/namespace.c:4132:2: warning: this 'if' 
clause does not guard... [-Wmisleading-indentation]
  if (OidIsValid(namespaceOid))
  ^~
/pgsql/source/master/src/backend/catalog/namespace.c:4134:3: note: ...this 
statement, but the latter is misleadingly indented as if it is guarded by the 
'if'
   get_namespace_oid(namespaceName, true);
   ^

$ gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: removal of dangling temp tables

On Fri, Dec 14, 2018 at 11:29 AM Alvaro Herrera
 wrote:
> I think the best way to fix this is to call RemoveTempRelations()
> unconditionally at session start (without doing the rest of the temp
> table setup, just the removal.)

That would certainly simplify things.  I think I thought about that as
far back as a734fd5d1c309cc553b7c8c79fba96218af090f7 but it seemed
like a significant behavior change and I wasn't sure that everyone
would like it.  In particular, it adds overhead to backend startup
that, in the case of a large temp schema, could be fairly long.

Nevertheless, I tentatively think that a change like this is a good
idea.  I wouldn't back-patch it, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

A case for UPDATE DISTINCT attribute

2018-12-14 Thread Gajus Kuizinas

I have observed that the following pattern is repeating in our data
management programs:

UPDATE
  event
SET
  fuid = ${fuid},
  venue_id = ${venueId},
  url = ${url}
WHERE
  id = ${id} AND
  fuid IS != ${fuid} AND
  venue_id IS != ${venueId} AND
  url IS DISTINCT FROM ${url};

Note: "url" can be null. Therefore, using IS DISTINCT FROM.

The reasons we are using this pattern are multiple:

   - an empty update will trigger matching triggers.
   - an empty update will be WAL-logged
   - an empty update create dead tuples that will need to be cleaned up by
   AUTOVACUUM

In cases where the data does not change, all of these are undesirable side
effects.

Meanwhile, a WHERE condition that excludes rows with matching values makes
this into a noop in case of matching target column values.

It appears this that this pattern should be encouraged, but the verbosity
(and the accompanying risk of introducing logical error, e.g. accidentally
using = comparison on a NULLable column) makes this a rarely used pattern.

I suggest that introducing an attribute such as "UPDATE DISTINCT", e.g.

UPDATE DISTINCT
  event
SET
  fuid = ${fuid},
  venue_id = ${venueId},
  url = ${url}
WHERE
  id = ${id}

would encourage greater adoption of such pattern.

Is there a technical reason this does not existing already?

ᐧ

removal of dangling temp tables

We recently ran into a funny situation, where autovacuum would not
remove very old temp tables. The relfrozenxid of those tables was about
to reach the max freeze age, so monitoring started to complain. It
turned out that autovacuum saw that the backendId was used by a live
backend ... but that session was in reality not using those temp tables,
and had not used any temp table at all. They were left-overs from a
crash months ago, and since the session was not using temp tables, they
had not been removed. DISCARD ALL may have run, but had no effect.

I think the best way to fix this is to call RemoveTempRelations()
unconditionally at session start (without doing the rest of the temp
table setup, just the removal.)

In versions earlier than pg11, related issues occur if you have a crash
with autovacuum off and/or workers disabled, and temp tables are leaked
in backendID 1 or 2; then start with normal values. In that case, those
backendIDs are used by the logical replication launcher and the
autovacuum launcher, so autovacuum does not remove them either. This
was fixed in PG11 inadvertently by this commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=943576bddcb52971041d9f5f806789921fa107ee
The reason the commit fixes it is that now the databaseID of the PGPROC
entry is compared to the temp table's database; and for those worker
processes, the DatabaseId is InvalidOid so it all works out.

This isn't a terribly interesting bug, as restarting with changed
worker/autovacuum options after a crash that happens to leak temp tables
should be quite rare. But anyway we can fix this second issue in prior
branches by adding a comparison to databaseId to the existing 'if' test
in autovacuum; easy enough, no compatibility concerns. This should also
cover the case that one session crashes leaking temp tables, then the
same session connects to a different database for a long time. (I
didn't verify this last point to be a real problem.)

--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-14 Thread Bruce Momjian

On Thu, Dec 13, 2018 at 10:40:59PM +0300, Alexander Korotkov wrote:
> On Thu, Dec 13, 2018 at 8:06 PM Andrey Borodin  wrote:
> > That's the same variable, one place is definition while other is potential 
> > misuse.
> > Seems like these 2 lines [0]
> >
> > +   if (BufferIsValid(lbuffer))
> > +   UnlockReleaseBuffer(lbuffer);
> >
> > are superfluous: lbuffer is UnlockReleased earlier.
> 
> Thanks to everybody for noticing!  Speaking more generally backpatch
> to 9.4 was completely wrong: there were multiple errors.  It's my
> oversight, I forget how much better our xlog system became since 9.4
> :)
> 
> I've pushed bf0e5a73be fixing that.

I can confirm the compiler warning is now gone.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: valgrind issues on Fedora 28

Tomas Vondra  writes:
> On 12/14/18 5:10 AM, Tom Lane wrote:
>> So I just got around to trying to do some valgrind testing for the first
>> time in a couple of months, and what I find is that this patch completely
>> breaks valgrind for me ...
>> This is valgrind 3.8.1 (RHEL6's version), so not bleeding edge exactly,
>> but we have very little evidence about how far back the committed patch
>> works.  I'm inclined to think we can't use this solution.

> Fedora 28 has 3.14.0, so this seems to be due to some improvements since
> 3.8.1. I'm not sure how to deal with this - surely we don't want to just
> revert the change because that would start generating noise again. For a
> moment I thought we might wildcard the error type somehow, so that it
> would accept Memcheck:* but AFAICs that is not supported.

Meh.  There are lots of situations where it's necessary to carry some
platform-specific valgrind suppressions; I have half a dozen on my
RHEL6 box because of various weirdness in its old version of perl, for
example.  I think this is one where people running code new enough to
have this problem need to carry private suppressions of their own.

In general, I'm not particularly on board with our valgrind.supp
carrying suppressions for code outside our own code base: I think
that's assuming WAY too much about which version of what is installed
on a particular box.

Maybe we could do something to make it simpler to have custom
suppressions?  Not sure what, though.

regards, tom lane

Re: row filtering for logical replication

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On 11/23/18 8:03 PM, Stephen Frost wrote:
> > * Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote:
> >> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek 
> >> wrote:
>  If carefully documented I see no problem with it... we already have an
>  analogous problem with functional indexes.
> >>>
> >>> The difference is that with functional indexes you can recreate the
> >>> missing object and everything is okay again. With logical replication
> >>> recreating the object will not help.
> >>>
> >>
> >> In this case with logical replication you should rsync the object. That is
> >> the price of misunderstanding / bad use of the new feature.
> >>
> >> As usual, there are no free beer ;-)
> > 
> > There's also certainly no shortage of other ways to break logical
> > replication, including ways that would also be hard to recover from
> > today other than doing a full resync.
> 
> Sure, but that seems more like an argument against creating additional
> ones (and for preventing those that already exist). I'm not sure this
> particular feature is where we should draw the line, though.

I was actually going in the other direction- we should allow it because
advanced users may know what they're doing better than we do and we
shouldn't prevent things just because they might be misused or
misunderstood by a user.

> > What that seems to indicate, to me at least, is that it'd be awful
> > nice to have a way to resync the data which doesn't necessairly
> > involve transferring all of it over again.
> > 
> > Of course, it'd be nice if we could track those dependencies too,
> > but that's yet another thing.
> 
> Yep, that seems like a good idea in general. Both here and for
> functional indexes (although I suppose sure is a technical reason why it
> wasn't implemented right away for them).

We don't track function dependencies in general and I could certainly
see cases where you really wouldn't want to do so, at least not in the
same way that we track FKs or similar.  I do wonder if maybe we didn't
track function dependencies because we didn't (yet) have create or
replace function and that now we should.  We don't track dependencies
inside a function either though.

> > In short, I'm not sure that I agree with the idea that we shouldn't
> > allow this and instead I'd rather we realize it and put the logical
> > replication into some kind of an error state that requires a resync.
> 
> That would still mean a need to resync the data to recover, so I'm not
> sure it's really an improvement. And I suppose it'd require tracking the
> dependencies, because how else would you mark the subscription as
> requiring a resync? At which point we could decline the DROP without a
> CASCADE, just like we do elsewhere, no?

I was actually thinking more along the lines of just simply marking the
publication/subscription as being in a 'failed' state when a failure
actually happens, and maybe even at that point basically throwing away
everything except the shell of the publication/subscription (so the user
can see that it failed and come in and properly drop it); I'm thinking
about this as perhaps similar to a transaction being aborted.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: alternative to PG_CATCH

Peter Eisentraut  writes:
> The fundamental problem, as I see it, is that the macro expansion needs
> to produce the "finally" code twice: Once in the else (error) branch of
> the setjmp, and once in the non-error code path, either after the
> if-setjmp, or in the try block.  But to be able to do that, we need to
> capture the block as a macro argument.

I don't especially like the MACRO({...}) proposal, because it looks way
too much like gcc's special syntax for "statement expressions".  If we
have to go this way, I'd rather see if MACRO((...)) can be made to work.
But I question your assumption that we have to have two physical copies
of the "finally" code.  There's nothing wrong with implementing this
sort of infrastructure with some goto's, or whatever else we have to
have to make it work.  Also, it'd look more natural as an extension
to the existing PG_TRY infrastructure if the source code were like

PG_TRY();
{
...
}
PG_FINALLY();
{
...
}
PG_END_TRY();

So even if Kyotaro-san's initial sketch isn't right in detail,
I think he has the right idea about where we want to go.

regards, tom lane

Re: row filtering for logical replication

Greetings,

* Petr Jelinek (petr.jeli...@2ndquadrant.com) wrote:
> On 23/11/2018 03:02, Stephen Frost wrote:
> > * Euler Taveira (eu...@timbira.com.br) wrote:
> >> 2018-02-28 21:54 GMT-03:00 Craig Ringer :
> >>> Good idea. I haven't read this yet, but one thing to make sure you've
> >>> handled is limiting the clause to referencing only the current tuple and 
> >>> the
> >>> catalogs. user-catalog tables are OK, too, anything that is
> >>> RelationIsAccessibleInLogicalDecoding().
> >>>
> >>> This means only immutable functions may be invoked, since a stable or
> >>> volatile function might attempt to access a table. And views must be
> >>> prohibited or recursively checked. (We have tree walkers that would help
> >>> with this).
> >>>
> >>> It might be worth looking at the current logic for CHECK expressions, 
> >>> since
> >>> the requirements are similar. In my opinion you could safely not bother 
> >>> with
> >>> allowing access to user catalog tables in the filter expressions and limit
> >>> them strictly to immutable functions and the tuple its self.
> >>
> >> IIRC implementation is similar to RLS expressions. I'll check all of
> >> these rules.
> > 
> > Given the similarity to RLS and the nearby discussion about allowing
> > non-superusers to create subscriptions, and probably publications later,
> > I wonder if we shouldn't be somehow associating this with RLS policies
> > instead of having the publication filtering be entirely independent..
>
> I do see the appeal here, if you consider logical replication to be a
> streaming select it probably applies well.
> 
> But given that this is happening inside output plugin which does not
> have full executor setup and has catalog-only snapshot I am not sure how
> feasible it is to try to merge these two things. As per my previous
> email it's possible that we'll have to be stricter about what we allow
> in expressions here.

I can certainly understand the concern about trying to combine the
implementation of this with that of RLS; perhaps that isn't a good fit
due to the additional constraints put on logical decoding.

That said, I still think it might make sense to consider these filters
for logical decoding to be policies and, ideally, to allow users to use
the same policy for both.

In the end, the idea of having to build a single large and complex
'create publication' command which has a bunch of tables, each with
their own filter clauses, just strikes me as pretty painful.

> The other issue with merging this is that the use-case for filtering out
> the data in logical replication is not necessarily about security, but
> often about sending only relevant data. So it makes sense to have filter
> on publication without RLS enabled on table and if we'd force that, we'd
> limit usefulness of this feature.

I definitely have a serious problem if we are going to say that you
can't use this filtering for security-sensitive cases.

> We definitely want to eventually create subscriptions as non-superuser
> but that has zero effect on this as everything here is happening on
> different server than where subscription lives (we already allow
> creation of publications with just CREATE privilege on database and
> ownership of the table).

What I wasn't clear about above was the idea that we might allow a user
other than the table owner to publish a given table, but that such a
publication should certanily only be allowed to include the rows which
that user has access to- as regulated by RLS.  If the RLS policy is too
complex to allow that then I would think we'd simply throw an error at
the create publication time and the would-be publisher would need to
figure that out with the table owner.

I'll admit that this might seem like a stretch, but what happens today?
Today, people write cronjobs to try to sync between tables with FDWs and
you don't need to own a table to use it as the target of a foreign
table.

I do think that we'll need to have some additional privileges around who
is allowed to create publications, I'm not entirely thrilled with that
being combined with the ability to create schemas; the two seem quite
different to me.

* Euler Taveira (eu...@timbira.com.br) wrote:
> Em sex, 23 de nov de 2018 às 11:40, Petr Jelinek
>  escreveu:
> > But given that this is happening inside output plugin which does not
> > have full executor setup and has catalog-only snapshot I am not sure how
> > feasible it is to try to merge these two things. As per my previous
> > email it's possible that we'll have to be stricter about what we allow
> > in expressions here.
>
> This feature should be as simple as possible. I don't want to
> introduce a huge overhead just for filtering some data. Data sharding
> generally uses simple expressions.

RLS often uses simple filters too.

> > The other issue with merging this is that the use-case for filtering out
> > the data in logical replication is not necessarily about security, but
> > often about sending only relevant data. So i

Re: explain plans with information about (modified) gucs

On 12/14/18 4:21 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> ... I propose to extend EXPLAIN output with an additional option, which
>> would include information about modified GUCs in the execution plan
>> (disabled by default, of course):
> 
> I'm a bit suspicious about whether this'll have any actual value,
> if it's disabled by default (which I agree it needs to be, if only for
> compatibility reasons).  The problem you're trying to solve is basically
> "I forgot that this might have an effect", but stuff that isn't shown
> by default will not help you un-forget.  It certainly won't fix the
> form of the problem that I run into, which is people sending in EXPLAIN
> plans and not mentioning their weird local settings.
> 

Not quite.

I agree we'll still have to deal with plans from users without this
info, but it's easier to ask for explain with this extra option (just
like we regularly ask for explain analyze instead of just plain
explain). I'd expect the output to be more complete than trying to
figure out which of the GUCs might have effect / been modified here.

But more importantly - my personal primary use case here is explains
from application connections generated using auto_explain, with some
application-level GUC magic. And there I can easily tweak auto_explain
config to do (auto_explain.log_gucs = true) of course.

>> We certainly don't want to include all GUCs, so the question is how to
>> decide which GUCs are interesting. The simplest approach would be to
>> look for GUCs that changed in the session (source == PGC_S_SESSION), but
>> that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we
>> probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE
>> because that includes irrelevant options like wal_buffers etc.
> 
> Don't you want to show anything that's not the built-in default?
> (I agree OVERRIDE could be excluded, but that's irrelevant for query
> tuning parameters.)  Just because somebody injected a damfool setting
> of, say, random_page_cost via the postmaster command line or
> environment settings doesn't make it not damfool :-(
> 

Probably. My assumption here was that I can do

   select * from pg_settings

and then combine it with whatever is included in the plan. But you're
right comparing it with the built-in default may be a better option.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: explain plans with information about (modified) gucs

Tomas Vondra  writes:
> ... I propose to extend EXPLAIN output with an additional option, which
> would include information about modified GUCs in the execution plan
> (disabled by default, of course):

I'm a bit suspicious about whether this'll have any actual value,
if it's disabled by default (which I agree it needs to be, if only for
compatibility reasons).  The problem you're trying to solve is basically
"I forgot that this might have an effect", but stuff that isn't shown
by default will not help you un-forget.  It certainly won't fix the
form of the problem that I run into, which is people sending in EXPLAIN
plans and not mentioning their weird local settings.

> We certainly don't want to include all GUCs, so the question is how to
> decide which GUCs are interesting. The simplest approach would be to
> look for GUCs that changed in the session (source == PGC_S_SESSION), but
> that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we
> probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE
> because that includes irrelevant options like wal_buffers etc.

Don't you want to show anything that's not the built-in default?
(I agree OVERRIDE could be excluded, but that's irrelevant for query
tuning parameters.)  Just because somebody injected a damfool setting
of, say, random_page_cost via the postmaster command line or
environment settings doesn't make it not damfool :-(

regards, tom lane

Re: explain plans with information about (modified) gucs



On 12/14/18 3:01 PM, Tomas Vondra wrote:
> On 12/14/18 2:05 PM, Jim Finnerty wrote:
>> You might want to only include the GUCs that are query tuning parameters,
>> i.e., those returned by:
>>
>> SELECT name, setting, category
>> FROM pg_settings
>> WHERE category LIKE 'Query Tuning%'
>> ORDER BY category, name;
>>
> 
> Good idea! Thanks.

V2 filtering the options to QUERY_TUNING group (and subgroups).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 646cd0d42c..ec44c59b7f 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -28,6 +28,7 @@ static bool auto_explain_log_verbose = false;
 static bool auto_explain_log_buffers = false;
 static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
+static bool auto_explain_log_gucs = false;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static int	auto_explain_log_level = LOG;
 static bool auto_explain_log_nested_statements = false;
@@ -112,6 +113,17 @@ _PG_init(void)
 			 NULL,
 			 NULL);
 
+	DefineCustomBoolVariable("auto_explain.log_gucs",
+			 "Print modified GUC values.",
+			 NULL,
+			 &auto_explain_log_gucs,
+			 false,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomBoolVariable("auto_explain.log_verbose",
 			 "Use EXPLAIN VERBOSE for plan logging.",
 			 NULL,
@@ -356,6 +368,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 			es->timing = (es->analyze && auto_explain_log_timing);
 			es->summary = es->analyze;
 			es->format = auto_explain_log_format;
+			es->gucs = auto_explain_log_gucs;
 
 			ExplainBeginOutput(es);
 			ExplainQueryText(es, queryDesc);
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 120b168d45..852c69b7bb 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -169,6 +169,23 @@ LOAD 'auto_explain';
 

 
+   
+
+ auto_explain.log_gucs (boolean)
+ 
+  auto_explain.log_gucs configuration parameter
+ 
+
+
+ 
+  auto_explain.log_gucs controls whether information
+  about modified configuration options are logged with the execution
+  plan.  Only options modified at the database, user, client or session
+  level are considered modified.  This parameter is off by default.
+ 
+
+   
+

 
  auto_explain.log_format (enum)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index de09ded65b..b8cab69f71 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -31,6 +31,7 @@
 #include "storage/bufmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_tables.h"
 #include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -164,6 +165,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
 			es->costs = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "buffers") == 0)
 			es->buffers = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "gucs") == 0)
+			es->gucs = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "timing") == 0)
 		{
 			timing_set = true;
@@ -547,6 +550,37 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->gucs)
+	{
+		int		i;
+		int		num;
+		StringInfoData	str;
+		struct config_generic **gucs;
+
+		gucs = get_modified_guc_options(&num);
+
+		for (i = 0; i < num; i++)
+		{
+			char *setting;
+			struct config_generic *conf = gucs[i];
+
+			if (i == 0)
+initStringInfo(&str);
+			else
+appendStringInfoString(&str, ", ");
+
+			setting = GetConfigOptionByName(conf->name, NULL, true);
+
+			if (setting)
+appendStringInfo(&str, "%s = '%s'", conf->name, setting);
+			else
+appendStringInfo(&str, "%s = NULL", conf->name);
+		}
+
+		if (num > 0)
+			ExplainPropertyText("GUCs", str.data, es);
+	}
+
 	if (es->summary && planduration)
 	{
 		double		plantime = INSTR_TIME_GET_DOUBLE(*planduration);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fe1939881..2d37760c7a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8556,6 +8556,45 @@ ShowAllGUCConfig(DestReceiver *dest)
 	end_tup_output(tstate);
 }
 
+struct config_generic **
+get_modified_guc_options(int *num)
+{
+	int		i;
+	struct config_generic **result;
+
+	*num = 0;
+	result = palloc(sizeof(struct config_generic *) * num_guc_variables);
+
+	for (i = 0; i < num_guc_variables; i++)
+	{
+		struct config_generic *conf = guc_variables[i];
+
+		/* return only options visible to the user */
+		if ((conf->flags & GUC_NO_SHOW_ALL) ||
+			((conf->flags & GUC_SUPERUSER_ONLY) &&
+			 !is_me

Re: Upgrading pg_statistic to handle collation honestly

Peter Eisentraut  writes:
> On 12/12/2018 16:57, Tom Lane wrote:
>> +stats->attrcollid = exprCollation(index_expr);
>> +/* XXX should we consult indcollation instead? */

> After looking through this again, I think the answer here is "yes".

Yeah, I was leaning towards that as well, but hadn't written the extra
code needed to make it so.  Will fix.

regards, tom lane

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

Greetings,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> On 2018-Dec-14, Stephen Frost wrote:
> 
> > That wasn't what was asked and I don't think I see a problem with having
> > concurrently be allowed in the parentheses.  For comparison, it's not
> > like "explain analyze select ..." or "explain buffers select" is
> > terribly good grammatical form.
> 
> ... and we don't allow EXPLAIN BUFFERS at all, and if we had had a
> parenthesized option list in EXPLAIN when we invented EXPLAIN ANALYZE, I
> bet we would have *not* made the ANALYZE keyword appear unadorned in
> that command.

I'm not convinced of that- there is value in being able to write full
and useful commands without having to always use parentheses.

> > If you wanted to try to get to a better form for the spelled out
> > sentence, I would think:
> > 
> > concurrently reindex table test
> > 
> > would probably be the approach to use,
> 
> I think this is terrible from a command-completion perspective, and from
> a documentation perspective (Certainly we wouldn't have a manpage about
> the "concurrently" command, for starters).

Right, I agreed that this had other downsides in the email you're
replying to here.  Glad we agree that it's not a good option.

> My vote goes to put the keyword inside of and exclusively in the
> parenthesized option list.

I disagree with the idea of exclusively having concurrently be in the
parentheses.  'explain buffers' is a much less frequently used option
(though that might, in part, be because it's a bit annoying to write out
explain (analyze, buffers) select...; I wonder if we could have a way to
say "if I'm running analyze, I always want buffers"...), but
concurrently reindexing a table (or index..) is going to almost
certainly be extremely common, perhaps even more common than *not*
reindexing concurrently.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

On 2018-Dec-14, Stephen Frost wrote:

> That wasn't what was asked and I don't think I see a problem with having
> concurrently be allowed in the parentheses.  For comparison, it's not
> like "explain analyze select ..." or "explain buffers select" is
> terribly good grammatical form.

... and we don't allow EXPLAIN BUFFERS at all, and if we had had a
parenthesized option list in EXPLAIN when we invented EXPLAIN ANALYZE, I
bet we would have *not* made the ANALYZE keyword appear unadorned in
that command.

> If you wanted to try to get to a better form for the spelled out
> sentence, I would think:
> 
> concurrently reindex table test
> 
> would probably be the approach to use,

I think this is terrible from a command-completion perspective, and from
a documentation perspective (Certainly we wouldn't have a manpage about
the "concurrently" command, for starters).

My vote goes to put the keyword inside of and exclusively in the
parenthesized option list.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Row Visibility and Table Access Methods

2018-12-14 Thread Jonah H. Harris

On Fri, Dec 14, 2018 at 12:28 AM Pavel Stehule 
wrote:

>
>
> čt 13. 12. 2018 v 10:23 odesílatel Simon Riggs 
> napsal:
>


>> Thoughts?
>>
>
> looks great
>

Agreed. This sounds well-thought-out and rather simple to implement.


-- 
Jonah H. Harris

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 14/12/2018 01:14, Stephen Frost wrote:
>  reindex table CONCURRENTLY test;
> >>
> >> By the way, does this syntax make sense?  I haven't seen a discussion on
> >> this anywhere in the various threads.  I keep thinking that
> >>
> >> reindex concurrently table test;
> >>
> >> would make more sense.  How about in combination with (verbose)?
> > 
> > I don't think it's a mistake that we have 'create index concurrently'
> > and it certainly would seem odd to me for 'create index' and 'reindex
> > table' to be different.
> > 
> > Certainly, from my recollection of english, you'd say "I am going to
> > reindex the table concurrently", you wouldn't say "I am going to
> > reindex concurrently the table."
> > 
> > Based on at least a quick looking around, the actual grammar rule seems
> > to match my recollection[1], adverbs should typically go AFTER the
> > verb + object, and the adverb shouldn't ever be placed between the verb
> > and the object.
> 
> So it would be grammatical to say
> 
> reindex table test concurrently

Yes, though I'm not really a fan of it.

> or in a pinch
> 
> reindex concurrently table test

No, you can't put concurrently between reindex and table.

> but I don't see anything grammatical about
> 
> reindex table concurrently test

I disagree, this does look reasonable to me and it's certainly much
better than 'reindex concurrently table' which looks clearly incorrect.

> Where this gets really messy is stuff like this:
> 
> reindex (verbose) database concurrently postgres
> 
> Why would "concurrently" not be part of the options next to "verbose"?

That wasn't what was asked and I don't think I see a problem with having
concurrently be allowed in the parentheses.  For comparison, it's not
like "explain analyze select ..." or "explain buffers select" is
terribly good grammatical form.

If you wanted to try to get to a better form for the spelled out
sentence, I would think:

concurrently reindex table test

would probably be the approach to use, though that's not what we use for
'create index' and it'd be rather out of character for us to start a
command with an adverb, making it ultimately a poor choice overall.

Going back to what we already have done and have in released versions,
we have 'create unique index concurrently test ...' and that's at least
reasonable (the adverb isn't showing up between the verb and the object,
and the adjective is between the verb and the object) and is what I
vote to go with, with the caveat that if we want to also allow it inside
the parentheses, I'm fine with that.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: explain plans with information about (modified) gucs

On 12/14/18 2:05 PM, Jim Finnerty wrote:
> You might want to only include the GUCs that are query tuning parameters,
> i.e., those returned by:
> 
> SELECT name, setting, category
> FROM pg_settings
> WHERE category LIKE 'Query Tuning%'
> ORDER BY category, name;
> 

Good idea! Thanks.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Change pgarch_readyXlog() to return .history files first

2018-12-14 Thread David Steele

On 12/13/18 7:15 PM, Michael Paquier wrote:
> On Thu, Dec 13, 2018 at 11:53:53AM -0500, David Steele wrote:
>> I also think we should consider back-patching this change.  It's hard to
>> imagine that archive commands would have trouble with this reordering
>> and the current ordering causes real pain in HA clusters.
> 
> I would like to hear opinion from other though if we should consider
> that as an improvement or an actual bug fix.  Changing the order of the
> files to map with what the startup process does when promoting does not
> sound like a bug fix to me, still this is not really invasive, so we
> could really consider it worth back-patching to reduce common pain from
> users when it comes to timeline handling.

I think an argument can be made that it is a bug (ish).  Postgres
generates the files in one order, and they get archived in a different
order.

>> -if (!found)
>> +/* Is this a history file? */
>> +bool history = basenamelen >= sizeof(".history") &&
>> +strcmp(rlde->d_name + (basenamelen - sizeof(".history") + 1),
>> +   ".history.ready") == 0;
> 
> Or you could just use IsTLHistoryFileName here?

We'd have to truncate .ready off the string to make that work, which
seems easy enough.  Is that what you were thinking?

One thing to consider is the check above is more efficient than
IsTLHistoryFileName() and it potentially gets run a lot.

> If one wants to simply check this code, you can just create dummy orphan
> files in archive_status and see in which order they get removed.

Seems awfully racy.  Are there currently any tests like this for the
archiver that I can look at extending?

Regards,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: automatically assigning catalog toast oids




> On 12/13/18, Tom Lane  wrote:
> Looking at the existing entries, it seems like we'd have to have
> one special case: schema public has OID 2200 but is intentionally
> not pinned.

setup_depend() mentions other exceptions, but only this one currently has any 
effect as far as I can see:

"pg_database: it's a feature, not a bug, that template1 is not pinned."

-John Naylor

Re: Remove Deprecated Exclusive Backup Mode

2018-12-14 Thread David Steele

On 12/14/18 3:01 AM, Peter Eisentraut wrote:
> On 13/12/2018 20:17, David Steele wrote:
>> On 12/13/18 2:02 PM, Peter Eisentraut wrote:
>>> On 12/12/2018 05:09, David Steele wrote:
 I think the patch stands on its own.  Exclusive backup mode is broken
 and is know to cause problems in the field.
>>>
>>> Is this breakage documented anywhere for users?
>>
>> Yes:
>>
>> https://www.postgresql.org/docs/11/continuous-archiving.html
>>
>> "Note that if the server crashes during the backup it may not be
>> possible to restart until the backup_label file has been manually
>> deleted from the PGDATA directory."
>>
>> Seems like the wording could be stronger.
> 
> I think that is a pretty liberal interpretation of the word "broken".
> All this says is "if you do A, and B happens, then you need to do C".
> 
> Clearly, not having to do that at all is better, but if this is all
> there is to it, then I'm confused by the characterizations of how awful
> and terrible this feature is and how we must rush to remove it.

If the server crashes during a backup, the server probably won't
restart.  We say "may" here but if your backup runs more than a few
checkpoints it's more like won't.  The remedy is to go manually delete a
file the user may have never heard of.  They are often hesitant to do so
-- for good reason.

The downtime stretches as they call their support company or consult
Google to determine if they should really do this.

The main thing is that *manual* intervention is required to get the
cluster running again.  Sure, you could automate it, but now we have
users writing scripts to delete files out of PGDATA (may as well get
tablespace_map as well) -- but hopefully not in the case of a restore
where you actually need those files.  And hopefully not anything else
important.

That seems pretty broken to me.

-- 
-David
da...@pgmasters.net

Re: valgrind issues on Fedora 28

On 12/14/18 5:10 AM, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 11/8/18 1:07 PM, Tomas Vondra wrote:
>>> Not sure what to do about this - maybe we should wildcard this first
>>> frame, to accept all wcsnlen variants. Opinions?
> 
>> I've committed this with the first frame wirldcarded, and backpatched it
>> all the way back to 9.4.
> 
> So I just got around to trying to do some valgrind testing for the first
> time in a couple of months, and what I find is that this patch completely
> breaks valgrind for me: it exits immediately with complaints like
> 
> ==00:00:00:00.222 14821== FATAL: in suppressions file 
> "/home/postgres/pgsql/src/tools/valgrind.supp" near line 227:
> ==00:00:00:00.222 14821==unknown tool suppression type
> ==00:00:00:00.222 14821== exiting now.
> 
> After a bit of hair-pulling I found that the line number report is
> a lie, and what it's really unhappy about is the "Memcheck:Addr32"
> specification on line 236.  This is not terribly surprising perhaps
> considering that "Addr32" isn't even documented on what should be
> the authoritative page:
> http://valgrind.org/docs/manual/mc-manual.html#mc-manual.suppfiles
> 

Hmm :-( FWIW the Addr32 comes directly from valgrind itself, which
explicitly suggested to add suppression like this:

{

   Memcheck:Addr32
   fun:__wcsnlen_avx2
   fun:wcsrtombs
   fun:wcstombs
   fun:wchar2char
   fun:str_tolower
   fun:lower
   fun:DirectFunctionCall1Coll
   fun:Generic_Text_IC_like
   ...
}

My guess is that documentation is either somewhat obsolete, or is meant
more like a general description than an exhaustive and authoritative
source of truth regarding suppressions.

> This is valgrind 3.8.1 (RHEL6's version), so not bleeding edge exactly,
> but we have very little evidence about how far back the committed patch
> works.  I'm inclined to think we can't use this solution.
> 

Fedora 28 has 3.14.0, so this seems to be due to some improvements since
3.8.1. I'm not sure how to deal with this - surely we don't want to just
revert the change because that would start generating noise again. For a
moment I thought we might wildcard the error type somehow, so that it
would accept Memcheck:* but AFAICs that is not supported.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: explain plans with information about (modified) gucs

2018-12-14 Thread Pavel Stehule

pá 14. 12. 2018 v 12:41 odesílatel Tomas Vondra <
tomas.von...@2ndquadrant.com> napsal:

> Hi,
>
> every now and then I have to investigate an execution plan that is
> strange in some way and I can't reproduce the same behavior. Usually
> it's simply due to data distribution changing since the problem was
> observed (say, after a nightly batch load/update).
>
> In many cases it however may be due to some local GUC tweaks, usually
> addressing some query specific issues (say, disabling nested loops or
> lowering join_collapse_limit). I've repeatedly ran into cases where the
> GUC was not properly reset to the "regular" value, and it's rather
> difficult to identify this is what's happening. Or cases with different
> per-user settings and connection pooling (SET SESSION AUTHORIZATION /
> ROLE etc.).
>
> So I propose to extend EXPLAIN output with an additional option, which
> would include information about modified GUCs in the execution plan
> (disabled by default, of course):
>
> test=# explain (gucs) select * from t;
>
>  QUERY PLAN
>   
>Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4)
>GUCs: application_name = 'x', client_encoding = 'UTF8',
>  cpu_tuple_cost = '0.01'
>(2 rows)
>
> Of course, this directly applies to auto_explain too, which gets a new
> option log_gucs.
>
> The patch is quite trivial, but there are about three open questions:
>
> 1) names of the options
>
> I'm not particularly happy with calling the option "gucs" - it's an
> acronym and many users have little idea what GUC stands for. So I think
> a better name would be desirable, but I'm not sure what would that be.
> Options? Parameters?
>
> 2) format of output
>
> At this point the names/values are simply formatted into a one-line
> string. That's not particularly readable, and it's not very useful for
> the YAML/JSON formats I guess. So adding each modified GUC as an extra
> text property would be better.
>
> 3) identifying modified (and interesting) GUCs
>
> We certainly don't want to include all GUCs, so the question is how to
> decide which GUCs are interesting. The simplest approach would be to
> look for GUCs that changed in the session (source == PGC_S_SESSION), but
> that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we
> probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE
> because that includes irrelevant options like wal_buffers etc.
>
> For now I've used
>
>   /* return only options that were modified (not as in config file) */
>   if ((conf->source <= PGC_S_ARGV) || (conf->source == PGC_S_OVERRIDE))
> continue;
>
> which generally does the right thing, although it also includes stuff
> like application_name or client_encoding. But perhaps it'd be better to
> whitelist the GUCs in some way, because some of the user-defined GUCs
> may be sensitive and should not be included in plans.
>
> Opinions?
>

has sense

Pavel


>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Fri, Dec 14, 2018 at 12:23:09PM +0300, Sergei Kornilov wrote:
> Not sure i can be reviewer since this was initially my proposal.

No issues from me if you wish to review the code.  In my opinion, it is
not a problem if you are a reviewer as I wrote the code.

> I vote +1 for this patch. Code looks good and i think we have no
> reason to leave RequestXLogStreaming as-is. 

Thanks for the input.  The take on my side is to be able to remove
ready_to_display from WalRcv so let's see what others think.
--
Michael


signature.asc
Description: PGP signature

RE: [PROPOSAL]a new data type 'bytea' for ECPG

2018-12-14 Thread Matsumura, Ryo

Meskes-san

Maybe I understand your comments about compatibility.
I will try to implement for sqlda.

# I am not goot at library version mechanism.
# I will learn about it.

Regards
Ryo Matsumura

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

On 2018-Dec-14, Peter Eisentraut wrote:

> On 14/12/2018 01:23, Michael Paquier wrote:
> > On Thu, Dec 13, 2018 at 07:14:57PM -0500, Stephen Frost wrote:
> >> Based on at least a quick looking around, the actual grammar rule seems
> >> to match my recollection[1], adverbs should typically go AFTER the
> >> verb + object, and the adverb shouldn't ever be placed between the verb
> >> and the object.
> > 
> > This part has been a long debate already in 2012-2013 when I sent the
> > first iterations of the patch, and my memories on the matter are that
> > the grammar you are showing here matches with the past agreement.
> 
> Do you happen to have a link for that?  I didn't find anything.

I think putting the CONCURRENTLY in the parenthesized list of options is
most sensible.

CREATE INDEX didn't have such an option list when we added this feature
there; see 
https://www.postgresql.org/message-id/flat/200608011143.k71Bh9c22067%40momjian.us#029e9a7ee8cb38beee494ef7891bec1d
for some discussion about that grammar.  Our options were not great ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Catalog views failed to show partitioned table information.

2018-12-14 Thread Suraj Kharage

Hi,

There are some catalog views which do not show the partitioned table and
its index entry.
One of them is "pg_indexes" which failed to show the partitioned index.
Attached the patch which fixes the same.

Other views such as pg_stat*,pg_statio_* has the same problem for
partitioned tables and indexes.
Since the partitioned tables and its indexes considered as a dummy, they do
not have any significance in stat tables,
can we still consider adding relkind=p in these pg_stat_* views? Thoughts?

Regards,
Suraj


pg_indexes_fix_for_partition_index.patch
Description: Binary data

explain plans with information about (modified) gucs

Hi,

every now and then I have to investigate an execution plan that is
strange in some way and I can't reproduce the same behavior. Usually
it's simply due to data distribution changing since the problem was
observed (say, after a nightly batch load/update).

In many cases it however may be due to some local GUC tweaks, usually
addressing some query specific issues (say, disabling nested loops or
lowering join_collapse_limit). I've repeatedly ran into cases where the
GUC was not properly reset to the "regular" value, and it's rather
difficult to identify this is what's happening. Or cases with different
per-user settings and connection pooling (SET SESSION AUTHORIZATION /
ROLE etc.).

So I propose to extend EXPLAIN output with an additional option, which
would include information about modified GUCs in the execution plan
(disabled by default, of course):

test=# explain (gucs) select * from t;

 QUERY PLAN
  
   Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4)
   GUCs: application_name = 'x', client_encoding = 'UTF8',
 cpu_tuple_cost = '0.01'
   (2 rows)

Of course, this directly applies to auto_explain too, which gets a new
option log_gucs.

The patch is quite trivial, but there are about three open questions:

1) names of the options

I'm not particularly happy with calling the option "gucs" - it's an
acronym and many users have little idea what GUC stands for. So I think
a better name would be desirable, but I'm not sure what would that be.
Options? Parameters?

2) format of output

At this point the names/values are simply formatted into a one-line
string. That's not particularly readable, and it's not very useful for
the YAML/JSON formats I guess. So adding each modified GUC as an extra
text property would be better.

3) identifying modified (and interesting) GUCs

We certainly don't want to include all GUCs, so the question is how to
decide which GUCs are interesting. The simplest approach would be to
look for GUCs that changed in the session (source == PGC_S_SESSION), but
that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we
probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE
because that includes irrelevant options like wal_buffers etc.

For now I've used

  /* return only options that were modified (not as in config file) */
  if ((conf->source <= PGC_S_ARGV) || (conf->source == PGC_S_OVERRIDE))
continue;

which generally does the right thing, although it also includes stuff
like application_name or client_encoding. But perhaps it'd be better to
whitelist the GUCs in some way, because some of the user-defined GUCs
may be sensitive and should not be included in plans.

Opinions?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 646cd0d42c..ec44c59b7f 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -28,6 +28,7 @@ static bool auto_explain_log_verbose = false;
 static bool auto_explain_log_buffers = false;
 static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
+static bool auto_explain_log_gucs = false;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static int	auto_explain_log_level = LOG;
 static bool auto_explain_log_nested_statements = false;
@@ -112,6 +113,17 @@ _PG_init(void)
 			 NULL,
 			 NULL);
 
+	DefineCustomBoolVariable("auto_explain.log_gucs",
+			 "Print modified GUC values.",
+			 NULL,
+			 &auto_explain_log_gucs,
+			 false,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomBoolVariable("auto_explain.log_verbose",
 			 "Use EXPLAIN VERBOSE for plan logging.",
 			 NULL,
@@ -356,6 +368,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 			es->timing = (es->analyze && auto_explain_log_timing);
 			es->summary = es->analyze;
 			es->format = auto_explain_log_format;
+			es->gucs = auto_explain_log_gucs;
 
 			ExplainBeginOutput(es);
 			ExplainQueryText(es, queryDesc);
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 120b168d45..852c69b7bb 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -169,6 +169,23 @@ LOAD 'auto_explain';
 

 
+   
+
+ auto_explain.log_gucs (boolean)
+ 
+  auto_explain.log_gucs configuration parameter
+ 
+
+
+ 
+  auto_explain.log_gucs controls whether information
+  about modified configuration options are logged with the execution
+  plan.  Only options modified at the database, user, client or session
+  level are considered modified.  This parameter is off by default.
+ 
+
+   
+

 
  auto_explain.log_format (enum)
diff --git

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan