Re: Commit/abort WAL records with dropped rels missing XLR_SPECIAL_REL_UPDATE

2020-08-14 Thread Amit Kapila
On Fri, Aug 14, 2020 at 2:17 PM Heikki Linnakangas  wrote:
>
> While hacking on pg_rewind, I noticed that commit and abort WAL records
> are never marked with the XLR_SPECIAL_REL_UPDATE flag. But if the record
> contains "dropped relfilenodes", surely it should be?
>

Right.

> It's harmless as far as the backend and all the programs in PostgreSQL
> repository are concerned, but the point of XLR_SPECIAL_REL_UPDATE is to
> aid external tools that try to track which files are modified. Attached
> is a patch to fix it.
>
> It's always been like that, but I am not going backport, for fear of
> breaking existing applications. If a program reads the WAL, and would
> actually need to do something with commit records dropping relations,
> that seems like such a common scenario that the author should've thought
> about it and handled it even without the flag reminding about it. Fixing
> it in master ought to be enough.
>

+1 for doing it in master only. Even if someone comes up with such a
scenario for back-branches, we can revisit our decision to backport
this but like you, I also don't see any pressing need to do it now.


-- 
With Regards,
Amit Kapila.




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-14 Thread Amit Kapila
On Sat, Aug 15, 2020 at 4:14 AM Thomas Munro  wrote:
>
> On Fri, Aug 14, 2020 at 6:14 PM Amit Kapila  wrote:
> > Yeah, that makes sense. I will take care of that later today or
> > tomorrow. We have not noticed that because currently none of the
> > extensions is using those functions. BTW, I noticed that after
> > failure, the next run is green, why so? Is the next run not on
> > windows?
>
> The three cfbot results are for applying the patch, testing on Windows
> and testing on Ubuntu in that order.  It's not at all clear and I'll
> probably find a better way to display it when I get around to adding
> some more operating systems, maybe with some OS icons or something
> like that...
>

Good to know, anyway, I have pushed a patch to mark those variables
with PGDLLIMPORT.

-- 
With Regards,
Amit Kapila.




Re: Asynchronous Append on postgres_fdw nodes.

2020-08-14 Thread Etsuro Fujita
On Fri, Aug 14, 2020 at 10:29 AM Thomas Munro  wrote:
> On Thu, Jul 2, 2020 at 3:20 PM Etsuro Fujita  wrote:
> > I'd like to join the party, but IIUC, we don't yet reach a consensus
> > on which one is the right way to go.  So I think we need to discuss
> > that first.
>
> Either way, we definitely need patch 0001.  One comment:
>
> -CreateWaitEventSet(MemoryContext context, int nevents)
> +CreateWaitEventSet(MemoryContext context, ResourceOwner res, int nevents)
>
> I wonder if it's better to have it receive ResourceOwner like that, or
> to have it capture CurrentResourceOwner.  I think the latter is more
> common in existing code.

Sorry for not having discussed anything, but actually, I’ve started
reviewing your patch first.  I’ll return to this after reviewing it
some more.

Thanks!

Best regards,
Etsuro Fujita




Re: run pgindent on a regular basis / scripted manner

2020-08-14 Thread Amit Kapila
On Sat, Aug 15, 2020 at 1:57 AM Alvaro Herrera  wrote:
>
> On 2020-Aug-13, Magnus Hagander wrote:
>
> > That is:
> > 1. Whenever a patch is pushed on master on the main repo a process kicked
> > off (or maybe wait 5 minutes to coalesce multiple patches if there are)
> > 2. This process checks out master, and runs pgindent on it
> > 3. When done, this gets committed to a new branch (or just overwrites an
> > existing branch of course, we don't need to maintain history here) like
> > "master-indented". This branch can be in a different repo, but one that
> > starts out as a clone of the main one
> > 4. A committer (any committer) can then on regular basis examine the
> > differences by fetch + diff. If they're happy with it, cherry pick it in.
> > If not, figure out what needs to be done to adjust it.
>
> Sounds good -- for branch master.
>
> Yesterday I tried to indent some patch across all branches, only to
> discover that I'm lacking the pg_bsd_indent necessary for the older
> ones.  I already have two, but apparently I'd need *four* different
> versions with current branches (1.3, 2.0, 2.1, 2.1.1)
>

FWIW, for back-branches, I just do similar to what Tom said above  [1]
("My own habit when back-patching has been to indent the HEAD patch
per-current-rules and then preserve that layout as much as possible in
the back branches"). If we want we can maintain all the required
versions of pg_bsd_indent but as of now, I am not doing so and thought
that following some approximation rule (do it for HEAD and try my best
to maintain the layout for back-branches) is good enough.

[1] - https://www.postgresql.org/message-id/397020.1597291716%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.




Re: display offset along with block number in vacuum errors

2020-08-14 Thread Amit Kapila
On Fri, Aug 14, 2020 at 4:06 PM Amit Kapila  wrote:
>
> On Mon, Aug 10, 2020 at 10:24 AM Masahiko Sawada
>  wrote:
> >
> > It's true that heap_page_is_all_visible() is called from only
> > lazy_vacuum_page() but I'm concerned it would lead misleading since
> > it's not actually removing tuples but just checking after vacuum. I
> > guess that the errcontext should show what the process is actually
> > doing and therefore help the investigation, so I thought VACUUM_HEAP
> > might not be appropriate for this case. But I see also your point.
> > Other vacuum error context phases match with vacuum progress
> > information phrases. So in heap_page_is_all_visible (), I agree it's
> > better to update the offset number and keep the phase VACUUM_HEAP
> > rather than do nothing.
> >
>
> Okay, I have changed accordingly and this means that the offset will
> be displayed for the vacuum phase as well. Apart from this, I have
> fixed all the comments raised by me in the attached patch. One thing
> we need to think is do we want to set offset during heap_page_prune
> when called from lazy_scan_heap? I think the code path for
> heap_prune_chain is quite deep, so an error can occur in that path.
> What do you think?
>

The reason why I have not included heap_page_prune related change in
the patch is that I don't want to sprinkle this in every possible
function (code path) called via vacuum especially if the probability
of an error in that code path is low. But, I am fine if you and or
others think that it is a good idea to update offset in
heap_page_prune as well.

-- 
With Regards,
Amit Kapila.




Re: Switch to multi-inserts for pg_depend

2020-08-14 Thread Michael Paquier
On Fri, Aug 14, 2020 at 02:23:16PM -0400, Alvaro Herrera wrote:
> It seems a bit silly to worry about allocating just the exact amount
> needed; the current approach looked fine to me.

Okay, thanks.

> The logic to keep track
> number of used slots used is baroque, though -- that could use a lot of
> simplification.

What are you suggesting here?  A new API layer to manage a set of
slots?
--
Michael


signature.asc
Description: PGP signature


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-14 Thread Thomas Munro
On Fri, Aug 14, 2020 at 6:14 PM Amit Kapila  wrote:
> Yeah, that makes sense. I will take care of that later today or
> tomorrow. We have not noticed that because currently none of the
> extensions is using those functions. BTW, I noticed that after
> failure, the next run is green, why so? Is the next run not on
> windows?

The three cfbot results are for applying the patch, testing on Windows
and testing on Ubuntu in that order.  It's not at all clear and I'll
probably find a better way to display it when I get around to adding
some more operating systems, maybe with some OS icons or something
like that...




Re: fixing old_snapshot_threshold's time->xid mapping

2020-08-14 Thread Thomas Munro
On Fri, Aug 14, 2020 at 1:04 PM Thomas Munro  wrote:
> On Fri, Aug 14, 2020 at 12:52 PM Thomas Munro  wrote:
> > Here's a rebase.
>
> And another, since I was too slow and v6 is already in conflict...
> sorry for the high frequency patches.

And ... now that this has a commitfest entry, cfbot told me about a
small problem in a makefile.  Third time lucky?
From c28576db2eaf7d9c68d0bb3566599638dd79deb1 Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Thu, 16 Apr 2020 09:37:31 -0400
Subject: [PATCH v8 1/6] Expose oldSnapshotControl.

---
 src/backend/utils/time/snapmgr.c | 55 +--
 src/include/utils/old_snapshot.h | 75 
 2 files changed, 77 insertions(+), 53 deletions(-)
 create mode 100644 src/include/utils/old_snapshot.h

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 752af0c10d..6cfb07e82b 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -63,6 +63,7 @@
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/old_snapshot.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
@@ -74,59 +75,7 @@
  */
 int			old_snapshot_threshold; /* number of minutes, -1 disables */
 
-/*
- * Structure for dealing with old_snapshot_threshold implementation.
- */
-typedef struct OldSnapshotControlData
-{
-	/*
-	 * Variables for old snapshot handling are shared among processes and are
-	 * only allowed to move forward.
-	 */
-	slock_t		mutex_current;	/* protect current_timestamp */
-	TimestampTz current_timestamp;	/* latest snapshot timestamp */
-	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
-	TransactionId latest_xmin;	/* latest snapshot xmin */
-	TimestampTz next_map_update;	/* latest snapshot valid up to */
-	slock_t		mutex_threshold;	/* protect threshold fields */
-	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
-	TransactionId threshold_xid;	/* earlier xid may be gone */
-
-	/*
-	 * Keep one xid per minute for old snapshot error handling.
-	 *
-	 * Use a circular buffer with a head offset, a count of entries currently
-	 * used, and a timestamp corresponding to the xid at the head offset.  A
-	 * count_used value of zero means that there are no times stored; a
-	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
-	 * is full and the head must be advanced to add new entries.  Use
-	 * timestamps aligned to minute boundaries, since that seems less
-	 * surprising than aligning based on the first usage timestamp.  The
-	 * latest bucket is effectively stored within latest_xmin.  The circular
-	 * buffer is updated when we get a new xmin value that doesn't fall into
-	 * the same interval.
-	 *
-	 * It is OK if the xid for a given time slot is from earlier than
-	 * calculated by adding the number of minutes corresponding to the
-	 * (possibly wrapped) distance from the head offset to the time of the
-	 * head entry, since that just results in the vacuuming of old tuples
-	 * being slightly less aggressive.  It would not be OK for it to be off in
-	 * the other direction, since it might result in vacuuming tuples that are
-	 * still expected to be there.
-	 *
-	 * Use of an SLRU was considered but not chosen because it is more
-	 * heavyweight than is needed for this, and would probably not be any less
-	 * code to implement.
-	 *
-	 * Persistence is not needed.
-	 */
-	int			head_offset;	/* subscript of oldest tracked time */
-	TimestampTz head_timestamp; /* time corresponding to head xid */
-	int			count_used;		/* how many slots are in use */
-	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
-} OldSnapshotControlData;
-
-static volatile OldSnapshotControlData *oldSnapshotControl;
+volatile OldSnapshotControlData *oldSnapshotControl;
 
 
 /*
diff --git a/src/include/utils/old_snapshot.h b/src/include/utils/old_snapshot.h
new file mode 100644
index 00..284af7d508
--- /dev/null
+++ b/src/include/utils/old_snapshot.h
@@ -0,0 +1,75 @@
+/*-
+ *
+ * old_snapshot.h
+ *		Data structures for 'snapshot too old'
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/old_snapshot.h
+ *
+ *-
+ */
+
+#ifndef OLD_SNAPSHOT_H
+#define OLD_SNAPSHOT_H
+
+#include "datatype/timestamp.h"
+#include "storage/s_lock.h"
+
+/*
+ * Structure for dealing with old_snapshot_threshold implementation.
+ */
+typedef struct OldSnapshotControlData
+{
+	/*
+	 * Variables for old snapshot handling are shared among processes and are
+	 * only allowed to move forward.
+	 */
+	slock_t		mutex_current;	/* protect current_timestamp */
+	TimestampTz current_timestamp;	/* latest snapshot timestamp 

Re: Dependencies for partitioned indexes are still a mess

2020-08-14 Thread Alvaro Herrera
On 2020-Aug-14, Alvaro Herrera wrote:

> On 2020-Aug-12, Alvaro Herrera wrote:
> 
> > Hmm, we do make the FK constraint depend on the ATTACH for the direct
> > children; what I think we're lacking is dependencies on descendants
> > twice-removed (?) or higher.  This mock patch seems to fix this problem
> > by adding dependencies recursively on all children of the index; I no
> > longer see this problem with it.
> 
> After going over this some more, this analysis seems correct.  Here's a
> better version of the patch which seems final to me.

Pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: run pgindent on a regular basis / scripted manner

2020-08-14 Thread Alvaro Herrera
On 2020-Aug-13, Magnus Hagander wrote:

> That is:
> 1. Whenever a patch is pushed on master on the main repo a process kicked
> off (or maybe wait 5 minutes to coalesce multiple patches if there are)
> 2. This process checks out master, and runs pgindent on it
> 3. When done, this gets committed to a new branch (or just overwrites an
> existing branch of course, we don't need to maintain history here) like
> "master-indented". This branch can be in a different repo, but one that
> starts out as a clone of the main one
> 4. A committer (any committer) can then on regular basis examine the
> differences by fetch + diff. If they're happy with it, cherry pick it in.
> If not, figure out what needs to be done to adjust it.

Sounds good -- for branch master.

Yesterday I tried to indent some patch across all branches, only to
discover that I'm lacking the pg_bsd_indent necessary for the older
ones.  I already have two, but apparently I'd need *four* different
versions with current branches (1.3, 2.0, 2.1, 2.1.1)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: run pgindent on a regular basis / scripted manner

2020-08-14 Thread Alvaro Herrera
On 2020-Aug-13, Stephen Frost wrote:

> For my 2c, anyway, I like the idea of having folks update the typedefs
> themselves when they've got a patch that needs a new typedef to be
> indented correctly.

Well, let's for starters encourage committers to update typedefs.
Personally I've stayed away from it for each commit just because we've
historically not done it, but I can easily change that.  Plus, it cannot
harm.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Loose ends after CVE-2020-14350 (extension installation hazards)

2020-08-14 Thread Chapman Flack
On 08/14/20 15:38, Tom Lane wrote:

> (3) If the SQL syntax is really just "WITH variable value [, ...]"
> then I'm afraid we're going to have a lot of parse-ambiguity problems
> with wedging full SET syntax into that.  The ability for the righthand

There is precedent in the SET command for having one general syntax
usable for any GUC, and specialized ones for a few 'special' GUCs
(search_path, client_encoding, timezone).

Maybe WITH could be done the same way, inventing some less thorny syntax
for the general case

   WITH (foo = bar, baz), (quux = 42), XMLBINARY BASE64, a AS (SELECT...)

and treating just the few like XMLBINARY that appear in the standard
as having equivalent specialized productions?

The only examples of the syntax in the standard that are coming to mind
right now are those I've seen in the SQL/XML part, but I feel like I have
seen others, as if the committee kind of likes their WITH local-setting-
of-something syntax, and additional examples may continue to appear.

Regards,
-Chap




Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

2020-08-14 Thread Mikhail Titov
Previously submitted patch got somehow trailing spaces mangled on the
way out. This is an attempt to use application/octet-stream MIME instead
of text/x-patch to preserve those for regression tests.

On Thu, Aug 13, 2020 at 12:11 AM, Pavel Stehule  wrote:
> please, assign your patch to commitfest application

Here is the backlink https://commitfest.postgresql.org/29/2681/

--
Mikhail


v2-0001-Test-DEFAULT-in-VALUES-RTE-for-generated-columns.patch
Description: tests


v2-0002-Allow-DEFAULT-in-VALUES-RTE-for-generated-columns.patch
Description: changeset


Re: Loose ends after CVE-2020-14350 (extension installation hazards)

2020-08-14 Thread Tom Lane
Chapman Flack  writes:
> On 08/14/20 14:50, Tom Lane wrote:
>> SAVEPOINT s1;
>> SET LOCAL search_path = pg_catalog, pg_temp;
>> ... protected code here ...
>> RELEASE SAVEPOINT s1;
>> but this does not work because SET LOCAL persists to the end of the
>> outer transaction.  Maybe we could invent a variant that only lasts
>> for the current subtransaction.

> This reminds me of the way the SQL standard overloads WITH to supply
> lexically-scoped settings of things, as well as CTEs, mentioned a while
> back. [1]
> Would this provide additional incentive to implement that syntax,
> generalized to support arbitrary GUCs and not just the handful of
> specific settings the standard uses it for?

Hmm.  I see a few things not to like about that:

(1) It's hard to see how the WITH approach could work for GUCs
that need to take effect during raw parsing, such as the much-detested
standard_conforming_strings.  Ideally we'd not have any such GUCs, for
the reasons explained at the top of gram.y, but I dunno that we'll ever
get there.

(2) We only have WITH for DML (SELECT/INSERT/UPDATE/DELETE), not utility
commands.  Maybe that's enough for the cases at hand.  Or maybe we'd be
willing to do whatever's needful to handle WITH attached to a utility
command, but that could be a pretty large addition of work.

(3) If the SQL syntax is really just "WITH variable value [, ...]"
then I'm afraid we're going to have a lot of parse-ambiguity problems
with wedging full SET syntax into that.  The ability for the righthand
side to be a comma-separated list is certainly going to go out the
window, and we have various other special cases like "SET TIME ZONE"
that aren't going to work.  Again, maybe we don't need a full solution,
but it seems like it's gonna be a kluge.

(4) You'd need to repeat the WITH for each SQL command, IIUC.  Could
get tedious.

So maybe this is worth doing just for more standards compliance, but
it doesn't really seem like a nicer solution than subtransaction-
scoped SET.

regards, tom lane




Re: Loose ends after CVE-2020-14350 (extension installation hazards)

2020-08-14 Thread Chapman Flack
On 08/14/20 14:50, Tom Lane wrote:
>   SAVEPOINT s1;
>   SET LOCAL search_path = pg_catalog, pg_temp;
>   ... protected code here ...
>   RELEASE SAVEPOINT s1;
> 
> but this does not work because SET LOCAL persists to the end of the
> outer transaction.  Maybe we could invent a variant that only lasts
> for the current subtransaction.

This reminds me of the way the SQL standard overloads WITH to supply
lexically-scoped settings of things, as well as CTEs, mentioned a while
back. [1]

Would this provide additional incentive to implement that syntax,
generalized to support arbitrary GUCs and not just the handful of
specific settings the standard uses it for?

Regards,
-Chap



[1] https://www.postgresql.org/message-id/5AAEAE0F.20006%40anastigmatix.net




Loose ends after CVE-2020-14350 (extension installation hazards)

2020-08-14 Thread Tom Lane
Yesterday's releases included some fixes meant to make it harder
for a malicious user to sabotage an extension installation/update
script.  There are some things remaining to be done in the area,
though:

1. We don't have a way to make things adequately secure for extensions
that depend on other extensions.  As an example, suppose that Alice
installs the "cube" extension into schema1 and then installs
"earthdistance" into schema2.  The earthdistance install script will
run with search_path "schema2, schema1".  If schema2 is writable by
Bob, then Bob can create a type "schema2.cube" ahead of time, and
that will capture all the references to cube in the earthdistance
script.  Bob can similarly capture the references to cube functions
in earthdistance's domain constraints.  A partial solution to this
might be to extend the @extschema@ notation to allow specification
of a depended-on extension's schema.  Inventing freely, we could
imagine writing earthdistance's domain creation command like

CREATE DOMAIN earth AS @extschema(cube)@.cube
  CONSTRAINT not_point check(@extschema(cube)@.cube_is_point(value))
  CONSTRAINT not_3d check(@extschema(cube)@.cube_dim(value) <= 3)
  CONSTRAINT on_surface check(abs(@extschema(cube)@.cube_distance(value,
  '(0)'::@extschema(cube)@.cube) /
  earth() - '1'::float8) < '10e-7'::float8);

This is pretty tedious and error-prone; but as long as one is careful
to write only exact matches of function and operator argument types,
it's safe against CVE-2018-1058-style attacks, even if both extensions
are in publicly writable schemas.

However, in itself this can only fix references that are resolved during
execution of the extension script.  I don't see a good way to use the
idea to make earthdistance's SQL functions fully secure.  It won't do
to write, say,

CREATE FUNCTION ll_to_earth(float8, float8)
...
AS 'SELECT @extschema(cube)@.cube(...)';

because this will not survive somebody doing "ALTER EXTENSION cube SET
SCHEMA schema3".  I don't have a proposal for what to do about that.
Admittedly, we already disclaim security if you run queries with a
search_path that contains any untrusted schemas ... but it would be
nice if extensions could be written that (in themselves) are safe
regardless.  Peter E's proposal for parsing SQL function bodies at
creation time could perhaps fix this for SQL functions, but we still
have the issue for other PLs.

2. For the most part, the sorts of DDL commands you might use in an
extension script are not very subject to CVE-2018-1058-style attacks
because they specify relevant functions and operators exactly.  As long
as the objects you want are at the front of the search_path, there's no
way for an attacker to inject another, better match.  I did find one
exception that is not fixed as of today: lookup_agg_function() allows
inexact argument type matches for an aggregate's support functions,
so it could be possible to capture a reference if the intended support
function doesn't exactly match the aggregate's declared input and
transition data types.  This doesn't occur in any contrib modules
fortunately.  I do not see a way to make this better without breaking
the ability to use polymorphic support functions in a non-polymorphic
aggregate, since those are inherently not exact matches.

3. As Christoph Berg noted, the fixes in some extension update scripts
mean that plpgsql has to be installed while those scripts run.  How
much do we care, and if we do, what should we do about it?  I suggested
a band-aid fix of updating the base install scripts so that users don't
typically need to run the update scripts, but that's just a band-aid.
Maybe we could extend SQL enough so that plpgsql isn't needed to do
what those scripts have to do.  I'd initially thought of doing the
search path save-and-restore via

SAVEPOINT s1;
SET LOCAL search_path = pg_catalog, pg_temp;
... protected code here ...
RELEASE SAVEPOINT s1;

but this does not work because SET LOCAL persists to the end of the
outer transaction.  Maybe we could invent a variant that only lasts
for the current subtransaction.

4. I noticed while testing that hstore--1.0--1.1.sql is completely
useless nowadays, so it might as well get dropped.  It fails with a
syntax error in every still-supported server version, since "=>" is
no longer a legal operator name.  There's no way to load hstore 1.0
into a modern server because of that, either.

Thoughts?

regards, tom lane




Re: Switch to multi-inserts for pg_depend

2020-08-14 Thread Alvaro Herrera
On 2020-Aug-14, Michael Paquier wrote:

> Regarding the maximum number of slots allocated.  Do people like the
> current approach taken by the patch to do a single loop of the
> dependency entries at the cost of more allocating perhaps too much for
> the array holding the set of TupleTableSlots (the actual slot
> initialization happens only if necessary)?  Or would it be preferred
> to scan twice the set of dependencies, discarding pinned dependencies
> in a first scan to build the list of dependencies that would be
> inserted?  This way, you can know the exact amount memory to allocated
> for TupleTableSlots, though that's just 64B for each one of them.

It seems a bit silly to worry about allocating just the exact amount
needed; the current approach looked fine to me.  The logic to keep track
number of used slots used is baroque, though -- that could use a lot of
simplification.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Dependencies for partitioned indexes are still a mess

2020-08-14 Thread Alvaro Herrera
On 2020-Aug-12, Alvaro Herrera wrote:

> Hmm, we do make the FK constraint depend on the ATTACH for the direct
> children; what I think we're lacking is dependencies on descendants
> twice-removed (?) or higher.  This mock patch seems to fix this problem
> by adding dependencies recursively on all children of the index; I no
> longer see this problem with it.

After going over this some more, this analysis seems correct.  Here's a
better version of the patch which seems final to me.

I'm not yet clear on whether the noisy DROP INDEX is an actual bug that
needs to be fixed, or instead it needs to be left alone.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 07d8af1885edafbd670efc52faa15824787a1dc2 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 14 Aug 2020 13:26:20 -0400
Subject: [PATCH v2] pg_dump: fix dependencies on FKs to multi-level
 partitioned tables
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Parallel-restoring a foreign key that references a partitioned table
with several levels of partitions can fail:

pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey postgres
pg_restore: error: could not execute query: ERROR:  there is no unique constraint matching given keys for referenced table "pk"
Command was: ALTER TABLE fkpart3.fk
ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a);

This happens in parallel restore mode because some index partitions
aren't yet attached to the topmost partitioned index that the FK uses,
and so the index is still invalid.  The current code marks the FK as
dependent on the first level of index-attach dump objects; the bug is
fixed by recursively marking the FK on their children.

Reported-by: Tom Lane 
Author: Álvaro Herrera 
Discussion: https://postgr.es/m/3170626.1594842...@sss.pgh.pa.us
---
 src/bin/pg_dump/pg_dump.c | 39 ---
 1 file changed, 32 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 9c8436dde6..2cb3f9b083 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -235,6 +235,7 @@ static DumpableObject *createBoundaryObjects(void);
 static void addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 	DumpableObject *boundaryObjs);
 
+static void addConstrChildIdxDeps(DumpableObject *dobj, IndxInfo *refidx);
 static void getDomainConstraints(Archive *fout, TypeInfo *tyinfo);
 static void getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, char relkind);
 static void makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo);
@@ -7517,25 +7518,20 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables)
 			reftable = findTableByOid(constrinfo[j].confrelid);
 			if (reftable && reftable->relkind == RELKIND_PARTITIONED_TABLE)
 			{
-IndxInfo   *refidx;
 Oid			indexOid = atooid(PQgetvalue(res, j, i_conindid));
 
 if (indexOid != InvalidOid)
 {
 	for (int k = 0; k < reftable->numIndexes; k++)
 	{
-		SimplePtrListCell *cell;
+		IndxInfo   *refidx;
 
 		/* not our index? */
 		if (reftable->indexes[k].dobj.catId.oid != indexOid)
 			continue;
 
 		refidx = >indexes[k];
-		for (cell = refidx->partattaches.head; cell;
-			 cell = cell->next)
-			addObjectDependency([j].dobj,
-((DumpableObject *)
- cell->ptr)->dumpId);
+		addConstrChildIdxDeps([j].dobj, refidx);
 		break;
 	}
 }
@@ -7548,6 +7544,35 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables)
 	destroyPQExpBuffer(query);
 }
 
+/*
+ * addConstrChildIdxDeps
+ *
+ * Recursive subroutine for getConstraints
+ *
+ * Given an object representing a foreign key constraint and an index on the
+ * partitioned table it references, mark the constraint object as dependent
+ * on the DO_INDEX_ATTACH object of each index partition, recursively
+ * drilling down to their partitions if any.  This ensures that the FK is not
+ * restored until the index is fully marked valid.
+ */
+static void
+addConstrChildIdxDeps(DumpableObject *dobj, IndxInfo *refidx)
+{
+	SimplePtrListCell *cell;
+
+	Assert(dobj->objType == DO_FK_CONSTRAINT);
+
+	for (cell = refidx->partattaches.head; cell; cell = cell->next)
+	{
+		IndexAttachInfo *attach = (IndexAttachInfo *) cell->ptr;
+
+		addObjectDependency(dobj, attach->dobj.dumpId);
+
+		if (attach->partitionIdx->partattaches.head != NULL)
+			addConstrChildIdxDeps(dobj, attach->partitionIdx);
+	}
+}
+
 /*
  * getDomainConstraints
  *
-- 
2.20.1



Re: Parallel query hangs after a smart shutdown is issued

2020-08-14 Thread Tom Lane
Arseny Sher  writes:
> FWIW, I've also looked through the patch and it's fine. Moderate testing
> also found no issues, check-world works, bgws are started during smart
> shutdown as expected. And surely this is better than the inital
> shorthack of allowing only parallel workers.

Thanks, appreciate the extra look.  It's pushed now.

regards, tom lane




Re: EDB builds Postgres 13 with an obsolete ICU version

2020-08-14 Thread Bruce Momjian
On Fri, Aug 14, 2020 at 10:23:27PM +0900, Michael Paquier wrote:
> We have nothing in core, yet, that helps with this kind of problem
> with binary upgrades.  In the last year, Julien and I worked on an
> upgrade case where a glibc upgrade was involved with pg_upgrade used
> for PG, and it could not afford the use of a new host to allow a
> logical dump/restore to rebuild the indexes from scratch.  You can
> always run a "reindex -a" after the upgrade to be sure that no indexes
> are broken because of the changes with collation versions, but once
> you have to give the guarantee that an upgrade does not take longer
> than a certain amount of time, the reindex easily becomes the
> bottleneck.  That's one motivation behind the recent work to add
> collation versions to pg_depend entries, which would lead to more
> filtering facilities for REINDEX on the backend to get for example the
> option to only reindex collation-sensitive indexes (imagine just a
> reindexdb --jobs with the collation filtering done at table-level,
> that would be fast, or a script doing this work generated by
> pg_upgrade).

Agreed --- only a small percentage of indexes are affected by
collations, and it would be great if we could tell users how to easily
identify them.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Parallel query hangs after a smart shutdown is issued

2020-08-14 Thread Arseny Sher


Tom Lane  writes:

> Thomas Munro  writes:
>> On Fri, Aug 14, 2020 at 4:45 AM Tom Lane  wrote:
>>> After some more rethinking and testing, here's a v5 that feels
>>> fairly final to me.  I realized that the logic in canAcceptConnections
>>> was kind of backwards: it's better to check the main pmState restrictions
>>> first and then the smart-shutdown restrictions afterwards.
>
>> LGTM.  I tested this a bit today and it did what I expected for
>> parallel queries and vacuum, on primary and standby.
>
> Thanks for reviewing!  I'll do the back-patching and push this today.

FWIW, I've also looked through the patch and it's fine. Moderate testing
also found no issues, check-world works, bgws are started during smart
shutdown as expected. And surely this is better than the inital
shorthack of allowing only parallel workers.




Re: Implement a new data type

2020-08-14 Thread Tom Lane
mohand oubelkacem makhoukhene  writes:
> I would like to implement a new data type next to char, number, varchar... 
> for example a special "Money" type, but
> I don't want to use extensions and the Create type command.  I want to 
> implement it directly inside source code,
> because I want to implement my new type at lower level, in order to perform 
> some more sophisticated functions after.

Why, and exactly what do you think you'd accomplish?

Postgres is meant to be an extensible system, which in general means that
anything that can be done in core code could be done in an extension.
Admittedly there are lots of ways that we fall short of that goal, but
new data types tend not to be one of them.  The only big difference
between an extension datatype and a core one is that for a core type
you have to construct all the catalog entries "by hand" by making
additions to the include/catalog/*.dat files, which is tedious and
error-prone.

> Just as an example,  help the optimizer in its decisions.

The core optimizer is pretty darn data-type-agnostic, and should
remain so IMO.  There are callbacks, such as selectivity estimators
and planner support functions, that contain specific knowledge of
particular functions and data types; and those facilities are available
to extensions too.

regards, tom lane




Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2020-08-14 Thread Ibrar Ahmed
On Mon, Aug 3, 2020 at 2:29 PM Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

> On 31.07.2020 23:28, Robert Haas wrote:
> > On Tue, Jul 14, 2020 at 1:51 PM Anastasia Lubennikova
> >  wrote:
> >> Questions from the first review pass:
> >>
> >> 1) Do we need XLH_INSERT_ALL_VISIBLE_SET ? IIUC, in the patch it is
> always
> >> implied by XLH_INSERT_ALL_FROZEN_SET.
> > I agree that it looks unnecessary to have two separate bits.
> >
> >> 2) What does this comment mean? Does XXX refers to the lsn comparison?
> >> Since it
> >> is definitely necessary to update the VM.
> >>
> >> + * XXX: This seems entirely unnecessary?
> >> + *
> >> + * FIXME: Theoretically we should only do this after we've
> >> + * modified the heap - but it's safe to do it here I think,
> >> + * because this means that the page previously was empty.
> >> + */
> >> +if (lsn > PageGetLSN(vmpage))
> >> +visibilitymap_set(reln, blkno, InvalidBuffer, lsn,
> >> vmbuffer,
> >> +  InvalidTransactionId, vmbits);
> > I wondered about that too. The comment which precedes it was, I
> > believe, originally written by me, and copied here from
> > heap_xlog_visible(). But it's not clear very good practice to just
> > copy the comment like this. If the same logic applies, the code should
> > say that we're doing the same thing here as in heap_xlog_visible() for
> > consistency, or some such thing; after all, that's the primary place
> > where that happens. But it looks like the XXX might have been added by
> > a second person who thought that we didn't need this logic at all, and
> > the FIXME by a third person who thought it was in the wrong place, so
> > the whole thing is really confusing at this point.
> >
> > I'm pretty worried about this, too:
> >
> > + * FIXME: setting recptr here is a dirty dirty hack, to
> prevent
> > + * visibilitymap_set() from WAL logging.
> >
> > That is indeed a dirty hack, and something needs to be done about it.
> >
> > I wonder if it was really all that smart to try to make the
> > HEAP2_MULTI_INSERT do this instead of just issuing separate WAL
> > records to mark it all-visible afterwards, but I don't see any reason
> > why this can't be made to work. It needs substantially more polishing
> > than it's had, though, I think.
> >
> New version of the patch is in the attachment.
>
> New design is more conservative and simpler:
> - pin the visibility map page in advance;
> - set PageAllVisible;
> - call visibilitymap_set() with its own XlogRecord, just like in other
> places.
>
> It allows to remove most of the "hacks" and keep code clean.
> The patch passes tests added in previous versions.
>
> I haven't tested performance yet, though. Maybe after tests, I'll bring
> some optimizations back.
>
> --
> Anastasia Lubennikova
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
> Here are some performance results with a patched and unpatched master
branch.
The table used for the test contains three columns (integer, text,
varchar).
The total number of rows is 1000 in total.

Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 9069.432 ms vacuum; 2567.961ms
COPY: 9004.533 ms vacuum: 2553.075ms
COPY: 8832.422 ms vacuum: 2540.742ms

Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 10031.723 ms vacuum: 127.524 ms
COPY: 9985.109  ms vacuum: 39.953 ms
COPY: 9283.373  ms vacuum: 37.137 ms

Time to take the copy slightly increased but the vacuum time significantly
decrease.

-- 
Ibrar Ahmed


Implement a new data type

2020-08-14 Thread mohand oubelkacem makhoukhene
Hello

I would like to implement a new data type next to char, number, varchar... for 
example a special "Money" type, but
I don't want to use extensions and the Create type command.  I want to 
implement it directly inside source code,
because I want to implement my new type at lower level, in order to perform 
some more sophisticated functions after.
Just as an example,  help the optimizer in its decisions.
How should I proceed ? Is it an easy task ?

Thanks
Mohand



Re: jsonb, collection & postgres_fdw

2020-08-14 Thread Tom Lane
Konstantin Knizhnik  writes:
> I still do not completely understand current criteria of shippable 
> functions.
> I understood Tom's explanation, but:

> postgres=# create table t1(t text collate "C");
> CREATE TABLE
> postgres=# create foreign table ft1(t text collate "ru_RU") server 
> pg_fdw options (table_name 't1');
> CREATE FOREIGN TABLE
> postgres=# explain select * from ft1 where lower(t)='some';
>   QUERY PLAN
> 
>   Foreign Scan on ft1  (cost=100.00..132.07 rows=7 width=32)
> (1 row)

> lower(t) is pushed to remote server despite to the fact that "t" has 
> different collations at local and remote servers.

Well, that's the case because you lied while creating the foreign
table.  We have no practical way to cross-check whether the foreign
table's declaration is an accurate representation of the remote table,
so we just take it on faith that it is.

The problem that the collation check is trying to solve is that we
can't safely push COLLATE clauses to the remote server, because it
may not have the same set of collation names as the local server.
So we can only push clauses whose collation is entirely derivable
from the table column(s) they use.  And then, per the above, we rely on
the user to make sure that the local and remote columns have equivalent
collations.  (Which conceivably would have different names.)

>  From my point of view, it will be nice to have flag in postgres_fdw 
> server indicating that foreign and remote servers are identical
> and treat all functions as shippable in this case (not only built-in 
> ones are belonging to explicitly specified shippable extensions).

Perhaps, but not everyone has that use-case.  I'd even argue that it's
a minority use-case.

regards, tom lane




Re: Inconsistent behavior of smart shutdown handling for queries with and without parallel workers

2020-08-14 Thread Tom Lane
Bharath Rupireddy  writes:
> Is there any way the bgworkers(for that matter, any postmaster's child
> process) knowing that there's a smart shutdown pending? This is
> useful, if any of the bgworker(if not parallel workers) want to
> differentiate the two modes i.e. smart and fast shutdown modes and
> smartly finish of their work.

With the patch I'm working on, the approach is basically that smart
shutdown changes nothing except for not allowing new connections ...
until the last regular connection is gone, at which point it starts to
act exactly like fast shutdown.  So in those terms there is no need for
bgworkers to know the difference.  If a bgworker did act differently
during the initial phase of a smart shutdown, that would arguably be
a bug, just as it's a bug that parallel query isn't working.

regards, tom lane




Re: Parallel query hangs after a smart shutdown is issued

2020-08-14 Thread Tom Lane
Thomas Munro  writes:
> On Fri, Aug 14, 2020 at 4:45 AM Tom Lane  wrote:
>> After some more rethinking and testing, here's a v5 that feels
>> fairly final to me.  I realized that the logic in canAcceptConnections
>> was kind of backwards: it's better to check the main pmState restrictions
>> first and then the smart-shutdown restrictions afterwards.

> LGTM.  I tested this a bit today and it did what I expected for
> parallel queries and vacuum, on primary and standby.

Thanks for reviewing!  I'll do the back-patching and push this today.

>> I'm assuming we want to back-patch this as far as 9.6, where parallel
>> query began to be a thing.

> Yeah.  I mean, it's more radical than what I thought we'd be doing for
> this, but you could get into real trouble by running in smart shutdown
> mode without the autovac infrastructure alive.

Right.  99.99% of the time, that early shutdown doesn't really cause
any problems, which is how we've gotten away with it this long.  But if
someone did leave session(s) running for a long time after issuing the
SIGTERM, the results could be bad --- and there's no obvious benefit
to the early shutdowns anyway.

regards, tom lane




Re: Autonomous database is coming to Postgres?

2020-08-14 Thread Dmitry Dolgov
> On Fri, Aug 14, 2020 at 08:55:53AM -0400, Bruce Momjian wrote:
> > On Thu, Aug 13, 2020 at 03:26:33AM +, tsunakawa.ta...@fujitsu.com wrote:
> > Hello,
> >
> > I'm not sure if I should have posted this to pgsql-advocacy, but this is 
> > being developed so I posted here.
> > Does anyone know if this development come to open source Postgres, or only 
> > to the cloud services of Microsoft and Google?
> > (I wonder this will become another reason that Postgres won't incorporate 
> > optimizer hint feature.)
> >
> > Data systems that learn to be better
> > http://news.mit.edu/2020/mit-data-systems-learn-be-better-tsunami-bao-0810
>
> It seems interesting, but I don't know anyone working on this.

Tim Kraska mentioned in twitter plans about releasing BAO as an open
source project (PostgreSQL extension I guess?), but there seems to be no
interaction with the community.




Re: Newline after --progress report

2020-08-14 Thread Tom Lane
Heikki Linnakangas  writes:
> While hacking on pg_rewind, this in pg_rewind's main() function caught 
> my eye:

Good catch.

> Attached is a patch to fix this, as well as a similar issue in 
> pg_checksums. pg_basebackup and pgbench also print progres reports like 
> this, but they seem correct to me.

I wonder whether it'd be better to push the responsibility for this
into progress_report(), by adding an additional parameter "bool last"
or the like.  Then the callers would not need such an unseemly amount
of knowledge about what progress_report() is doing.

regards, tom lane




Re: EDB builds Postgres 13 with an obsolete ICU version

2020-08-14 Thread Michael Paquier
On Fri, Aug 14, 2020 at 09:00:06AM -0400, Bruce Momjian wrote:
> On Tue, Aug 11, 2020 at 02:58:30PM +0200, Magnus Hagander wrote:
>> I assumed it had code for that stuff already. Mainly because I assumed it
>> supported doing pg_upgrade, which requires similar things no?
> 
> While pg_upgrade requires having the old and new cluster software in
> place, I don't think it helps allowing different ICU versions for each
> cluster.  I guess you can argue that if you know the user is _not_ going
> to be using pg_upgrade, then a new ICU version should be used for the
> new cluster.

We have nothing in core, yet, that helps with this kind of problem
with binary upgrades.  In the last year, Julien and I worked on an
upgrade case where a glibc upgrade was involved with pg_upgrade used
for PG, and it could not afford the use of a new host to allow a
logical dump/restore to rebuild the indexes from scratch.  You can
always run a "reindex -a" after the upgrade to be sure that no indexes
are broken because of the changes with collation versions, but once
you have to give the guarantee that an upgrade does not take longer
than a certain amount of time, the reindex easily becomes the
bottleneck.  That's one motivation behind the recent work to add
collation versions to pg_depend entries, which would lead to more
filtering facilities for REINDEX on the backend to get for example the
option to only reindex collation-sensitive indexes (imagine just a
reindexdb --jobs with the collation filtering done at table-level,
that would be fast, or a script doing this work generated by
pg_upgrade).
--
Michael


signature.asc
Description: PGP signature


Re: EDB builds Postgres 13 with an obsolete ICU version

2020-08-14 Thread Bruce Momjian
On Tue, Aug 11, 2020 at 02:58:30PM +0200, Magnus Hagander wrote:
> On Tue, Aug 4, 2020 at 11:42 AM Dave Page  wrote:
> That would require fairly large changes to the installer to allow it to
> login to the database server (whether that would work would be dependent 
> on
> how pg_hba.conf is configured), and also assumes that the ICU ABI hasn't
> changed between releases. It would also require some hacky renaming of
> DLLs, as they have the version number in them.
> 
> I assumed it had code for that stuff already. Mainly because I assumed it
> supported doing pg_upgrade, which requires similar things no?

While pg_upgrade requires having the old and new cluster software in
place, I don't think it helps allowing different ICU versions for each
cluster.  I guess you can argue that if you know the user is _not_ going
to be using pg_upgrade, then a new ICU version should be used for the
new cluster.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Autonomous database is coming to Postgres?

2020-08-14 Thread Bruce Momjian
On Thu, Aug 13, 2020 at 03:26:33AM +, tsunakawa.ta...@fujitsu.com wrote:
> Hello,
> 
> I'm not sure if I should have posted this to pgsql-advocacy, but this is 
> being developed so I posted here.
> 
> Does anyone know if this development come to open source Postgres, or only to 
> the cloud services of Microsoft and Google?
> 
> (I wonder this will become another reason that Postgres won't incorporate 
> optimizer hint feature.)
> 
> Data systems that learn to be better
> http://news.mit.edu/2020/mit-data-systems-learn-be-better-tsunami-bao-0810

It seems interesting, but I don't know anyone working on this.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: display offset along with block number in vacuum errors

2020-08-14 Thread Amit Kapila
On Thu, Aug 6, 2020 at 7:41 PM Amit Kapila  wrote:
>
> On Wed, Aug 5, 2020 at 12:47 AM Mahendra Singh Thalor
>  wrote:
> >
> > Apart from these, I fixed Justin's comment of extra brackets(That was
> > due to "patch -p 1 < file", as 002_fix was not applying directly). I
> > haven't updated the document for this(Sawada's comment). I will try in
> > the next patch.
> > Attaching v4 patch for review.
> >
>
> Few comments on the latest patch:
> 1.
> @@ -2640,6 +2659,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats
> *vacrelstats)
>   */
>   new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
>   vacrelstats->blkno = new_rel_pages;
> + vacrelstats->offnum = InvalidOffsetNumber;
>
> Do we really need to update the 'vacrelstats->offnum' here when we
> have already set it to InvalidOffsetNumber in the caller?
>

I have removed this change.

> 2.
> @@ -3574,8 +3605,14 @@ vacuum_error_callback(void *arg)
>   {
>   case VACUUM_ERRCB_PHASE_SCAN_HEAP:
>   if (BlockNumberIsValid(errinfo->blkno))
> - errcontext("while scanning block %u of relation \"%s.%s\"",
> -errinfo->blkno, errinfo->relnamespace, errinfo->relname);
> + {
> + if (OffsetNumberIsValid(errinfo->offnum))
> + errcontext("while scanning block %u of relation \"%s.%s\", item offset %u",
> +
>
> I am not completely sure if this error message is an improvement over
> what you have in the initial version of patch "while scanning block %u
> and offset %u of relation \"%s.%s\"",...". I see that Justin has
> raised a concern above that whether users will be aware of 'offset'
> but I also see that we have used it in a few other messages in the
> code.

I have changed the message to what you have in the original patch.

Apart from above, I have also reset the offset number back to
InvalidOffsetNumber in lazy_scan_heap after processing all the tuples,
otherwise, it will erroneously display wring offset if any error
occurred afterward.

Let me know what you think of the changes done in the latest patch.

-- 
With Regards,
Amit Kapila.




Re: display offset along with block number in vacuum errors

2020-08-14 Thread Amit Kapila
On Fri, Aug 7, 2020 at 7:18 AM Amit Kapila  wrote:
>
> On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby  wrote:
> >
> > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:
> > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby  
> > > wrote:
> > > >
> > > >
> > > > lazy_check_needs_freeze iterates over blocks and this patch changes it 
> > > > to
> > > > update vacrelstats.  I think it should do what
> > > > lazy_{vacuum/cleanup}_heap/page/index do and call 
> > > > update_vacuum_error_info() at
> > > > its beginning (even though only the offset is changed), and then
> > > > restore_vacuum_error_info() at its end (to "revert back" to the item 
> > > > number it
> > > > started with).
> > > >
> > >
> > > I see that Mahendra has changed patch as per this suggestion but I am
> > > not convinced that it is a good idea to sprinkle
> > > update_vacuum_error_info()/restore_vacuum_error_info() at places more
> > > than required. I see that it might look a bit clean from the
> > > perspective that if tomorrow we use the function
> > > lazy_check_needs_freeze() for a different purpose then we don't need
> > > to worry about the wrong phase information. If we are worried about
> > > that then we should have an assert in that function to ensure that the
> > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.
> >
> > The motivation was to restore the offnum, which is set to Invalid at the 
> > start
> > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
> > should be restored or re-set to Invalid when returns to lazy_scan_heap().  
> > If
> > you think it's important, we could just set vacrelstats->offnum = Invalid
> > before returning,
> >
>
> Yeah, I would prefer that and probably a comment to indicate why we
> are doing that.
>

Changed accordingly in the updated patch.

-- 
With Regards,
Amit Kapila.




Re: display offset along with block number in vacuum errors

2020-08-14 Thread Amit Kapila
On Mon, Aug 10, 2020 at 10:24 AM Masahiko Sawada
 wrote:
>
> It's true that heap_page_is_all_visible() is called from only
> lazy_vacuum_page() but I'm concerned it would lead misleading since
> it's not actually removing tuples but just checking after vacuum. I
> guess that the errcontext should show what the process is actually
> doing and therefore help the investigation, so I thought VACUUM_HEAP
> might not be appropriate for this case. But I see also your point.
> Other vacuum error context phases match with vacuum progress
> information phrases. So in heap_page_is_all_visible (), I agree it's
> better to update the offset number and keep the phase VACUUM_HEAP
> rather than do nothing.
>

Okay, I have changed accordingly and this means that the offset will
be displayed for the vacuum phase as well. Apart from this, I have
fixed all the comments raised by me in the attached patch. One thing
we need to think is do we want to set offset during heap_page_prune
when called from lazy_scan_heap? I think the code path for
heap_prune_chain is quite deep, so an error can occur in that path.
What do you think?

-- 
With Regards,
Amit Kapila.


v5-0001-Add-additional-information-in-vacuum-errcontext.patch
Description: Binary data


Re: Yet another fast GiST build (typo)

2020-08-14 Thread Pavel Borisov
I see this feature quite useful in concept and decided to test it.

On a real database of 7 million rows I observed speedup of 4 times in case
of single column index on points only and 2.5 times speedup in case of
index on points with several included columns. Standard deviation between
in series of measurements being of 10%. Index size saving was of 1.7-1.5
times respectively. Points were in all four quadrants.

On random points same as query in the original message it was observer 3
times speedup with the patch. Then I generated same points set but so they
will get into one quadrant didn't observe a difference with the previous
case. So probably anomaly in Morton curve not so big to induce noticeable
slowdown in a whole random set. But as the ordering is done only for index
and not used outside index it seems to me possible to introduce shifting
floating point coordinates respective to leftmost-bottom corner point and
thus make all of them positive to avoid anomaly of Morton curve near
quadrants transitions.

Of course speed measurements depend on machine and configuration a lot, but
I am sure anyway there is a noticeable difference in index build time and
this is quite valuable for end-user who build GiSt index on point type of
data. Furthermore same speedup is also for REBUILD INDEX CONCURRENTLY.
There short rebuild time also mean fewer user modified table rows during
rebuild which should be integrated in a newer index after rebuild.

This patch can be also seen as a step to futher introduce the other
ordering algoritms e.g. Gilbert curve and I consider this feature is useful
and is worth to be committed.

Both patches 0001 and 0002 when applied on version 14dev compile and work
cleanly. Regression tests are passed.
Code seems clean and legible for me.

In declaration I see little bit different style in similar argument
pointers/arrays:
extern IndexTuple gistFormTuple(GISTSTATE *giststate, Relation r, Datum
*attdata, bool *isnull, bool isleaf);
extern IndexTuple gistCompressValusAndFormTuple(GISTSTATE *giststate,
Relation r, Datum attdata[], bool isnull[], bool isleaf, Datum compatt[]);
I suppose this is because gistFormTuple previously had different style in
declaration and definition. Maybe it would be nice to change them to one
style in all code, I propose pointers instead of [].

In a big comment
/*
+* In this function we need to compute Morton codes for non-integral
+* components p->x and p->y. But Morton codes are defined only for
+* integral values.
i don't quite caught meaning of "non-integral" and "integral" and propose
to replace it to "float" and "integers".

Also there are some extra spaces before line
"prev_level_start = level_start;"
and after
"The argument is a pointer to a SortSupport
struct."

Overall I see the patch useful and almost ready for commit.

вт, 4 авг. 2020 г. в 21:28, Andrey M. Borodin :

>
>
> > 30 июля 2020 г., в 06:26, Thomas Munro 
> написал(а):
> >
> > On Fri, Jul 10, 2020 at 6:55 PM Andrey M. Borodin 
> wrote:
> >> Thanks! Fixed.
> >
> > It's not a bug, but I think those 64 bit constants should be wrapped
> > in UINT64CONST(), following our convention.
> Thanks, fixed!
>
> > I'm confused about these two patches: 0001 introduces
> > gist_point_fastcmp(), but then 0002 changes it to gist_bbox_fastcmp().
> > Maybe you intended to keep both of them?  Also 0002 seems to have
> > fixups for 0001 squashed into it.
> Indeed, that were fixups: point converted to GiST representation is a bbox
> already, and the function expects only bboxes.
>
> Also I've fixed some mismerges in documentation.
>
> Thanks!
>
> Best regards, Andrey Borodin.
>
>

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Commit/abort WAL records with dropped rels missing XLR_SPECIAL_REL_UPDATE

2020-08-14 Thread Heikki Linnakangas
While hacking on pg_rewind, I noticed that commit and abort WAL records 
are never marked with the XLR_SPECIAL_REL_UPDATE flag. But if the record 
contains "dropped relfilenodes", surely it should be?


It's harmless as far as the backend and all the programs in PostgreSQL 
repository are concerned, but the point of XLR_SPECIAL_REL_UPDATE is to 
aid external tools that try to track which files are modified. Attached 
is a patch to fix it.


It's always been like that, but I am not going backport, for fear of 
breaking existing applications. If a program reads the WAL, and would 
actually need to do something with commit records dropping relations, 
that seems like such a common scenario that the author should've thought 
about it and handled it even without the flag reminding about it. Fixing 
it in master ought to be enough.


Thoughts?

- Heikki
>From cafc42a1dfe6afc9d36c0440dd45a5b12d9a6063 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 12 Aug 2020 23:07:50 +0300
Subject: [PATCH 1/2] Make xact.h usable in frontend.

xact.h included utils/datetime.h, which cannot be used in the frontend
(it includes fmgr.h, which needs Datum). But xact.h only needs the
definition of TimestampTz from it, which is available directly in
datatypes/timestamp.h. Change xact.h to include that instead of
utils/datetime.h, so that it can be used in client programs.
---
 src/backend/nodes/params.c   | 1 +
 src/backend/utils/time/snapmgr.c | 2 ++
 src/include/access/xact.h| 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c
index 1719119fc28..bce0c7e72b2 100644
--- a/src/backend/nodes/params.c
+++ b/src/backend/nodes/params.c
@@ -16,6 +16,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "fmgr.h"
 #include "mb/stringinfo_mb.h"
 #include "nodes/params.h"
 #include "parser/parse_node.h"
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 6b6c8571e23..02b51bc4fe0 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -53,6 +53,7 @@
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
+#include "datatype/timestamp.h"
 #include "lib/pairingheap.h"
 #include "miscadmin.h"
 #include "storage/predicate.h"
@@ -67,6 +68,7 @@
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 
 /*
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index c18554bae2c..9f9dc7d3354 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -16,11 +16,11 @@
 
 #include "access/transam.h"
 #include "access/xlogreader.h"
+#include "datatype/timestamp.h"
 #include "lib/stringinfo.h"
 #include "nodes/pg_list.h"
 #include "storage/relfilenode.h"
 #include "storage/sinval.h"
-#include "utils/datetime.h"
 
 /*
  * Maximum size of Global Transaction ID (including '\0').
-- 
2.20.1

>From f04db8fee6508000d3fdbbcaf1577186959ac310 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 14 Aug 2020 11:36:18 +0300
Subject: [PATCH 2/2] Mark commit and abort WAL records with
 XLR_SPECIAL_REL_UPDATE.

If a commit or abort record includes "dropped relfilenodes", then replaying
the record will remove data files. That is surely a "special rel update",
but the records were not marked as such.

Teach pg_rewind to expect and ignore them, and add a test case to cover it.

It's always been like this, but no backporting for fear of breaking
existing applications. If an application parser the WAL but is not handling
commit/abort records, it would stop working. That might be a good thing if
it would actually need to handle the dropped rels, but it will be caught
when the application is updated to work with PostgreSQL v14 anyway.
---
 src/backend/access/transam/xact.c |  2 ++
 src/bin/pg_rewind/parsexlog.c | 13 +
 src/bin/pg_rewind/t/001_basic.pl  | 15 ++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 7ccb7d68ed9..af6afcebb13 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5565,6 +5565,7 @@ XactLogCommitRecord(TimestampTz commit_time,
 	{
 		xl_xinfo.xinfo |= XACT_XINFO_HAS_RELFILENODES;
 		xl_relfilenodes.nrels = nrels;
+		info |= XLR_SPECIAL_REL_UPDATE;
 	}
 
 	if (nmsgs > 0)
@@ -5697,6 +5698,7 @@ XactLogAbortRecord(TimestampTz abort_time,
 	{
 		xl_xinfo.xinfo |= XACT_XINFO_HAS_RELFILENODES;
 		xl_relfilenodes.nrels = nrels;
+		info |= XLR_SPECIAL_REL_UPDATE;
 	}
 
 	if (TransactionIdIsValid(twophase_xid))
diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c
index 2325fb5d302..2229c86f9af 100644
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@ -14,6 +14,7 @@
 #include 
 
 #include "access/rmgr.h"
+#include "access/xact.h"
 #include "access/xlog_internal.h"
 

Re: Fix an old description in high-availability.sgml

2020-08-14 Thread Michael Paquier
On Fri, Aug 14, 2020 at 04:53:54PM +0900, Masahiko Sawada wrote:
> The following sentence in high-availability.sgml is not true:
> 
> The background writer is active during recovery and will perform
> restartpoints (similar to checkpoints on the primary) and normal block
> cleaning activities.
> 
> I think this is an oversight of the commit 806a2ae in 2011; the
> checkpointer process started to be responsible for creating
> checkpoints.

Good catch it is.  Your phrasing looks good to me.
--
Michael


signature.asc
Description: PGP signature


Re: Switch to multi-inserts for pg_depend

2020-08-14 Thread Michael Paquier
On Thu, Aug 13, 2020 at 11:45:52AM -0400, Alvaro Herrera wrote:
> MAX_CATALOG_INSERT_BYTES sounds decent to me.  I mentioned dependency.h
> because I was uncaffeinatedly thinking that this was used with API
> defined there -- but in reality it's used with indexing.h functions, and
> it seems to me that that file would be the place for it.

OK, let's live with indexing.h then.

Regarding the maximum number of slots allocated.  Do people like the
current approach taken by the patch to do a single loop of the
dependency entries at the cost of more allocating perhaps too much for
the array holding the set of TupleTableSlots (the actual slot
initialization happens only if necessary)?  Or would it be preferred
to scan twice the set of dependencies, discarding pinned dependencies
in a first scan to build the list of dependencies that would be
inserted?  This way, you can know the exact amount memory to allocated
for TupleTableSlots, though that's just 64B for each one of them.

I have read today through the patch set of Julien and Thomas to add a
version string to pg_depend, and I get the impression that the
current approach taken by the patch fits better in the whole picture.
--
Michael


signature.asc
Description: PGP signature


Re: Terminate the idle sessions

2020-08-14 Thread Li Japin

On Aug 14, 2020, at 2:15 PM, Bharath Rupireddy 
mailto:bharath.rupireddyforpostg...@gmail.com>>
 wrote:

I think, since the idle_session_timeout is by default disabled, we
have no problem. My thought is what if a user enables the
feature(knowingly or unknowingly) on the remote backend? If the user
knows about the above scenario, that may be fine. On the other hand,
either we can always the feature on the remote backend(at the
beginning of the remote txn, like we set for some other configuration
settings see - configure_remote_session() in connection.c) or how
about mentioning the above scenario in this feature documentation?

Though we can disable the idle_session_timeout when using postgres_fdw,
there still has locally cached connection cache entries when the remote sessions
terminated by accident.  AFAIK, you have provided a patch to solve this
problem, and it is in current CF [1].

[1] - https://commitfest.postgresql.org/29/2651/

Best Regards,
Japin Li.


Newline after --progress report

2020-08-14 Thread Heikki Linnakangas
While hacking on pg_rewind, this in pg_rewind's main() function caught 
my eye:


progress_report(true);
printf("\n");

It is peculiar, because progress_report() uses fprintf(stderr, ...) for 
all its printing, and in fact the only other use of printf() in 
pg_rewind is in printing the "pg_rewind --help" text.


I think the idea here was to move to the next line, after 
progress_report() has updated the progress line for the last time. It 
probably also should not be printed, when "--progress" is not used.


Attached is a patch to fix this, as well as a similar issue in 
pg_checksums. pg_basebackup and pgbench also print progres reports like 
this, but they seem correct to me.


- Heikki
>From b4a64c3aca7257336c3ca1bf4bee8675e148de10 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 12 Aug 2020 23:27:01 +0300
Subject: [PATCH 1/1] Fix printing last progress report line in client
 programs.

A number of client programs have a "--progress" option that when printing
to a TTY, updates the current line by printing a '\r' and overwriting it.
After the last line, an extra '\n' needs to be printed to move the cursor
to the next line. pg_basebackup and pgbench got this right, but pg_rewind
and pg_checksums were slightly wrong. pg_rewind printed the newline to
stdout instead of stderr, and pg_checksum printed the newline even when
not printing to a TTY. Fix them.
---
 src/bin/pg_checksums/pg_checksums.c | 3 ++-
 src/bin/pg_rewind/pg_rewind.c   | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 1daa5aed0e0..9ae884897ef 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -626,7 +626,8 @@ main(int argc, char *argv[])
 		if (showprogress)
 		{
 			progress_report(true);
-			fprintf(stderr, "\n");	/* Need to move to next line */
+			if (isatty(fileno(stderr)))
+fprintf(stderr, "\n");	/* Need to move to next line */
 		}
 
 		printf(_("Checksum operation completed\n"));
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 0015d3b461a..c000c12fa58 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -422,7 +422,8 @@ main(int argc, char **argv)
 	executeFileMap();
 
 	progress_report(true);
-	printf("\n");
+	if (showprogress && isatty(fileno(stderr)))
+		fprintf(stderr, "\n");	/* Need to move to next line */
 
 	if (showprogress)
 		pg_log_info("creating backup label and updating control file");
-- 
2.20.1



Re: massive FPI_FOR_HINT load after promote

2020-08-14 Thread Simon Riggs
On Mon, 10 Aug 2020 at 23:56, Alvaro Herrera  wrote:
>
> The problem was simply that when a page is
> examined by a seqscan, we do HeapTupleSatisfiesVisibility of each tuple
> in isolation; and for each tuple we call SetHintBits().  And only the
> first time the FPI happens; by the time we get to the second tuple, the
> page is already dirty, so there's no need to emit an FPI.  But the FPI
> we sent only had the bit on the first tuple ... so the standby will not
> have the bit set for any subsequent tuple.  And on promotion, the
> standby will have to have the bits set for all those tuples, unless you
> happened to dirty the page again later for other reasons.

Which probably means that pg_rewind is broken because it won't be able
to rewind correctly.

> One simple idea to try to forestall this problem would be to modify the
> algorithm so that all tuples are scanned and hinted if the page is going
> to be dirtied -- then send a single FPI setting bits for all tuples,
> instead of just on the first tuple.

This would make latency much worse for non seqscan cases.

Certainly for seqscans it would make sense to emit a message that sets
all tuples at once, or possibly emit an FPI and then follow that with
a second message that sets all other hints on the page.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
Mission Critical Databases




Fix an old description in high-availability.sgml

2020-08-14 Thread Masahiko Sawada
Hi,

The following sentence in high-availability.sgml is not true:

The background writer is active during recovery and will perform
restartpoints (similar to checkpoints on the primary) and normal block
cleaning activities.

I think this is an oversight of the commit 806a2ae in 2011; the
checkpointer process started to be responsible for creating
checkpoints.

I've attached the patch.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fix_high_availability_doc.patch
Description: Binary data


Re: Collation versioning

2020-08-14 Thread Michael Paquier
On Fri, Aug 14, 2020 at 02:21:58PM +1200, Thomas Munro wrote:
> Thanks Julien.  I'm planning to do a bit more testing and review, and
> then hopefully commit this next week.  If anyone else has objections
> to this design, now would be a good time to speak up.

The design to use pg_depend for the version string and rely on an
unknown state for indexes whose collations are unknown has a clear
consensus, so nothing to say about that.  It looks like this will
benefit from using multi-INSERTs with pg_depend, actually.

I have read through the patch, and there are a couple of portions that
could be improved and/or simplified.

 /*
- * Adjust all dependency records to come from a different object of the same 
type
  * Swap all dependencies of and on the old index to the new one, and
+ * vice-versa, while preserving any referenced version for the original owners.
+ * Note that a call to CommandCounterIncrement() would cause duplicate entries
+ * in pg_depend, so this should not be done.
+ */
+void
+swapDependencies(Oid classId, Oid firstObjectId, Oid secondObjectId)
+{
+   changeDependenciesOf(classId, firstObjectId, secondObjectId, true);
+   changeDependenciesOn(classId, firstObjectId, secondObjectId);
+
+   changeDependenciesOf(classId, secondObjectId, firstObjectId, true);
+   changeDependenciesOn(classId, secondObjectId, firstObjectId);
+}

The comment on top of the routine is wrong, as it could apply to
something else than indexes.  Anyway, I don't think there is much
value in adding this API as the only part where this counts is
relation swapping for reindex concurrently.  It could also be possible
that this breaks some extension code by making those static to
pg_depend.c.

-long
+static long
 changeDependenciesOf(Oid classId, Oid oldObjectId,
-Oid newObjectId)
+Oid newObjectId, bool preserve_version)
All the callers of changeDependenciesOf() set the new argument to
true, making the false path dead, even if it just implies that the
argument is null.  I would suggest to keep the original function
signature.  If somebody needs a version where they don't want to
preserve the version, it could just be added later.

+* We don't want to record redundant depedencies that are used
+* to track versions to avoid redundant warnings in case of
s/depedencies/dependencies/

+   /*
+* XXX For deterministic transaction, se should only track the
version
+* if the AM relies on a stable ordering.
+*/
+   if (determ_colls)
+   {
+   /* XXX check if the AM relies on a stable ordering */
+   recordDependencyOnCollations(, determ_colls, true);
Some cleanup needed here?  Wouldn't it be better to address the issues
with stable ordering first?

+   /* recordDependencyOnSingleRelExpr get rid of duplicated
entries */
s/get/gets/, incorrect grammar.

+   /* XXX should we warn about "disappearing" versions? */
+   if (current_version)
+   {
Something to do here?

+   /*
+* We now support versioning for the underlying collation library on
+* this system, or previous version is unknown.
+*/
+   if (!version || (strcmp(version, "") == 0 && strcmp(current_version,
+   "") != 0))
Strange diff format here.

+static char *
+index_check_collation_version(const ObjectAddress *otherObject,
+ const char *version,
+ void *userdata)
All the new functions in index.c should have more documentation and
comments to explain what they do.

+   foreach(lc, collations)
+   {
+   ObjectAddress referenced;
+
+   ObjectAddressSet(referenced, CollationRelationId, lfirst_oid(lc));
+
+   recordMultipleDependencies(myself, , 1,
+  DEPENDENCY_NORMAL, record_version);
+   }
I think that you could just use an array of ObjectAddresses here, fill
in a set of ObjectAddress objects and just call
recordMultipleDependencies() for all of them?  Just create a set using
new_object_addresses(), register them with add_exact_object_address(),
and then finish the job with record_object_address_dependencies().
--
Michael


signature.asc
Description: PGP signature


Re: jsonb, collection & postgres_fdw

2020-08-14 Thread Konstantin Knizhnik




On 14.08.2020 09:40, Bharath Rupireddy wrote:

On Thu, Aug 13, 2020 at 8:54 PM Konstantin Knizhnik
 wrote:

Right now jsonb functions are treated as non-shippable by postgres_fdw
and so predicates with them are not pushed down to foreign server:

I wonder if there is some way of making postgres_fdw to push this this
function to foreign server?
May be this check should be changed to:

  if (fe->inputcollid == InvalidOid || inner_cxt.state ==
FDW_COLLATE_NONE)
   /* OK, inputs are all noncollatable */ ;


I think, in general, we may want to push the some of the local
functions that may filter out tuples/rows to remote backend to reduce
the data transfer(assuming collation and other settings are similar to
that of the local backend), but definitely, not this way. One possible
issue could be that, what if these functions are supported/installed
on the local server, but not on the remote? May be because the remote
postgres server version is different than that of the local? Is there
a version check between local and remote servers in postgres_fdw?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Right now postgres_fdw treat as shippable only builtin functions or 
functions from extensions explicitly specified as shippable extensions 
in parameters of this FDW server. So I do no see a problem here. Yes, 
foreign server may have different version of Postgres which doesn't have
this built-in function or its  profile is different. It can happen if 
postgres_fdw is used to connect two different servers which are 
maintained independently. But in most cases I think, postgres_fdw is 
used to organize some kind of cluster. In this case all nodes are 
identical (hardware, OS, postgres version) and performance is very 
critical (because scalability - of one of the goal of replacing single 
node with cluster).

This is why push down of predicates is very critical in this case.

I still do not completely understand current criteria of shippable 
functions.

I understood Tom's explanation, but:

postgres=# create table t1(t text collate "C");
CREATE TABLE
postgres=# create foreign table ft1(t text collate "ru_RU") server 
pg_fdw options (table_name 't1');

CREATE FOREIGN TABLE
postgres=# explain select * from ft1 where lower(t)='some';
 QUERY PLAN

 Foreign Scan on ft1  (cost=100.00..132.07 rows=7 width=32)
(1 row)

lower(t) is pushed to remote server despite to the fact that "t" has 
different collations at local and remote servers.

Also when initialize postgres database, you can specify default collation.
I have not found any place in postgres_fdw which tries to check if 
default collation of remote and local servers are the same

or specify collation explicitly when them are different.

From my point of view, it will be nice to have flag in postgres_fdw 
server indicating that foreign and remote servers are identical
and treat all functions as shippable in this case (not only built-in 
ones are belonging to explicitly specified shippable extensions).

It will simplify using postres_fdw in clusters and makes it more efficient.






Re: jsonb, collection & postgres_fdw

2020-08-14 Thread Bharath Rupireddy
On Thu, Aug 13, 2020 at 8:54 PM Konstantin Knizhnik
 wrote:
>
> Right now jsonb functions are treated as non-shippable by postgres_fdw
> and so predicates with them are not pushed down to foreign server:
>
> I wonder if there is some way of making postgres_fdw to push this this
> function to foreign server?
> May be this check should be changed to:
>
>  if (fe->inputcollid == InvalidOid || inner_cxt.state ==
> FDW_COLLATE_NONE)
>   /* OK, inputs are all noncollatable */ ;
>

I think, in general, we may want to push the some of the local
functions that may filter out tuples/rows to remote backend to reduce
the data transfer(assuming collation and other settings are similar to
that of the local backend), but definitely, not this way. One possible
issue could be that, what if these functions are supported/installed
on the local server, but not on the remote? May be because the remote
postgres server version is different than that of the local? Is there
a version check between local and remote servers in postgres_fdw?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Terminate the idle sessions

2020-08-14 Thread Bharath Rupireddy
On Tue, Aug 11, 2020 at 8:45 AM Li Japin  wrote:
>
> I’ve attached a new version that add “idle_session_timeout” in the default 
> postgresql.conf.
>

Hi, I would like to just mention a use case I thought of while discussing [1]:

In postgres_fdw: assuming we use idle_in_session_timeout on remote
backends,  the remote sessions will be closed after timeout, but the
locally cached connection cache entries still exist and become stale.
The subsequent queries that may use the cached connections will fail,
of course these subsequent queries can retry the connections only at
the beginning of a remote txn but not in the middle of a remote txn,
as being discussed in [2]. For instance, in a long running local txn,
let say we used a remote connection at the beginning of the local
txn(note that it will open a remote session and it's entry is cached
in local connection cache), only we use the cached connection later at
some point in the local txn, by then let say the
idle_in_session_timeout has happened on the remote backend and the
remote session would have been closed, the long running local txn will
fail instead of succeeding.

I think, since the idle_session_timeout is by default disabled, we
have no problem. My thought is what if a user enables the
feature(knowingly or unknowingly) on the remote backend? If the user
knows about the above scenario, that may be fine. On the other hand,
either we can always the feature on the remote backend(at the
beginning of the remote txn, like we set for some other configuration
settings see - configure_remote_session() in connection.c) or how
about mentioning the above scenario in this feature documentation?

[1] - 
https://www.postgresql.org/message-id/CALj2ACU1NBQo9mihA15dFf6udkOi7m0u2_s5QJ6dzk%3DZQyVbwQ%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/flat/CALj2ACUAi23vf1WiHNar_LksM9EDOWXcbHCo-fD4Mbr1d%3D78YQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-14 Thread Amit Kapila
On Fri, Aug 14, 2020 at 10:11 AM Thomas Munro  wrote:
>
> On Thu, Aug 13, 2020 at 6:38 PM Amit Kapila  wrote:
> > I have pushed that patch last week and attached are the remaining
> > patches. I have made a few changes in the next patch
> > 0001-Extend-the-BufFile-interface.patch and have some comments on it
> > which are as below:
>
> Hi Amit,
>
> I noticed that Konstantin Knizhnik's CF entry 2386 calls
> table_scan_XXX() functions from an extension, namely
> contrib/auto_explain, and started failing to build on Windows after
> commit 7259736a.  This seems to be due to the new global variables
> CheckXidAlive and bsysscan, which probably need PGDLLIMPORT if they
> are accessed from inline functions that are part of the API that we
> expect extensions to be allowed to call.
>

Yeah, that makes sense. I will take care of that later today or
tomorrow. We have not noticed that because currently none of the
extensions is using those functions. BTW, I noticed that after
failure, the next run is green, why so? Is the next run not on
windows?

-- 
With Regards,
Amit Kapila.




Re: Parallel query hangs after a smart shutdown is issued

2020-08-14 Thread Thomas Munro
On Fri, Aug 14, 2020 at 4:45 AM Tom Lane  wrote:
> I wrote:
> > Hmmm ... maybe that should be more like
> > if (smartShutState != SMART_NORMAL_USAGE &&
> > backend_type == BACKEND_TYPE_NORMAL)
>
> After some more rethinking and testing, here's a v5 that feels
> fairly final to me.  I realized that the logic in canAcceptConnections
> was kind of backwards: it's better to check the main pmState restrictions
> first and then the smart-shutdown restrictions afterwards.

LGTM.  I tested this a bit today and it did what I expected for
parallel queries and vacuum, on primary and standby.

> I'm assuming we want to back-patch this as far as 9.6, where parallel
> query began to be a thing.

Yeah.  I mean, it's more radical than what I thought we'd be doing for
this, but you could get into real trouble by running in smart shutdown
mode without the autovac infrastructure alive.