Collation versioning

2018-09-04 Thread Thomas Munro
Hello,

While reviewing the ICU versioning work a while back, I mentioned the
idea of using a user-supplied command to get a collversion string for
the libc collation provider.  I was reminded about that by recent news
about an upcoming glibc/CLDR resync that is likely to affect
PostgreSQL users (though, I guess, probably only when they do a major
OS upgrade).  Here's an experimental patch to try that idea out.  For
example, you might set it like this:

libc_collation_version_command = 'md5
/usr/share/locale/@LC_COLLATE@/LC_COLLATE | sed "s/.* = //"'

... or, on a Debian system using the locales package, like this:

libc_collation_version_command = 'dpkg -s locales | grep Version: |
sed "s/Version: //"'

Using the checksum approach, it works like this:

postgres=# alter collation "xx_XX" refresh version;
NOTICE:  changing version from b88d621596b7e61337e832f7841066a9 to
7b008442fbaf5dfe7a10fb3d82a634ab
ALTER COLLATION
postgres=# select * from pg_collation where collname = 'xx_XX';
-[ RECORD 1 ]-+-
collname  | xx_XX
collnamespace | 2200
collowner | 10
collprovider  | c
collencoding  | 6
collcollate   | en_US.UTF-8
collctype | UTF-8
collversion   | 7b008442fbaf5dfe7a10fb3d82a634ab

When the collation definition changes you get the desired scary
warning on next attempt to use it in a fresh backend:

postgres=# select * from t order by v;
WARNING:  collation "xx_XX" has version mismatch
DETAIL:  The collation in the database was created using version
b88d621596b7e61337e832f7841066a9, but the operating system provides
version 7b008442fbaf5dfe7a10fb3d82a634ab.
HINT:  Rebuild all objects affected by this collation and run ALTER
COLLATION public."xx_XX" REFRESH VERSION, or build PostgreSQL with the
right library version.

The problem is that it isn't in effect at initdb time so if you add
that later it only affects new locales.  You'd need a way to do that
during init to capture the imported system locale versions, and that's
a really ugly string to have to pass into some initdb option.  Ugh.

Another approach would be to decide that we're willing to put
non-portable version extracting magic in pg_locale.c.   On a long
flight I hacked my libc to store a version string (based on CLDR
version or whatever) in its binary locale definitions and provide a
proper interface to ask for it, modelled on querylocale(3):

const char *querylocaleversion(int mask, locale_t locale);

Then the patch for pg_locale.c is trivial, see attached.  While I
could conceivably try to convince my local friendly OS to take such a
patch, the real question is how to deal with glibc.  Does anyone know
of a way to extract a version string from glibc using existing
interfaces?  I heard there was an undocumented way but I haven't been
able to find it -- probably because I was, erm, looking in the
documentation.

Or maybe this isn't worth bothering with, and we should just build out
the ICU support and then make it the default and be done with it.

In passing, here's a patch to add tab completion for ALTER COLLATION
... REFRESH VERSION.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Auto-complete-ALTER-COLLATION-.-REFRESH-VERSION.patch
Description: Binary data


0001-Add-libc_collation_version_command-GUC.patch
Description: Binary data


0001-Add-libc_collation_version_command-GUC.patch
Description: Binary data


Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 >> However, this commit broke float8 test on 32-bit FreeBSD 11 with
 >> clang 3.8.0 compiler. Regressions.diff follows:

 Andres> Does this happen with a newer clang version too?

float8 test (and all other tests) passes for me on clang 3.9.1 on fbsd11
on 32-bit ARM, and on -m32 builds on amd64.

I also confirmed that without #define isinf(x) __builtin_isinf(x), on
both 32bit and 64bit fbsd isinf() compiles as a function call, so the
OP's proposed change would not be desirable.

-- 
Andrew (irc:RhodiumToad)



Re: pointless check in RelationBuildPartitionDesc

2018-09-04 Thread Alvaro Herrera
Proposed patch.  Checking isnull in a elog(ERROR) is important, because
the column is not marked NOT NULL.  This is not true for other columns
where we simply do Assert(!isnull).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit dc5f9933595c95471898e506e004a1799ea4eae5
Author: Alvaro Herrera 
AuthorDate: Tue Sep 4 13:24:25 2018 -0300
CommitDate: Tue Sep 4 13:27:30 2018 -0300

fix relpartbound accesses

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f46af41b56..ffdd14b819 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -773,9 +773,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,

  InvalidOid,

  typaddress);
 
-   /* Store inheritance information for new rel. */
-   StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != 
NULL);
-
/*
 * We must bump the command counter to make the newly-created relation
 * tuple visible for opening.
@@ -869,6 +866,9 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
heap_close(parent, NoLock);
}
 
+   /* Store inheritance information for new rel. */
+   StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != 
NULL);
+
/*
 * Process the partitioning specification (if any) and store the 
partition
 * key information into the catalog.
@@ -14705,7 +14705,9 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 
(void) SysCacheGetAttr(RELOID, tuple, Anum_pg_class_relpartbound,
   );
-   Assert(!isnull);
+   if (isnull)
+   elog(ERROR, "null relpartbound for relation %u",
+RelationGetRelid(partRel));
 
/* Clear relpartbound and reset relispartition */
memset(new_val, 0, sizeof(new_val));
diff --git a/src/backend/partitioning/partbounds.c 
b/src/backend/partitioning/partbounds.c
index 9015a05d32..4ed9920618 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -1641,8 +1641,9 @@ get_qual_for_range(Relation parent, PartitionBoundSpec 
*spec,
datum = SysCacheGetAttr(RELOID, tuple,

Anum_pg_class_relpartbound,

);
+   if (isnull)
+   elog(ERROR, "null relpartbound for relation 
%u", inhrelid);
 
-   Assert(!isnull);
bspec = (PartitionBoundSpec *)
stringToNode(TextDatumGetCString(datum));
if (!IsA(bspec, PartitionBoundSpec))
diff --git a/src/backend/utils/cache/partcache.c 
b/src/backend/utils/cache/partcache.c
index 115a9fe78f..2fc876b7d8 100644
--- a/src/backend/utils/cache/partcache.c
+++ b/src/backend/utils/cache/partcache.c
@@ -302,23 +302,11 @@ RelationBuildPartitionDesc(Relation rel)
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for relation %u", 
inhrelid);
 
-   /*
-* It is possible that the pg_class tuple of a partition has 
not been
-* updated yet to set its relpartbound field.  The only case 
where
-* this happens is when we open the parent relation to check 
using its
-* partition descriptor that a new partition's bound does not 
overlap
-* some existing partition.
-*/
-   if (!((Form_pg_class) GETSTRUCT(tuple))->relispartition)
-   {
-   ReleaseSysCache(tuple);
-   continue;
-   }
-
datum = SysCacheGetAttr(RELOID, tuple,

Anum_pg_class_relpartbound,
);
-   Assert(!isnull);
+   if (isnull)
+   elog(ERROR, "null relpartbound for partition %u" 
inhrelid);
boundspec = (Node *) stringToNode(TextDatumGetCString(datum));
 
/*
@@ -883,9 +871,8 @@ generate_partition_qual(Relation rel)
boundDatum = SysCacheGetAttr(RELOID, tuple,
 
Anum_pg_class_relpartbound,
 );
-   if (isnull) /* should not happen */
-   elog(ERROR, "relation \"%s\" has relpartbound = null",
-

Re: Collation versioning

2018-09-04 Thread Thomas Munro
On Tue, Sep 4, 2018 at 10:02 AM Thomas Munro
 wrote:
> const char *querylocaleversion(int mask, locale_t locale);
>
> Then the patch for pg_locale.c is trivial, see attached.

Oops, here's that one, FWIW.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Use-querylocaleversion-3-if-available.patch
Description: Binary data


Re: Add SKIP LOCKED to VACUUM and ANALYZE

2018-09-04 Thread Michael Paquier
On Tue, Sep 04, 2018 at 03:49:09PM +, Bossart, Nathan wrote:
> Yes.  I've started working on this again, but the new patch set is
> probably still a few days out.

Thanks, Nathan.
--
Michael


signature.asc
Description: PGP signature


Re: PostgreSQL logical decoder output plugin - unchanged toast data

2018-09-04 Thread Andres Freund
Hi,

On 2018-09-04 11:45:10 +0200, Georgy Buranov wrote:
> Hi Andres. Thank you very much for your help.
> 
> I tried the following solution and I have some problems.
> 
> * I have 9.6 postgres and I do not have separate field for rd_pkindex
> * All I have is rd_replidindex field. Usually (when REPLICA IDENTITY
> is NOT FULL), it still contains the primary key
> * But in the case of REPLICA IDENTITY FULL - rd_replidindex is NULL
> and rd_pkindex does not exist
> 
> So, is there a way to get the primary key on 9.6 postgres?

You have to to query the syscache yourself, in that case.  I know
nothing about what you're doing in your output plugin.  In the ones I
wrote there's a side-cache for per-relation output plugin specific data,
do you have something like that?  You can probably just crib the code
for the lookup from dblink.c:get_pkey_attnames().  If you have a cache,
itmight be worth caching the results.

Greetings,

Andres Freund



Re: Enable using IS NOT DISTINCT FROM in hash and merge joins

2018-09-04 Thread Alexander Kuzmenkov

Hi,

I heard from my colleagues that we have an extension that does a similar 
thing. I'm attaching the source. It creates operator "==" for some data 
types, that works like "IS NOT DISTINCT FROM". You can then perform hash 
joins on this operator. Apparently the hash join machinery supports 
non-strict operators, but I'm not sure about the hash indexes.


I think the question we have to answer is what prevents us from having 
hash and merge joins on non-strict operators? Hash joins seem to work 
already, so we can just create a custom operator and use it. Merge join 
executor can be adapted to work as well, but the planner would require 
more complex changes. Just adding a check for DistinctExpr to 
check_mergejoinable probably breaks equivalence classes. The problem 
with merge join planning is that it has the notion of "mergejoinable 
operator", which is a strict btree equality operator, and it uses such 
operators both to perform merge joins and to conclude that some two 
variables must be equal (that is, create equivalence classes). If we are 
going to perform merge joins on some other kinds of operators, we have 
to disentangle these two uses. I had to do this to support merge joins 
on inequality clauses, you can take a look at this thread if you wish: 
https://www.postgresql.org/message-id/flat/b31e1a2d-5ed2-cbca-649e-136f1a7c4c31%40postgrespro.ru 



--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



fulleq.7z
Description: application/7z-compressed


Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Thomas Munro
On Tue, Sep 4, 2018 at 9:39 AM Andrew Gierth
 wrote:
>
> > "Andres" == Andres Freund  writes:
>
>  >> However, this commit broke float8 test on 32-bit FreeBSD 11 with
>  >> clang 3.8.0 compiler. Regressions.diff follows:
>
>  Andres> Does this happen with a newer clang version too?
>
> float8 test (and all other tests) passes for me on clang 3.9.1 on fbsd11
> on 32-bit ARM, and on -m32 builds on amd64.
>
> I also confirmed that without #define isinf(x) __builtin_isinf(x), on
> both 32bit and 64bit fbsd isinf() compiles as a function call, so the
> OP's proposed change would not be desirable.

I installed FreeBSD 11.2 i386 on a virtual machine.  I couldn't
reproduce the problem with either the base cc (clang 6.0.0) or clang38
(clang 3.8.1) installed via pkg.

The OP reported clang 3.8.0, so a minor version behind what I tested.

I did learn that "make check" fails in rolenames if your Unix user is
called "user".

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Incorrect use of errcode_for_file_access in backend code

2018-09-04 Thread Michael Paquier
On Sun, Sep 02, 2018 at 01:07:47PM -0700, Michael Paquier wrote:
> Wouldn't something like the attached provide more adapted error
> handling?  That's mostly error handling beautification, so I would be
> incline to just fix HEAD.
> 
> (I have noticed some inconsistent error string format in autoprewarm.c
> on the way.)

Pushed as d6e98eb as that will reduce a bit more translator's work,
after noticing one error string in basebackup.c which could be unified
with the rest.
--
Michael


signature.asc
Description: PGP signature


Re: patch to allow disable of WAL recycling

2018-09-04 Thread Tomas Vondra
Hi,

So here is the last set of benchmark results, this time from ext4 on a
small SATA-based RAID (3 x 7.2k). As before, I'm only attaching PDFs
with the simple charts, full results are available in the git repository
[1]. Overall the numbers are rather boring, with almost no difference
between the two setups.

That being said, I'm not opposed to introducing the GUC. I'm not going
to pretend my tests represents all possible HW configs and workloads,
and I have no trouble believing that it may be quite beneficial in some
cases.

The one comment about the code is that we usually use the actual default
value in the config sample. But the patch does this:

+#wal_recycle = off # do not recycle WAL files

while the GUC is defined like this:

{
{"wal_recycle", PGC_SUSET, WAL_SETTINGS,
gettext_noop("WAL recycling enabled."),
NULL
},
_recycle,
true,
NULL, NULL, NULL
},

So the default is actually "on" which makes the commented-out config
sample rather confusing.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


ext4-sata.pdf
Description: Adobe PDF document


Re: speeding up planning with partitions

2018-09-04 Thread David Rowley
On 30 August 2018 at 00:06, Amit Langote  wrote:
> With various overheads gone thanks to 0001 and 0002, locking of all
> partitions via find_all_inheritos can be seen as the single largest
> bottleneck, which 0003 tries to address.  I've kept it a separate patch,
> because I'll need to think a bit more to say that it's actually to safe to
> defer locking to late planning, due mainly to the concern about the change
> in the order of locking from the current method.  I'm attaching it here,
> because I also want to show the performance improvement we can expect with it.

For now, find_all_inheritors() locks the tables in ascending Oid
order.  This makes sense with inheritance parent tables as it's much
cheaper to sort on this rather than on something like the table's
namespace and name.  I see no reason why what we sort on really
matters, as long as we always sort on the same thing and the key we
sort on is always unique so that the locking order is well defined.

For partitioned tables, there's really not much sense in sticking to
the same lock by Oid order. The order of the PartitionDesc is already
well defined so I don't see any reason why we can't just perform the
locking in PartitionDesc order. This would mean you could perform the
locking of the partitions once pruning is complete somewhere around
add_rel_partitions_to_query(). Also, doing this would remove the need
for scanning pg_inherits during find_all_inheritors() and would likely
further speed up the planning of queries to partitioned tables with
many partitions. I wrote a function named
get_partition_descendants_worker() to do this in patch 0002 in [1] (it
may have been a bad choice to have made this a separate function
rather than just part of find_all_inheritors() as it meant I had to
change a bit too much code in tablecmds.c). There might be something
you can salvage from that patch to help here.

[1] 
https://www.postgresql.org/message-id/CAKJS1f9QjUwQrio20Pi%3DyCHmnouf4z3SfN8sqXaAcwREG6k0zQ%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] git down

2018-09-04 Thread Tom Lane
Sandeep Thakkar  writes:
> I'm seeing this warning (git.postgresql.org[0: 2620:122:b000:7::243]:
> errno=Network is unreachable) on one of the buildfarm animals (anole) from
> last few weeks. The git version is 1.7.4.2. When I use https, it errors
> "fatal: cannot exec 'git-remote-https': Permission denied"

Well, the rest of the buildfarm seems fine, so the problem must be local.
Maybe the permissions on that file (git-remote-https) got messed up?

FWIW, 1.7.4.2 is pretty durn ancient.  Consider just installing a new git
revision instead of troubleshooting the one you've got.

regards, tom lane



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-09-04 Thread Dean Rasheed
On 3 September 2018 at 00:17, Tomas Vondra  wrote:
> Hi,
>
> Attached is an updated version of the patch series, adopting a couple of
> improvements - both for MCV lists and histograms.
>
>
> MCV
> ---
>
> For the MCV list part, I've adopted the approach proposed by Dean, using
> base selectivity and using it to correct the non-MCV part. I agree the
> simplicity of the approach is a nice feature, and it seems to produce
> better estimates. I'm not sure I understand the approach perfectly, but
> I've tried to add comments explaining how it works etc.
>

Cool. Looking at this afresh after some time away, it still looks like
a reasonable approach, and the test results are encouraging.

In mcv_clauselist_selectivity(), you raise the following question:

if (matches[i] != STATS_MATCH_NONE)
{
/* XXX Shouldn't the basesel be outside the if condition? */
*basesel += mcv->items[i]->base_frequency;
s += mcv->items[i]->frequency;
}

The reason it needs to be inside the "if" block is that what it's
computing is the base (independent) selectivity of the clauses found
to match the MCV stats, so that then in
statext_clauselist_selectivity() it can be used in the following
computation:

/* Estimated selectivity of values not covered by MCV matches */
other_sel = simple_sel - mcv_basesel;

to give an estimate for the other clauses that aren't covered by the
MCV stats. So I think the code is correct as it stands, but if you
think it isn't clear enough, maybe a comment update is in order.

The assumption being made is that mcv_basesel will cancel out the part
of simple_sel that is due to clauses matching the MCV stats, so that
we can then just add on the MCV selectivity. Of course that's only an
approximation, and it won't be exact -- partly due to the fact that
simple_sel and mcv_basesel are potentially computed using different
sample rows, and so will differ in the MCV region, and partly because
of second-order effects arising from the way that selectivities are
combined in clauselist_selectivity_simple(). Maybe that's something
that can be improved upon, but it doesn't seem like a bad initial
approximation.


> I've also changed how we build the MCV lists, particularly how we decide
> how many / which items store in the MCV list. In the previous version
> I've adopted the same algorithm we use for per-column MCV lists, but in
> certain cases that turned out to be too restrictive.
>
> Consider for example a table with multiple perfectly correlated columns,
> with very few combinations. That is, something like this:
>
> CREATE TABLE t (a int, b int);
>
> INSERT INTO t SELECT mod(i,50), mod(i,50)
>   FROM generate_series(1,1e6) s(i);
>
> CREATE STATISTICS s (mcv) ON a,b FROM t;
>
> Now, the data distribution is very simple - uniform, with 50 distinct
> combinations, each representing 2% of data (and the random sample should
> be pretty close to that).
>
> In these cases, analyze_mcv_list decides it does not need any MCV list,
> because the frequency for each value is pretty much 1/ndistinct. For
> single column that's reasonable, but for multiple correlated columns
> it's rather problematic. We might use the same ndistinct approach
> (assuming we have the ndistinct coefficients), but that still does not
> allow us to decide which combinations are "valid" with respect to the
> data. For example we can't decide (1,10) does not appear in the data.
>
> So I'm not entirely sure adopting the same algorithm analyze_mcv_list
> algorithm both for single-column and multi-column stats. It may make
> sense to keep more items in the multi-column case for reasons that are
> not really valid for a single single-column.
>
> For now I've added a trivial condition to simply keep all the groups
> when possible. This probably needs more thought.
>

Ah, this is a good point. I think I see the problem here.

analyze_mcv_list() works by keeping those MCV entries that are
statistically significantly more frequent than the selectivity that
would have otherwise been assigned to the values, which is based on
ndistinct and nullfrac. That's not really right for multivariate stats
though, because the selectivity that would be assigned to a
multi-column value if it weren't in the multivariate MCV list is
actually calculated using the product of individual column
selectivities. Fortunately we now calculate this (base_frequency), so
actually I think what's needed is a custom version of
analyze_mcv_list() that keeps MCV entries if the observed frequency is
statistically significantly larger than the base frequency, not the
ndistinct-based frequency.

It might also be worthwhile doing a little more work to make the
base_frequency values more consistent with the way individual column
selectivities are actually calculated -- currently the patch always
uses the observed single-column frequencies to calculate the base
frequencies, but actually the univariate stats would only do that for
a subset of the 

Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> I kinda wonder if we should add -mno-x87 or such in configure when
 >> we detect clang, obviously it doesn't deal correctly with this.

 Tom> Seems worth looking into, but what happens if someone tries to
 Tom> compile for x87 hardware? Or do we care anymore?

Already discussed this one on IRC with Andres, but to put this on record
for future reference: we can't use -mno-x87 on 32bit intel, even with an
-march= option with an SSE2 capable CPU, because the 32-bit ABI requires
floats to be returned in the x87 registers and breaking that either
results in silently wrong results or in clang dying with "fatal error:
error in backend: X87 register return with X87 disabled" or similar.

-- 
Andrew (irc:RhodiumToad)



Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-04 Thread Michael Paquier
On Tue, Sep 04, 2018 at 03:09:21PM -0700, Jimmy Yih wrote:
> When an empty namespace is being initially populated with certain objects,
> it is possible for a DROP SCHEMA operation to come in and delete the
> namespace without using CASCADE.  These objects would be created but are
> left unusable.  This is capable of leaving behind pg_class, pg_type, and
> other such entries with invalid namespace values that need to be manually
> removed by the user.  To prevent this from happening, we should take an
> AccessShareLock on the namespace of which the type, function, etc. is being
> added to.
> 
> Attached is a patch that covers the following CREATE commands:
> CREATE TYPE, CREATE FUNCTION, CREATE AGGREGATE, CREATE OPERATOR FAMILY,
> CREATE OPERATOR CLASS, and CREATE OPERATOR
>
> [...]
>
> The patch itself is relatively trivial by just adding copy/paste lock
> acquires near other lock acquires.  I wasn't sure if this needed a decently
> sized refactor such as the referenced commit above.

It seems to me that we are missing some dependency tracking in some of
those cases.  For example if you use a CREATE TYPE AS (...), then the
DROP SCHEMA would fail without specifying CASCADE, and session 2 would
block without a CASCADE.  So instead of adding more code paths where
LockDatabaseObject() is used, I would expect session 2 to block in
get_object_address all the time, which looks a bit lossy now by the
way.

Something which would be good to have for all those queries is a set of
isolation tests.  No need for multiple specs, you could just use one
spec with one session defining all the object types you would like to
work on.  How did you find this object list?  Did you test all the
objects available manually?

Please let me play with what you have here a bit..  If you have an
isolation test spec at hand, that would be helpful.
--
Michael


signature.asc
Description: PGP signature


Re: pointless check in RelationBuildPartitionDesc

2018-09-04 Thread Amit Langote
On 2018/09/05 1:50, Alvaro Herrera wrote:
> Proposed patch.  Checking isnull in a elog(ERROR) is important, because
> the column is not marked NOT NULL.  This is not true for other columns
> where we simply do Assert(!isnull).

Looks good.  Thanks for taking care of other sites as well.

@@ -14705,7 +14705,9 @@ ATExecDetachPartition(Relation rel, RangeVar *name)

(void) SysCacheGetAttr(RELOID, tuple, Anum_pg_class_relpartbound,
   );
-   Assert(!isnull);
+   if (isnull)
+   elog(ERROR, "null relpartbound for relation %u",
+RelationGetRelid(partRel));

In retrospect, I'm not sure why this piece of code is here at all; maybe
just remove the SycCacheGetAttr and Assert?

Regards,
Amit




Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andres Freund
On 2018-09-05 01:47:52 +0100, Andrew Gierth wrote:
> > "Tom" == Tom Lane  writes:
> 
>  >> I kinda wonder if we should add -mno-x87 or such in configure when
>  >> we detect clang, obviously it doesn't deal correctly with this.
> 
>  Tom> Seems worth looking into, but what happens if someone tries to
>  Tom> compile for x87 hardware? Or do we care anymore?
> 
> Already discussed this one on IRC with Andres, but to put this on record
> for future reference: we can't use -mno-x87 on 32bit intel, even with an
> -march= option with an SSE2 capable CPU, because the 32-bit ABI requires
> floats to be returned in the x87 registers and breaking that either
> results in silently wrong results or in clang dying with "fatal error:
> error in backend: X87 register return with X87 disabled" or similar.

My current proposal is thus to do add a check that does
#if defined(__clang__) && defined(__i386__) && !defined(__SSE2_MATH__)
something-that-fails
#endif
in an autoconf test, and have configure complain if that
fails. Something roughly along the lines of
"Compiling PostgreSQL with clang, on 32bit x86, requires SSE2 support. Use 
-msse2 or use gcc."

Greetings,

Andres Freund



Re: pg_verify_checksums failure with hash indexes

2018-09-04 Thread Amit Kapila
On Tue, Sep 4, 2018 at 1:42 PM Dilip Kumar  wrote:
>
> I have tested pg_upgrade with different block size (1K, 4K, 8K, 32K).
> The upgrade is working fine from v10 to v11 and I am able to fetch
> data with index scan on the hash index after an upgrade.
>

Thanks,  do you see any way to write a test for this patch?  AFAICS,
there is no existing test for a different block size and not sure if
there is an easy way to write one.  I feel it is not a bad idea if we
have some tests for different block sizes.  Recently, during zheap
development, we found that we have introduced a bug for a non-default
block size and we can't find that because we don't have any test for
it and the same happens here.

Does anybody else have any idea on how can we write a test for
non-default block size or if we already have anything similar?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: pointless check in RelationBuildPartitionDesc

2018-09-04 Thread Amit Langote
On 2018/09/04 21:51, Alvaro Herrera wrote:
> On 2018-Sep-04, Amit Langote wrote:
> 
>> On 2018/09/04 13:08, Alvaro Herrera wrote:
> 
>>> I think it'd be pointless noise.  If we really want to protect against
>>> that, I think we should promote the Assert for relpartbound's isnull
>>> flag into an if test.
>>
>> So that we can elog(ERROR, ...) if isnull is true?  If so, I agree,
>> because that's what should've been done here anyway (for a value read from
>> the disk that is).
> 
> Yes.
> 
> I wonder now what's the value of using an Assert for this, BTW: if those
> are enabled, then we'll crash saying "this column is null and shouldn't!".
> If they are disabled, we'll instead crash (possibly in production)
> trying to decode the field.  What was achieved?

I can see how that the Assert was pointless all along.

> With an elog(ERROR) the reaction is the expected one: the current
> transaction is aborted, but the server continues to run.
> 
>> I think we should check relispartition then too.
> 
> Well, we don't check every single catalog flag in every possible code
> path just to ensure it's sane.  I don't see any reason to do differently
> here.  We rely heavily on the catalog contents to be correct, and
> install defenses against user-induced corruption that would cause us to
> crash; but going much further than that would be excessive IMO.

Okay, sure anyway.

Thanks,
Amit




Re: test_pg_dump missing cleanup actions

2018-09-04 Thread Michael Paquier
On Tue, Sep 04, 2018 at 06:02:51PM -0400, Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Michael Paquier  writes:
>>> While hacking another patch, I have noticed that triggerring multiple
>>> times in a row installcheck on test_pg_dump results in a failure because
>>> it is missing clean up actions on the role regress_dump_test_role.
>>> Roles are shared objects, so I think that we ought to not let traces of
>>> it when doing any regression tests on a running instance.
>> 
>>> Attached is a patch to clean up things.
>> 
>> I'm confused.  Isn't the point of that script exactly to create a modified
>> extension for testing pg_dump with?

Not really, the regression tests run for pg_regress and the TAP test
suite are two completely isolated things and share no dependencies.
e54f757 has actually changed test_pg_dump.sql so as it adds regression
tests for pg_init_privs for ALTER EXTENSION ADD/DROP, and visibly those
have been added to test_pg_dump because they were easier to add there,
and this has no interactions with pg_dump.  What I think should have
been done initially is to add those new tests in test_extensions
instead.

>> What I'd do is leave the final state as-is and add a "drop role if exists"
>> at the start, similar to what some of the core regression tests do.
> 
> I've not followed this thread but based on Tom's response, I agree with
> his suggestion of what to do here.

Not as far as I can see..  Please note that using installcheck on the
main regression test suite does not leave around any extra roles.  I can
understand the role of having a DROP ROLE IF EXISTS though: if you get a
crash while testing, then the beginning of the tests are repeatable, so
independently of the root issue Tom's suggestion makes sense to me.
--
Michael


signature.asc
Description: PGP signature


Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andres Freund
On 2018-09-04 19:02:02 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > ... Note that we've previously encountered similar issues
> > on gcc, which is why we've tried to force gcc's hand with
> > -fexcess-precision=standard.
> 
> Right.  Annoying that clang doesn't have that.  We can't realistically
> program around an issue that might or might not show up depending on the
> whims of the compiler's register allocation.

Right.


> > I kinda wonder if we should add -mno-x87 or such in configure when we
> > detect clang, obviously it doesn't deal correctly with this.
> 
> Seems worth looking into, but what happens if someone tries to compile
> for x87 hardware?  Or do we care anymore?

It generates linker errors:

clang-8 -std=gnu99 -march=i386 -O2 -m32 -mno-x87 ~/tmp/flttst.c -o ~/tmp/flttst 
&& ~/tmp/flttst
/usr/bin/ld: /tmp/flttst-ba93f5.o: in function `main':
flttst.c:(.text+0x37): undefined reference to `__muldf3'
/usr/bin/ld: flttst.c:(.text+0x67): undefined reference to `__gedf2'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

I think we only should add the flag when detecting clang, so a user with
just a < pentium-4 could resort to gcc.  Although honestly, I don't
think that's really something usefully to cater for.

The one hangup I have with this is that clang on *bsd defaults to i486
(and thus x87) when compiling 32bit binaries. That's obviously
*terrible* for performance, but it'd still be annoying for some users.
While I don't immediately know how, that seems like an argument for
doing this in configure, and suggesting a better compiler flag.

Greetings,

Andres Freund



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Victor Wagner
В Tue, 04 Sep 2018 15:48:24 -0400
Tom Lane  пишет:


> I tried to reproduce the problem, without success, on a few different
> FreeBSD images I had laying about:
> 
> FreeBSD 11.0/x86_64, clang version 3.8.0
>   (this confirms OP's report that x86_64 is OK)
> 
> FreeBSD 10.3/ppc, gcc 4.2.1
> 
> FreeBSD 12.0-CURRENT (from around mid-May)/arm64, clang version 6.0.0
> 
> (I was testing PG HEAD, not the 11 branch, but I don't see a reason
> to think that'd make a difference.)

Alas, it does. First thing I've done after discovering this bug, it is
to look if it exists in master. And master passes this test on the same
machine where problem was discovered.


> I also looked for evidence of a bug of this sort in the clang
> bugzilla. I couldn't find anything, but it's possible that "isinf"
> isn't what I should have searched on.
> 
> Anyway, my estimation is that this is a compiler bug that's been
> repaired, and it probably isn't widespread enough to justify our
> inserting some klugy workaround.

It doesn't look so, as bug persists after I've upgraded system to
current 11.2-RELEASE with clang 6.0.0.



-- 
   Victor Wagner 



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> Thomas and I are sitting in a cafe and are trying to figure out
 Andres> what's going on...

I have a standalone test case:

#include 
#include 

int main(int argc, char **argv)
{
double d1 = (argc ? 1e180 : 0);
double d2 = (argv ? 1e200 : 0);
int r2 = __builtin_isinf(d1 * d2);
int r1 = isinf(d1 * d2);
printf("r1 = %d, r2 = %d\n", r1, r2);
return 0;
}

Note that swapping the r1 and r2 lines makes the problem disappear (!).

on amd64, clang 3.9.1:

cc -O2 -m32 flttst.c && ./a.out
r1 = 1, r2 = 0

cc -O2 flttst.c && ./a.out
r1 = 1, r2 = 1

Can't reproduce on 32-bit arm.

-- 
Andrew (irc:RhodiumToad)



Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-04 Thread Peter Geoghegan
On Tue, Sep 4, 2018 at 12:59 PM, R, Siva  wrote:
> We recently encountered an issue where the opaque data flags on a gin data
> leaf page was corrupted while replaying a gin insert WAL record. Upon
> further examination of the redo code, we found a bug in ginRedoRecompress
> code, which extracts the WAL information and updates the page.

I wonder how you managed to detect it in the first place. Were you
using something like wal_consistency_checking=all, with a custom
stress-test?

-- 
Peter Geoghegan



Re: test_pg_dump missing cleanup actions

2018-09-04 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Michael Paquier  writes:
> > While hacking another patch, I have noticed that triggerring multiple
> > times in a row installcheck on test_pg_dump results in a failure because
> > it is missing clean up actions on the role regress_dump_test_role.
> > Roles are shared objects, so I think that we ought to not let traces of
> > it when doing any regression tests on a running instance.
> 
> > Attached is a patch to clean up things.
> 
> I'm confused.  Isn't the point of that script exactly to create a modified
> extension for testing pg_dump with?
> 
> What I'd do is leave the final state as-is and add a "drop role if exists"
> at the start, similar to what some of the core regression tests do.

I've not followed this thread but based on Tom's response, I agree with
his suggestion of what to do here.

Thanks!

Stephen


signature.asc
Description: PGP signature


Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-04 Thread Jimmy Yih
Hello,

When an empty namespace is being initially populated with certain objects,
it is possible for a DROP SCHEMA operation to come in and delete the
namespace without using CASCADE.  These objects would be created but are
left unusable.  This is capable of leaving behind pg_class, pg_type, and
other such entries with invalid namespace values that need to be manually
removed by the user.  To prevent this from happening, we should take an
AccessShareLock on the namespace of which the type, function, etc. is being
added to.

Attached is a patch that covers the following CREATE commands:
CREATE TYPE, CREATE FUNCTION, CREATE AGGREGATE, CREATE OPERATOR FAMILY,
CREATE OPERATOR CLASS, and CREATE OPERATOR

Example Reproduction
Session1: CREATE SCHEMA testschema;
Session1: BEGIN;
Session1: CREATE TYPE testschema.testtype;
Session2: DROP SCHEMA testschema;
Session1: COMMIT;
// The DROP SCHEMA schema succeeds in session 2 and session 1's COMMIT goes
through without error but the pg_type entry will still be there for typname
testtype.  This example is for just a simple TYPE object but the other
objects can be reproduced the same way in place of CREATE TYPE.

Related Postgres 9.2 commit (note in commit message that "We need similar
protection for all other object types"):
https://github.com/postgres/postgres/commit/1575fbcb795fc331f46588b4520c4bca7e854d5c

The patch itself is relatively trivial by just adding copy/paste lock
acquires near other lock acquires.  I wasn't sure if this needed a decently
sized refactor such as the referenced commit above.

Regards,
Jimmy


concurrent_namespace_drop.patch
Description: Binary data


Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> The reason it behaves oddly is this: on i387 FPU (and NOT on arm32
 >> or on 32-bit i386 with a modern architecture specified to the
 >> compiler), the result of 1e200 * 1e180 is not in fact infinite,
 >> because it fits in an 80-bit long double. So __builtin_isinf reports
 >> that it is finite; but if it gets stored to memory as a double (e.g.
 >> to pass as a parameter to a function), it then becomes infinite.

 Tom> Ah-hah. Can we fix it by explicitly casting the argument of isinf
 Tom> to double?

No; the generated code doesn't change. Presumably the compiler regards
the value as already being of type "double", just one that happens to be
stored in a longer register.

-- 
Andrew (irc:RhodiumToad)



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andres Freund
On 2018-09-04 17:46:29 -0400, Tom Lane wrote:
> Andrew Gierth  writes:
> > The reason it behaves oddly is this: on i387 FPU (and NOT on arm32 or on
> > 32-bit i386 with a modern architecture specified to the compiler), the
> > result of 1e200 * 1e180 is not in fact infinite, because it fits in an
> > 80-bit long double. So __builtin_isinf reports that it is finite; but if
> > it gets stored to memory as a double (e.g. to pass as a parameter to a
> > function), it then becomes infinite.
> 
> Ah-hah.  Can we fix it by explicitly casting the argument of isinf
> to double?

No, that doesn't work, afaict the cast has vanished long before the
optimizer stage.  Note that we've previously encountered similar issues
on gcc, which is why we've tried to force gcc's hand with
-fexcess-precision=standard.

The apparent reason this doesn't fail on master is that there
check_float8_val() is an inline function, rather than a macro, but the
inline function doesn't get inlined (probably due to the number of
callsites). As the parameter passing convention doesn't do 80bit FPU
registers, the value is infinity by the time it gets to
__builtin_isinf().

I think this is pretty clearly a C99 violation by clang (note how gcc
automatically enables -fexcess-precision=standard in C99 mode).  I'll
report something, but I've little hope this getting fixed quickly - and
it'll definitely not get fixed for such old clang versions as the OP used.

While we obviously could just neuter the isinf() macro hackery in this
edge case, my concern is that it seems likely that further such issues
are hidden in code that doesn't immediately cause regression test
failure.

I kinda wonder if we should add -mno-x87 or such in configure when we
detect clang, obviously it doesn't deal correctly with this. SSE2 based
floating point math is far from new...  Or just error out in configure,
although that seems a bit harder.


I'm now looking at how it comes that clang on linux doesn't default to
x87 math, but freebsd does.


Greetings,

Andres Freund



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Victor Wagner
В Tue, 4 Sep 2018 11:06:56 -0700
Thomas Munro  пишет:

> On Tue, Sep 4, 2018 at 9:39 AM Andrew Gierth
>  wrote:
> >  
> > > "Andres" == Andres Freund  writes:  
> >  
> >  >> However, this commit broke float8 test on 32-bit FreeBSD 11 with
> >  >> clang 3.8.0 compiler. Regressions.diff follows:  
> >  
> >  Andres> Does this happen with a newer clang version too?  
> >
> > float8 test (and all other tests) passes for me on clang 3.9.1 on
> > fbsd11 on 32-bit ARM, and on -m32 builds on amd64.
> >
> > I also confirmed that without #define isinf(x) __builtin_isinf(x),
> > on both 32bit and 64bit fbsd isinf() compiles as a function call,
> > so the OP's proposed change would not be desirable.  
> 
> I installed FreeBSD 11.2 i386 on a virtual machine.  I couldn't
> reproduce the problem with either the base cc (clang 6.0.0) or clang38
> (clang 3.8.1) installed via pkg.

Do you reproducing problem in REL_11_STABLE? It doesn't exist in master.
Also may be it might depend on some configure options. I usually
compile postgres with as much --with-something enabled as possible

Ive put my cconfigure/build/check logs to 

https://wagner.pp.ru/~vitus/stuff/freebsd-i386-logs.tar.gz

Unfortunately there is a bit too much of them to attach to the list
message.

These logs are produced after I've upgraded my outdated system
to current 11.2 with clang 6.0.0. It haven't eliminated problem.

I can publish my KVM images both of old system (11.0 witth clang 3.8.0)
and new one (current 11.2 with clang 6.0.0)

> 
> The OP reported clang 3.8.0, so a minor version behind what I tested.
> 
> I did learn that "make check" fails in rolenames if your Unix user is
> called "user".
> 



-- 
   Victor Wagner 



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Tom Lane
Victor Wagner  writes:
> Tom Lane  пишет:
>> (I was testing PG HEAD, not the 11 branch, but I don't see a reason
>> to think that'd make a difference.)

> Alas, it does. First thing I've done after discovering this bug, it is
> to look if it exists in master. And master passes this test on the same
> machine where problem was discovered.

Oh really ... I'll go try again.  Now I agree with Andres, we need
to understand *exactly* what is failing here rather than guess.

My first thought was that the C99 changeover might explain the
difference.  But clang doesn't require any special switch to select
C99 mode, so in principle that should've made no difference to clang
builds.

regards, tom lane



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andres Freund
On 2018-09-04 16:13:45 -0400, Tom Lane wrote:
> Victor Wagner  writes:
> > Tom Lane  пишет:
> >> (I was testing PG HEAD, not the 11 branch, but I don't see a reason
> >> to think that'd make a difference.)
> 
> > Alas, it does. First thing I've done after discovering this bug, it is
> > to look if it exists in master. And master passes this test on the same
> > machine where problem was discovered.
> 
> Oh really ... I'll go try again.  Now I agree with Andres, we need
> to understand *exactly* what is failing here rather than guess.
> 
> My first thought was that the C99 changeover might explain the
> difference.  But clang doesn't require any special switch to select
> C99 mode, so in principle that should've made no difference to clang
> builds.

Thomas and I are sitting in a cafe and are trying to figure out what's
going on...

Greetings,

Andres Freund



Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-04 Thread Alexander Korotkov
Hi, Siva!

On Tue, Sep 4, 2018 at 11:01 PM R, Siva  wrote:
> We recently encountered an issue where the opaque data flags on a gin data 
> leaf page was corrupted while replaying a gin insert WAL record. Upon further 
> examination of the redo code, we found a bug in ginRedoRecompress code, which 
> extracts the WAL information and updates the page.
>
> Specifically, when a new segment is inserted in the middle of a page, a 
> memmove operation is performed [1] at the current point in the page to make 
> room for the new segment. If this segment insertion is followed by delete 
> segment actions that are yet to be processed and the total data size is very 
> close to GinDataPageMaxDataSize, then we may move the data portion beyond the 
> boundary causing the opaque data to be corrupted.
>
> One way of solving this problem is to perform the replay work on a scratch 
> space, perform sanity check on the total size of the data portion before 
> copying it back to the actual page. While it involves additional memory 
> allocation and memcpy operations, it is safer and similar to the 'do' code 
> path where we ensure to make a copy of all segment past the first modified 
> segment before placing them back on the page [2].
>
> I have attached a patch for that approach here. Please let us know any 
> comments or feedback.

Do you have a test scenario for reproduction of this issue?  We need
it to ensure that fix is correct.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Tom Lane
Andrew Gierth  writes:
> The reason it behaves oddly is this: on i387 FPU (and NOT on arm32 or on
> 32-bit i386 with a modern architecture specified to the compiler), the
> result of 1e200 * 1e180 is not in fact infinite, because it fits in an
> 80-bit long double. So __builtin_isinf reports that it is finite; but if
> it gets stored to memory as a double (e.g. to pass as a parameter to a
> function), it then becomes infinite.

Ah-hah.  Can we fix it by explicitly casting the argument of isinf
to double?

regards, tom lane



Re: [HACKERS] Bug in to_timestamp().

2018-09-04 Thread David G. Johnston
On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> The current version of patch doesn't really distinguish spaces and
> delimiters in format string in non-FX mode.  So, spaces and delimiters
> are forming single group.  For me Oracle behavior is ridiculous at
> least because it doesn't allow cases when input string exactly matches
> format string.
>
> This one fails:
> SELECT to_timestamp('2018- -01 02', '- -MM DD') FROM dual
>

Related to below - since this works today it should continue to work.  I
was under the wrong impression that we followed their behavior today and
were pondering deviating from it because of its ridiculousness.


> > The couple of regression tests that change do so for the better.  It
> would be illuminating to set this up as two patches though, one introducing
> all of the new regression tests against the current code and then a second
> patch with the changed behavior with only the affected tests.
>
> OK, here you go.  0001-to_timestamp-regression-test-v17.patch
> introduces changes in regression tests and their output for current
> master, while 0002-to_timestamp-format-checking-v17.patch contain
> changes to to_timestamp itself.
>
>
>From those results the question is how important is it to force the
following breakage on our users (i.e., introduce FX exact symbol matching):

SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
- to_timestamp
---
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the
format string.

There seemed to be some implicit approvals of this breakage some 30 emails
and 10 months ago but given that this is the only change from a correct
result to a failure I'd like to officially put it out there for
opinion/vote gathering.   Mine is a -1; though keeping the distinction
between space and non-alphanumeric characters is expected.

David J.


Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andres Freund
On 2018-09-04 15:16:27 -0700, Andres Freund wrote:
> I'm now looking at how it comes that clang on linux doesn't default to
> x87 math, but freebsd does.

Oh well. I present you, without comments, the following:

  switch (Triple.getOS()) {
  case llvm::Triple::FreeBSD:
  case llvm::Triple::NetBSD:
  case llvm::Triple::OpenBSD:
return "i486";
  case llvm::Triple::Haiku:
return "i586";
  default:
// Fallback to p4.
return "pentium4";
  }

Greetings,

Andres Freund



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Tom Lane
Thomas Munro  writes:
> On Tue, Sep 4, 2018 at 9:39 AM Andrew Gierth
>  wrote:
>> I also confirmed that without #define isinf(x) __builtin_isinf(x), on
>> both 32bit and 64bit fbsd isinf() compiles as a function call, so the
>> OP's proposed change would not be desirable.

Presumably what we are looking at here is a compiler codegen bug for
__builtin_isinf().  The proposed change dodges it precisely by
substituting the library function instead --- at a performance penalty.
I agree that this isn't a real desirable fix, and we definitely ought
not cause it to happen on platforms that haven't got the bug.

> I installed FreeBSD 11.2 i386 on a virtual machine.  I couldn't
> reproduce the problem with either the base cc (clang 6.0.0) or clang38
> (clang 3.8.1) installed via pkg.
> The OP reported clang 3.8.0, so a minor version behind what I tested.

I tried to reproduce the problem, without success, on a few different
FreeBSD images I had laying about:

FreeBSD 11.0/x86_64, clang version 3.8.0
(this confirms OP's report that x86_64 is OK)

FreeBSD 10.3/ppc, gcc 4.2.1

FreeBSD 12.0-CURRENT (from around mid-May)/arm64, clang version 6.0.0

(I was testing PG HEAD, not the 11 branch, but I don't see a reason
to think that'd make a difference.)

I also looked for evidence of a bug of this sort in the clang bugzilla.
I couldn't find anything, but it's possible that "isinf" isn't what
I should have searched on.

Anyway, my estimation is that this is a compiler bug that's been
repaired, and it probably isn't widespread enough to justify our
inserting some klugy workaround.

regards, tom lane



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andrew Gierth
> "Victor" == Victor Wagner  writes:

 Victor> Do you reproducing problem in REL_11_STABLE? It doesn't exist
 Victor> in master.

Oh, I missed that. It is reproducible on REL_11_STABLE with clang 3.9.1,
I'll dig into it more.

-- 
Andrew (irc:RhodiumToad)



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andrew Gierth
> "Andrew" == Andrew Gierth  writes:
> "Andres" == Andres Freund  writes:

 Andres> Thomas and I are sitting in a cafe and are trying to figure out
 Andres> what's going on...

 Andrew> I have a standalone test case:

 Andrew> #include 
 Andrew> #include 

 Andrew> int main(int argc, char **argv)
 Andrew> {
 Andrew> double d1 = (argc ? 1e180 : 0);
 Andrew> double d2 = (argv ? 1e200 : 0);
 Andrew> int r2 = __builtin_isinf(d1 * d2);
 Andrew> int r1 = isinf(d1 * d2);
 Andrew> printf("r1 = %d, r2 = %d\n", r1, r2);
 Andrew> return 0;
 Andrew> }

 Andrew> Note that swapping the r1 and r2 lines makes the problem
 Andrew> disappear (!).

And that's the clue to why it happens.

The reason it behaves oddly is this: on i387 FPU (and NOT on arm32 or on
32-bit i386 with a modern architecture specified to the compiler), the
result of 1e200 * 1e180 is not in fact infinite, because it fits in an
80-bit long double. So __builtin_isinf reports that it is finite; but if
it gets stored to memory as a double (e.g. to pass as a parameter to a
function), it then becomes infinite.

 Andrew> cc -O2 -m32 flttst.c && ./a.out
 Andrew> r1 = 1, r2 = 0

Specifying a recent microarch makes it use 64-bit FP registers rather
than 80-bit ones:

cc -O2 -m32 -march=skylake flttst.c && ./a.out
r1 = 1, r2 = 1

-- 
Andrew (irc:RhodiumToad)



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andres Freund
Hi,

On 2018-09-04 15:16:27 -0700, Andres Freund wrote:
> I think this is pretty clearly a C99 violation by clang (note how gcc
> automatically enables -fexcess-precision=standard in C99 mode).  I'll
> report something, but I've little hope this getting fixed quickly - and
> it'll definitely not get fixed for such old clang versions as the OP used.

https://bugs.llvm.org/show_bug.cgi?id=38833

Greetings,

Andres Freund



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Tom Lane
Andres Freund  writes:
> ... Note that we've previously encountered similar issues
> on gcc, which is why we've tried to force gcc's hand with
> -fexcess-precision=standard.

Right.  Annoying that clang doesn't have that.  We can't realistically
program around an issue that might or might not show up depending on the
whims of the compiler's register allocation.

> I kinda wonder if we should add -mno-x87 or such in configure when we
> detect clang, obviously it doesn't deal correctly with this.

Seems worth looking into, but what happens if someone tries to compile
for x87 hardware?  Or do we care anymore?

regards, tom lane



test_pg_dump missing cleanup actions

2018-09-04 Thread Michael Paquier
Hi Stephen,

While hacking another patch, I have noticed that triggerring multiple
times in a row installcheck on test_pg_dump results in a failure because
it is missing clean up actions on the role regress_dump_test_role.
Roles are shared objects, so I think that we ought to not let traces of
it when doing any regression tests on a running instance.

Attached is a patch to clean up things.

Thanks,
--
Michael
diff --git a/src/test/modules/test_pg_dump/expected/test_pg_dump.out b/src/test/modules/test_pg_dump/expected/test_pg_dump.out
index c9bc6f7625..8d7b0ec926 100644
--- a/src/test/modules/test_pg_dump/expected/test_pg_dump.out
+++ b/src/test/modules/test_pg_dump/expected/test_pg_dump.out
@@ -92,3 +92,12 @@ ALTER EXTENSION test_pg_dump DROP SERVER s0;
 ALTER EXTENSION test_pg_dump DROP TABLE test_pg_dump_t1;
 ALTER EXTENSION test_pg_dump DROP TYPE test_pg_dump_e1;
 ALTER EXTENSION test_pg_dump DROP VIEW test_pg_dump_v1;
+-- Clean up
+DROP EXTENSION test_pg_dump;
+DROP SCHEMA test_pg_dump_s1;
+DROP MATERIALIZED VIEW test_pg_dump_mv1;
+DROP FOREIGN TABLE ft1;
+DROP SERVER s0;
+DROP TYPE test_pg_dump_e1;
+DROP FUNCTION test_pg_dump(integer);
+DROP ROLE regress_dump_test_role;
diff --git a/src/test/modules/test_pg_dump/sql/test_pg_dump.sql b/src/test/modules/test_pg_dump/sql/test_pg_dump.sql
index e463dec404..2ceb84d223 100644
--- a/src/test/modules/test_pg_dump/sql/test_pg_dump.sql
+++ b/src/test/modules/test_pg_dump/sql/test_pg_dump.sql
@@ -105,3 +105,13 @@ ALTER EXTENSION test_pg_dump DROP SERVER s0;
 ALTER EXTENSION test_pg_dump DROP TABLE test_pg_dump_t1;
 ALTER EXTENSION test_pg_dump DROP TYPE test_pg_dump_e1;
 ALTER EXTENSION test_pg_dump DROP VIEW test_pg_dump_v1;
+
+-- Clean up
+DROP EXTENSION test_pg_dump;
+DROP SCHEMA test_pg_dump_s1;
+DROP MATERIALIZED VIEW test_pg_dump_mv1;
+DROP FOREIGN TABLE ft1;
+DROP SERVER s0;
+DROP TYPE test_pg_dump_e1;
+DROP FUNCTION test_pg_dump(integer);
+DROP ROLE regress_dump_test_role;


signature.asc
Description: PGP signature


Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Thomas Munro
On Tue, Sep 4, 2018 at 12:59 PM Victor Wagner  wrote:
>
> В Tue, 04 Sep 2018 15:48:24 -0400
> Tom Lane  пишет:
>
>
> > I tried to reproduce the problem, without success, on a few different
> > FreeBSD images I had laying about:
> >
> > FreeBSD 11.0/x86_64, clang version 3.8.0
> >   (this confirms OP's report that x86_64 is OK)
> >
> > FreeBSD 10.3/ppc, gcc 4.2.1
> >
> > FreeBSD 12.0-CURRENT (from around mid-May)/arm64, clang version 6.0.0
> >
> > (I was testing PG HEAD, not the 11 branch, but I don't see a reason
> > to think that'd make a difference.)
>
> Alas, it does. First thing I've done after discovering this bug, it is
> to look if it exists in master. And master passes this test on the same
> machine where problem was discovered.
>
>
> > I also looked for evidence of a bug of this sort in the clang
> > bugzilla. I couldn't find anything, but it's possible that "isinf"
> > isn't what I should have searched on.
> >
> > Anyway, my estimation is that this is a compiler bug that's been
> > repaired, and it probably isn't widespread enough to justify our
> > inserting some klugy workaround.
>
> It doesn't look so, as bug persists after I've upgraded system to
> current 11.2-RELEASE with clang 6.0.0.

Ah, right it fails for me too on 32 bit FreeBSD 11.2 with
REL_11_STABLE.  Investigating...

-- 
Thomas Munro
http://www.enterprisedb.com



Re: test_pg_dump missing cleanup actions

2018-09-04 Thread Tom Lane
Michael Paquier  writes:
> While hacking another patch, I have noticed that triggerring multiple
> times in a row installcheck on test_pg_dump results in a failure because
> it is missing clean up actions on the role regress_dump_test_role.
> Roles are shared objects, so I think that we ought to not let traces of
> it when doing any regression tests on a running instance.

> Attached is a patch to clean up things.

I'm confused.  Isn't the point of that script exactly to create a modified
extension for testing pg_dump with?

What I'd do is leave the final state as-is and add a "drop role if exists"
at the start, similar to what some of the core regression tests do.

regards, tom lane



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andrew Gierth
> "Andrew" == Andrew Gierth  writes:

 Andrew> Specifying a recent microarch makes it use 64-bit FP registers
 Andrew> rather than 80-bit ones:

 Andrew> cc -O2 -m32 -march=skylake flttst.c && ./a.out
 Andrew> r1 = 1, r2 = 1

where "recent" means "since about 2000 or so":

cc -O2 -m32 -march=pentium4 flttst.c && ./a.out
r1 = 1, r2 = 1

(the actual requirement is for SSE2 support)

-- 
Andrew (irc:RhodiumToad)



Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-04 Thread Andres Freund



On September 4, 2018 9:11:25 PM PDT, Tom Lane  wrote:
>Michael Paquier  writes:
>> On Tue, Sep 04, 2018 at 03:09:21PM -0700, Jimmy Yih wrote:
>>> When an empty namespace is being initially populated with certain
>objects,
>>> it is possible for a DROP SCHEMA operation to come in and delete the
>>> namespace without using CASCADE.
>
>> It seems to me that we are missing some dependency tracking in some
>of
>> those cases.
>
>No, I think Jimmy is right: it's a race condition.  The pg_depend entry
>would produce the right result, except that it's not committed yet so
>the DROP SCHEMA doesn't see it.
>
>The bigger question is whether we want to do anything about this.
>Historically we've not bothered with locking on database objects that
>don't represent storage (ie, relations and databases).  If we're going
>to
>take this seriously, then we should for example also acquire lock on
>any
>function that's referenced in a view definition, to ensure it doesn't
>go
>away before the view is committed and its dependencies become visible.
>Likewise for operators, opclasses, collations, text search objects, you
>name it.  And worse, we'd really need this sort of locking even in
>vanilla
>DML queries, since objects could easily go away before the query is
>done.
>
>I think that line of thought leads to an enormous increase in locking
>overhead, for which we'd get little if any gain in usability.  So my
>inclination is to make an engineering judgment that we won't fix this.

Haven't we already significantly started down this road, to avoid a lot of the 
"tuple concurrently updated" type errors? Would expanding this a git further 
really be that noticeable?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: speeding up planning with partitions

2018-09-04 Thread Amit Langote
On 2018/09/04 22:24, David Rowley wrote:
> On 30 August 2018 at 00:06, Amit Langote  
> wrote:
>> With various overheads gone thanks to 0001 and 0002, locking of all
>> partitions via find_all_inheritos can be seen as the single largest
>> bottleneck, which 0003 tries to address.  I've kept it a separate patch,
>> because I'll need to think a bit more to say that it's actually to safe to
>> defer locking to late planning, due mainly to the concern about the change
>> in the order of locking from the current method.  I'm attaching it here,
>> because I also want to show the performance improvement we can expect with 
>> it.
> 
> For now, find_all_inheritors() locks the tables in ascending Oid
> order.  This makes sense with inheritance parent tables as it's much
> cheaper to sort on this rather than on something like the table's
> namespace and name.  I see no reason why what we sort on really
> matters, as long as we always sort on the same thing and the key we
> sort on is always unique so that the locking order is well defined.
> 
> For partitioned tables, there's really not much sense in sticking to
> the same lock by Oid order. The order of the PartitionDesc is already
> well defined so I don't see any reason why we can't just perform the
> locking in PartitionDesc order.

The reason we do the locking with find_all_inheritors for regular queries
(planner (expand_inherited_rtentry) does it for select/update/delete and
executor (ExecSetupPartitionTupleRouting) for insert) is that that's the
order used by various DDL commands when locking partitions, such as when
adding a column.  So, we'd have one piece of code locking partitions by
Oid order and others by canonical partition bound order or PartitionDesc
order.  I'm no longer sure if that's problematic though, about which more
below.

> This would mean you could perform the
> locking of the partitions once pruning is complete somewhere around
> add_rel_partitions_to_query(). Also, doing this would remove the need
> for scanning pg_inherits during find_all_inheritors() and would likely
> further speed up the planning of queries to partitioned tables with
> many partitions.

That's what happens with 0003; note the following hunk in it:

@@ -1555,14 +1555,15 @@ expand_inherited_rtentry(PlannerInfo *root,
RangeTblEntry *rte, Index rti)
 lockmode = AccessShareLock;

 /* Scan for all members of inheritance set, acquire needed locks */
-inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
+if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);

> I wrote a function named
> get_partition_descendants_worker() to do this in patch 0002 in [1] (it
> may have been a bad choice to have made this a separate function
> rather than just part of find_all_inheritors() as it meant I had to
> change a bit too much code in tablecmds.c). There might be something
> you can salvage from that patch to help here.
> 
> [1] 
> https://www.postgresql.org/message-id/CAKJS1f9QjUwQrio20Pi%3DyCHmnouf4z3SfN8sqXaAcwREG6k0zQ%40mail.gmail.com

Actually, I had written a similar patch to replace the usages of
find_all_inheritors and find_inheritance_children by different
partitioning-specific functions which would collect the the partition OIDs
from the already open parent table's PartitionDesc, more or less like the
patch you mention does.  But I think we don't need any new function(s) to
do that, that is, we don't need to change all the sites that call
find_all_inheritors or find_inheritance_children in favor of new functions
that return partition OIDs in PartitionDesc order, if only *because* we
want to change planner to lock partitions in the PartitionDesc order.  I'm
failing to see why the difference in locking order matters.  I understood
the concern as that locking partitions in different order could lead to a
deadlock if concurrent backends request mutually conflicting locks, but
only one of the conflicting backends, the one that got lock on the parent,
would be allowed to lock children.  Thoughts on that?

Thanks,
Amit




Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-04 Thread Tom Lane
Andres Freund  writes:
> On September 4, 2018 9:11:25 PM PDT, Tom Lane  wrote:
>> I think that line of thought leads to an enormous increase in locking
>> overhead, for which we'd get little if any gain in usability.  So my
>> inclination is to make an engineering judgment that we won't fix this.

> Haven't we already significantly started down this road, to avoid a lot of 
> the "tuple concurrently updated" type errors?

Not that I'm aware of.  We do not take locks on schemas, nor functions,
nor any other of the object types I mentioned.

> Would expanding this a git further really be that noticeable?

Frankly, I think it would be not so much "noticeable" as "disastrous".

Making the overhead tolerable would require very large compromises
in coverage, perhaps like "we'll only lock during DDL not DML".
At which point I'd question why bother.  We've seen no field reports
(that I can recall offhand, anyway) that trace to not locking these
objects.

regards, tom lane



Re: Pluggable Storage - Andres's take

2018-09-04 Thread Haribabu Kommi
On Tue, Sep 4, 2018 at 10:33 AM Andres Freund  wrote:

> Hi,
>
> Thanks for the patches!
>
> On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:
> > I found couple of places where the zheap is using some extra logic in
> > verifying
> > whether it is zheap AM or not, based on that it used to took some extra
> > decisions.
> > I am analyzing all the extra code that is done, whether any callbacks can
> > handle it
> > or not? and how? I can come back with more details later.
>
> Yea, I think some of them will need to stay (particularly around
> integrating undo) and som other ones we'll need to abstract.
>

OK. I will list all the areas that I found with my observation of how to
abstract or leaving it and then implement around it.


> > >> And then:
> > >> - lotsa cleanups
> > >> - rebasing onto a newer version of the abstract slot patchset
> > >> - splitting out smaller patches
> > >>
> > >>
> > >> You'd moved the bulk insert into tableam callbacks - I don't quite get
> > >> why? There's not really anything AM specific in that code?
> > >>
> > >
> > > The main reason of adding them to AM is just to provide a control to
> > > the specific AM to decide whether they can support the bulk insert or
> > > not?
> > >
> > > Current framework doesn't support AM specific bulk insert state to be
> > > passed from one function to another and it's structure is fixed. This
> needs
> > > to be enhanced to add AM specific private members also.
> > >
> >
> > Do you want me to work on it to make it generic to AM methods to extend
> > the structure?
>
> I think the best thing here would be to *remove* all AM abstraction for
> bulk insert, until it's actuallly needed. The likelihood of us getting
> the interface right and useful without an actual user seems low. Also,
> this already is a huge patch...
>

OK. Will remove them and share the patch.


>
> > @@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate,
> EState *estate,
> >   CommandId mycid, int hi_options,
> >   ResultRelInfo *resultRelInfo,
> >   BulkInsertState bistate,
> > - int nBufferedTuples,
> TupleTableSlot **bufferedSlots,
> > + int nBufferedSlots, TupleTableSlot
> **bufferedSlots,
> >   uint64 firstBufferedLineNo);
> >  static bool CopyReadLine(CopyState cstate);
> >  static bool CopyReadLineText(CopyState cstate);
> > @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate)
> >   void   *bistate;
> >   uint64  processed = 0;
> >   booluseHeapMultiInsert;
> > - int nBufferedTuples = 0;
> > + int nBufferedSlots = 0;
> >   int prev_leaf_part_index = -1;
>
> > -#define MAX_BUFFERED_TUPLES 1000
> > +#define MAX_BUFFERED_SLOTS 1000
>
> What's the point of these renames? We're still dealing in tuples. Just
> seems to make the patch larger.
>

OK. I will correct it.


>
> >   if (useHeapMultiInsert)
> >   {
> > + int tup_size;
> > +
> >   /* Add this tuple to the tuple
> buffer */
> > - if (nBufferedTuples == 0)
> > + if (nBufferedSlots == 0)
> > + {
> >   firstBufferedLineNo =
> cstate->cur_lineno;
> > -
>  Assert(bufferedSlots[nBufferedTuples] == myslot);
> > - nBufferedTuples++;
> > +
> > + /*
> > +  * Find out the Tuple size
> of the first tuple in a batch and
> > +  * use it for the rest
> tuples in a batch. There may be scenarios
> > +  * where the first tuple
> is very small and rest can be large, but
> > +  * that's rare and this
> should work for majority of the scenarios.
> > +  */
> > + tup_size =
> heap_compute_data_size(myslot->tts_tupleDescriptor,
> > +
>myslot->tts_values,
> > +
>myslot->tts_isnull);
> > + }
>
> This seems too exensive to me.  I think it'd be better if we instead
> used the amount of input data consumed for the tuple as a proxy. Does that
> sound reasonable?
>

Yes, the cstate structure contains the line_buf member that holds the
information of
the line length of the row, this can be used as a tuple length to limit the
size usage.
comments?

Regards,
Haribabu Kommi
Fujitsu Australia

Re: pg_verify_checksums failure with hash indexes

2018-09-04 Thread Amit Kapila
On Wed, Sep 5, 2018 at 9:46 AM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > Does anybody else have any idea on how can we write a test for
> > non-default block size or if we already have anything similar?
>
> Build with a non-default BLCKSZ and see if the regression tests pass.
> There's no way that a build with BLCKSZ x can run any tests for
> BLCKSZ y.
>
> Note that you can expect some plan variations from a different BLCKSZ,
> so there'd be at least a few "failures" in the regression tests, which'd
> require manual inspection.  Otherwise this could be delegated to a
> buildfarm animal using a nonstandard BLCKSZ.
>

Thanks for the suggestion.  I will do the manual verification related
to this patch.  I am not too much worried about this particular patch
as we know how to manually test it, but my main worry was any future
change (like changing the meta page contents) shouldn't break it.  I
think your suggestion for having a separate buildfarm for nonstandard
BLCKSZ is good.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] proposal: schema variables

2018-09-04 Thread Pavel Stehule
Hi

2018-09-04 9:21 GMT+02:00 Dean Rasheed :

> AFAICS this patch does nothing to consider parallel safety -- that is,
> as things stand, a variable is allowed in a query that may be
> parallelised, but its value is not copied to workers, leading to
> incorrect results. For example:
>
> create table foo(a int);
> insert into foo select * from generate_series(1,100);
> create variable zero int;
> let zero = 0;
>
> explain (costs off) select count(*) from foo where a%10 = zero;
>
>   QUERY PLAN
> ---
>  Finalize Aggregate
>->  Gather
>  Workers Planned: 2
>  ->  Partial Aggregate
>->  Parallel Seq Scan on foo
>  Filter: ((a % 10) = zero)
> (6 rows)
>
> select count(*) from foo where a%10 = zero;
>
>  count
> ---
>  38037-- Different random result each time, should be 100,000
> (1 row)
>
> Thoughts?
>

The query use copy of values of variables now - but unfortunately, these
values are not passed to workers.  Should be fixed.

Thank you for test case.

Pavel


> Regards,
> Dean
>


Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Victor Wagner


Collegues,

Few days ago commit 

 1f349aa7d9a6633e87db071390c73a39ac279ba4

Fix 8a934d67 for libc++ and make more include order resistant. 

was introduced into branch REL_11_STABLE.
It seems that it deals with GLIBC-based systems (i.e Linux and Hurd)
only, and shouldn't affect systems with non-GNU libc (such as BSD
family).

It changes code  which has a comment:

/*
 * Glibc doesn't use the builtin for clang due to a *gcc* bug in a
version
 * newer than the gcc compatibility clang claims to have. This would
cause a
 * *lot* of superfluous function calls, therefore revert when using
clang. In
 * C++ there's issues with libc++ (not libstdc++), so disable as well.
 */


However, this commit broke float8 test on 32-bit FreeBSD 11 with clang
3.8.0 compiler. Regressions.diff follows:

== pgsql.build/src/test/regress/regression.diffs
===
*** 
/usr/home/buildfarm/build-farm/REL_11_STABLE/pgsql.build/src/test/regress/expected/float8.out
Tue Sep  4 11:57:47 2018
--- 
/usr/home/buildfarm/build-farm/REL_11_STABLE/pgsql.build/src/test/regress/results/float8.out
Tue Sep  4 12:03:28 2018 *** *** 418,424  SET f1 =
FLOAT8_TBL.f1 * '-1' WHERE FLOAT8_TBL.f1 > '0.0';
  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
! ERROR:  value out of range: overflow
  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
  ERROR:  value out of range: overflow
  SELECT 0 ^ 0 + 0 ^ 1 + 0 ^ 0.0 + 0 ^ 0.5;
--- 418,432 
 SET f1 = FLOAT8_TBL.f1 * '-1'
 WHERE FLOAT8_TBL.f1 > '0.0';
  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
!  bad | ?column? 
! -+--
!  |0
!  |  -3.484e+201
!  | -1.0043e+203
!  |-Infinity
!  | -1.2345678901234
! (5 rows)
! 
  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
  ERROR:  value out of range: overflow
  SELECT 0 ^ 0 + 0 ^ 1 + 0 ^ 0.0 + 0 ^ 0.5; 

Problem doesn't arise on 64-bit FreeBSD and MacOS. (we have no 32-bit
MacOs at hand).

Following patch fixes this problem:

diff --git a/src/include/port.h b/src/include/port.h
index 0ce72e50e5..7daaebf568 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -350,7 +350,7 @@ extern int  isinf(double x);
  * *lot* of superfluous function calls, therefore revert when using
clang. In
  * C++ there's issues with libc++ (not libstdc++), so disable as well.
  */
-#if defined(__clang__) && !defined(__cplusplus)
+#if defined(__clang__) && !defined(__cplusplus) && defined(__linux__)
 /* needs to be separate to not confuse other compilers */
 #if __has_builtin(__builtin_isinf)
 /* need to include before, to avoid getting overwritten */

Idea of the patch is that because patch is only GLIBC-related, we
should apply this logic only if we have linux (more precisely linux or
hurd, but adding hurd would make condition overcomplicated and no one
seems to use it) system.
-- 



Re: pointless check in RelationBuildPartitionDesc

2018-09-04 Thread Alvaro Herrera
On 2018-Sep-04, Amit Langote wrote:

> On 2018/09/04 13:08, Alvaro Herrera wrote:

> > I think it'd be pointless noise.  If we really want to protect against
> > that, I think we should promote the Assert for relpartbound's isnull
> > flag into an if test.
> 
> So that we can elog(ERROR, ...) if isnull is true?  If so, I agree,
> because that's what should've been done here anyway (for a value read from
> the disk that is).

Yes.

I wonder now what's the value of using an Assert for this, BTW: if those
are enabled, then we'll crash saying "this column is null and shouldn't!".
If they are disabled, we'll instead crash (possibly in production)
trying to decode the field.  What was achieved?

With an elog(ERROR) the reaction is the expected one: the current
transaction is aborted, but the server continues to run.

> I think we should check relispartition then too.

Well, we don't check every single catalog flag in every possible code
path just to ensure it's sane.  I don't see any reason to do differently
here.  We rely heavily on the catalog contents to be correct, and
install defenses against user-induced corruption that would cause us to
crash; but going much further than that would be excessive IMO.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: TupleTableSlot abstraction

2018-09-04 Thread Amit Khandekar
On 28 August 2018 at 22:43, Ashutosh Bapat
 wrote:
> On Fri, Aug 24, 2018 at 6:46 AM, Andres Freund  wrote:
>
>>
>>> @@ -2883,7 +2885,7 @@ CopyFrom(CopyState cstate)
>>>   if (slot == NULL)   /* "do nothing" */
>>>   skip_tuple = true;
>>>   else/* trigger might have 
>>> changed tuple */
>>> - tuple = ExecMaterializeSlot(slot);
>>> + tuple = ExecFetchSlotTuple(slot, true);
>>>   }
>>
>> Could we do the Materialize vs Fetch vs Copy change separately?
>>
>
> Ok. I will do that.

Ashutosh offlist has shared with me
0006-Rethink-ExecMaterializeSlot-ExecFetchSlotTuple-in-th.patch which
contains the separated changes. The attached tar includes this
additional patch.


>>
>>> @@ -1590,7 +1590,8 @@ ExecHashTableInsert(HashJoinTable hashtable,
>>>   TupleTableSlot *slot,
>>>   uint32 hashvalue)
>>>  {
>>> - MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
>>> + boolshouldFree;
>>> + MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, );
>>>   int bucketno;
>>>   int batchno;
>>>
>>> @@ -1664,6 +1665,9 @@ ExecHashTableInsert(HashJoinTable hashtable,
>>> hashvalue,
>>> 
>>> >innerBatchFile[batchno]);
>>>   }
>>> +
>>> + if (shouldFree)
>>> + heap_free_minimal_tuple(tuple);
>>>  }
>>
>> Hm, how about splitting these out?
>
> Split into a separate patch? It doesn't make sense to add this patch
> before 0006 since the slots in those patches can "own" a minimal
> tuple. Let's add a patch after 0006 i.e. tuple table abstraction
> patch. Will do.

Ashutosh offlist has shared with me
0009-Rethink-ExecFetchSlotMinimalTuple.patch which contains the above
separated changes. The attached tar includes this additional patch.


On 31 August 2018 at 20:50, Andres Freund  wrote:
>> On 31 August 2018 at 10:05, Amit Khandekar  wrote:
> Hi,
>
> On 2018-08-31 10:05:05 +0530, Amit Khandekar wrote:
>> On 28 August 2018 at 22:43, Ashutosh Bapat
>> >> I think I was wrong at saying that we should remove this. I think you
>> >> were right that it should become a callback...
>>
>> > We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you
>> > want to reinstantiate those as well? If so, slot_getsysattr() becomes
>> > a wrapper around getsysattr() callback.
>
> Right.
>
>> One option is that the getsysattr() callback function returns false if
>> system attributes are not supported for that slot type. Other option
>> is that in the not-supported case, the function errors out, meaning
>> that the caller should be aware that the slot type is the one that
>> supports system attributes.
>
>> I had prepared changes for the first option, but Ashutosh Bapat
>> offlist made me realize that it's worth considering the 2nd option(
>> i.e. erroring out).
>
> I think we should error out.

I did these changes in the existing 0007-Restructure-TupleTableSlot...
patch. This patch now contains changes for the new getsysattr()
callback. I used the 2nd option : erroring out.

On 31 August 2018 at 10:05, Amit Khandekar  wrote:
> The only use case where slot_getsysattr() is called right now is
> execCurrentOf(), and it currently checks the bool return value of
> slot_getsysattr() and prints a user-friendly message if false is
> returned. Looking at the code (commit 8f5ac440430ab), it seems that we
> want to continue to have a user-friendly message for some unhandled
> cases like custom scans. So perhaps for now it's ok to go with the
> first option where getsysattr callback returns false for slot types
> that don't have system attributes.

I noticed that the issue for which commit 8f5ac440430ab was done was
that the tid was attempted to be fetched from a tuple that doesn't
support system attribute (virtual tuple), although the report
complained about a non-user-friendly message being given. So similarly
for custom scan, since it is not handled similarly, for now we can
continue to give an "unfriendly" message. The getsysattr() callbacks
will return something like "virtual tuple table slot does not have
system atttributes" if the slot does not support system attributes.


The attached v11 tar has the above set of changes.

Thanks
-Amit Khandekar


pg_abstract_tts_patches_v11.tar.gz
Description: GNU Zip compressed data


Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-04 Thread Andres Freund



On September 4, 2018 6:16:24 AM PDT, Victor Wagner  wrote:
>
>Collegues,
>
>Few days ago commit 
>
> 1f349aa7d9a6633e87db071390c73a39ac279ba4
>
>Fix 8a934d67 for libc++ and make more include order resistant. 
>
>was introduced into branch REL_11_STABLE.
>It seems that it deals with GLIBC-based systems (i.e Linux and Hurd)
>only, and shouldn't affect systems with non-GNU libc (such as BSD
>family).

What libc is this using?


>It changes code  which has a comment:
>
>/*
> * Glibc doesn't use the builtin for clang due to a *gcc* bug in a
>version
> * newer than the gcc compatibility clang claims to have. This would
>cause a
> * *lot* of superfluous function calls, therefore revert when using
>clang. In
> * C++ there's issues with libc++ (not libstdc++), so disable as well.
> */
>
>
>However, this commit broke float8 test on 32-bit FreeBSD 11 with clang
>3.8.0 compiler. Regressions.diff follows:

Does this happen with a newer clang version too?

>== pgsql.build/src/test/regress/regression.diffs
>===
>***
>/usr/home/buildfarm/build-farm/REL_11_STABLE/pgsql.build/src/test/regress/expected/float8.out
>Tue Sep  4 11:57:47 2018
>---
>/usr/home/buildfarm/build-farm/REL_11_STABLE/pgsql.build/src/test/regress/results/float8.out
>Tue Sep  4 12:03:28 2018 *** *** 418,424  SET f1 =
>FLOAT8_TBL.f1 * '-1' WHERE FLOAT8_TBL.f1 > '0.0';
>  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
>! ERROR:  value out of range: overflow
>  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
>  ERROR:  value out of range: overflow
>  SELECT 0 ^ 0 + 0 ^ 1 + 0 ^ 0.0 + 0 ^ 0.5;
>--- 418,432 
> SET f1 = FLOAT8_TBL.f1 * '-1'
> WHERE FLOAT8_TBL.f1 > '0.0';
>  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
>!  bad | ?column? 
>! -+--
>!  |0
>!  |  -3.484e+201
>!  | -1.0043e+203
>!  |-Infinity
>!  | -1.2345678901234
>! (5 rows)
>! 
>  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
>  ERROR:  value out of range: overflow
>  SELECT 0 ^ 0 + 0 ^ 1 + 0 ^ 0.0 + 0 ^ 0.5; 
>
>Problem doesn't arise on 64-bit FreeBSD and MacOS. (we have no 32-bit
>MacOs at hand).

This seems like something that needs to be understood.  If a compiler intrinsic 
doesn't work, it's serious.


>Following patch fixes this problem:
>
>diff --git a/src/include/port.h b/src/include/port.h
>index 0ce72e50e5..7daaebf568 100644
>--- a/src/include/port.h
>+++ b/src/include/port.h
>@@ -350,7 +350,7 @@ extern int  isinf(double x);
>  * *lot* of superfluous function calls, therefore revert when using
>clang. In
>  * C++ there's issues with libc++ (not libstdc++), so disable as well.
>  */
>-#if defined(__clang__) && !defined(__cplusplus)
>+#if defined(__clang__) && !defined(__cplusplus) && defined(__linux__)
> /* needs to be separate to not confuse other compilers */
> #if __has_builtin(__builtin_isinf)
> /* need to include before, to avoid getting overwritten */
>
>Idea of the patch is that because patch is only GLIBC-related, we
>should apply this logic only if we have linux (more precisely linux or
>hurd, but adding hurd would make condition overcomplicated and no one
>seems to use it) system.

This seems too restrictive. There's nothing Linux specific in the issue - 
causes slowdowns on other platforms too. 

Thanks!

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Add SKIP LOCKED to VACUUM and ANALYZE

2018-09-04 Thread Bossart, Nathan
On 9/3/18, 6:20 PM, "Michael Paquier"  wrote:
> Nathan, could you rebase your patch set?  From what I can see the last
> patch set applies with one conflict, and it would be nice for clarity to
> split the routines for analyze, vacuum and cluster into separate places.
> Similar to what is done with vacuum_is_relation_owner, having the same
> set of logs for vacuum and analyze may be cleaner.  The set of ownership
> checks should happen after the skip lock checks to be consistent between
> the ownership checks done when building the relation list (list
> expansion for partitions and such) as well as for vacuum_rel() and
> analyze_rel().

Yes.  I've started working on this again, but the new patch set is
probably still a few days out.

> With all the work which has been done already, I don't think that we are
> that far from getting something committable.

Great!

Nathan



Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-04 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Sep 04, 2018 at 03:09:21PM -0700, Jimmy Yih wrote:
>> When an empty namespace is being initially populated with certain objects,
>> it is possible for a DROP SCHEMA operation to come in and delete the
>> namespace without using CASCADE.

> It seems to me that we are missing some dependency tracking in some of
> those cases.

No, I think Jimmy is right: it's a race condition.  The pg_depend entry
would produce the right result, except that it's not committed yet so
the DROP SCHEMA doesn't see it.

The bigger question is whether we want to do anything about this.
Historically we've not bothered with locking on database objects that
don't represent storage (ie, relations and databases).  If we're going to
take this seriously, then we should for example also acquire lock on any
function that's referenced in a view definition, to ensure it doesn't go
away before the view is committed and its dependencies become visible.
Likewise for operators, opclasses, collations, text search objects, you
name it.  And worse, we'd really need this sort of locking even in vanilla
DML queries, since objects could easily go away before the query is done.

I think that line of thought leads to an enormous increase in locking
overhead, for which we'd get little if any gain in usability.  So my
inclination is to make an engineering judgment that we won't fix this.

It'd be interesting to know whether any other DBMSes make an effort
to prevent this kind of problem.

regards, tom lane



Re: [HACKERS] Restricting maximum keep segments by repslots

2018-09-04 Thread Masahiko Sawada
On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
 wrote:
> At Mon, 3 Sep 2018 18:14:22 +0900, Masahiko Sawada  
> wrote in 
>> Thank you for updating the patch!
>>
>> On Tue, Jul 31, 2018 at 6:11 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hello.
>> >
>> > At Tue, 24 Jul 2018 16:47:41 +0900, Masahiko Sawada 
>> >  wrote in 
>> > 
>> >> On Mon, Jul 23, 2018 at 4:16 PM, Kyotaro HORIGUCHI
>> >>  wrote:
>> >> > Hello.
>> >> >
>> >> > At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada 
>> >> >  wrote in 
>> >> > 
>> >> This funk should be updated.
>> >
>> > Perhaps you need a fresh database cluster.
>>
>> I meant this was a doc update in 0004 patch but it's fixed in v7 patch.
>
> Wow..
>
>> While testing the v7 patch, I got the following result with
>> max_slot_wal_keep_size = 5GB and without wal_keep_segments setting.
>>
>> =# select pg_current_wal_lsn(), slot_name, restart_lsn,
>> confirmed_flush_lsn, wal_status, remain, pg_size_pretty(remain) from
>> pg_replication_slots ;
>>  pg_current_wal_lsn | slot_name | restart_lsn | confirmed_flush_lsn |
>> wal_status |  remain  | pg_size_pretty
>> +---+-+-++--+
>>  2/A3D8 | l1| 1/AC000910  | 1/AC000948  |
>> streaming  | 16777000 | 16 MB
>> (1 row)
>>
>> The actual distance between the slot limit and the slot 'l1' is about
>> 1GB(5GB - (2/A3D8 - 1/AC000910)) but the system view says the
>> remain is only 16MB. For the calculation of resetBytes in
>> GetOldestKeepSegment(), the current patch seems to calculate the
>> distance between the minSlotLSN and restartLSN when (curLSN -
>> max_slot_wal_keep_size) < minSlotLSN. However, I think that the actual
>> remained bytes until the slot lost the required WAL is (restartLSN -
>> (currLSN - max_slot_wal_keep_size)) in that case.
>
> Oops! That's a silly thinko or rather a typo. It's apparently
> wrong that keepSegs instead of limitSegs is involved in making
> the calculation of restBytes. Just using limitSegs makes it
> sane. It's a pity that I removed the remain from regression test.
>
> Fixed that and added an item for remain calculation in the TAP
> test. I expect that pg_size_pretty() adds some robustness to the
> test.
>
>> Also, 0004 patch needs to be rebased on the current HEAD.
>
> Done. Please find the v8 attached.
>

Thank you for updating! Here is the review comment for v8 patch.

+/*
+ * This slot still has all required segments. Calculate how many
+ * LSN bytes the slot has until it loses restart_lsn.
+ */
+fragbytes = wal_segment_size - (currLSN % wal_segment_size);
+XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
fragbytes,
+wal_segment_size, *restBytes);

For the calculation of fragbytes, I think we should calculate the
fragment bytes of restartLSN instead. The the formula "restartSeg +
limitSegs - currSeg" means the # of segment between restartLSN and the
limit by the new parameter. I don't think that the summation of it and
fragment bytes of currenLSN is correct. As the following result
(max_slot_wal_keep_size is 128MB) shows, the remain column shows the
actual remains + 16MB (get_bytes function returns the value of
max_slot_wal_keep_size in bytes).

postgres(1:29447)=# select pg_current_wal_lsn(), slot_name,
restart_lsn, wal_status, remain, pg_size_pretty(remain),
pg_size_pretty(get_bytes('max_slot_wal_keep_size') -
(pg_current_wal_lsn() - restart_lsn)) from pg_replication_slots ;
 pg_current_wal_lsn | slot_name | restart_lsn | wal_status |  remain
| pg_size_pretty | pg_size_pretty
+---+-++---++
 0/1D0001F0 | l1| 0/1D0001B8  | streaming  | 150994448
| 144 MB | 128 MB
(1 row)

---
If the wal_keeps_segments is greater than max_slot_wal_keep_size, the
wal_keep_segments doesn't affect the value of the remain column.

postgres(1:48422)=# show max_slot_wal_keep_size ;
 max_slot_wal_keep_size

 128MB
(1 row)

postgres(1:48422)=# show wal_keep_segments ;
 wal_keep_segments
---
 5000
(1 row)

postgres(1:48422)=# select slot_name, wal_status, remain,
pg_size_pretty(remain) as remain  from pg_replication_slots ;
 slot_name | wal_status |  remain   | remain
---++---+
 l1| streaming  | 150994728 | 144 MB
(1 row)

*** After consumed over 128MB WAL ***

postgres(1:48422)=# select slot_name, wal_status, remain,
pg_size_pretty(remain) as remain  from pg_replication_slots ;
 slot_name | wal_status | remain | remain
---+++-
 l1| streaming  |  0 | 0 bytes
(1 row)

---
For the cosmetic stuff there are code where need the line break.

 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
+static XLogSegNo 

Re: [HACKERS] Restricting maximum keep segments by repslots

2018-09-04 Thread Kyotaro HORIGUCHI
At Mon, 3 Sep 2018 18:14:22 +0900, Masahiko Sawada  
wrote in 
> Thank you for updating the patch!
> 
> On Tue, Jul 31, 2018 at 6:11 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello.
> >
> > At Tue, 24 Jul 2018 16:47:41 +0900, Masahiko Sawada  
> > wrote in 
> > 
> >> On Mon, Jul 23, 2018 at 4:16 PM, Kyotaro HORIGUCHI
> >>  wrote:
> >> > Hello.
> >> >
> >> > At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada 
> >> >  wrote in 
> >> > 
> >> This funk should be updated.
> >
> > Perhaps you need a fresh database cluster.
> 
> I meant this was a doc update in 0004 patch but it's fixed in v7 patch.

Wow..

> While testing the v7 patch, I got the following result with
> max_slot_wal_keep_size = 5GB and without wal_keep_segments setting.
> 
> =# select pg_current_wal_lsn(), slot_name, restart_lsn,
> confirmed_flush_lsn, wal_status, remain, pg_size_pretty(remain) from
> pg_replication_slots ;
>  pg_current_wal_lsn | slot_name | restart_lsn | confirmed_flush_lsn |
> wal_status |  remain  | pg_size_pretty
> +---+-+-++--+
>  2/A3D8 | l1| 1/AC000910  | 1/AC000948  |
> streaming  | 16777000 | 16 MB
> (1 row)
> 
> The actual distance between the slot limit and the slot 'l1' is about
> 1GB(5GB - (2/A3D8 - 1/AC000910)) but the system view says the
> remain is only 16MB. For the calculation of resetBytes in
> GetOldestKeepSegment(), the current patch seems to calculate the
> distance between the minSlotLSN and restartLSN when (curLSN -
> max_slot_wal_keep_size) < minSlotLSN. However, I think that the actual
> remained bytes until the slot lost the required WAL is (restartLSN -
> (currLSN - max_slot_wal_keep_size)) in that case.

Oops! That's a silly thinko or rather a typo. It's apparently
wrong that keepSegs instead of limitSegs is involved in making
the calculation of restBytes. Just using limitSegs makes it
sane. It's a pity that I removed the remain from regression test.

Fixed that and added an item for remain calculation in the TAP
test. I expect that pg_size_pretty() adds some robustness to the
test.

> Also, 0004 patch needs to be rebased on the current HEAD.

Done. Please find the v8 attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From a380a7fffcf01eb869035113da451c670ec52772 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 21 Dec 2017 21:20:20 +0900
Subject: [PATCH 1/4] Add WAL relief vent for replication slots

Adds a capability to limit the number of segments kept by replication
slots by a GUC variable.
---
 src/backend/access/transam/xlog.c | 106 --
 src/backend/utils/misc/guc.c  |  12 +++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h |   1 +
 4 files changed, 95 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 493f1db7b9..982eedad32 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -105,6 +105,7 @@ int			wal_level = WAL_LEVEL_MINIMAL;
 int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
+int			max_slot_wal_keep_size_mb = 0;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -867,6 +868,7 @@ static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 static void LocalSetXLogInsertAllowed(void);
 static void CreateEndOfRecoveryRecord(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
+static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr minSlotPtr);
 static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
 static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
 
@@ -9488,6 +9490,53 @@ CreateRestartPoint(int flags)
 	return true;
 }
 
+/*
+ * Returns minimum segment number the next checkpoint must leave considering
+ * wal_keep_segments, replication slots and max_slot_wal_keep_size.
+ *
+ * currLSN is the current insert location
+ * minSlotLSN is the minimum restart_lsn of all active slots
+ */
+static XLogSegNo
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
+{
+	uint64		keepSegs = 0;
+	XLogSegNo	currSeg;
+	XLogSegNo	minSlotSeg;
+
+	XLByteToSeg(currLSN, currSeg, wal_segment_size);
+	XLByteToSeg(minSlotLSN, minSlotSeg, wal_segment_size);
+
+	/*
+	 * Calculate keep segments by slots first. The second term of the
+	 * condition is just a sanity check.
+	 */
+	if (minSlotLSN != InvalidXLogRecPtr && minSlotSeg <= currSeg)
+		keepSegs = currSeg - minSlotSeg;
+
+	/* Cap keepSegs by max_slot_wal_keep_size */
+	if (max_slot_wal_keep_size_mb > 0)
+	{
+		uint64 limitSegs;
+
+		limitSegs = ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
+
+		/* Apply max_slot_wal_keep_size to keeping segments */
+		if (limitSegs < keepSegs)
+			keepSegs = 

Re: [HACKERS] proposal: schema variables

2018-09-04 Thread Dean Rasheed
AFAICS this patch does nothing to consider parallel safety -- that is,
as things stand, a variable is allowed in a query that may be
parallelised, but its value is not copied to workers, leading to
incorrect results. For example:

create table foo(a int);
insert into foo select * from generate_series(1,100);
create variable zero int;
let zero = 0;

explain (costs off) select count(*) from foo where a%10 = zero;

  QUERY PLAN
---
 Finalize Aggregate
   ->  Gather
 Workers Planned: 2
 ->  Partial Aggregate
   ->  Parallel Seq Scan on foo
 Filter: ((a % 10) = zero)
(6 rows)

select count(*) from foo where a%10 = zero;

 count
---
 38037-- Different random result each time, should be 100,000
(1 row)

Thoughts?

Regards,
Dean



Re: pg_verify_checksums failure with hash indexes

2018-09-04 Thread Dilip Kumar
On Tue, Sep 4, 2018 at 10:49 AM, Dilip Kumar  wrote:
> On Tue, Sep 4, 2018 at 10:14 AM, Amit Kapila  wrote:
>> On Mon, Sep 3, 2018 at 2:44 PM Dilip Kumar  wrote:
>>> On Mon, Sep 3, 2018 at 8:37 AM, Amit Kapila  wrote:
>>> > On Sat, Sep 1, 2018 at 10:28 AM Dilip Kumar  wrote:
>>> >>
>>> >> I think if we compute with below formula which I suggested upthread
>>> >>
>>> >> #define HASH_MAX_BITMAPS   Min(BLCKSZ / 8, 1024)
>>> >>
>>> >> then for BLCKSZ 8K and bigger, it will remain the same value where it
>>> >> does not overrun.  And, for the small BLCKSZ, I think it will give
>>> >> sufficient space for the hash map. If the BLCKSZ is 1K then the sizeof
>>> >> (HashMetaPageData) + sizeof (HashPageOpaque) = 968 which is very close
>>> >> to the BLCKSZ.
>>> >>
>>> >
>>> > Yeah, so at 1K, the value of HASH_MAX_BITMAPS will be 128 as per above
>>> > formula which is what it was its value prior to the commit 620b49a1.
>>> > I think it will be better if you add a comment in your patch
>>> > indicating the importance/advantage of such a formula.
>>> >
>>> I have added the comments.
>>>
>>
>
> In my previous patch mistakenly I put Max(BLCKSZ / 8, 1024) instead of
> Min(BLCKSZ / 8, 1024).   I have fixed the same.
>
>> Thanks, I will look into it.  Can you please do some pg_upgrade tests
>> to ensure that this doesn't impact the upgrade?  You can create
>> hash-index and populate it with some data in version 10 and try
>> upgrading to 11 after applying this patch.  You can also try it with
>> different block-sizes.
>>
> Ok, I will do that.
>

I have tested pg_upgrade with different block size (1K, 4K, 8K, 32K).
The upgrade is working fine from v10 to v11 and I am able to fetch
data with index scan on the hash index after an upgrade.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: speeding up planning with partitions

2018-09-04 Thread Amit Langote
Hi Dilip,

Thanks for taking a look.

On 2018/09/03 20:57, Dilip Kumar wrote:
> The idea looks interesting while going through the patch I observed
> this comment.
> 
> /*
>  * inheritance_planner
>  *   Generate Paths in the case where the result relation is an
>  *   inheritance set.
>  *
>  * We have to handle this case differently from cases where a source relation
>  * is an inheritance set. Source inheritance is expanded at the bottom of the
>  * plan tree (see allpaths.c), but target inheritance has to be expanded at
>  * the top.
> 
> I think with your patch these comments needs to be change?

Yes, maybe a good idea to mention that partitioned table result relations
are not handled here.

Actually, I've been wondering if this patch (0001) shouldn't get rid of
inheritance_planner altogether and implement the new approach for *all*
inheritance sets, not just partitioned tables, but haven't spent much time
on that idea yet.

>   if (parse->resultRelation &&
> - rt_fetch(parse->resultRelation, parse->rtable)->inh)
> + rt_fetch(parse->resultRelation, parse->rtable)->inh &&
> + rt_fetch(parse->resultRelation, parse->rtable)->relkind !=
> + RELKIND_PARTITIONED_TABLE)
>   inheritance_planner(root);
>   else
>   grouping_planner(root, false, tuple_fraction);
> 
> I think we can add some comments to explain if the target rel itself
> is partitioned rel then why
> we can directly go to the grouping planner.

Okay, I will try to add more explanatory comments here and there in the
next version I will post later this week.

Thanks,
Amit




Re: Make executor's Range Table an array instead of a List

2018-09-04 Thread Amit Langote
On 2018/08/24 7:22, David Rowley wrote:
> On 24 August 2018 at 02:26, Amit Langote  wrote:
>> One of the patches I sent last week does the same thing, among a
>> couple of other things with regard to handling relations in the
>> executor.  On a cursory look at the patch, your take of it looks
>> better than mine.  Will test tomorrow.  Here is a link to my email:
>>
>> https://www.postgresql.org/message-id/468c85d9-540e-66a2-1dde-fec2b741e...@lab.ntt.co.jp
>>
>> 4th of my patches implements the array'fication of executor's range table.
> 
> Sorry, didn't realise. I'll withdraw this and review yours during the
> upcoming 'fest.

Since your patch implemented the idea in a better manner, I'm thinking I
should merge it with mine.  Is that okay with you?

Thanks,
Amit




Re: pg_verify_checksums -d option (was: Re: pg_verify_checksums -r option)

2018-09-04 Thread Masahiko Sawada
On Mon, Sep 3, 2018 at 10:06 PM, Yugo Nagata  wrote:
> Hi,
>
> On Sat, 1 Sep 2018 07:40:40 +0200 (CEST)
> Fabien COELHO  wrote:
>
>> > Attached is a patch to allow pg_verity_checksums to specify a database
>> > to scan.  This is usefule for users who want to verify checksums of 
>> > relations
>> > in a specific database. We can specify a database by OID using -d or 
>> > --dboid option.
>> > Also, when -g or --global-only is used only shared relations are scaned.
>>
>> It seems that the patch does not apply anymore. Could you rebase it?
>
> I attached the rebased patch.

FWIW, I think it would be good if we can specify multiple database
oids to the -d option, like "pg_verify_checksums -d 12345 -d 23456".
Maybe it's the same for the -r option.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Make executor's Range Table an array instead of a List

2018-09-04 Thread Amit Langote
On 2018/09/04 17:14, David Rowley wrote:
> On 4 September 2018 at 19:31, Amit Langote
>  wrote:
>> On 2018/08/24 7:22, David Rowley wrote:
>>> On 24 August 2018 at 02:26, Amit Langote  wrote:
 One of the patches I sent last week does the same thing, among a
 couple of other things with regard to handling relations in the
 executor.  On a cursory look at the patch, your take of it looks
 better than mine.  Will test tomorrow.  Here is a link to my email:

 https://www.postgresql.org/message-id/468c85d9-540e-66a2-1dde-fec2b741e...@lab.ntt.co.jp

 4th of my patches implements the array'fication of executor's range table.
>>>
>>> Sorry, didn't realise. I'll withdraw this and review yours during the
>>> upcoming 'fest.
>>
>> Since your patch implemented the idea in a better manner, I'm thinking I
>> should merge it with mine.  Is that okay with you?
> 
> Feel free to do that.

Okay, will do.

> I've not yet looked at your patch. I was
> planning on looking at your partition planning performance
> improvements patches first.

Great, thanks.

Regards,
Amit




Re: pointless check in RelationBuildPartitionDesc

2018-09-04 Thread Amit Langote
On 2018/09/04 13:08, Alvaro Herrera wrote:
> On 2018-Sep-04, Amit Langote wrote:
> 
>> On 2018/09/04 10:19, Michael Paquier wrote:
>>> On Tue, Sep 04, 2018 at 09:47:07AM +0900, Amit Langote wrote:
 On 2018/09/04 6:39, Alvaro Herrera wrote:
> After looking, it seems that this is just self-inflicted pain: for some
> reason, we store the pg_inherits row for a partition, and immediately
> afterwards compute and store its partition bound, which requires the
> above hack.  But if we do things in the opposite order, this is no
> longer needed.  I propose to remove it, as in the attached patch.

 +1.  I remember having facepalmed at this before and had also written a
 patch but never got around to submitting it.
>>>
>>> Ok, I see.  It seems to me that this could be replaced by an
>>> elog(ERROR), as relispartition ought to be set anyway.  This way any
>>> future callers would get things done in the correct order.
>>
>> Converting it to elog(ERROR, ...) might be a good idea.
> 
> I think it'd be pointless noise.  If we really want to protect against
> that, I think we should promote the Assert for relpartbound's isnull
> flag into an if test.

So that we can elog(ERROR, ...) if isnull is true?  If so, I agree,
because that's what should've been done here anyway (for a value read from
the disk that is).  I think we should check relispartition then too.

Thanks,
Amit




Re: MERGE SQL statement for PG12

2018-09-04 Thread Jaime Casanova
On Tue, 4 Sep 2018 at 00:01, Pavan Deolasee  wrote:
>
> I've rebased the patch against the current master. The patch hasn't changed 
> much since the last time.
>

Hi Pavan,

I had this crash when running sqlsmith against the previous version of
this patch and just confirmed it still crash with this version (which
makes sense because you said patch hasn't changed much)

To reproduce run this query against regression database:

"""
MERGE INTO public.pagg_tab_ml_p3 as target_0
USING public.prt2_l_p3_p2 as ref_0 ON target_0.a = ref_0.a
WHEN MATCHED AND cast(null as tid) <= cast(null as tid) THEN DELETE;
"""

attached is a backtrace

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f0b496c442a in __GI_abort () at abort.c:89
#2  0x55e572b2ede4 in ExceptionalCondition (conditionName=0x55e572d09070 
"!(list != ((List *) ((void *)0)))", errorType=0x55e572d08df7 
"FailedAssertion", 
fileName=0x55e572d08df0 "list.c", lineNumber=390) at assert.c:54
#3  0x55e572840746 in list_nth_cell (list=0x0, n=1) at list.c:390
#4  0x55e57284083b in list_nth (list=0x0, n=1) at list.c:413
#5  0x55e5727d48f1 in adjust_partition_tlist (tlist=0x0, 
map=0x55e5738912f8) at execPartition.c:1448
#6  0x55e5727d385e in ExecInitPartitionInfo (mtstate=0x55e573857808, 
resultRelInfo=0x55e573857268, proute=0x55e5738768c8, estate=0x55e573857018, 
partidx=0)
at execPartition.c:765
#7  0x55e5727cefe3 in ExecMergeMatched (mtstate=0x55e573857808, 
estate=0x55e573857018, slot=0x55e573870ba8, junkfilter=0x55e573877d20, 
tupleid=0x7ffebd80430a)
at execMerge.c:205
#8  0x55e5727cee5e in ExecMerge (mtstate=0x55e573857808, 
estate=0x55e573857018, slot=0x55e573870ba8, junkfilter=0x55e573877d20, 
resultRelInfo=0x55e573857268)
at execMerge.c:127
#9  0x55e572802f57 in ExecModifyTable (pstate=0x55e573857808) at 
nodeModifyTable.c:2229
#10 0x55e5727d5d56 in ExecProcNodeFirst (node=0x55e573857808) at 
execProcnode.c:445
#11 0x55e5727c93b7 in ExecProcNode (node=0x55e573857808) at 
../../../src/include/executor/executor.h:237
#12 0x55e5727cbdd2 in ExecutePlan (estate=0x55e573857018, 
planstate=0x55e573857808, use_parallel_mode=false, operation=CMD_MERGE, 
sendTuples=false, numberTuples=0, 
direction=ForwardScanDirection, dest=0x55e573860720, execute_once=true) at 
execMain.c:1725
#13 0x55e5727c99bf in standard_ExecutorRun (queryDesc=0x55e5737a6258, 
direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#14 0x55e5727c97d5 in ExecutorRun (queryDesc=0x55e5737a6258, 
direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:306
#15 0x55e5729c1239 in ProcessQuery (plan=0x55e5738605a8, 
sourceText=0x55e57377c4c8 "MERGE INTO public.pagg_tab_ml_p3 as target_0 
\nUSING public.prt2_l_p3_p2 as ref_0 ON target_0.a = ref_0.a \nWHEN MATCHED   
AND cast(null as tid) <= cast(null as tid)THEN DELETE;", params=0x0, 
queryEnv=0x0, dest=0x55e573860720, completionTag=0x7ffebd804780 "") at 
pquery.c:161
#16 0x55e5729c2c0b in PortalRunMulti (portal=0x55e5737e5b78, 
isTopLevel=true, setHoldSnapshot=false, dest=0x55e573860720, 
altdest=0x55e573860720, 
completionTag=0x7ffebd804780 "") at pquery.c:1291
#17 0x55e5729c21c8 in PortalRun (portal=0x55e5737e5b78, 
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x55e573860720, 
altdest=0x55e573860720, 
completionTag=0x7ffebd804780 "") at pquery.c:804
#18 0x55e5729bbd20 in exec_simple_query (
query_string=0x55e57377c4c8 "MERGE INTO public.pagg_tab_ml_p3 as target_0 
\nUSING public.prt2_l_p3_p2 as ref_0 ON target_0.a = ref_0.a \nWHEN MATCHED   
AND cast(null as tid) <= cast(null as tid)THEN DELETE;") at postgres.c:1122
#19 0x55e5729c02ed in PostgresMain (argc=1, argv=0x55e5737ae238, 
dbname=0x55e5737ae098 "regression", username=0x55e5737ae070 "jcasanov") at 
postgres.c:4150
#20 0x55e572918861 in BackendRun (port=0x55e57379e070) at postmaster.c:4364
#21 0x55e572917f5a in BackendStartup (port=0x55e57379e070) at 
postmaster.c:4036
#22 0x55e572914299 in ServerLoop () at postmaster.c:1709
#23 0x55e572913ac7 in PostmasterMain (argc=3, argv=0x55e573776210) at 
postmaster.c:1382
#24 0x55e572837605 in main (argc=3, argv=0x55e573776210) at main.c:228


Re: Make executor's Range Table an array instead of a List

2018-09-04 Thread David Rowley
On 4 September 2018 at 19:31, Amit Langote
 wrote:
> On 2018/08/24 7:22, David Rowley wrote:
>> On 24 August 2018 at 02:26, Amit Langote  wrote:
>>> One of the patches I sent last week does the same thing, among a
>>> couple of other things with regard to handling relations in the
>>> executor.  On a cursory look at the patch, your take of it looks
>>> better than mine.  Will test tomorrow.  Here is a link to my email:
>>>
>>> https://www.postgresql.org/message-id/468c85d9-540e-66a2-1dde-fec2b741e...@lab.ntt.co.jp
>>>
>>> 4th of my patches implements the array'fication of executor's range table.
>>
>> Sorry, didn't realise. I'll withdraw this and review yours during the
>> upcoming 'fest.
>
> Since your patch implemented the idea in a better manner, I'm thinking I
> should merge it with mine.  Is that okay with you?

Feel free to do that. I've not yet looked at your patch. I was
planning on looking at your partition planning performance
improvements patches first.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Enable using IS NOT DISTINCT FROM in hash and merge joins

2018-09-04 Thread Chi Gao
Hello,

We are using PostgreSQL to execute some SQL scripts "auto translated" from HIVE 
QL, where the join operator "<=>" is heavily used. The semantic same operator 
in PostgreSQL is "IS NOT DISTINCT FROM".

However, we found when "IS NOT DISTINCT FROM" is used in joins, only nested 
loop plan can be generated, which is confirmed here 
https://www.postgresql.org/message-id/13950.1511879733%40sss.pgh.pa.us and here 
https://postgrespro.com/list/thread-id/2059856 .

In another discussion someone suggests using coalesce(...) to replace NULLs to 
some special value, but in similar situation as in that thread, we have no 
reliable way to conclude a special value for any expression.

So I hacked the PG10 code to support using "IS NOT DISTINCT FROM" in hash and 
merge joins (not touching the indexes). It works in our environment, but I want 
to know if my approach is making sense, or is going to make damage.

There are 6 kinds of changes, and to be honest, none of them I am confident is 
doing in correct way...so please advise:
- I do this by first reversing the meaning of DistinctExpr, from "IS 
DISTINCT FROM" to "IS NOT DISTINCT FROM", which will be simpler to process in 
joins, because "IS NOT DISTINCT FROM" is closer to "=". 
(backend/parser/parse_expr.c, backend/utils/adt/ruleutils.c)
- The execution logic of DistinctExpr internally already reverts the 
result, because above change cancels it out, I revert it back. 
(backend/executor/execExprInterp.c, backend/optimizer/path/clausesel.c)
- In hash joins, I need to tell the executor that "NULL matches NULL" when the 
operator is "IS NOT DISTINCT FROM". I cannot figure out the best way for 
passing such information down, so I just ORed 0x800 to the operator Oid 
List. As no code in other parts is doing so, please advise a better approach, 
should I add a Bitmapset to pass the flags? Or should I define a new Node type 
to include both Oid and a bool flag?  (backend/executor/nodeHashjoin.c, 
backend/executor/nodeHash.c)
- To support merge join, I added a nulleqnull bool flag in SortSupportData to 
bypass the "stop merging earlier when NULLs is reached" logic when the join 
operator is DistinctExpr. I think there is a padding gap after "bool
abbreviate;", so I add the new flag after that, just want to keep binary 
compatibility in case something depends on it... 
(backend/executor/nodeMergejoin.c, include/utils/sortsupport.h)
- In create_join_clause, reconsider_outer_join_clause, and 
reconsider_full_join_clause functions, the derived expression generated by call 
to build_implied_join_equality outputs OpExpr for DistictExpr, because they are 
same in definition, I just patch the resulting node back to DistinctExpr if 
input is DistinctExpr. (backend/optimizer/path/equivclass.c)
- All other changes are for necessary code paths only allow OpExpr, I added 
logic to allow DistinctExpr too.

The patch in attachment is based on commit 
821200405cc3f25fda28c5f58d17d640e25559b8.


Thanks!


Gao, Chi
Beijing Microfun Co. Ltd.



enable_is_not_distinct_from_in_hash_and_merge_joins.patch
Description: enable_is_not_distinct_from_in_hash_and_merge_joins.patch


Re: A strange GiST error message or fillfactor of GiST build

2018-09-04 Thread Kyotaro HORIGUCHI
At Mon, 3 Sep 2018 23:05:23 +0300, Alexander Korotkov 
 wrote in 

> On Mon, Sep 3, 2018 at 9:09 PM Andrey Borodin  wrote:
> > > 3 сент. 2018 г., в 20:16, Alexander Korotkov  
> > > написал(а):
> > > That's a good idea.  Especially if we take into account that
> > > fillfactor is not a forever bad idea for GIST, it just doesn't look
> > > reasonable for current building algorithm.  So, we can decide to
> > > ignore it, but if we would switch to different GiST building
> > > algorithm, which can pack pages tightly, we can take fillfactor into
> > > account again.
> > BTW, I think that build algorithm may be provided by opclass.
> 
> I don't think that providing the whole building algorithm by opclass
> is a good idea.  But we could rather make opclass provide some
> essential routines for build algorithm.  For instance, we may
> implement sorting-based build algorithm for GiST (like one we have for
> B-tree).  It wouldn't work for all the opclass'es, but definitely
> should work for some of them.  Geometrical opclass'es may sort values
> by some kind of space-filling curve.  In this case, opclass can define
> a comparison function.
> 
> > > I think I need to prove my position about GiST fillfactor with some
> > > experiments first, and then propose a patch.
> > FWIW fillfactor can be a cheap emulator for intrapage indexing, when you 
> > have like a lot of RAM. Though I didn't tested that.
> > Also I believe proper intrapage indexing is better :)
> 
> It might be somewhat useful side effect for developers.  But it's
> definitely doesn't look like a feature we should encourage users to
> use.

I agree that fillfactor should be ignored in certain cases
(inserting the first tuple into empty pages or something like
that). Even though GiST doesn't need fillfactor at all, it is
defined independently from AM instances and it gives some
(undocumented) convenient on testing even on GiST.

So, does the following makes sense?

- Preserve fillfactor on GiST but set its default to 100%.

- Ignore fillfactor iff the first tuple for the new page fail
  with it but succeed without it.

- Modify GiST internal routines to bring around GISTInsertState
  so that they can see fillfactor or any other parameters without
  further modification.

  
https://www.postgresql.org/message-id/20180830.144209.208080135.horiguchi.kyot...@lab.ntt.co.jp

# A storm (literally) is coming behind...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

  




Re: [HACKERS] git down

2018-09-04 Thread Sandeep Thakkar
Hi,

I'm seeing this warning (git.postgresql.org[0: 2620:122:b000:7::243]:
errno=Network is unreachable) on one of the buildfarm animals (anole) from
last few weeks. The git version is 1.7.4.2. When I use https, it errors
"fatal: cannot exec 'git-remote-https': Permission denied"

On Mon, Oct 30, 2017 at 2:32 PM, Magnus Hagander 
wrote:

>
>
> On Mon, Oct 30, 2017 at 2:35 AM, Andreas Karlsson 
> wrote:
>
>> On 10/27/2017 10:51 PM, Erik Rijkers wrote:
>>
>>> git.postgresql.org is down/unreachable
>>>
>>> ( git://git.postgresql.org/git/postgresql.git )
>>>
>>
>> Yes, I noticed this too, but https://git.postgresql.org/git
>> /postgresql.git still works fine.
>>
>> I guess it makes sense to remove unencrypted access, but in that case
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary should not
>> advertise supporting the git protocol. I have not seen any announcement
>> either, but that could just be me not paying enough attention.
>
>
> We definitely still support the unencrypted git protocol. I do recommend
> using https, but that doesn't mean git shouldn't work. There seems to have
> been some network issues and the git daemon doesn't do a very good job of
> handling hung sessions. I've cleaned up for now and it seems to be working
> again, and we'll do some more digging into what actually was the root
> cause.
>
>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/ 
>  Work: https://www.redpill-linpro.com/ 
>



-- 
Sandeep Thakkar


Re: PostgreSQL logical decoder output plugin - unchanged toast data

2018-09-04 Thread Georgy Buranov
Hi Andres. Thank you very much for your help.

I tried the following solution and I have some problems.

* I have 9.6 postgres and I do not have separate field for rd_pkindex
* All I have is rd_replidindex field. Usually (when REPLICA IDENTITY
is NOT FULL), it still contains the primary key
* But in the case of REPLICA IDENTITY FULL - rd_replidindex is NULL
and rd_pkindex does not exist

So, is there a way to get the primary key on 9.6 postgres?

> If you want the pkey, that's in rd_pkindex. rd_replidindex will only
> differ if the identity is manually set to another candidate key
> (possibly because there's no pkey).
>



Re: A strange GiST error message or fillfactor of GiST build

2018-09-04 Thread Alexander Korotkov
On Tue, Sep 4, 2018 at 12:16 PM Kyotaro HORIGUCHI
 wrote:
> I agree that fillfactor should be ignored in certain cases
> (inserting the first tuple into empty pages or something like
> that). Even though GiST doesn't need fillfactor at all, it is
> defined independently from AM instances

What exactly do you mean?  Yes, fillfactor is defined in StdRdOptions
struct, which is shared across many access methods.  But some of them
uses fillfactor, while others don't.  For instance, GIN and BRIN don't
support fillfactor at all.

> and it gives some
> (undocumented) convenient on testing even on GiST.

Do you keep in the mind some particular use cases?  If so, could you
please share them.  It would let me and others understand what
behavior we need to preserve and why.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: executor relation handling

2018-09-04 Thread Amit Langote
On 2018/08/16 17:22, Amit Langote wrote:
> 0004-Revise-executor-range-table-relation-opening-closing.patch
> 
> This adds two arrays to EState indexed by RT indexes, one for
> RangeTblEntry's and another for Relation pointers.  The former reduces the
> cost of fetching RangeTblEntry by RT index.  The latter is used by a newly
> introduced function ExecRangeTableRelation(), which replaces heap_open for
> relations that are contained in the range table.  If a given RTE's
> relation is opened by multiple times, only the first call of
> ExecRangeTableRelation will go to relcache.

David Rowely recently, independently [1], proposed one of the ideas
mentioned above (store RangeTblEntry's in an array in EState).  As I
mentioned in reply to his email, I think his implementation of the idea is
better than mine, so I've merged his patch in the above patch, except one
part: instead of removing the es_range_table list altogether, I decided to
keep it around so that ExecSerializePlan doesn't have to build one all
over again from the array like his patch did.

Updated patches attached; 0001-0003 are same as v1.

Thanks,
Amit

[1] Make executor's Range Table an array instead of a List
https://postgr.es/m/CAKJS1f9EypD_=xG6ACFdF=1cbjz+z9hihhsd-rqljor+qya...@mail.gmail.com
From e46911bd7bb621b66c8d693132d4aeb3202d73df Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 19 Jul 2018 16:14:51 +0900
Subject: [PATCH v2 1/4] Don't lock range table relations in the executor

As noted in https://postgre.es/m/4114.1531674142%40sss.pgh.pa.us,
taking relation locks inside the executor proper is redundant, because
appropriate locks should've been taken by the upstream code, that is,
during the parse/rewrite/plan pipeline when adding the relations to
range table or by the AcquireExecutorLocks if using a cached plan.

Also, InitPlan would do certain initiaization steps, such as creating
result rels, ExecRowMarks, etc. only because of the concerns about
locking order, which it needs not to do anymore, because we've
eliminated executor locking at all.  Instead create result rels and
ExecRowMarks, etc. in the ExecInit* routines of the respective nodes.

To indicate which lock was taken on a relation before creating a
RTE_RELATION range table entry for it, add a 'lockmode' field to
RangeTblEntry.  It's set when the relation is opened for the first
time in the parse/rewrite/plan pipeline and its range table entry
created.
---
 src/backend/commands/view.c|   2 +
 src/backend/executor/execMain.c| 269 -
 src/backend/executor/execPartition.c   |   4 +-
 src/backend/executor/execUtils.c   |  24 +--
 src/backend/executor/nodeLockRows.c|   2 +-
 src/backend/executor/nodeModifyTable.c |  39 -
 src/backend/nodes/copyfuncs.c  |   1 +
 src/backend/nodes/equalfuncs.c |   1 +
 src/backend/nodes/outfuncs.c   |   1 +
 src/backend/nodes/readfuncs.c  |   1 +
 src/backend/parser/analyze.c   |   1 +
 src/backend/parser/parse_clause.c  |   1 +
 src/backend/parser/parse_relation.c|   1 +
 src/backend/parser/parse_utilcmd.c |   4 +
 src/backend/rewrite/rewriteHandler.c   |  18 +--
 src/backend/storage/lmgr/lmgr.c|   7 +-
 src/backend/utils/cache/plancache.c|  26 +---
 src/include/executor/executor.h|   2 +-
 src/include/nodes/parsenodes.h |   1 +
 src/include/storage/lmgr.h |   2 +-
 20 files changed, 166 insertions(+), 241 deletions(-)

diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index ffb71c0ea7..818557d796 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -388,9 +388,11 @@ UpdateRangeTableOfViewParse(Oid viewOid, Query *viewParse)
rt_entry1 = addRangeTableEntryForRelation(pstate, viewRel,

  makeAlias("old", NIL),

  false, false);
+   rt_entry1->lockmode = AccessShareLock;
rt_entry2 = addRangeTableEntryForRelation(pstate, viewRel,

  makeAlias("new", NIL),

  false, false);
+   rt_entry2->lockmode = AccessShareLock;
/* Must override addRangeTableEntry's default access-check flags */
rt_entry1->requiredPerms = 0;
rt_entry2->requiredPerms = 0;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c583e020a0..298bf43fce 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -50,10 +50,12 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "optimizer/clauses.h"
+#include "optimizer/prep.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
 #include