Re: TupleTableSlot abstraction

2018-08-05 Thread Andres Freund
Hi,

On 2018-08-06 10:08:24 +0530, Ashutosh Bapat wrote:
> I think, I explained why getattr() needs to be a separate callback.
> There's a reason why slot_getattr() more code than just calling
> slot_getsomeattrs() and return the required one - the cases when the
> requested attribute is NULL or is missing from a tuple.  Depending
> upon the tuple layout access to a simple attribute can be optimized
> without spending cycles to extract attributes prior to that one.
> Columnar store (I haven't seen Haribabu's patch), for example, stores
> columns from multiple rows together and thus doesn't have a compact
> version of tuple. In such cases extracting an individual attribute
> directly is far cheaper than extracting all the previous attributes.

OTOH, it means we continually access the null bitmap instead of just
tts_isnull[i].

Your logic about not deforming columns in this case would hold for *any*
deforming of previous columns as well. That's an optimization that we
probably want to implement at some point (e.g. by building a bitmap of
needed columns in the planner), but I don't think we should do it
together with this already large patchset.


> Why should we force the storage API to extract all the attributes in
> such a case?

Because there's no need yet, and it complicates the API without
corresponding benefit.

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2018-08-05 Thread Ashutosh Bapat
On Sun, Aug 5, 2018 at 3:49 PM, Andres Freund  wrote:
> Hi,
>
> Working on rebasing the pluggable storage patch on the current version
> of this.

Thanks. Please let me know if you see any issues.

>
> On 2018-07-26 17:09:08 +0530, Ashutosh Bapat wrote:
>> Done. I also noticed that slot_getattr() optimizes the cases when the
>> requested attributes is NULL or is missing from a tuple. Given that a
>> custom TupleTableSlot type can have its own optimizations for the
>> same, added a new call back getattr() to obtain value of a given
>> attribute from slot. The callback is called from slot_getattr().
>
> I'm quite against this. This is just proliferation of callbacks without
> much use.  Why do you think this is helpful?  I think it's much better
> to go back to a single callback to deform here.
>

I think, I explained why getattr() needs to be a separate callback.
There's a reason why slot_getattr() more code than just calling
slot_getsomeattrs() and return the required one - the cases when the
requested attribute is NULL or is missing from a tuple.  Depending
upon the tuple layout access to a simple attribute can be optimized
without spending cycles to extract attributes prior to that one.
Columnar store (I haven't seen Haribabu's patch), for example, stores
columns from multiple rows together and thus doesn't have a compact
version of tuple. In such cases extracting an individual attribute
directly is far cheaper than extracting all the previous attributes.
Why should we force the storage API to extract all the attributes in
such a case?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Allow postgres_fdw passwordless non-superuser conns with prior superuser permission

2018-08-05 Thread Craig Ringer
Hi all

Currently postgres_fdw cannot be used with 'cert' authentication, i.e.
client-certificate validation and cert cn => postgres username mapping. You
also can't use things like Kerberos, SSPI, etc with a superuser-created FDW
and username map.

To permit this, I'd like to allow postgres_fdw user mappings to be created
with a new 'permit_passwordless' option. Only the superuser is allowed to
create such a mapping. If it's set to true, we bypass the
check_conn_params(...)
connection-string password check and the connect_pg_server(...) check for
the conn using a password when a non-superuser establishes a connection.

This doesn't re-open CVE-2007-6601 because the superuser has to explicitly
grant the access.

To make SSL client certs work properly with FDWs, I'd also like to add a
libpq parameter 'sslpassword' parameter, which corresponds to the PgJDBC
parameter of the same name. This lets the superuser create user mappings
for ssl client cert auth that use a user-specific 'sslcert', 'sslkey', and
'sslpassword'. Users can't use each others' keys because they don't know
the key password, and they can't create passwordless user mappings anyway.
Without 'sslpassword' a non-superuser user could make a connection to an
'md5 clientcert=1' server using another users' client cert. It's a trivial
change.

Patches to follow shortly.



Detailed explanation.

postgres_fdw assumes that connections that do not specify a password in the
connstr and connections that lack a password on the user mapping are
insecure and rejects them for non-superusers with:

ERROR: password is required
DETAIL: Non-superuser cannot connect if the server does not request a
password.
HINT: Target server's authentication method must be changed.

or

ERROR: password is required
DETAIL: Non-superusers must provide a password in the user mapping.

See check_conn_params and connect_pg_server
in contrib/postgres_fdw/connection.c .

This is because of CVE-2007-6601 for dblink.

It's assuming that connections without passwords must therefore not have
anything to make sure the local postgres user is really the user authorized
to access the remote end as that remote postgres user. It's trying to stop
use of 'peer' or 'ident', which we shouldn't permit because we'd be
allowing the non-superuser to potentially authenticate as the (usually)
'postgres' system-user and gain access to a 'postgres' superuser db
account. Or the connection might be using a service file or .pgpass in the
'postgres' user's home directory, in which case again we don't want a
non-superuser able to use it.

For 'cert' authentication you don't want to assume that the non-superuser
should be allowed to use the client certificate's private key file from the
file system. We don't provide a way to provide the ssl key as a literal in
the postgres_fdw user mapping. Nor is there a way to store the ssl key
encrypted, and put the ssl key passphrase in the user mapping, making sure
that just the authorized user can access it.

The problem is that you can't then use 'cert' auth for postgres_fdw at all.

Nor can you authorize a non-superuser to use passwordless authentication if
you know your local server is configured securely (no 'trust', 'peer' or
'ident') and you *want* to authorize the user to use a service file,
pgpass, etc.

We should allow this if the user mapping creator is the superuser and they
explicitly mark the mapping as permit_passwordless. That won't let normal
users escalate.

If the superuser is the one creating the user mapping between local and
remote user IDs, then they're the ones delegating the access. They can
already GRANT mysuperuser TO public if they're feeling stupid; similarly,
if they CREATE USER MAPPING FOR public SERVER localhost OPTIONS
('permit_passwordless',
'true', ...) ... then it's on them if they have 'peer' or 'ident' enabled
on the server.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: REINDEX and shared catalogs

2018-08-05 Thread Bossart, Nathan
On 8/5/18, 4:12 PM, "Michael Paquier"  wrote:
> Attached is a set of patches I proposed on the original thread, which
> skips shared catalogs if the user running REINDEX is not an owner of
> it.  This is a behavior change, and as I have a hard time believing that
> anybody can take advantage of the current situation, I would like also
> to see this stuff back-patched, as anybody doing shared hosting of
> Postgres is most likely fixing the hole one way or another.  However, I
> am sure as well that many folks here would not like that.
>
> This thread is here to gather opinions and to help reaching a consensus,
> as I would like to do something at least on HEAD for future releases.

+1 for fixing this on master.  Upon reading the versioning policy,
which states that "minor releases fix only frequently-encountered,
security, and data corruption bugs..." [0], I gather that back-
patching such permissions changes might not be possible unless it is
treated as a security issue.  While I would love to see this fixed for
older versions, I don't anticipate much support for back-patching.
Kyotaro Horiguchi and Robert Haas have already voted against it in the
previous thread, anyway.

Nathan

[0] https://www.postgresql.org/support/versioning/



Re: [report] memory leaks in COPY FROM on partitioned table

2018-08-05 Thread Kohei KaiGai
2018-08-06 1:50 GMT+09:00 Alvaro Herrera :
>> Now, it consumed about 60MB rss at the beginning of COPY FROM, and it
>> grows up very slowly during the COPY FROM execution, then grew up to
>> 250MB before completion.
>> We may have another memory blocks which are not released during
>> execution, however,
>> I could not identify whether it is really memory leaking or not, and
>> who's jobs.
>
> Most likely, this is a different memory leak.
>
> I sugges that one way to track this down is first figure out *which*
> context is the one growing, which you can see by running
> MemoryContextStats a few times and noting for meaningful differences.
> Then we can try to narrow down what is allocating stuff in that context.
>
Yes, but the hardest is identification of which memory context is growing
up time by time. Once we could identify, MemoryContextStats() tells us
the name of problematic context and details.

Of course, above my observation is just based on rss usage of postgresql.
It can increase physical page allocation by page fault on the virtual address
space correctly allocated.

>> It may be an idea to put a debug code that raises a notice when
>> MemoryContext allocates more than the threshold.
>
> I don't think this is really practical, because whatever the threshold
> we set, there would be some corner-case scenario where the threshold is
> legitimately crossed.  And some memory leak scenarios that don't cross
> any thresholds.
>
I assume this threshold is configurable by GUC, and disabled on the default.
Once a user found suspicious memory usage increase, we can set a threshold
value. In above case, we may be able to see something around 120MB threshold.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 



Re: [report] memory leaks in COPY FROM on partitioned table

2018-08-05 Thread Alvaro Herrera
On 2018-Aug-04, Kohei KaiGai wrote:

> I could load the same data (544GB csv, 789GB heap) using COPY FROM 
> successfully.
> When I reported the problem, rss usage of postgresql process increased
> about 10MB/s ratio, then OOM killer eliminated after a few hours.

OK, I think we can consider this particular bug closed, then.

> Now, it consumed about 60MB rss at the beginning of COPY FROM, and it
> grows up very slowly during the COPY FROM execution, then grew up to
> 250MB before completion.
> We may have another memory blocks which are not released during
> execution, however,
> I could not identify whether it is really memory leaking or not, and
> who's jobs.

Most likely, this is a different memory leak.

I sugges that one way to track this down is first figure out *which*
context is the one growing, which you can see by running
MemoryContextStats a few times and noting for meaningful differences.
Then we can try to narrow down what is allocating stuff in that context.

> It may be an idea to put a debug code that raises a notice when
> MemoryContext allocates more than the threshold.

I don't think this is really practical, because whatever the threshold
we set, there would be some corner-case scenario where the threshold is
legitimately crossed.  And some memory leak scenarios that don't cross
any thresholds.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Should contrib modules install .h files?

2018-08-05 Thread Andrew Gierth
> "Andrew" == Andrew Gierth  writes:

 Andrew> Here's a patch that fixes (not necessarily in the best way) the
 Andrew> PGXS builds of all the contrib/*_pl{perl,python} modules.

Oh, obviously this patch doesn't fix the windows Install.pm yet, but
that'd be easier to do after finalizing the list of include files to
install.

-- 
Andrew (irc:RhodiumToad)



Re: Should contrib modules install .h files?

2018-08-05 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> This logic could perhaps be best moved into the pgxs makefile
 >> itself, either unconditionally adding -I options to CPPFLAGS, or
 >> conditionally adding them based on a WANT_EXTENSION_HEADERS flag of
 >> some sort set by the module makefile.

 Tom> I think we'd want to press forward on making that happen, so that
 Tom> hstore_plperl and friends can serve as copy-and-pasteable
 Tom> prototype code for out-of-tree transform modules. Do you have an
 Tom> idea how to fix the other problem you mentioned with the plpython
 Tom> makefiles?

Here's a patch that fixes (not necessarily in the best way) the PGXS
builds of all the contrib/*_pl{perl,python} modules.

Open questions:

 - is there a better way of doing the conditional setting of
   PG_CPPFLAGS?

 - the choice of which .h files to install from plperl and plpython is
   not principled - I just installed the ones needed for the contrib
   modules to work. Particularly for plpython this list needs to be
   reviewed - but I'm not a pythonist and that should be done by someone
   who is.

-- 
Andrew (irc:RhodiumToad)

diff --git a/contrib/hstore_plperl/Makefile b/contrib/hstore_plperl/Makefile
index f63cba2745..32ecaa43cb 100644
--- a/contrib/hstore_plperl/Makefile
+++ b/contrib/hstore_plperl/Makefile
@@ -4,7 +4,6 @@ MODULE_big = hstore_plperl
 OBJS = hstore_plperl.o $(WIN32RES)
 PGFILEDESC = "hstore_plperl - hstore transform for plperl"
 
-PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plperl -I$(top_srcdir)/contrib/hstore
 
 EXTENSION = hstore_plperl hstore_plperlu
 DATA = hstore_plperl--1.0.sql hstore_plperlu--1.0.sql
@@ -13,10 +12,12 @@ REGRESS = hstore_plperl hstore_plperlu create_transform
 EXTRA_INSTALL = contrib/hstore
 
 ifdef USE_PGXS
+PG_CPPFLAGS = -I$(includedir_server)/extension
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
 include $(PGXS)
 else
+PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plperl -I$(top_srcdir)/contrib
 subdir = contrib/hstore_plperl
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
diff --git a/contrib/hstore_plperl/hstore_plperl.c b/contrib/hstore_plperl/hstore_plperl.c
index c09bd38d09..61b5557421 100644
--- a/contrib/hstore_plperl/hstore_plperl.c
+++ b/contrib/hstore_plperl/hstore_plperl.c
@@ -5,7 +5,7 @@
 #include "fmgr.h"
 #include "plperl.h"
 #include "plperl_helpers.h"
-#include "hstore.h"
+#include "hstore/hstore.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/contrib/hstore_plpython/Makefile b/contrib/hstore_plpython/Makefile
index b81735ab91..93f3507130 100644
--- a/contrib/hstore_plpython/Makefile
+++ b/contrib/hstore_plpython/Makefile
@@ -4,7 +4,6 @@ MODULE_big = hstore_plpython$(python_majorversion)
 OBJS = hstore_plpython.o $(WIN32RES)
 PGFILEDESC = "hstore_plpython - hstore transform for plpython"
 
-PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plpython $(python_includespec) -I$(top_srcdir)/contrib/hstore -DPLPYTHON_LIBNAME='"plpython$(python_majorversion)"'
 
 EXTENSION = hstore_plpythonu hstore_plpython2u hstore_plpython3u
 DATA = hstore_plpythonu--1.0.sql hstore_plpython2u--1.0.sql hstore_plpython3u--1.0.sql
@@ -13,10 +12,12 @@ REGRESS = hstore_plpython
 REGRESS_PLPYTHON3_MANGLE := $(REGRESS)
 
 ifdef USE_PGXS
+PG_CPPFLAGS = $(python_includespec) -I$(includedir_server)/extension -DPLPYTHON_LIBNAME='"plpython$(python_majorversion)"'
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
 include $(PGXS)
 else
+PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plpython $(python_includespec) -I$(top_srcdir)/contrib -DPLPYTHON_LIBNAME='"plpython$(python_majorversion)"'
 subdir = contrib/hstore_plpython
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
diff --git a/contrib/hstore_plpython/hstore_plpython.c b/contrib/hstore_plpython/hstore_plpython.c
index 218e6612b1..2f24090ff3 100644
--- a/contrib/hstore_plpython/hstore_plpython.c
+++ b/contrib/hstore_plpython/hstore_plpython.c
@@ -3,7 +3,7 @@
 #include "fmgr.h"
 #include "plpython.h"
 #include "plpy_typeio.h"
-#include "hstore.h"
+#include "hstore/hstore.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/contrib/ltree_plpython/Makefile b/contrib/ltree_plpython/Makefile
index 7e988c7993..f9e9e8d734 100644
--- a/contrib/ltree_plpython/Makefile
+++ b/contrib/ltree_plpython/Makefile
@@ -4,8 +4,6 @@ MODULE_big = ltree_plpython$(python_majorversion)
 OBJS = ltree_plpython.o $(WIN32RES)
 PGFILEDESC = "ltree_plpython - ltree transform for plpython"
 
-PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plpython $(python_includespec) -I$(top_srcdir)/contrib/ltree -DPLPYTHON_LIBNAME='"plpython$(python_majorversion)"'
-
 EXTENSION = ltree_plpythonu ltree_plpython2u ltree_plpython3u
 DATA = ltree_plpythonu--1.0.sql ltree_plpython2u--1.0.sql ltree_plpython3u--1.0.sql
 
@@ -13,10 +11,12 @@ REGRESS = ltree_plpython
 REGRESS_PLPYTHON3_MANGLE := $(REGRESS)
 
 ifdef USE_PGXS
+PG_CPPFLAGS = $(python_includespec) -I$(includedir_server)/extension -DPLPYTHON_LIBNAME='"plpython$(python_majorversion)"'
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
 

REINDEX and shared catalogs

2018-08-05 Thread Michael Paquier
Hi all,

This is a continuation of the thread "Canceling authentication due to
timeout aka Denial of Service Attack", which is here to focus on the
case of REINDEX:
https://www.postgresql.org/message-id/20180730003422.GA2878%40paquier.xyz

As visibly the set of patches I proposed on this thread is not
attracting the proper attention, I have preferred beginning a new thread
so as this can get a proper review and agreement.

Per the original thread, it is not difficult to block loading of
critical indexes, authentication being one, with a couple of SQLs:
TRUNCATE, VACUUM and REINDEX as reported.

VACUUM and TRUNCATE will have their own thread later depending on my
time available, and actually those refer to the problem where a relation
lock is attempted to be taken before checking if the user running the
command has the privileges to do so, and if the user has no privilege on
the relation, then the session would wait on a lock but fail later.
However REINDEX is different.

In the case of REINDEX, we *allow* shared catalogs to be reindexed.
Hence, if a user is a database owner, he would also be able to reindex
critical indexes on shared catalogs, where blocking authentication is
possible just with sessions connected to the database reindexed.  For a
schema, the situation is basically worse since 9.5 as a schema owner can
do the same with lighter permissions.  One can just run "SELECT * FROM
pg_stat_activity" in a transaction block in session 1, run REINDEX in
session 2, and cause the system to refuse new connections.  This is
documented as well.

Attached is a set of patches I proposed on the original thread, which
skips shared catalogs if the user running REINDEX is not an owner of
it.  This is a behavior change, and as I have a hard time believing that
anybody can take advantage of the current situation, I would like also
to see this stuff back-patched, as anybody doing shared hosting of
Postgres is most likely fixing the hole one way or another.  However, I
am sure as well that many folks here would not like that.

This thread is here to gather opinions and to help reaching a consensus,
as I would like to do something at least on HEAD for future releases.

reindex-priv-93.patch is for REL9_3_STABLE, reindex-priv-94.patch for
REL9_4_STABLE and reindex-priv-95-master.patch for 9.5~master.

Thanks for reading!
--
Michael
From d82e9df826b29bf9030dc97551bdd0bef894677d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 30 Jul 2018 09:29:32 +0900
Subject: [PATCH] Restrict access to reindex of shared catalogs for
 non-privileged users

A database owner running a database-level REINDEX has the possibility to
also do the operation on shared system catalogs without being an owner
of them, which allows him to block resources it should not have access
to.  For example, PostgreSQL would go unresponsive and even block
authentication if a lock is waited for pg_authid.  This commit makes
sure that a user running a REINDEX SYSTEM or DATABASE only works on the
following relations:
- The user is a superuser
- The user is the table owner
- The user is the database owner, only if the relation worked on is not
shared.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526...@wrigleys.postgresql.org
---
 doc/src/sgml/ref/reindex.sgml|  3 ++-
 src/backend/commands/indexcmds.c | 11 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index a795dfa325..00c024 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -214,7 +214,8 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } nam
Reindexing a single index or table requires being the owner of that
index or table.  Reindexing a database requires being the owner of
the database (note that the owner can therefore rebuild indexes of
-   tables owned by other users).  Of course, superusers can always
+   tables owned by other users).  Reindexing a shared catalog requires
+   being the owner of that shared catalog.  Of course, superusers can always
reindex anything.
   
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index fb51a79364..414c9cd66f 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1873,6 +1873,17 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
 continue;
 		}
 
+		/*
+		 * The table can be reindexed if the user is superuser, the table
+		 * owner, or the database owner (but in the latter case, only if it's
+		 * not a shared relation).  pg_class_ownercheck includes the superuser
+		 * case, and we already know that the user has permission to run
+		 * REINDEX on this database.
+		 */
+		if (!pg_class_ownercheck(HeapTupleGetOid(tuple), GetUserId()) &&
+			classtuple->relisshared)
+			continue;
+
 		if 

Re: Should contrib modules install .h files?

2018-08-05 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> The module would reference its own headers using #include "foo.h",
 >> which would not find extension/module/foo.h, so no problem here.

 Tom> Check, although we also need to document that you should do it
 Tom> that way. Also, at least with gcc, the rule about "look in the
 Tom> calling file's directory first" would prevent problems (except in
 Tom> VPATH builds ... does PGXS support that? Should we recommend "-I."
 Tom> before the "-I$(includedir_server)/extension" switch?)

PGXS is supposed to support VPATH builds. I do not believe -I. is
needed in the normal case, but we should probably document that if the
module uses any -I flags of its own they should normally go first.

 Tom> Hm. Do we know of any extensions big enough that they need
 Tom> subdirectories of headers? I don't mind leaving that for later as
 Tom> long as it's not a present need somewhere. On the other hand,
 Tom> couldn't it "just work" to write "x/y.h" in the list of headers to
 Tom> install?

It doesn't "just work" because (a) all the existing makefile variables
that give files to install assume that any path given is the source path
only, not to be preserved in the copy; (b) we don't want to constrain
the source file layout in a way that would force .h files to be in a
specific place.

Compare the DATA variable in pgxs: DATA = foo/bar.sql means to install
$(srcdir)/foo/bar.sql as $(DESTDIR)$(datadir)/$(datamoduledir)/bar.sql,
_not_ as $(DESTDIR)$(datadir)/$(datamoduledir)/foo/bar.sql. Making
HEADERS behave otherwise would be inconsistent and inflexible.

For example, suppose my extension source dir looks like this:

./Makefile
./foo.control
./src
./src/foo.h
./src/foo1.c
./src/foo2.c
./scripts/foo--1.0.sql

I would have these in the makefile:

HEADERS = src/foo.h
DATA = scripts/foo--1.0.sql

and it seems clear to me that that should install foo.h as
$(includedir_server)/extension/foo/foo.h and not as foo/src/foo.h.

 >> Making an out-of-tree build for hstore_plperl etc. work [...]

 Tom> I think we'd want to press forward on making that happen, so that
 Tom> hstore_plperl and friends can serve as copy-and-pasteable
 Tom> prototype code for out-of-tree transform modules. Do you have an
 Tom> idea how to fix the other problem you mentioned with the plpython
 Tom> makefiles?

The plpython makefiles are including a makefile to munge regression
tests for python3 vs python2. The most obvious fix would be to install
this makefile as lib/pgxs/src/pl/plpython/regress-python3-mangle.mk
(I don't think any other changes would be needed).

I'll do some tests on this.

-- 
Andrew (irc:RhodiumToad)



Re: [GSoC]The project summary

2018-08-05 Thread Aleksander Alekseev
Hello Charles,

Thanks for keeping us informed. As you probably already discovered the
email I used previously doesn't work any longer. Please add
afis...@gmail.com to CC instead.

I will take a look tomorrow (it's pretty late in my timezone currently).

On Sun, Aug 5, 2018 at 9:05 PM, Charles Cui 
wrote:

> Hi mentors and hackers,
>
>The final review is coming. Here is the project summary for the thrift
> plugin work for Postgres database. Please let me know if there are anything
> missing for the final review.
> 1. Implement the thrift binary protocol for both simple data structures
> (e.g., int, double) and complex data structures (e.g., list, map and
> struct) in pg_thrift plugin. The interface is byte based which means user
> need to pass in a byte and can use rich apis to parse out required fields.
> 2. Implement the thrift compact protocol for both simple data structures
> and complex data structures. The interface is also byte based and user can
> use rich apis to parse out fields.
> 3. A set of APIs for both binary protocol and compact protocol to parse
> out fields with kinds of types.
> 4. A customized thrift type (thrift_binary) where user specifies json, but
> stores in the format of byte. This type makes the plugin more user
> friendly, currently we support simple getter on top of this type. There are
> some improvements that can be done in the future to make the type support
> more operations.
> 5. Set up CI to continuously compile for each commit. Currently the plugin
> works in 9.4, 10, and 11.
> 6. A set of unit tests to cover most important use cases(
> https://github.com/charles-cui/pg_thrift/blob/master/sql/pg_thrift.sql).
> 7. A detailed document to showcase how to use this plugin (
> https://github.com/charles-cui/pg_thrift/blob/master/README.md).
> From this document, user knows how to install pg_thrift, how to parse out
> required fields from byte using provided api, how to build index based on
> the thrift bytes by the use of the api, and how to use the customized
> thrift type.
>



-- 
Best regards,
Aleksander Alekseev


[GSoC]The project summary

2018-08-05 Thread Charles Cui
Hi mentors and hackers,

   The final review is coming. Here is the project summary for the thrift
plugin work for Postgres database. Please let me know if there are anything
missing for the final review.
1. Implement the thrift binary protocol for both simple data structures
(e.g., int, double) and complex data structures (e.g., list, map and
struct) in pg_thrift plugin. The interface is byte based which means user
need to pass in a byte and can use rich apis to parse out required fields.
2. Implement the thrift compact protocol for both simple data structures
and complex data structures. The interface is also byte based and user can
use rich apis to parse out fields.
3. A set of APIs for both binary protocol and compact protocol to parse out
fields with kinds of types.
4. A customized thrift type (thrift_binary) where user specifies json, but
stores in the format of byte. This type makes the plugin more user
friendly, currently we support simple getter on top of this type. There are
some improvements that can be done in the future to make the type support
more operations.
5. Set up CI to continuously compile for each commit. Currently the plugin
works in 9.4, 10, and 11.
6. A set of unit tests to cover most important use cases(
https://github.com/charles-cui/pg_thrift/blob/master/sql/pg_thrift.sql).
7. A detailed document to showcase how to use this plugin (
https://github.com/charles-cui/pg_thrift/blob/master/README.md).
>From this document, user knows how to install pg_thrift, how to parse out
required fields from byte using provided api, how to build index based on
the thrift bytes by the use of the api, and how to use the customized
thrift type.


Re: pg_upgrade from 9.4 to 10.4

2018-08-05 Thread Bruce Momjian
On Sat, Aug  4, 2018 at 11:26:06PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Fri, Aug  3, 2018 at 04:56:32PM -0400, Bruce Momjian wrote:
> >> On Fri, Aug  3, 2018 at 01:55:04PM -0400, Tom Lane wrote:
> >>> Right now is probably not a good time to fix this, but it seems like
> >>> something that could be improved.  I'd be kind of inclined to remove
> >>> the pidfile checking business altogether in favor of inspecting the
> >>> state in pg_control; or at least do them both in the same place with
> >>> the same recovery attempt if we don't like what we see.
> 
> >> Yes, I realize this was inconsistent.  It was done this way purely based
> >> on how easy it would be to check each item in each place.  I am fine
> >> with removing the #3 cleanup so we are consistent.  We can also add docs
> >> to say it should be a "clean" shutdown.
> 
> > How do you want me to handle this, considering it has to be backpatched?
> 
> Well, removing the check entirely is certainly not a good idea.
> 
> I looked at the code briefly and concur that making it work "nicely" isn't
> as simple as one could wish; the code you added has some dependencies on
> the context it's in, so that moving it elsewhere would require code
> duplication or refactoring.  Maybe it's not something to try to shoehorn
> into v11.  We can always improve it later.
> 
> (I'd also say that the weekend before a release wrap is no time
> to be committing anything noncritical, so even if you're feeling
> motivated to fix it in v11, best to hold off till after the wrap.)

Yeah, I felt that what we have is probably as good as we reasonably
could get in the short term.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Should contrib modules install .h files?

2018-08-05 Thread Tom Lane
[ back to this after a detour to ON CONFLICT land ]

Andrew Gierth  writes:
> OK, after considerable experiment I think I can answer these points: the
> most "practical" way is to do this (or an equivalent), as you originally
> suggested:
> PG_CPPFLAGS = -I$(includedir_server)/extension

Yeah, that's where I thought we'd end up.

> We could add a mention of this to the PGXS docs and the header comment
> of pgxs.mk as being the recommended practice; would that be enough?
> Or should the logic go into the pgxs makefile as suggested below?

My thought is just to document how to do this.  It'll still be true that
most extensions have no dependencies on other extensions, so they won't
need any such headers; sticking extra -I switches into their builds by
default can only cause trouble.

>  Tom> Something that copes with selecting the right headers if you're
>  Tom> rebuilding a module whose headers are already installed (as you
>  Tom> mentioned upthread).

> The module would reference its own headers using #include "foo.h",
> which would not find extension/module/foo.h, so no problem here.

Check, although we also need to document that you should do it that
way.  Also, at least with gcc, the rule about "look in the calling
file's directory first" would prevent problems (except in VPATH builds
... does PGXS support that?  Should we recommend "-I." before the
"-I$(includedir_server)/extension" switch?)

> One case that doesn't "just work" would be what PostGIS currently does.
> Like other extensions it doesn't (afaik) currently try and install any
> PG-related header files, but if it was modified to do so, some changes
> in those header files would be needed because a lot of them have things
> like #include "../postgis_config.h" which would fail.

Yeah, I don't doubt that extensions will have to make minor mods to
adapt to this scheme.  As long as they're minor, I don't think it's
a problem.

> Another case that doesn't "just work" would be if some extension has a
> file foo.h that does #include "x/y.h" to include another file that's
> part of the same extension, expecting to get extension/foo/x/y.h. Right
> now, the install rule I added to the pgxs makefile puts all header files
> for a module in the same dir; if we wanted to support making a whole
> subdirectory tree under extension/modulename, then that'd require more
> work in the makefiles.

Hm.  Do we know of any extensions big enough that they need subdirectories
of headers?  I don't mind leaving that for later as long as it's not a
present need somewhere.  On the other hand, couldn't it "just work" to
write "x/y.h" in the list of headers to install?

> Making an out-of-tree build for hstore_plperl etc. work under this
> scheme would require changes. The in-tree build has this:
> PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plperl -I$(top_srcdir)/contrib/hstore
> #include "hstore.h"
> This would have to be:
> PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plperl -I$(top_srcdir)/contrib
> #include "hstore/hstore.h"
> for the in-tree build, and
> PG_CPPFLAGS = -I$(includedir_server)/extension
> #include "hstore/hstore.h"
> for the out-of-tree build.

> This logic could perhaps be best moved into the pgxs makefile itself,
> either unconditionally adding -I options to CPPFLAGS, or conditionally
> adding them based on a WANT_EXTENSION_HEADERS flag of some sort set by
> the module makefile.

I think we'd want to press forward on making that happen, so that
hstore_plperl and friends can serve as copy-and-pasteable prototype
code for out-of-tree transform modules.  Do you have an idea how to
fix the other problem you mentioned with the plpython makefiles?

regards, tom lane



Re: GiST VACUUM

2018-08-05 Thread Andrey Borodin
Hi!

> 5 авг. 2018 г., в 16:18, Heikki Linnakangas  написал(а):
> 
> On 31/07/18 23:06, Andrey Borodin wrote:
>>> On a typical GiST index, what's the ratio of leaf vs. internal
>>> pages? Perhaps an array would indeed be better.
> >
>> Typical GiST has around 200 tuples per internal page. I've switched
>> to List since it's more efficient than bitmap.
> 
> Hmm. A ListCell is 16 bytes, plus the AllocChunk header, 16 bytes. 32
> bytes per internal page in total, while a bitmap consumes one bit per page, 
> leaf or internal. If I'm doing
> my math right, assuming the ratio of leaf pages vs internal pages is
> 1:200, a List actually consumes more memory than a bitmap; 256 bits per
> internal page, vs. 200 bits. An array, with 4 bytes per block number,
> would be the winner here.
We do not know size of this array beforehand. I can implement normal ArrayList 
though (with repallocing array) or linked list of chunks. Or anything from data 
structures zoo.
Or just stick with bitmap (my preferred way).
> 
>> But I have to note that default growth strategy of Bitmap is not good: we 
>> will be repallocing byte by byte.
> 
> True, that repallocing seems bad. You could force it to be pre-allocated
> by setting the last bit. Or add a function to explicitly enlarge the bitmap.
OK, I'll think of proper resize function (ensure capacity, to be precise).

Best regards, Andrey Borodin.




Re: Negotiating the SCRAM channel binding type

2018-08-05 Thread Heikki Linnakangas
On 5 August 2018 19:01:04 EEST, Tom Lane  wrote:
>Heikki Linnakangas  writes:
>> Sorry, I see now that there was indeed a test for 
>> scram_channel_binding='', which meant "no channel binding". That was 
>> confusing, I assumed '' was the default.
>
>Ugh, it isn't?  There's a general principle in libpq that setting a
>parameter to an empty string is the same as leaving it unset.  I think
>violating that pattern is a bad idea.

Yeah. In any case, the whole option is gone now, so we're good.

- Heikki



Re: Negotiating the SCRAM channel binding type

2018-08-05 Thread Tom Lane
Heikki Linnakangas  writes:
> Sorry, I see now that there was indeed a test for 
> scram_channel_binding='', which meant "no channel binding". That was 
> confusing, I assumed '' was the default.

Ugh, it isn't?  There's a general principle in libpq that setting a
parameter to an empty string is the same as leaving it unset.  I think
violating that pattern is a bad idea.

regards, tom lane



Re: Negotiating the SCRAM channel binding type

2018-08-05 Thread Heikki Linnakangas

On 05/08/18 17:11, I wrote:

The only negative test there was, was to check for bogus
scram_channel_binding option, "scram_channel_binding=not-exists". Yeah,
it would be nice to have some, but this commit didn't really change that
situation.


Sorry, I see now that there was indeed a test for 
scram_channel_binding='', which meant "no channel binding". That was 
confusing, I assumed '' was the default.



I'm hoping that we add a libpq option to actually force channel binding
soon. That'll make channel binding actually useful to users, but it will
also make it easier to write tests to check that channel binding is
actually used or not used, in the right situations.


Nevertheless, this we should do.

- Heikki



Re: Negotiating the SCRAM channel binding type

2018-08-05 Thread Heikki Linnakangas

On 05/08/18 15:45, Michael Paquier wrote:

On Sun, Aug 05, 2018 at 03:30:43PM +0300, Heikki Linnakangas wrote:

That test just tested that the scram_channel_binding libpq option works, but
I removed the option. I know you wanted to keep it as a feature flag, but as
discussed earlier, I don't think that'd be useful.


Sorry for the noise, I missed that there is still the test "Basic SCRAM
authentication with SSL" so that would be fine.  I would have preferred
keeping around the negative test so as we don't break SSL connections
when the client enforced cbind_flag to 'n' as that would be useful when
adding new SSL implementations as that would avoid manual tests which
people will most likely forget, but well...


The only negative test there was, was to check for bogus 
scram_channel_binding option, "scram_channel_binding=not-exists". Yeah, 
it would be nice to have some, but this commit didn't really change that 
situation.


I'm hoping that we add a libpq option to actually force channel binding 
soon. That'll make channel binding actually useful to users, but it will 
also make it easier to write tests to check that channel binding is 
actually used or not used, in the right situations.



You can remove $supports_tls_server_end_point in 002_scram.pl by the
way.  Should I remove it or perhaps you would prefer to do it?


Ah, good catch. I'll go and remove it, thanks!

- Heikki



Re: [PATCH v18] GSSAPI encryption support

2018-08-05 Thread Heikki Linnakangas

Sorry if this sounds facetious, but:

What is the point of this patch? What's the advantage of GSSAPI 
encryption over SSL? I was hoping to find the answer by reading the 
documentation changes, but all I can see is "how" to set it up, and 
nothing about "why".


- Heikki



Re: Negotiating the SCRAM channel binding type

2018-08-05 Thread Michael Paquier
On Sun, Aug 05, 2018 at 03:30:43PM +0300, Heikki Linnakangas wrote:
> That test just tested that the scram_channel_binding libpq option works, but
> I removed the option. I know you wanted to keep it as a feature flag, but as
> discussed earlier, I don't think that'd be useful.

Sorry for the noise, I missed that there is still the test "Basic SCRAM
authentication with SSL" so that would be fine.  I would have preferred
keeping around the negative test so as we don't break SSL connections
when the client enforced cbind_flag to 'n' as that would be useful when
adding new SSL implementations as that would avoid manual tests which
people will most likely forget, but well...

You can remove $supports_tls_server_end_point in 002_scram.pl by the
way.  Should I remove it or perhaps you would prefer to do it?
--
Michael


signature.asc
Description: PGP signature


Re: Negotiating the SCRAM channel binding type

2018-08-05 Thread Heikki Linnakangas

On 05/08/18 15:08, Michael Paquier wrote:

On Sun, Aug 05, 2018 at 02:00:04PM +0300, Heikki Linnakangas wrote:

I did some further testing with this, compiling with and without
HAVE_BE_TLS_GET_CERTIFICATE_HASH and HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH,
and fixed a few combinations that did not work. And I fixed the other
comment typos etc. that you pointed out.


Two things that I am really unhappy about is first that you completely
wiped out the test suite for channel binding.  We know that channel
binding will be used once HAVE_X509_GET_SIGNATURE_NID is set, hence why
didn't you keep the check on supports_tls_server_end_point to determine
if the connection should be a failure or a success?


That test just tested that the scram_channel_binding libpq option works, 
but I removed the option. I know you wanted to keep it as a feature 
flag, but as discussed earlier, I don't think that'd be useful.



Then, I also find the meddling around HAVE_X509_GET_SIGNATURE_NID and
the other flags over-complicated, but I won't fight hard on that point
if you want to go your way.


I don't feel too strongly about this either, so if you want to write a 
patch to refactor that, I'm all ears. Note that I had to do something, 
so that the server code knows whether to advertise SCRAM-SHA-256-PLUS or 
not, and likewise that the client knows whether to choose channel 
binding or not.


- Heikki



Re: Explain buffers wrong counter with parallel plans

2018-08-05 Thread Amit Kapila
On Fri, Aug 3, 2018 at 2:09 PM, Amit Kapila  wrote:
> On Thu, Aug 2, 2018 at 11:14 PM, Robert Haas  wrote:
>> On Thu, Aug 2, 2018 at 5:41 AM, Amit Kapila  wrote:
>>> I have created three patches (a) move InstrStartParallelQuery from its
>>> original location so that we perform it just before ExecutorRun (b)
>>> patch to fix the gather stats by calling shutdown at appropriate place
>>> and allow stats collection in ExecShutdownNode (c) Probit calling
>>> ExecShutdownNode if there is a possibility of backward scans (I have
>>> done some basic tests with this patch, if we decide to proceed with
>>> it, then some more verification and testing would be required).
>>>
>>> I think we should commit first two patches as that fixes the problem
>>> being discussed in this thread and then do some additional
>>> verification for the third patch (mentioned in option c).  I can
>>> understand if people want to commit the third patch before the second
>>> patch, so let me know what you guys think.
>>
>> I'm happy with the first two patches.
>>
>
> Thanks.  I have pushed those two patches.
>
>>  In the third one, I don't think
>> "See ExecLimit" is a good thing to put a comment like this, because
>> it's too hard to find the comment to which it refers, and because
>> future commits are likely to edit or remove that comment without
>> noticing the references to it from elsewhere.  Instead I would just
>> write, in all three places, /* If we know we won't need to back up, we
>> can release resources at this point. */ or something like that.
>>
>
> Okay, I have changed the comment as per your suggestion in the
> attached patch.  I will do some more testing/verification of this
> patch and will commit over the weekend or on Monday if everything is
> fine.
>

I have verified that the patch works whenever we use scrollable
cursors.  Please find the attached patch with the modified commit
message.  I think now it is a bit late for this minor-release and this
doesn't appear to be a blocker issue, it is better to push it after
the release.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


prohibit_shutdown_backward_scans_v3.patch
Description: Binary data


Re: GiST VACUUM

2018-08-05 Thread Heikki Linnakangas

On 31/07/18 23:06, Andrey Borodin wrote:

On a typical GiST index, what's the ratio of leaf vs. internal
pages? Perhaps an array would indeed be better.

>

Typical GiST has around 200 tuples per internal page. I've switched
to List since it's more efficient than bitmap.


Hmm. A ListCell is 16 bytes, plus the AllocChunk header, 16 bytes. 32
bytes per internal page in total, while a bitmap consumes one bit per 
page, leaf or internal. If I'm doing

my math right, assuming the ratio of leaf pages vs internal pages is
1:200, a List actually consumes more memory than a bitmap; 256 bits per
internal page, vs. 200 bits. An array, with 4 bytes per block number,
would be the winner here.

But I have to note that default growth strategy of Bitmap is not 
good: we will be repallocing byte by byte.


True, that repallocing seems bad. You could force it to be pre-allocated
by setting the last bit. Or add a function to explicitly enlarge the bitmap.

- Heikki



Re: Negotiating the SCRAM channel binding type

2018-08-05 Thread Heikki Linnakangas

Thanks for the review!

On 22/07/18 16:54, Michael Paquier wrote:

On Fri, Jul 20, 2018 at 08:05:04PM +0300, Heikki Linnakangas wrote:

This removes the scram_channel_binding libpq option altogether, since there
is now only one supported channel binding type.


This can also be used to switch off channel binding on the client-side,
which is the behavior you could get only by using a version of libpq
compiled with v10 in the context of an SSL connection.  Do we really
want to lose this switch as well?  I like feature switches.


Well, it'd be useless for users, there is no reason to switch off 
channel binding if both the client and server support it. It might not 
add any security you care about, but it won't do any harm either. The 
non-channel-binding codepath is still exercised with non-SSL connections.



+PostgreSQL is tls-server-end-point.  If other channel
+binding types are supported in the future, the server will advertise
+them as separate SASL mechanisms.
I don't think that this is actually true, per my remark of upthread we
could also use an option-based approach with each SASL mechanism sent by
the server.  I would recommend to remove this last sentence.


Ok. That's how I'm envisioning we'd add future binding types, but since 
we're not sure, I'll leave it out.



+   man-in-the-middle attacks by mixing the signature of the server's
+   certificate into the transmitted password hash. While a fake server can
+   retransmit the real server's certificate, it doesn't have access to the
+   private key matching that certificate, and therefore cannot prove it is
+   the owner, causing SSL connection failure
Nit: insisting on the fact that tls-server-end-point is used in this
context.


Yeah, that's the assumption. Now that we only do tls-server-end-point, I 
think we can assume that without further explanation.



+void
+pg_be_scram_get_mechanisms(Port *port, StringInfo buf)
+{
+   /*
+* Advertise the mechanisms in decreasing order of importance.  So the
+* channel-binding variants go first, if they are supported. Channel
+* binding is only supported with SSL, and only if the SSL implementation
+* has a function to get the certificate's hash
[...]
+#ifdef HAVE_BE_TLS_GET_CERTIFICATE_HASH
+   if (strcmp(selected_mech, SCRAM_SHA_256_PLUS_NAME) == 0 && port->ssl_in_use)
+   state->channel_binding_in_use = true;
+   else
+#endif
Hm.  I think that this should be part of the set of APIs that each SSL
implementation has to provide.  It is not clear to me yet if using the
flavor of SSL in Windows or macos universe will allow end-point to work,
and this could make this proposal more complicated.  And
HAVE_BE_TLS_GET_CERTIFICATE_HASH is something OpenSSL-ish, so I would
recommend to remove all SSL-implementation-specific code from auth*.c
and fe-auth*.c, keeping those in their own file.  One simple way to
address this problem would be to make each SSL implementation return a
boolean to tell if it supports SCRAM channel binding or not, with Port*
of PGconn* in input to be able to filter connections using SSL or not.


The idea here is that if the SSL implementation implements the required 
functions, it #defines HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH (in the 
client) and/or HAVE_BE_TLS_GET_CERTIFICATE_HASH (in the server). So the 
code above is not implementation-specific; it doesn't know the details 
of OpenSSL, it only refers to the compile-time flag which the SSL 
implementation-specific code defines. The flag is part of the API that 
the SSL implementation provides, it's just a compile-time flag rather 
than run-time.


I'll try to clarify the comments on this.


+if (strcmp(channel_binding_type, "tls-server-end-point") != 0)
+ereport(ERROR,
+(errcode(ERRCODE_PROTOCOL_VIOLATION),
+(errmsg("unsupported SCRAM channel-binding type
\"%s\"",
+sanitize_str(channel_binding_type);
Let's give up on sanitize_str.  I am not sure that it is a good idea to
print in error messages server-side something that the client has sent.


Well, the point of sanitize_str is to make it safe. You're right that 
it's not strictly necessary, but I think it would be helpful to print 
out the channel binding type that the client attempted to use.



And a couple of lines down the call to be_tls_get_certificate_hash in
auth-scram.c is only protected by USE_SSL...  So compilation would
likely break once a new SSL implementation is added, and libpq-be.h gets
uglier.


Fixed by changing "#ifdef USE_SSL" to "#ifdef 
HAVE_BE_TLS_GET_CERTIFICATE_HASH".


It's true that there is some risk for libpq-be.h (and libpq-int.h) to 
become ugly, if we add more SSL implementations, and if those 
implementations have complicated conditions on whether they can get the 
certificate hashes. In practice, I think it's going to be OK. All the 
SSL implementations we've talked about - GnuTLS, macOS, Windows - do 
support the functionality, so we don't need complicated 

Re: TupleTableSlot abstraction

2018-08-05 Thread Andres Freund
Hi,

Working on rebasing the pluggable storage patch on the current version
of this.

On 2018-07-26 17:09:08 +0530, Ashutosh Bapat wrote:
> Done. I also noticed that slot_getattr() optimizes the cases when the
> requested attributes is NULL or is missing from a tuple. Given that a
> custom TupleTableSlot type can have its own optimizations for the
> same, added a new call back getattr() to obtain value of a given
> attribute from slot. The callback is called from slot_getattr().

I'm quite against this. This is just proliferation of callbacks without
much use.  Why do you think this is helpful?  I think it's much better
to go back to a single callback to deform here.

Greetings,

Andres Freund



Re: Pluggable Storage - Andres's take

2018-08-05 Thread Andres Freund
Hi,

I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with.  I plan to push a tree including a lot of fixes and improvements
soon.

On 2018-08-03 12:35:50 +1000, Haribabu Kommi wrote:
> while investing the crash, I observed that it is due to the lot of FIXME's
> in
> the code. So I just fixed minimal changes and looking into correcting
> the FIXME's first.
> 
> One thing I observed is lack relation pointer is leading to crash in the
> flow of EvalPlan* functions, because all ROW_MARK types doesn't
> contains relation pointer.
> 
> will continue to check all FIXME fixes.

Thanks.


> > - COPY's multi_insert path should probably deal with a bunch of slots,
> >   rather than forming HeapTuples
> >
> 
> Implemented supporting of slots in the copy multi insert path.

Cool. I've not yet looked at it, but I plan to do so soon.  Will have to
rebase over the other copy changes first :(


- Andres



Facility for detecting insecure object naming

2018-08-05 Thread Noah Misch
Commit e09144e documented the rules for writing safe qualified names, but
those rules are tedious to apply in practice.  Spotting the defects in this
function definition (from an unpublished draft intended for
https://postgr.es/m/20180710014308.ga805...@rfd.leadboat.com) is, I think, too
hard:

CREATE FUNCTION latitude(earth)
RETURNS float8
LANGUAGE SQL
IMMUTABLE STRICT
PARALLEL SAFE
AS 'SELECT CASE
  WHEN @cube_schema@.cube_ll_coord($1, 3) OPERATOR(pg_catalog./)
@extschema@.earth() OPERATOR(pg_catalog.<) -1 THEN -90::pg_catalog.float8
  WHEN @cube_schema@.cube_ll_coord($1, 3) OPERATOR(pg_catalog./)
@extschema@.earth() OPERATOR(pg_catalog.>) 1 THEN 90::pg_catalog.float8
  ELSE pg_catalog.degrees(pg_catalog.asin(@cube_schema@.cube_ll_coord($1, 3)
OPERATOR(pg_catalog./) @extschema@.earth()))
END';

If hackers and non-core extension authors are to write such code, let's make
it easier to check the work.  Different classes of code need different checks.
In each class, qualified function and operator names referring to untrusted
schemas need an exact match of function parameters, including any VARIADIC.
Class-specific rules:

a. SQL intended to run under secure search_path.  No class-specific rules.
   src/bin code is an example of this class, and this is the only secure class
   for end-user applications.

b. SQL intended for a particular search_path, possibly untrusted.  Unqualified
   names need an exact match.  Use a qualified name for any object whose
   schema appears in search_path later than some untrusted schema.  Examples
   of this class include extension scripts, pre-CVE-2018-1058 pg_dump, some
   functions with "SET search_path", and many casual end-user applications.

c. SQL intended to work the same under every search_path setting.  Do not use
   any unqualified name.  Most pg_catalog and contrib functions, but not those
   having a "SET search_path" annotation, are examples of this class.

I believe PostgreSQL can apply each class's rules given a list of trusted
schemas and a flag to enable the checks.  Class (b) naturally degenerates to
class (a) if every schema of search_path appears in the whitelist.  To check
class-(c) code, issue "SET search_path = not_in_whitelist, pg_temp,
pg_catalog, ..."  before the test queries.  That's something of a hack, but I
didn't think of a non-hack that I liked better.

Should this feature warn about "SELECT 'earth()'::pg_catalog.regprocedure"
under the conditions that would make it warn about "SELECT earth()"?  Should
it offer the option to warn or not warn?  Some uses of reg*,
e.g. appendQualifiedRelation(), would consider those to be false positives.

Then there's the question of exact UI naming.  Some possibilities:

  SET lint_trusted_schemas = pg_catalog, admin
  SET lint = reg*, exact_match, qualified_name
  SET lint = all
  SET lint = ''

  SET lint_trusted_schemas = pg_catalog, admin
  SET lint_name_security = on

  SET name_security_warning_trusted_schemas = pg_catalog, admin
  SET name_security_warning = on

  SET name_security_warnings_trusted_schemas = pg_catalog, admin
  SET warnings = reg*, exact_match, qualified_name

Preferences, other ideas?