Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-08-28 Thread Michael Paquier
Hi all,

Here is a summary of what has happened since this thread has been
created.  Three problems reported on this thread have been solved and
resulted in different commits for early lock lookups:
- VACUUM FULL, patched on 12~:
https://www.postgresql.org/message-id/2018081142.ga6...@paquier.xyz
Commit a556549: Improve VACUUM and ANALYZE by avoiding early lock queue
- TRUNCATE, patched on 12~:
https://www.postgresql.org/message-id/20180806165816.ga19...@paquier.xyz
Commit f841ceb: Improve TRUNCATE by avoiding early lock queue
- REINDEX, patched on 11~:
https://www.postgresql.org/message-id/20180805211059.ga2...@paquier.xyz
Commit 661dd23: Restrict access to reindex of shared catalogs for
non-privileged users

Please note that I have been very conservative with the different fixes
as v11 is getting very close to release.  The patch for REINDEX is a
behavior change which will not get further down anyway.  It would still
be nice to get a second lookup at the code and look if there are other
suspicious calls of relation_open or such which could allow
non-privileged users to pile up locks and cause more DOS problems.

Thanks,
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-08-11 Thread Michael Paquier
Hi Nathan,

On Fri, Jul 27, 2018 at 02:40:42PM +, Bossart, Nathan wrote:
> I think I'm essentially suggesting what you have in 0002 but without
> the new RangeVarGetRelidExtended() callback.  I've attached a modified
> version of 0002 that seems to fix the originally reported issue.  (I
> haven't looked into any extra handling needed for ANALYZE or
> partitioned tables.)  Running the same checks for all VACUUMs would
> keep things simple and provide a more uniform user experience.

I have been looking at this patch for VACUUM (perhaps we ought to spawn
a new thread as the cases of REINDEX and TRUNCATE have been addressed),
and I can see a problem with this approach.  If multiple transactions
are used with a list of relations vacuum'ed then simply moving around
the ACL checks would cause a new problem: if the relation vacuum'ed
disappears when processing it with vacuum_rel() after the list of
relations is built then we would get an ERROR instead of a skip because
of pg_class_ownercheck() which is not fail-safe.

Wouldn't it be enough to check just the ACLs in expand_vacuum_rel() and
get_all_vacuum_rels()?  The first is rather similar to what happens for
TRUNCATE, and the second is similar to what has been fixed for VACUUM.
We should also remove the relkind checks out of vacuum_skip_rel() as
those checks are not completely consistent for all those code paths...
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-08-02 Thread Michael Paquier
On Wed, Aug 01, 2018 at 10:55:11AM +, Ahsan Hadi wrote:
> Can you rebase the reindex-priv-93.patch, it is getting some hunk failures.

Patch reindex-priv-93.patch is for REL9_3_STABLE, where it applies
cleanly for me.  reindex-priv-94.patch is for REL9_4_STABLE and
reindex-priv-95-master.patch is for 9.5~master.
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-08-01 Thread ahsan hadi
Hi Michael,

Can you rebase the reindex-priv-93.patch, it is getting some hunk failures.

patching file doc/src/sgml/ref/reindex.sgml
Hunk #1 succeeded at 227 (offset 13 lines).
patching file src/backend/commands/indexcmds.c
Hunk #1 FAILED at 1873.
1 out of 1 hunk FAILED -- saving rejects to file 
src/backend/commands/indexcmds.c.rej

Thanks.

Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-31 Thread Michael Paquier
On Tue, Jul 31, 2018 at 10:34:57AM -0400, Robert Haas wrote:
> Just as a general statement, I don't think we should, as part of a
> patch for the issue discussed on this thread, make any changes AT ALL
> to who has permission to perform which operations.  We *certainly*
> should not back-patch such changes, but we really also should not make
> them on master unless they are discussed on a separate thread with a
> clear subject line and agreed by a clear consensus.

Well, perhaps spawning one thread for each command would make the most
sense then?  I am feeling a bit of confusion here.  There have been
three cases discussed up to now:
1) TRUNCATE, which results in a patch that has no behavioral changes.
2) VACUUM [FULL], where we are also heading toward a patch that has no
behavioral change per the last arguments exchanged, where we make sure
that permission checks are done before acquiring any locks.
3) REINDEX, where a database or a schema owner is able to take an
exclusive lock on a shared catalog with limited permissions.  That
sucks as this block calls to load_critical_index, but I would be ready
to buy to not back-patch such a change if the consensus reached is to
not skip shared catalogs if a database/schema owner has no ownership on
the shared catalog directly.

> This patch should only be about detecting cases where we lack
> permission earlier, before we try to acquire a lock.

Yeah, the REINDEX case is the only one among the three where the set of
relations worked on changes.  Please note that ownership checks happen
in this case for indexes, tables, database and schemas before taking a
lock on them.  Databases/systems and schemas just check for ownership of
respectively the database and the schema.

I'll be happy to create one thread per patch if that helps.  Thinking
about it this would attract more attention to each individual problem
reported.
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-30 Thread Bossart, Nathan
On 7/29/18, 7:35 PM, "Michael Paquier"  wrote:
> Yeah, I was testing that yesterday night and bumped on this case when
> trying do a REINDEX SCHEMA pg_class.  The point is that you can simplify
> the check and remove pg_database_ownercheck as there is already an ACL
> check on the database/system/schema at the top of the routine, so you
> are already sure that pg_database_ownercheck() or
> pg_namespace_ownercheck would return true.  This shaves a few cycles as
> well for each relation checked.

Makes sense.

>> I also noticed that this patch causes shared relations to be skipped
>> silently.  Perhaps we should emit a WARNING or DEBUG message when this
>> happens, at least for REINDEXOPT_VERBOSE.
>
> That's intentional.  I thought about that as well, but I am hesitant to
> do so as we don't bother mentioning the other relations skipped.
> REINDEX VERBOSE also shows up what are the tables processed, so it is
> easy to guess what are the tables skipped, still more painful.  And the
> documentation changes added cover the gap.

Okay, that seems reasonable to me, too.

> +1, I have included your suggestions.  The patch attached applies easily
> down to 9.5 where REINDEX SCHEMA was added.  For 9.4 and 9.3, there is
> no schema case, still the new check is similar.  The commit message is
> slightly changed so as there is no mention of REINDEX SCHEMA.  9.3 needs
> a slight change compared to 9.4 as well.

For 9.3 and 9.4, it might be nice to add the "even if the user is the
owner of the specified database" part, too.

> So attached are patches for 9.5~master, 9.4 and 9.3 with commit
> messages.  Does that look fine to folks of this thread?

Looks good to me.  Since REINDEX can be used to block calls to
load_critical_index() from new connections, back-patching seems appropriate.

Nathan



Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-30 Thread Kyotaro HORIGUCHI
At Mon, 30 Jul 2018 09:34:22 +0900, Michael Paquier  wrote 
in <20180730003422.ga2...@paquier.xyz>
> On Sun, Jul 29, 2018 at 04:11:38PM +, Bossart, Nathan wrote:
> > On 7/27/18, 7:10 PM, "Michael Paquier"  wrote:
> > This is added to ReindexMultipleTables(), which is used for REINDEX
> > SCHEMA, REINDEX SYSTEM, and REINDEX DATABASE.  Presently, REINDEX
> > SCHEMA requires being the owner of the schema, and REINDEX SYSTEM and
> > REINDEX DATABASE require being the owner of the database.  So, if a
> > role is an owner of a database or the pg_catalog schema, they are able
> > to reindex shared catalogs like pg_authid.
> 
> Yeah, I was testing that yesterday night and bumped on this case when
> trying do a REINDEX SCHEMA pg_class.  The point is that you can simplify
> the check and remove pg_database_ownercheck as there is already an ACL
> check on the database/system/schema at the top of the routine, so you
> are already sure that pg_database_ownercheck() or
> pg_namespace_ownercheck would return true.  This shaves a few cycles as
> well for each relation checked.

There's a case where the database owner differs from catalogs'
owner. ALTER DATABASE OWNER TO can make such configuration.
In this case, the previous code allows REINDEX of a shared
catalog for the database owner who is not the catalog's owner.

Superuser  : alice
Database owner : joe
Catalog owner  : andy
Schema owner   : joe

"REINDEX SCHERMA " by joe allows shared catalog to be
reindexed, even he is *not* the owner of the catalog.

The revised patch behaves differently to this case. "REINDEX
SCHERMA " by joe *skips* shared catalog since he is
not the owner of the catalog.

# Uh? Alice doesn't involved anywhere..

I feel that just being a database owner doesn't justify to cause
this problem innocently. Catalog owner is also doubious but we
can carefully configure the ownerships to avoid the problem since
only superuser can change it. So I vote +1 for the revised patch.

> > I also noticed that this patch causes shared relations to be skipped
> > silently.  Perhaps we should emit a WARNING or DEBUG message when this
> > happens, at least for REINDEXOPT_VERBOSE.
> 
> That's intentional.  I thought about that as well, but I am hesitant to
> do so as we don't bother mentioning the other relations skipped.
> REINDEX VERBOSE also shows up what are the tables processed, so it is
> easy to guess what are the tables skipped, still more painful.  And the
> documentation changes added cover the gap.

Even if it is written in the "Notes" section, doesn't the
following need some fix? It is the same for the DATBASE item.

| Parameters
...
| SYSTEM
|   Recreate all indexes on system catalogs within the current
|   database. *Indexes on shared system catalogs are included*.
|   Indexes on user tables are not processed. This form
|   of REINDEX cannot be executed inside a transaction block.


> > I noticed that there is no mention that the owner of a schema can do
> > REINDEX SCHEMA, which seems important to note.  Also, the proposed
> > wording might seem slightly ambiguous for the REINDEX DATABASE case.
> > It might be clearer to say something like the following:
> > 
> > Reindexing a single index or table requires being the owner of
> > that index of table.  REINDEX DATABASE and REINDEX SYSTEM
> > require being the owner of the database, and REINDEX SCHEMA
> > requires being the owner of the schema (note that the user can
> > therefore rebuild indexes of tables owned by other users).
> > Reindexing a shared catalog requires being the owner of the
> > shared catalog, even if the user is the owner of the specified
> > database or schema.  Of course, superusers can always reindex
> > anything.
> 
> +1, I have included your suggestions.  The patch attached applies easily
> down to 9.5 where REINDEX SCHEMA was added.  For 9.4 and 9.3, there is
> no schema case, still the new check is similar.  The commit message is
> slightly changed so as there is no mention of REINDEX SCHEMA.  9.3 needs
> a slight change compared to 9.4 as well.
> 
> So attached are patches for 9.5~master, 9.4 and 9.3 with commit
> messages.  Does that look fine to folks of this thread?

This apparently changes the documented behavior and the problem
seems to be a result of a rather malicious/intentional
combination of operatoins (especially named vacuum on a shared
catalog). I vote -0.5 to backpatch unless we categorize this as a
security issue.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-29 Thread Michael Paquier
On Sun, Jul 29, 2018 at 04:11:38PM +, Bossart, Nathan wrote:
> On 7/27/18, 7:10 PM, "Michael Paquier"  wrote:
> > No problem.  If there are no objections, I am going to fix the REINDEX
> > issue first and back-patch.  Its patch is the least invasive of the
> > set.
> 
> This seems like a reasonable place to start.  I took a closer look at
> 0003.

Thanks for looking at it!

> This is added to ReindexMultipleTables(), which is used for REINDEX
> SCHEMA, REINDEX SYSTEM, and REINDEX DATABASE.  Presently, REINDEX
> SCHEMA requires being the owner of the schema, and REINDEX SYSTEM and
> REINDEX DATABASE require being the owner of the database.  So, if a
> role is an owner of a database or the pg_catalog schema, they are able
> to reindex shared catalogs like pg_authid.

Yeah, I was testing that yesterday night and bumped on this case when
trying do a REINDEX SCHEMA pg_class.  The point is that you can simplify
the check and remove pg_database_ownercheck as there is already an ACL
check on the database/system/schema at the top of the routine, so you
are already sure that pg_database_ownercheck() or
pg_namespace_ownercheck would return true.  This shaves a few cycles as
well for each relation checked.

> I also noticed that this patch causes shared relations to be skipped
> silently.  Perhaps we should emit a WARNING or DEBUG message when this
> happens, at least for REINDEXOPT_VERBOSE.

That's intentional.  I thought about that as well, but I am hesitant to
do so as we don't bother mentioning the other relations skipped.
REINDEX VERBOSE also shows up what are the tables processed, so it is
easy to guess what are the tables skipped, still more painful.  And the
documentation changes added cover the gap.

> I noticed that there is no mention that the owner of a schema can do
> REINDEX SCHEMA, which seems important to note.  Also, the proposed
> wording might seem slightly ambiguous for the REINDEX DATABASE case.
> It might be clearer to say something like the following:
> 
> Reindexing a single index or table requires being the owner of
> that index of table.  REINDEX DATABASE and REINDEX SYSTEM
> require being the owner of the database, and REINDEX SCHEMA
> requires being the owner of the schema (note that the user can
> therefore rebuild indexes of tables owned by other users).
> Reindexing a shared catalog requires being the owner of the
> shared catalog, even if the user is the owner of the specified
> database or schema.  Of course, superusers can always reindex
> anything.

+1, I have included your suggestions.  The patch attached applies easily
down to 9.5 where REINDEX SCHEMA was added.  For 9.4 and 9.3, there is
no schema case, still the new check is similar.  The commit message is
slightly changed so as there is no mention of REINDEX SCHEMA.  9.3 needs
a slight change compared to 9.4 as well.

So attached are patches for 9.5~master, 9.4 and 9.3 with commit
messages.  Does that look fine to folks of this thread?
--
Michael
From d82e9df826b29bf9030dc97551bdd0bef894677d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 30 Jul 2018 09:29:32 +0900
Subject: [PATCH] Restrict access to reindex of shared catalogs for
 non-privileged users

A database owner running a database-level REINDEX has the possibility to
also do the operation on shared system catalogs without being an owner
of them, which allows him to block resources it should not have access
to.  For example, PostgreSQL would go unresponsive and even block
authentication if a lock is waited for pg_authid.  This commit makes
sure that a user running a REINDEX SYSTEM or DATABASE only works on the
following relations:
- The user is a superuser
- The user is the table owner
- The user is the database owner, only if the relation worked on is not
shared.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526...@wrigleys.postgresql.org
---
 doc/src/sgml/ref/reindex.sgml|  3 ++-
 src/backend/commands/indexcmds.c | 11 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index a795dfa325..00c024 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -214,7 +214,8 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } nam
Reindexing a single index or table requires being the owner of that
index or table.  Reindexing a database requires being the owner of
the database (note that the owner can therefore rebuild indexes of
-   tables owned by other users).  Of course, superusers can always
+   tables owned by other users).  Reindexing a shared catalog requires
+   being the owner of that shared catalog.  Of course, superusers can always
reindex anything.
   
 
diff --git a/src/backend/commands/indexcmds.c 

Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-29 Thread Bossart, Nathan
On 7/27/18, 7:10 PM, "Michael Paquier"  wrote:
> No problem.  If there are no objections, I am going to fix the REINDEX
> issue first and back-patch.  Its patch is the least invasive of the
> set.

This seems like a reasonable place to start.  I took a closer look at
0003.

+   /*
+* We allow the user to reindex a table if he is superuser, the 
table
+* owner, or the database owner (but in the latter case, only 
if it's not
+* a shared relation).
+*/
+   if (!(pg_class_ownercheck(relid, GetUserId()) ||
+ (pg_database_ownercheck(MyDatabaseId, GetUserId()) &&
+  !classtuple->relisshared)))
+   continue;

This is added to ReindexMultipleTables(), which is used for REINDEX
SCHEMA, REINDEX SYSTEM, and REINDEX DATABASE.  Presently, REINDEX
SCHEMA requires being the owner of the schema, and REINDEX SYSTEM and
REINDEX DATABASE require being the owner of the database.  So, if a
role is an owner of a database or the pg_catalog schema, they are able
to reindex shared catalogs like pg_authid.

This patch changes things so that only superusers or the owner of the
table can reindex shared relations.  This works as expected for
REINDEX SYSTEM and REINDEX DATABASE, but I think there is a problem
for REINDEX SCHEMA.  If the user is not the owner of the database but
is the owner of the schema, they will not be able to use REINDEX
SCHEMA to reindex any tables that they do not own.  To fix this, we
will likely need to use pg_namespace_ownercheck() instead of
pg_database_ownercheck() for the REINDEX_OBJECT_SCHEMA case.

I also noticed that this patch causes shared relations to be skipped
silently.  Perhaps we should emit a WARNING or DEBUG message when this
happens, at least for REINDEXOPT_VERBOSE.

Reindexing a single index or table requires being the owner of that
index or table.  Reindexing a database requires being the owner of
the database (note that the owner can therefore rebuild indexes of
-   tables owned by other users).  Of course, superusers can always
+   tables owned by other users).  Reindexing a shared catalog requires
+   being the owner of that shared catalog.  Of course, superusers can always
reindex anything.

I noticed that there is no mention that the owner of a schema can do
REINDEX SCHEMA, which seems important to note.  Also, the proposed
wording might seem slightly ambiguous for the REINDEX DATABASE case.
It might be clearer to say something like the following:

Reindexing a single index or table requires being the owner of
that index of table.  REINDEX DATABASE and REINDEX SYSTEM
require being the owner of the database, and REINDEX SCHEMA
requires being the owner of the schema (note that the user can
therefore rebuild indexes of tables owned by other users).
Reindexing a shared catalog requires being the owner of the
shared catalog, even if the user is the owner of the specified
database or schema.  Of course, superusers can always reindex
anything.

Nathan



Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-27 Thread Michael Paquier
On Fri, Jul 27, 2018 at 02:40:42PM +, Bossart, Nathan wrote:
> On 7/26/18, 11:16 PM, "Michael Paquier"  wrote:
> I think I'm essentially suggesting what you have in 0002 but without
> the new RangeVarGetRelidExtended() callback.  I've attached a modified
> version of 0002 that seems to fix the originally reported issue.  (I
> haven't looked into any extra handling needed for ANALYZE or
> partitioned tables.)  Running the same checks for all VACUUMs would
> keep things simple and provide a more uniform user experience.

Okay, let me check that.  Your patch has at least an error in
get_all_vacuum_rels() where toast relations cannot be skipped.

>> The docs mentioned that shared catalogs are processed, so I did not
>> bother, but visibly your comment is that we could be more precise about
>> the ownership in this case?  An attempt is attached.
> 
> Sorry, I should have been clearer.  But yes, your update is what I was
> thinking.

No problem.  If there are no objections, I am going to fix the REINDEX
issue first and back-patch.  Its patch is the least invasive of the
set.
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-26 Thread Michael Paquier
On Fri, Jul 27, 2018 at 12:37:02PM +0900, Kyotaro HORIGUCHI wrote:
> I intended the same as Bossart said. It is not reasonable that
> VACUUM and VACUUM FULL behaves differently on rejection for the
> same reason. The objective of the first (or early) check is
> rejecting (non-superuser's) unprivileged VACUUM FULL. Any
> possible modifications on a syscache tuple before taking a lock
> doesn't seem to harm in the point of view. Anyway the lock
> acquired in expand_vacuum_rel is not held until vacuum_rel.

Thinking about it harder, it is possible to pass down a status pointer
that the callback of RangeVarGetRelidExtended could fill if we pass it
down using the argument list.  This would need special handling for the
case where expand_vacuum_rel() is NIL but that would be workable.  If
vacuum_check_rel() does not need to work with elevel >= ERROR, then the
set of log messages remains the same.

> Apart from the discussion above, it is uneasy for me that the
> messages seem heavily affected by the callers context.

Sure, the patch for vacuum is still heavily WIP.  I am not much happy
about that either.

>> Regarding those patches, I am pretty happy how things turn out for
>> TRUNCATE and REINDEX, way less for VACUUM, so getting 0001 and 0003
>> committed first makes the most sense to me as their logic is rather
>> straight-forward (well way of speaking ;p ).
> 
> About 0001, I'm not sure what the "session-level" means but it
> looks fine and works as expected. 0003 looks fine overall.

What I mean here is checking the interactions with other backends,
truncate_check_session is the best name I could come up with.
truncate_check_activity was another.  I am not attached to a single
name, if you have an ideaof course please feel free.

> I think I heard that gender-free wording is appreciated but it
> might be only about documentation.

Sure.
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-26 Thread Kyotaro HORIGUCHI
Hello.

At Fri, 27 Jul 2018 11:31:23 +0900, Michael Paquier  wrote 
in <20180727023123.ge1...@paquier.xyz>
> On Thu, Jul 26, 2018 at 11:14:31PM +, Bossart, Nathan wrote:
> > On 7/26/18, 3:42 PM, "Michael Paquier"  wrote:
> > Minus a possible documentation update, 0003 seems almost ready, too.
> 
> The docs mentioned that shared catalogs are processed, so I did not
> bother, but visibly your comment is that we could be more precise about
> the ownership in this case?  An attempt is attached.
> 
> > For 0002, would adding vacuum_skip_rel() before and after
> > try_relation_open() in vacuum_rel() be enough?  That way, we could
> > avoid emitting an ERROR for only the VACUUM FULL case (and skip it
> > with a WARNING like everything else).
> 
> Er, we need a lock on it while looking at its data in vacuum_rel(), no?
> So the call to vacuum_skip_rel() needs to happen after
> try_relation_open(), once we are sure that the relation is opened.
> Having two set of checks is actually better as the operation can involve
> multiple operations.

I intended the same as Bossart said. It is not reasonable that
VACUUM and VACUUM FULL behaves differently on rejection for the
same reason. The objective of the first (or early) check is
rejecting (non-superuser's) unprivileged VACUUM FULL. Any
possible modifications on a syscache tuple before taking a lock
doesn't seem to harm in the point of view. Anyway the lock
acquired in expand_vacuum_rel is not held until vacuum_rel.

> The error messages generated by vacuum_skip_rel are not especially smart
> when elevel >= ERROR.  As we need a proper errcode for that case as well
> just using a separate error message is less confusing.  I have

Apart from the discussion above, it is uneasy for me that the
messages seem heavily affected by the callers context.

> implemented my idea in the updated set attached.  Another issue I have
> found is that when doing for example a system-wide analyze, we would
> finish with spurious warnings, as toast relations need to be discarded
> from the first set of relations built.

It seems reasonable.

> Anyway, I have done more work on the patches, mainly I have fixed the
> calls to RangeVarGetRelidExtended using booleans.  I have added

I think that it is useful to let callback reutrn bool value that
indicates whether stop or go. This allows WARNING in the
callback.

> isolation tests for cases which are cheap, aka those not involving a
> system-wide operation.  Running those tests on HEAD, it is easy to see
> that TRUNCATE or VACUUM complete after a session doing a catalog lookup
> commits its transaction.  VACUUM skips a relation and VACUUM FULL issues
> an error.
> 
> Regarding those patches, I am pretty happy how things turn out for
> TRUNCATE and REINDEX, way less for VACUUM, so getting 0001 and 0003
> committed first makes the most sense to me as their logic is rather
> straight-forward (well way of speaking ;p ).

About 0001, I'm not sure what the "session-level" means but it
looks fine and works as expected. 0003 looks fine overall. I
think I heard that gender-free wording is appreciated but it
might be only about documentation.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-26 Thread Michael Paquier
On Thu, Jul 26, 2018 at 11:14:31PM +, Bossart, Nathan wrote:
> On 7/26/18, 3:42 PM, "Michael Paquier"  wrote:
> Minus a possible documentation update, 0003 seems almost ready, too.

The docs mentioned that shared catalogs are processed, so I did not
bother, but visibly your comment is that we could be more precise about
the ownership in this case?  An attempt is attached.

> For 0002, would adding vacuum_skip_rel() before and after
> try_relation_open() in vacuum_rel() be enough?  That way, we could
> avoid emitting an ERROR for only the VACUUM FULL case (and skip it
> with a WARNING like everything else).

Er, we need a lock on it while looking at its data in vacuum_rel(), no?
So the call to vacuum_skip_rel() needs to happen after
try_relation_open(), once we are sure that the relation is opened.
Having two set of checks is actually better as the operation can involve
multiple operations.

The error messages generated by vacuum_skip_rel are not especially smart
when elevel >= ERROR.  As we need a proper errcode for that case as well
just using a separate error message is less confusing.  I have
implemented my idea in the updated set attached.  Another issue I have
found is that when doing for example a system-wide analyze, we would
finish with spurious warnings, as toast relations need to be discarded
from the first set of relations built.

Anyway, I have done more work on the patches, mainly I have fixed the
calls to RangeVarGetRelidExtended using booleans.  I have added
isolation tests for cases which are cheap, aka those not involving a
system-wide operation.  Running those tests on HEAD, it is easy to see
that TRUNCATE or VACUUM complete after a session doing a catalog lookup
commits its transaction.  VACUUM skips a relation and VACUUM FULL issues
an error.

Regarding those patches, I am pretty happy how things turn out for
TRUNCATE and REINDEX, way less for VACUUM, so getting 0001 and 0003
committed first makes the most sense to me as their logic is rather
straight-forward (well way of speaking ;p ).
--
Michael
From 4bb42ed30aae37c7a230ebe64b27a965b275f375 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 27 Jul 2018 09:50:05 +0900
Subject: [PATCH 1/3] Refactor TRUNCATE execution to avoid early lock lookups

A caller of TRUNCATE can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a truncation of a
critical catalog table to block even all incoming connection attempts.

This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is
used with a callback doing the necessary checks at an earlier stage,
avoiding locking issues at early stages.

Author: Michael Paquier
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526...@wrigleys.postgresql.org
---
 src/backend/commands/tablecmds.c  | 83 +++
 .../isolation/expected/truncate-conflict.out  | 45 ++
 src/test/isolation/isolation_schedule |  1 +
 .../isolation/specs/truncate-conflict.spec| 27 ++
 4 files changed, 138 insertions(+), 18 deletions(-)
 create mode 100644 src/test/isolation/expected/truncate-conflict.out
 create mode 100644 src/test/isolation/specs/truncate-conflict.spec

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eb2d33dd86..82bb07049d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -300,7 +300,10 @@ struct DropRelationCallbackState
 #define child_dependency_type(child_is_partition)	\
 	((child_is_partition) ? DEPENDENCY_AUTO : DEPENDENCY_NORMAL)
 
-static void truncate_check_rel(Relation rel);
+static void truncate_check_rel(Oid relid, Form_pg_class reltuple);
+static void truncate_check_session(Relation rel);
+static void RangeVarCallbackForTruncate(const RangeVar *relation,
+			Oid relId, Oid oldRelId, void *arg);
 static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 bool is_partition, List **supOids, List **supconstr,
 int *supOidCount);
@@ -1336,15 +1339,25 @@ ExecuteTruncate(TruncateStmt *stmt)
 		bool		recurse = rv->inh;
 		Oid			myrelid;
 
-		rel = heap_openrv(rv, AccessExclusiveLock);
-		myrelid = RelationGetRelid(rel);
+		myrelid = RangeVarGetRelidExtended(rv, AccessExclusiveLock,
+		   0, RangeVarCallbackForTruncate,
+		   NULL);
+
+		/* open the relation, we already hold a lock on it */
+		rel = heap_open(myrelid, NoLock);
+
 		/* don't throw error for "TRUNCATE foo, foo" */
 		if (list_member_oid(relids, myrelid))
 		{
 			heap_close(rel, AccessExclusiveLock);
 			continue;
 		}
-		truncate_check_rel(rel);
+
+		/*
+		 * RangeVarGetRelidExtended has done some basic checks with its
+		 * callback, and only remain session-level checks.
+		 */
+		truncate_check_session(rel);
 		rels = lappend(rels, rel);
 		relids = lappend_oid(relids, myrelid);
 		/* Log this relation only if needed for logical 

Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-26 Thread Bossart, Nathan
On 7/26/18, 3:42 PM, "Michael Paquier"  wrote:
> Thanks for the lookup.  0003 is the most simple in the set by the way.

Minus a possible documentation update, 0003 seems almost ready, too.

For 0002, would adding vacuum_skip_rel() before and after
try_relation_open() in vacuum_rel() be enough?  That way, we could
avoid emitting an ERROR for only the VACUUM FULL case (and skip it
with a WARNING like everything else).

Nathan



Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-26 Thread Michael Paquier
On Thu, Jul 26, 2018 at 03:06:59PM +, Bossart, Nathan wrote:
> I took a look at 0001.

Thanks for the lookup.  0003 is the most simple in the set by the way.

> On 7/26/18, 12:24 AM, "Michael Paquier"  wrote:
> - myrelid = RelationGetRelid(rel);
> + myrelid = RangeVarGetRelidExtended(rv, AccessExclusiveLock,
> + false, RangeVarCallbackForTruncate, NULL);
> 
> Should the flags argument be 0 instead of false?

Yes, those should be 0.  All patches are missing that.  It does not have
a bad consequence on the patch, still that's incorrect.
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-26 Thread Bossart, Nathan
On 7/26/18, 10:07 AM, "Bossart, Nathan"  wrote:
> The first time we use this callback, the relation won't be locked, so
> isn't it possible that we won't get a valid tuple here?  I did notice
> that callbacks like RangeVarCallbackForRenameRule,
> RangeVarCallbackForPolicy, and RangeVarCallbackForRenameTrigger assume
> that the relation can be concurrently dropped, but
> RangeVarCallbackOwnsRelation does not.  Instead, we assume that the
> syscache search will succeed if the given OID is valid.  Is this a
> bug, or am I missing something?

Please pardon the noise.  I see that we don't accept invalidation
messages until later on in RangeVarGetRelidExtended(), at which point
we'll retry and get InvalidOid for concurrently dropped relations.

Nathan



Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-26 Thread Bossart, Nathan
I took a look at 0001.

On 7/26/18, 12:24 AM, "Michael Paquier"  wrote:
> 1) 0001 does the work for TRUNCATE, but using RangeVarGetRelidExtended
> with a custom callback based on the existing truncate_check_rel().  I
> had to split the session-level checks into a separate routine as no
> Relation is available, but that finishes by being a neat result without
> changing any user-facing behavior.

Splitting the checks like this seems reasonable.  As you pointed out,
it doesn't change the behavior of the session checks, which AFAICT
aren't necessary for the kind of permissions checks we want to add to
the RangeVarGetRelidExtended() call.

-   myrelid = RelationGetRelid(rel);
+   myrelid = RangeVarGetRelidExtended(rv, AccessExclusiveLock,
+   
   false, RangeVarCallbackForTruncate,
+   
   NULL);

Should the flags argument be 0 instead of false?

+   /* Nothing to do if the relation was not found. */
+   if (!OidIsValid(relId))
+   return;
+
+   tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relId));
+   if (!HeapTupleIsValid(tuple))   /* should not happen */
+   elog(ERROR, "cache lookup failed for relation %u", relId);

The first time we use this callback, the relation won't be locked, so
isn't it possible that we won't get a valid tuple here?  I did notice
that callbacks like RangeVarCallbackForRenameRule,
RangeVarCallbackForPolicy, and RangeVarCallbackForRenameTrigger assume
that the relation can be concurrently dropped, but
RangeVarCallbackOwnsRelation does not.  Instead, we assume that the
syscache search will succeed if the given OID is valid.  Is this a
bug, or am I missing something?

Nathan



Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-25 Thread Michael Paquier
Hi,

So, I have spent the last couple of days trying to figure out a nice
solution for VACUUM, TRUNCATE and REINDEX, and attached is a set of
three patches to address the issues of this thread:
1) 0001 does the work for TRUNCATE, but using RangeVarGetRelidExtended
with a custom callback based on the existing truncate_check_rel().  I
had to split the session-level checks into a separate routine as no
Relation is available, but that finishes by being a neat result without
changing any user-facing behavior.
2) 0002 does the work for VACUUM.  It happens that vacuum_rel() already
does all the skip-related checks we need to know about to decide if a
relation needs to be vacuum'ed or not, so I refactored things as
follows:
2-1) For a VACUUM manual listing relations, issue an ERROR if it cannot
be vacuum'ed. Previously vacuum_rel() would just log a WARNING and call
it a day *after* locking the relation.  But as we need to rely on
RangeVarGetRelidExtended() an ERROR is necessary.  The ERROR happens
only if VACUUM FULL is used.
2-2) When a relation list is not specified in a manual VACUUM command,
then the decision to skip the relation is done in get_all_vacuum_rels()
when building the relation list with the pg_class lookup.  This logs a
DEBUG message when the relation is skipped, which is more information
that what we have now.  The checks need to happen again in vacuum_rel as
the VACUUM work could be spawned across multiple transactions, where a
WARNING is logged.
3) REINDEX is already smart enough to check for ownership of relations
if one is manually listed and reports an ERROR.  However it can cause
the instance to be stuck when doing a database-wide REINDEX on a
database using just the owner of this database.  In this case it seems
to me that we need to make ReindexMultipleTables in terms of ACL
checks, as per 0003.

I quite like the shape of the patches proposed here, and the refactoring
is I think pretty clear.  Each patch can be treated independently as
well.  Comments are welcome.  (Those patches are not indented yet, which
does not matter much at this stage anyway.)

Thanks,
--
Michael
From 4dc84bb5fb0d33a5f6fedd31828adda9db8fd2d4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 26 Jul 2018 11:51:45 +0900
Subject: [PATCH 1/3] Refactor TRUNCATE execution to avoid early lock lookups

A caller of TRUNCATE can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a truncation of a
critical catalog table to block even all incoming connection attempts.

This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is
used with a callback doing the necessary checks at an earlier stage,
avoiding locking issues at early stages.

Author: Michael Paquier
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526...@wrigleys.postgresql.org
---
 src/backend/commands/tablecmds.c | 83 +---
 1 file changed, 65 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eb2d33dd86..b06c9dbf63 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -300,7 +300,10 @@ struct DropRelationCallbackState
 #define child_dependency_type(child_is_partition)	\
 	((child_is_partition) ? DEPENDENCY_AUTO : DEPENDENCY_NORMAL)
 
-static void truncate_check_rel(Relation rel);
+static void truncate_check_rel(Oid relid, Form_pg_class reltuple);
+static void truncate_check_session(Relation rel);
+static void RangeVarCallbackForTruncate(const RangeVar *relation,
+			Oid relId, Oid oldRelId, void *arg);
 static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 bool is_partition, List **supOids, List **supconstr,
 int *supOidCount);
@@ -1336,15 +1339,25 @@ ExecuteTruncate(TruncateStmt *stmt)
 		bool		recurse = rv->inh;
 		Oid			myrelid;
 
-		rel = heap_openrv(rv, AccessExclusiveLock);
-		myrelid = RelationGetRelid(rel);
+		myrelid = RangeVarGetRelidExtended(rv, AccessExclusiveLock,
+		   false, RangeVarCallbackForTruncate,
+		   NULL);
+
+		/* open the relation, we already hold a lock on it */
+		rel = heap_open(myrelid, NoLock);
+
 		/* don't throw error for "TRUNCATE foo, foo" */
 		if (list_member_oid(relids, myrelid))
 		{
 			heap_close(rel, AccessExclusiveLock);
 			continue;
 		}
-		truncate_check_rel(rel);
+
+		/*
+		 * RangeVarGetRelidExtended has done some basic checks with its
+		 * callback, and only remain session-level checks.
+		 */
+		truncate_check_session(rel);
 		rels = lappend(rels, rel);
 		relids = lappend_oid(relids, myrelid);
 		/* Log this relation only if needed for logical decoding */
@@ -1367,7 +1380,9 @@ ExecuteTruncate(TruncateStmt *stmt)
 
 /* find_all_inheritors already got lock */
 rel = heap_open(childrelid, NoLock);
-truncate_check_rel(rel);
+truncate_check_rel(RelationGetRelid(rel), 

Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-25 Thread Michael Paquier
On Tue, Jul 24, 2018 at 06:27:16PM +0900, Kyotaro HORIGUCHI wrote:
> I may be misunderstanding something because (really ?) it's still
> extremely hot in Japan, today.

It is better since yesterday, in exchange of a typhoon heading straight
to Tokyo :) 
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-24 Thread Kyotaro HORIGUCHI
At Tue, 24 Jul 2018 14:23:02 +0900, Michael Paquier  wrote 
in <20180724052302.gb4...@paquier.xyz>
> On Mon, Jul 23, 2018 at 09:51:54PM -0700, Andres Freund wrote:
> > On July 23, 2018 9:50:10 PM PDT, Michael Paquier  
> > wrote:
> >> Oh, yes, that would be bad.  My mind has slipped here.  I have seen
> >> manual VACUUMs on system catalogs for applications using many temp
> >> tables...  So we would want to have only VACUUM FULL being
> >> conditionally
> >> happening?  The question comes then about what to do when a VACUUM FULL
> >> is run without a list of relations because expand_vacuum_rel() is not
> >> actually the only problem.  Would we want to ignore system tables as
> >> well except if allow_system_table_mods is on?  When no relation list is
> >> specified, get_all_vacuum_rels() builds the list of relations which
> >> causes vacuum_rel() to complain on try_relation_open(), so patching
> >> just expand_vacuum_rel() solves only half of the problem for manual
> >> VACUUMs.
> > 
> > I think any such restriction is entirely unacceptable. FULL or not.
> 
> Well, letting any users attempt to take an exclusive lock which
> makes others to be stuck as well is not acceptable either, so
> two possible answers would be to fail
> or skip such relations.  The first concept applies if a relation list is
> given by the user, and the second if no list is given.
> 
> Do you have any thoughts on the matter?

I'm not sure what is the exact problem here. If it is that a
non-privilege user can stuck on VACUUM FULL on the scenario, we
should just give up VACUUM_FULLing on unprivileged relations
*before* trying to take a lock on it. There's no reason for
allowing VACUUM FULL on a relation on that the same user is not
allowed to run VACUUM. I don't think it's a prbolem that
superuser can be involed in the blocking scenario running VACUUM
FULL.

I may be misunderstanding something because (really ?) it's still
extremely hot in Japan, today.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-23 Thread Michael Paquier
On Tue, Jul 24, 2018 at 01:34:13AM -0400, Alvaro Herrera wrote:
> But I don't see why RangeVarCallbackOwnsTable isn't sufficient.

The set of relkinds checked by truncate_check_rel and
RangeVarCallbackOwnsTable is different (toast and matviews).  And in the
case of VACUUM, partitioned tables can call RangeVarGetRelidExtended
when one is listed as part of a manual VACUUM command.
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-23 Thread Michael Paquier
On Tue, Jul 24, 2018 at 02:23:02PM +0900, Michael Paquier wrote:
> Well, letting any users take an exclusive lock on system catalogs at
> will is not acceptable either, so two possible answers would be to fail
> or skip such relations.  The first concept applies if a relation list is
> given by the user, and the second if no list is given.

The first sentence is incorrect.  That's actually "letting any users
attempt to take an exclusive lock which makes others to be stuck as
well".
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-23 Thread Michael Paquier
On Mon, Jul 23, 2018 at 09:51:54PM -0700, Andres Freund wrote:
> On July 23, 2018 9:50:10 PM PDT, Michael Paquier  wrote:
>> Oh, yes, that would be bad.  My mind has slipped here.  I have seen
>> manual VACUUMs on system catalogs for applications using many temp
>> tables...  So we would want to have only VACUUM FULL being
>> conditionally
>> happening?  The question comes then about what to do when a VACUUM FULL
>> is run without a list of relations because expand_vacuum_rel() is not
>> actually the only problem.  Would we want to ignore system tables as
>> well except if allow_system_table_mods is on?  When no relation list is
>> specified, get_all_vacuum_rels() builds the list of relations which
>> causes vacuum_rel() to complain on try_relation_open(), so patching
>> just expand_vacuum_rel() solves only half of the problem for manual
>> VACUUMs.
> 
> I think any such restriction is entirely unacceptable. FULL or not.

Well, letting any users take an exclusive lock on system catalogs at
will is not acceptable either, so two possible answers would be to fail
or skip such relations.  The first concept applies if a relation list is
given by the user, and the second if no list is given.

Do you have any thoughts on the matter?
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-23 Thread Andres Freund



On July 23, 2018 9:50:10 PM PDT, Michael Paquier  wrote:
>On Mon, Jul 23, 2018 at 09:17:53PM -0700, Andres Freund wrote:
>> I might be mis-parsing this due to typos. Are you actually suggesting
>> vacuum on system tables should depend on that GUC? If so, why? That's
>> seems like a terrible idea.  It's pretty normal to occasionally have
>> to vacuum them?
>
>Oh, yes, that would be bad.  My mind has slipped here.  I have seen
>manual VACUUMs on system catalogs for applications using many temp
>tables...  So we would want to have only VACUUM FULL being
>conditionally
>happening?  The question comes then about what to do when a VACUUM FULL
>is run without a list of relations because expand_vacuum_rel() is not
>actually the only problem.  Would we want to ignore system tables as
>well except if allow_system_table_mods is on?  When no relation list is
>specified, get_all_vacuum_rels() builds the list of relations which
>causes vacuum_rel() to complain on try_relation_open(), so patching
>just expand_vacuum_rel() solves only half of the problem for manual
>VACUUMs.

I think any such restriction is entirely unacceptable. FULL or not.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-23 Thread Michael Paquier
On Mon, Jul 23, 2018 at 09:17:53PM -0700, Andres Freund wrote:
> I might be mis-parsing this due to typos. Are you actually suggesting
> vacuum on system tables should depend on that GUC? If so, why? That's
> seems like a terrible idea.  It's pretty normal to occasionally have
> to vacuum them?

Oh, yes, that would be bad.  My mind has slipped here.  I have seen
manual VACUUMs on system catalogs for applications using many temp
tables...  So we would want to have only VACUUM FULL being conditionally
happening?  The question comes then about what to do when a VACUUM FULL
is run without a list of relations because expand_vacuum_rel() is not
actually the only problem.  Would we want to ignore system tables as
well except if allow_system_table_mods is on?  When no relation list is
specified, get_all_vacuum_rels() builds the list of relations which
causes vacuum_rel() to complain on try_relation_open(), so patching
just expand_vacuum_rel() solves only half of the problem for manual
VACUUMs.
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-23 Thread Andres Freund



On July 23, 2018 9:14:03 PM PDT, Michael Paquier  wrote:
>- While it would make sense, at least to me, to make VACUUM fall into
>if
>allow_system_table_mods is allowed, 

I might be mis-parsing this due to typos. Are you actually suggesting vacuum on 
system tables should depend on that GUC? If so, why? That's seems like a 
terrible idea. It's pretty normal to occasionally have to vacuum them?


Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-23 Thread Michael Paquier
On Mon, Jul 23, 2018 at 11:29:33AM -0400, Robert Haas wrote:
> ExecuteTruncate needs to be refactored to use RangeVarGetRelidExtended
> with a non-NULL callback rather than heap_openrv, and
> expand_vacuum_rel needs to use RangeVarGetRelidExtended with a
> callback instead of RangeVarGetRelid.  See
> cbe24a6dd8fb224b9585f25b882d5ffdb55a0ba5 as an example of what to do.
> I fixed a large number of cases of this problem back around that time,
> but then ran out of steam and had to move onto other things before I
> got them all.  Patches welcome.

Thanks for pointing those out, I looked at both code paths recently for
some other work...  The amount of work does not consist just in using
for example RangeVarCallbackOwnsRelation for VACUUM and TRUNCATE.  There
are a couple of reasons behind that:
- While it would make sense, at least to me, to make VACUUM fall into if
allow_system_table_mods is allowed, that's not the case of ANALYZE as I
think that we should be able to call ANALYZE on a system catalog as
well.  So we would basically a new flavor of
RangeVarCallbackOwnsRelation for VACUUM which makes this difference
between vacuum and analyze with an argument in the callback, the options
of VacuumStmt would be nice.  This would not be used by autovacuum
anyway, but adding an assertion and mentioning that in the comments
would not hurt.  There is an argument for just restricting VACUUM FULL
as well and not plain VACUUM, as that's the one hurting here.
- TRUNCATE is closer to a solution, as it has its own flavor of relation
checks with truncate_check_rel.  So the callback would replace
truncate_check_rel but CheckTableNotInUse should be moved out of it.
TRUNCATE already uses allow_system_table_mods for its checks.

Thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-23 Thread Jeff Janes
On Fri, Jul 20, 2018 at 5:56 PM, Marko Tiikkaja  wrote:

> On Fri, Jul 20, 2018 at 2:17 AM, Jeremy Schneider 
> wrote:
>
>> I'd like to bump this old bug that Lloyd filed for more discussion. It
>> seems serious enough to me that we should at least talk about it.
>>
>> Anyone with simply the login privilege and the ability to run SQL can
>> instantly block all new incoming connections to a DB including new
>> superuser connections.
>>
>
> So..  don't VACUUM FULL pg_authid without lock_timeout?
>

That's like saying the solution to a security hole is for no one to attempt
to exploit it.

Note that you do not need to have permissions to do the vacuum full.  This
works merely from the attempt to do so, before the permissions are checked.

Cheers,

Jeff


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-23 Thread Robert Haas
On Thu, Jul 19, 2018 at 7:17 PM, Jeremy Schneider  wrote:
> I'd like to bump this old bug that Lloyd filed for more discussion. It
> seems serious enough to me that we should at least talk about it.
>
> Anyone with simply the login privilege and the ability to run SQL can
> instantly block all new incoming connections to a DB including new
> superuser connections.
>
> session 1:
> select pg_sleep(99) from pg_stat_activity;
>
> session 2:
> vacuum full pg_authid; -or- truncate table pg_authid;
>
> (there are likely other SQL you could run in session 2 as well.)

ExecuteTruncate needs to be refactored to use RangeVarGetRelidExtended
with a non-NULL callback rather than heap_openrv, and
expand_vacuum_rel needs to use RangeVarGetRelidExtended with a
callback instead of RangeVarGetRelid.  See
cbe24a6dd8fb224b9585f25b882d5ffdb55a0ba5 as an example of what to do.
I fixed a large number of cases of this problem back around that time,
but then ran out of steam and had to move onto other things before I
got them all.  Patches welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-20 Thread Marko Tiikkaja
On Fri, Jul 20, 2018 at 2:17 AM, Jeremy Schneider 
wrote:

> I'd like to bump this old bug that Lloyd filed for more discussion. It
> seems serious enough to me that we should at least talk about it.
>
> Anyone with simply the login privilege and the ability to run SQL can
> instantly block all new incoming connections to a DB including new
> superuser connections.
>

So..  don't VACUUM FULL pg_authid without lock_timeout?

I can come up with dozens of ways to achieve the same effect, all of them
silly.


.m