Hi, many thanks for the review!
I went through your comments (a lot of which pertained to the original larger patch I took code from), attached is a reworked version 2. Other changes: 1. Ignore STATUS_EOF, this led to auth_delay being applied twice (maybe due to the gss/kerberos auth psql is trying by default? Is that legit and should this change be reverted?) - i.e. handle STATUS_OK and STATUS_ERROR explicitly. 2. Guard ShmemInitStruct() with LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE) / LWLockRelease(AddinShmemInitLock), as is done in pg_prewarm and pg_stat_statements as well. 3. Added an additional paragraph discussing the value of auth_delay.milliseconds when auth_delay.exponential_backoff is on, see below. I wonder whether maybe auth_delay.max_seconds should either be renamed to auth_delay.exponential_backoff_max_seconds (but then it is rather long) in order to make it clearer it only applies in that context or alternatively just apply to auth_delay.milliseconds as well (though that would be somewhat weird). Further comments to your comments: On Tue, Jan 16, 2024 at 12:00:49PM +0530, Abhijit Menon-Sen wrote: > At 2024-01-04 08:30:36 +0100, mba...@gmx.net wrote: > > > > +typedef struct AuthConnRecord > > +{ > > + char remote_host[NI_MAXHOST]; > > + bool used; > > + double sleep_time; /* in milliseconds */ > > +} AuthConnRecord; > > Do we really need a "used" field here? You could avoid it by setting > remote_host[0] = '\0' in cleanup_conn_record. Ok, removed. > > static void > > auth_delay_checks(Port *port, int status) > > { > > + double delay; > > I would just initialise this to auth_delay_milliseconds here, instead of > the harder-to-notice "else" below. Done. > > @@ -43,8 +69,150 @@ auth_delay_checks(Port *port, int status) > > */ > > if (status != STATUS_OK) > > { > > - pg_usleep(1000L * auth_delay_milliseconds); > > + if (auth_delay_exp_backoff) > > + { > > + /* > > + * Exponential backoff per remote host. > > + */ > > + delay = record_failed_conn_auth(port); > > + if (auth_delay_max_seconds > 0) > > + delay = Min(delay, 1000L * > > auth_delay_max_seconds); > > + } > > I would make this comment more specific, maybe something like "Delay by > 2^n seconds after each authentication failure from a particular host, > where n is the number of consecutive authentication failures". Done. > It's slightly discordant to see the new delay being returned by a > function named "record_<something>" (rather than "calculate_delay" or > similar). Maybe a name like "backoff_after_failed_auth" would be better? > Or "increase_delay_on_auth_fail"? I called it increase_delay_after_failed_conn_auth() now. > > +static double > > +record_failed_conn_auth(Port *port) > > +{ > > + AuthConnRecord *acr = NULL; > > + int j = -1; > > + > > + acr = find_conn_record(port->remote_host, &j); > > + > > + if (!acr) > > + { > > + if (j == -1) > > + > > + /* > > + * No free space, MAX_CONN_RECORDS reached. Wait as > > long as the > > + * largest delay for any remote host. > > + */ > > + return find_conn_max_delay(); > > In this extraordinary situation (where there are lots of hosts with lots > of authentication failures), why not delay by auth_delay_max_seconds > straightaway, especially when the default is only 10s? I don't see much > point in coordinating the delay between fifty known hosts and an unknown > number of others. I was a bit worried about legitimate users suffering here if (for some reason) a lot of different hosts try to guess passwords, but only once or twice or something. But I have changed it now as you suggested as that makes it simpler and I guess the problem I mentioned above is rather contrived. > > + elog(DEBUG1, "new connection: %s, index: %d", acr->remote_host, > > j); > > I think this should be removed, but if you want to leave it in, the > message should be more specific about what this is actually about, and > where the message is from, so as to not confuse debug-log readers. I left it in but mentioned auth_delay in it now. I wonder though whether this might be a useful message to have at some more standard level like INFO? > (The same applies to the other debug messages.) Those are all gone. > > +static AuthConnRecord * > > +find_conn_record(char *remote_host, int *free_index) > > +{ > > + int i; > > + > > + *free_index = -1; > > + for (i = 0; i < MAX_CONN_RECORDS; i++) > > + { > > + if (!acr_array[i].used) > > + { > > + if (*free_index == -1) > > + /* record unused element */ > > + *free_index = i; > > + continue; > > + } > > + if (strcmp(acr_array[i].remote_host, remote_host) == 0) > > + return &acr_array[i]; > > + } > > + > > + return NULL; > > +} > > It's not a big deal, but to be honest, I would much prefer to (get rid > of used, as suggested earlier, and) have separate find_acr_for_host() > and find_free_acr() functions. Done. > > +static void > > +record_conn_failure(AuthConnRecord *acr) > > +{ > > + if (acr->sleep_time == 0) > > + acr->sleep_time = (double) auth_delay_milliseconds; > > + else > > + acr->sleep_time *= 2; > > +} > > I would just roll this function into record_failed_conn_auth (or > whatever it's named), Done. > In terms of the logic, it would have been slightly clearer to store the > number of failures and calculate the delay, but it's easier to double > the sleep time that way you've written it. I think it's fine. I kept it as-is for now. > It's worth noting that there is no time-based reset of the delay with > this feature, which I think is something that people might expect to go > hand-in-hand with exponential backoff. I think that's probably OK too. You mean something like "after 5 minutes, reset the delay to 0 again"? I agree that this would probably be useful, but would also make the change more complex. > > +static void > > +auth_delay_shmem_startup(void) > > +{ > > + Size required; > > + bool found; > > + > > + if (shmem_startup_next) > > + shmem_startup_next(); > > + > > + required = sizeof(AuthConnRecord) * MAX_CONN_RECORDS; > > + acr_array = ShmemInitStruct("Array of AuthConnRecord", required, > > &found); > > + if (found) > > + /* this should not happen ? */ > > + elog(DEBUG1, "variable acr_array already exists"); > > + /* all fileds are set to 0 */ > > + memset(acr_array, 0, required); > > } > > I think you can remove the elog and just do the memset if (!found). Also > if you're changing it anyway, I'd suggest something like "total_size" > instead of "required". Done. > > + DefineCustomBoolVariable("auth_delay.exp_backoff", > > + "Exponential backoff > > for failed connections, per remote host", > > + NULL, > > + > > &auth_delay_exp_backoff, > > + false, > > + PGC_SIGHUP, > > + 0, > > + NULL, > > + NULL, > > + NULL); > > Maybe "Double the delay after each authentication failure from a > particular host". (Note: authentication failed, not connection.) Done. > I would also mildly prefer to spell out "exponential_backoff" (but leave > auth_delay_exp_backoff as-is). Done. > > + DefineCustomIntVariable("auth_delay.max_seconds", > > + "Maximum seconds to > > wait when login fails during exponential backoff", > > + NULL, > > + &auth_delay_max_seconds, > > + 10, > > + 0, INT_MAX, > > + PGC_SIGHUP, > > + GUC_UNIT_S, > > + NULL, NULL, NULL); > > + > > Maybe just "Maximum delay when exponential backoff is enabled". Done. > (Parameter indentation doesn't match the earlier block.) I noticed that as well, but pgindent keeps changing it back to this, not sure why... > I'm not able to make up my mind if I think 10s is a good default or not. > In practice, it means that after the first three consecutive failures, > we'll delay by 10s for every subsequent failure. That sounds OK. But is > is much more useful than, for example, delaying the first three failures > by auth_delay_milliseconds and then jumping straight to max_seconds? What I had in mind is that admins would lower auth_delay.milliseconds to something like 100 or 125 when exponential_backoff is on, so that the first few (possibley honest) auth failures do not get an annoying 1 seconds penalty, but later ones then wil. In that case, 10 seconds is probably ok cause you'd need more than a handful of auth failures. I added a paragraph to the documentation to this end. > I can't really imagine wanting to increase max_seconds to, say, 128 and > keep a bunch of backends sleeping while someone's trying to brute-force > a password. And with a reasonably short max_seconds, I'm not sure if > having the backoff be _exponential_ is particularly important. > > Or maybe because this is a contrib module, we don't have to think about > it to that extent? Well, not sure. I think something like 10 seconds should be fine for most brute-force attacks in practise, and it is configurable (and turned off by default). > > diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml > > index 0571f2a99d..2ca9528011 100644 > > --- a/doc/src/sgml/auth-delay.sgml > > +++ b/doc/src/sgml/auth-delay.sgml > > @@ -16,6 +16,17 @@ > > connection slots. > > </para> > > > > + <para> > > + It is optionally possible to let <filename>auth_delay</filename> wait > > longer > > + for each successive authentication failure from a particular remote > > host, if > > + the configuration parameter <varname>auth_delay.exp_backoff</varname> is > > + active. Once an authentication succeeded from a remote host, the > > + authentication delay is reset to the value of > > + <varname>auth_delay.milliseconds</varname> for this host. The parameter > > + <varname>auth_delay.max_seconds</varname> sets an upper bound for the > > delay > > + in this case. > > + </para> > > How about something like this… > > If you enable exponential_backoff, auth_delay will double the delay > after each consecutive authentication failure from a particular > host, up to the given max_seconds (default: 10s). If the host > authenticates successfully, the delay is reset. Done, mostly. > > + <varlistentry> > > + <term> > > + <varname>auth_delay.max_seconds</varname> (<type>integer</type>) > > + <indexterm> > > + <primary><varname>auth_delay.max_seconds</varname> configuration > > parameter</primary> > > + </indexterm> > > + </term> > > + <listitem> > > + <para> > > + How many seconds to wait at most if exponential backoff is active. > > + Setting this parameter to 0 disables it. The default is 10 seconds. > > + </para> > > + </listitem> > > + </varlistentry> > > I suggest "The maximum delay, in seconds, when exponential backoff is > enabled." Done. Michael
>From e60cdca73d384fa467f8e6becdd85d9a0c530b60 Mon Sep 17 00:00:00 2001 From: Michael Banck <michael.ba...@credativ.de> Date: Wed, 27 Dec 2023 15:55:39 +0100 Subject: [PATCH v2] Add optional exponential backoff to auth_delay contrib module. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This adds two new GUCs for auth_delay, exponential_backoff and max_seconds. The former controls whether exponential backoff should be used or not, the latter sets an maximum delay (default is 10s) in case exponential backoff is active. The exponential backoff is tracked per remote host and doubled for every failed login attempt (i.e., wrong password, not just missing pg_hba line or database) and reset to auth_delay.milliseconds after a successful authentication from that host. This patch is partly based on a larger (but ultimately rejected) patch by 成之焕. Authors: Michael Banck, 成之焕 Reviewed-by: Abhijit Menon-Sen Discussion: https://postgr.es/m/ahwaxacqiwivoehs5yejpqog.1.1668569845751.hmail.zhch...@ceresdata.com --- contrib/auth_delay/auth_delay.c | 197 ++++++++++++++++++++++++++++++- doc/src/sgml/auth-delay.sgml | 48 +++++++- src/tools/pgindent/typedefs.list | 1 + 3 files changed, 243 insertions(+), 3 deletions(-) diff --git a/contrib/auth_delay/auth_delay.c b/contrib/auth_delay/auth_delay.c index ff0e1fd461..06d2f5f280 100644 --- a/contrib/auth_delay/auth_delay.c +++ b/contrib/auth_delay/auth_delay.c @@ -14,24 +14,49 @@ #include <limits.h> #include "libpq/auth.h" +#include "miscadmin.h" #include "port.h" +#include "storage/ipc.h" +#include "storage/lwlock.h" +#include "storage/shmem.h" #include "utils/guc.h" #include "utils/timestamp.h" PG_MODULE_MAGIC; +#define MAX_CONN_RECORDS 50 + /* GUC Variables */ static int auth_delay_milliseconds = 0; +static bool auth_delay_exp_backoff = false; +static int auth_delay_max_seconds = 0; /* Original Hook */ static ClientAuthentication_hook_type original_client_auth_hook = NULL; +typedef struct AuthConnRecord +{ + char remote_host[NI_MAXHOST]; + double sleep_time; /* in milliseconds */ +} AuthConnRecord; + +static shmem_startup_hook_type shmem_startup_next = NULL; +static shmem_request_hook_type shmem_request_next = NULL; +static AuthConnRecord *acr_array = NULL; + +static AuthConnRecord *find_acr_for_host(char *remote_host); +static AuthConnRecord *find_free_acr(void); +static double increase_delay_after_failed_conn_auth(Port *port); +static void cleanup_conn_record(Port *port); + /* * Check authentication */ static void auth_delay_checks(Port *port, int status) { + double delay = auth_delay_milliseconds; + /* * Any other plugins which use ClientAuthentication_hook. */ @@ -41,10 +66,146 @@ auth_delay_checks(Port *port, int status) /* * Inject a short delay if authentication failed. */ - if (status != STATUS_OK) + if (status == STATUS_ERROR) { - pg_usleep(1000L * auth_delay_milliseconds); + if (auth_delay_exp_backoff) + { + /* + * Delay by 2^n seconds after each authentication failure from a + * particular host, where n is the number of consecutive + * authentication failures. + */ + delay = increase_delay_after_failed_conn_auth(port); + + /* + * Clamp delay to a maximum of auth_delay_max_seconds. + */ + if (auth_delay_max_seconds > 0) { + delay = Min(delay, 1000L * auth_delay_max_seconds); + } + } + + if (delay > 0) + { + elog(DEBUG1, "Authentication delayed for %g seconds due to auth_delay", delay / 1000.0); + pg_usleep(1000L * (long) delay); + } } + + /* + * Remove host-specific delay if authentication succeeded. + */ + if (status == STATUS_OK) + cleanup_conn_record(port); +} + +static double +increase_delay_after_failed_conn_auth(Port *port) +{ + AuthConnRecord *acr = NULL; + + acr = find_acr_for_host(port->remote_host); + + if (!acr) + { + acr = find_free_acr(); + + if (!acr) + { + /* + * No free space, MAX_CONN_RECORDS reached. Wait for the + * configured maximum amount. + */ + return 1000L * auth_delay_max_seconds; + } + strcpy(acr->remote_host, port->remote_host); + } + if (acr->sleep_time == 0) + acr->sleep_time = (double) auth_delay_milliseconds; + else + acr->sleep_time *= 2; + + return acr->sleep_time; +} + +static AuthConnRecord * +find_acr_for_host(char *remote_host) +{ + int i; + + for (i = 0; i < MAX_CONN_RECORDS; i++) + { + if (strcmp(acr_array[i].remote_host, remote_host) == 0) + return &acr_array[i]; + } + + return NULL; +} + +static AuthConnRecord * +find_free_acr(void) +{ + int i; + + for (i = 0; i < MAX_CONN_RECORDS; i++) + { + if (!acr_array[i].remote_host[0]) + return &acr_array[i]; + } + + return 0; +} + +static void +cleanup_conn_record(Port *port) +{ + AuthConnRecord *acr = NULL; + + acr = find_acr_for_host(port->remote_host); + if (acr == NULL) + return; + + port->remote_host[0] = '\0'; + + acr->sleep_time = 0.0; +} + +/* + * Set up shared memory + */ + +static void +auth_delay_shmem_request(void) +{ + Size shm_size; + + if (shmem_request_next) + shmem_request_next(); + + shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS; + shm_size += sizeof(int); + RequestAddinShmemSpace(shm_size); +} + +static void +auth_delay_shmem_startup(void) +{ + bool found; + Size shm_size; + + if (shmem_startup_next) + shmem_startup_next(); + + shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS; + + LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE); + acr_array = ShmemInitStruct("Array of AuthConnRecord", shm_size, &found); + if (!found) + { + /* First time through ... */ + memset(acr_array, 0, shm_size); + } + LWLockRelease(AddinShmemInitLock); } /* @@ -53,6 +214,11 @@ auth_delay_checks(Port *port, int status) void _PG_init(void) { + if (!process_shared_preload_libraries_in_progress) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("auth_delay must be loaded via shared_preload_libraries"))); + /* Define custom GUC variables */ DefineCustomIntVariable("auth_delay.milliseconds", "Milliseconds to delay before reporting authentication failure", @@ -66,9 +232,36 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable("auth_delay.exponential_backoff", + "Double the delay after each authentication failure from a particular host", + NULL, + &auth_delay_exp_backoff, + false, + PGC_SIGHUP, + 0, + NULL, + NULL, + NULL); + + DefineCustomIntVariable("auth_delay.max_seconds", + "Maximum delay when exponential backoff is enabled", + NULL, + &auth_delay_max_seconds, + 10, + 0, INT_MAX, + PGC_SIGHUP, + GUC_UNIT_S, + NULL, NULL, NULL); + MarkGUCPrefixReserved("auth_delay"); /* Install Hooks */ original_client_auth_hook = ClientAuthentication_hook; ClientAuthentication_hook = auth_delay_checks; + + /* Set up shared memory */ + shmem_request_next = shmem_request_hook; + shmem_request_hook = auth_delay_shmem_request; + shmem_startup_next = shmem_startup_hook; + shmem_startup_hook = auth_delay_shmem_startup; } diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml index 0571f2a99d..89585ea3c8 100644 --- a/doc/src/sgml/auth-delay.sgml +++ b/doc/src/sgml/auth-delay.sgml @@ -16,6 +16,22 @@ connection slots. </para> + <para> + It is optionally possible to let <filename>auth_delay</filename> wait longer + on each successive authentication failure if the configuration parameter + <varname>auth_delay.exponential_backoff</varname> is active. If enabled, + <filename>auth_delay</filename> will double the delay after each consecutive + authentication failure from a particular host, up to the given + <varname>auth_delay.max_seconds</varname> (default: 10s). If the host + authenticates successfully, the delay is reset. + </para> + + <para> + In this case, it might be desirable to decrease the configuration parameter + <varname>auth_delay.milliseconds</varname> from the default of 1 second in + order to not delay as long for single accidental authentication failures. + </para> + <para> In order to function, this module must be loaded via <xref linkend="guc-shared-preload-libraries"/> in <filename>postgresql.conf</filename>. @@ -39,6 +55,34 @@ </para> </listitem> </varlistentry> + <varlistentry> + <term> + <varname>auth_delay.exp_backoff</varname> (<type>bool</type>) + <indexterm> + <primary><varname>auth_delay.exp_backoff</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Whether to use exponential backoff per remote host on authentication + failure. The default is off. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + <varname>auth_delay.max_seconds</varname> (<type>integer</type>) + <indexterm> + <primary><varname>auth_delay.max_seconds</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + The maximum delay, in seconds, when exponential backoff is + enabled. The default is 10 seconds. + </para> + </listitem> + </varlistentry> </variablelist> <para> @@ -50,7 +94,9 @@ # postgresql.conf shared_preload_libraries = 'auth_delay' -auth_delay.milliseconds = '500' +auth_delay.milliseconds = '125' +auth_delay.exp_backoff = 'on' +auth_delay.max_seconds = '20' </programlisting> </sect2> diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 29fd1cae64..ee30969f0c 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -165,6 +165,7 @@ AttrMap AttrMissing AttrNumber AttributeOpts +AuthConnRecord AuthRequest AuthToken AutoPrewarmSharedState -- 2.39.2