Hi,
On 10/21/22 2:58 AM, Michael Paquier wrote:
On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:
Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
implement regexes for databases and roles in hba.
It does also contain new regexes related TAP tests and doc updates.
Thanks for the updated version. This is really easy to look at now.
It relies on the refactoring made in fc579e11c6 (but changes the
regcomp_auth_token() parameters so that it is now responsible for emitting
the compilation error message (if any), to avoid code duplication in
parse_hba_line() and parse_ident_line() for roles, databases and user name
mapping).
@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
[...]
- if (!tok->quoted && tok->string[0] == '+')
+ if (!token_has_regexp(tok))
{
Hmm. Do we need token_has_regexp() here for all the cases? We know
that the string can begin with a '+', hence it is no regex. The same
applies for the case of "all". The remaining case is the one where
the user name matches exactly the AuthToken string, which should be
last as we want to treat anything beginning with a '/' as a regex. It
seems like we could do an order like that? Say:
if (!tok->quoted && tok->string[0] == '+')
//do
else if (token_is_keyword(tok, "all"))
//do
else if (token_has_regexp(tok))
//do regex compilation, handling failures
else if (token_matches(tok, role))
//do exact match
The same approach with keywords first, regex, and exact match could be
applied as well for the databases? Perhaps it is just mainly a matter
of taste,
Yeah, I think it is.
and it depends on how much you want to prioritize the place
of the regex over the rest but that could make the code easier to
understand in the long-run and this is a very sensitive code area,
And agree that your proposal tastes better ;-): it is easier to
understand, v2 attached has been done that way.
Compiling the expressions for the user and database lists one-by-one
in parse_hba_line() as you do is correct. However there is a gotcha
that you are forgetting here: the memory allocations done for the
regexp compilations are not linked to the memory context where each
line is processed (named hbacxt in load_hba()) and need a separate
cleanup.
Oops, right, thanks for the call out!
In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.
Right, but I think that should be "parsed_hba_lines != NIL".
My guess is that you could do both the same way as load_ident() does,
keeping some symmetry between the two code paths.
Right. To avoid code duplication in the !ok/ok cases, the function
free_hba_line() has been added in v2: it goes through the list of
databases and roles tokens and call free_auth_token() for each of them.
Unifying both into
a common routine would be sort of annoying as HbaLines uses lists
within the lists of parsed lines, and IdentLine can have one at most
in each line.
I agree, and v2 is not attempting to unify them.
For now, I have made your last patch a bit shorter by applying the
refactoring of regcomp_auth_token() separately with a few tweaks to
the comments.
Thanks! v2 attached does apply on top of that.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..30753003ba 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -234,10 +234,13 @@ hostnogssenc <replaceable>database</replaceable>
<replaceable>user</replaceabl
replication connections do not specify any particular database whereas
logical replication connections do specify it.
Otherwise, this is the name of
- a specific <productname>PostgreSQL</productname> database.
- Multiple database names can be supplied by separating them with
- commas. A separate file containing database names can be specified by
- preceding the file name with <literal>@</literal>.
+ a specific <productname>PostgreSQL</productname> database or a regular
+ expression preceded by <literal>/</literal>.
+ Multiple database names and/or regular expressions preceded by
<literal>/</literal>
+ can be supplied by separating them with commas.
+ A separate file containing database names and/or regular expressions
preceded
+ by <literal>/</literal> can be specified by preceding the file name
+ with <literal>@</literal>.
</para>
</listitem>
</varlistentry>
@@ -249,18 +252,20 @@ hostnogssenc <replaceable>database</replaceable>
<replaceable>user</replaceabl
Specifies which database user name(s) this record
matches. The value <literal>all</literal> specifies that it
matches all users. Otherwise, this is either the name of a specific
- database user, or a group name preceded by <literal>+</literal>.
+ database user, a regular expression preceded by <literal>/</literal>,
+ or a group name preceded by <literal>+</literal>.
(Recall that there is no real distinction between users and groups
in <productname>PostgreSQL</productname>; a <literal>+</literal> mark
really means
<quote>match any of the roles that are directly or indirectly members
of this role</quote>, while a name without a <literal>+</literal> mark
matches
- only that specific role.) For this purpose, a superuser is only
+ only that specific role or regular expression.) For this purpose, a
superuser is only
considered to be a member of a role if they are explicitly a member
of the role, directly or indirectly, and not just by virtue of
being a superuser.
- Multiple user names can be supplied by separating them with commas.
- A separate file containing user names can be specified by preceding the
- file name with <literal>@</literal>.
+ Multiple user names and/or regular expressions preceded by
<literal>/</literal>
+ can be supplied by separating them with commas. A separate file
containing
+ user names and/or regular expressions preceded by <literal>/</literal>
+ can be specified by preceding the file name with <literal>@</literal>.
</para>
</listitem>
</varlistentry>
@@ -739,6 +744,18 @@ host all all ::1/128
trust
# TYPE DATABASE USER ADDRESS METHOD
host all all localhost trust
+# The same using regular expression for DATABASE, which allows connection to
the
+# db1 and testdb databases and any database with a name ending with "test".
+#
+# TYPE DATABASE USER ADDRESS METHOD
+local db1,/^.*test$,testdb all localhost trust
+
+# The same using regular expression for USER, which allows connection to the
+# user1 and testuser users and any user with a name ending with "test".
+#
+# TYPE DATABASE USER ADDRESS
METHOD
+local db1,/^.*test$,testdb user1,/^.*test$,testuser localhost
trust
+
# Allow any user from any host with IP address 192.168.93.x to connect
# to database "postgres" as the same user name that ident reports for
# the connection (typically the operating system user name).
@@ -785,15 +802,17 @@ host all all 192.168.12.10/32
gss
# TYPE DATABASE USER ADDRESS METHOD
host all all 192.168.0.0/16 ident
map=omicron
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
# allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases. The file
-# $PGDATA/admins contains a list of names of administrators. Passwords
+# with the same name as their database user name) except for users ending
+# with "helpdesk", administrators and members of role "support",
+# who can connect to all databases.
+# The file $PGDATA/admins contains a list of names of administrators.
Passwords
# are required in all cases.
#
# TYPE DATABASE USER ADDRESS METHOD
local sameuser all md5
+local all /^.*helpdesk$ md5
local all @admins md5
local all +support md5
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index f3539a7929..0e2185d35e 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -293,6 +293,31 @@ free_auth_token(AuthToken *token)
pg_regfree(token->regex);
}
+/*
+ * Free a HbaLine, that may include databases and/or roles AuthToken with a
+ * regular expression that needs to be cleaned up explicitly.
+ */
+static void
+free_hba_line(HbaLine *line)
+{
+ ListCell *cell;
+ AuthToken *tok;
+
+ /* Clean up roles regexes */
+ foreach(cell, line->roles)
+ {
+ tok = lfirst(cell);
+ free_auth_token(tok);
+ }
+
+ /* Clean up databases regexes */
+ foreach(cell, line->databases)
+ {
+ tok = lfirst(cell);
+ free_auth_token(tok);
+ }
+}
+
/*
* Copy a AuthToken struct into freshly palloc'd memory.
*/
@@ -676,8 +701,14 @@ check_role(const char *role, Oid roleid, List *tokens)
if (is_member(roleid, tok->string + 1))
return true;
}
- else if (token_matches(tok, role) ||
- token_is_keyword(tok, "all"))
+ else if (token_is_keyword(tok, "all"))
+ return true;
+ else if (token_has_regexp(tok))
+ {
+ if (regexec_auth_token(role, tok, 0, NULL) == REG_OKAY)
+ return true;
+ }
+ else if (token_matches(tok, role))
return true;
}
return false;
@@ -719,6 +750,11 @@ check_db(const char *dbname, const char *role, Oid roleid,
List *tokens)
}
else if (token_is_keyword(tok, "replication"))
continue; /* never match this if
not walsender */
+ else if (token_has_regexp(tok))
+ {
+ if (regexec_auth_token(dbname, tok, 0, NULL) ==
REG_OKAY)
+ return true;
+ }
else if (token_matches(tok, dbname))
return true;
}
@@ -1138,8 +1174,15 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
tokens = lfirst(field);
foreach(tokencell, tokens)
{
- parsedline->databases = lappend(parsedline->databases,
-
copy_auth_token(lfirst(tokencell)));
+ AuthToken *tok = copy_auth_token(lfirst(tokencell));
+
+ /*
+ * Compile a regex from the database token, if necessary.
+ */
+ if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg,
elevel))
+ return NULL;
+
+ parsedline->databases = lappend(parsedline->databases, tok);
}
/* Get the roles. */
@@ -1158,8 +1201,15 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
tokens = lfirst(field);
foreach(tokencell, tokens)
{
- parsedline->roles = lappend(parsedline->roles,
-
copy_auth_token(lfirst(tokencell)));
+ AuthToken *tok = copy_auth_token(lfirst(tokencell));
+
+ /*
+ * Compile a regex from the role token, if necessary.
+ */
+ if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg,
elevel))
+ return NULL;
+
+ parsedline->roles = lappend(parsedline->roles, tok);
}
if (parsedline->conntype != ctLocal)
@@ -2355,12 +2405,31 @@ load_hba(void)
if (!ok)
{
- /* File contained one or more errors, so bail out */
+ /*
+ * File contained one or more errors, so bail out, first being
careful
+ * to clean up whatever we allocated. Most stuff will go away
via
+ * MemoryContextDelete, but we have to clean up regexes
explicitly.
+ */
+ foreach(line, new_parsed_lines)
+ {
+ HbaLine *newline = (HbaLine *) lfirst(line);
+
+ free_hba_line(newline);
+ }
MemoryContextDelete(hbacxt);
return false;
}
/* Loaded new file successfully, replace the one we use */
+ if (parsed_hba_lines != NIL)
+ {
+ foreach(line, parsed_hba_lines)
+ {
+ HbaLine *newline = (HbaLine *) lfirst(line);
+
+ free_hba_line(newline);
+ }
+ }
if (parsed_hba_context != NULL)
MemoryContextDelete(parsed_hba_context);
parsed_hba_context = hbacxt;
diff --git a/src/test/authentication/t/001_password.pl
b/src/test/authentication/t/001_password.pl
index ea664d18f5..acb8bfaac8 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -81,6 +81,14 @@ $node->safe_psql(
GRANT ALL ON sysuser_data TO md5_role;");
$ENV{"PGPASSWORD"} = 'pass';
+# Create a role that contains a comma to stress the parsing.
+$node->safe_psql('postgres',
+ q{SET password_encryption='md5'; CREATE ROLE "md5,role" LOGIN PASSWORD
'pass';}
+);
+
+# Create a database to test regular expression.
+$node->safe_psql('postgres', "CREATE database regex_testdb;");
+
# For "trust" method, all users should be able to connect. These users are not
# considered to be authenticated.
reset_pg_hba($node, 'all', 'all', 'trust');
@@ -200,6 +208,39 @@ append_to_file(
test_conn($node, 'user=md5_role', 'password from pgpass', 0);
+# Testing with regular expression for username. Note that the third regex
+# matches in this case.
+reset_pg_hba($node, 'all', '/^.*nomatch.*$, baduser, /^md.*$', 'password');
+test_conn($node, 'user=md5_role', 'password, matching regexp for username',
+ 0);
+
+# The third regex does not match anymore.
+reset_pg_hba($node, 'all', '/^.*nomatch.*$, baduser, /^m_d.*$', 'password');
+test_conn($node, 'user=md5_role',
+ 'password, non matching regexp for username',
+ 2, log_unlike => [qr/connection authenticated:/]);
+
+# Test with a comma in the regular expression
+reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
+test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
+ 0);
+
+# Testing with regular expression for dbname. The third regex matches.
+reset_pg_hba($node, '/^.*nomatch.*$, baddb, /^regex_t.*b$', 'all',
+ 'password');
+test_conn(
+ $node, 'user=md5_role dbname=regex_testdb', 'password,
+ matching regexp for dbname', 0);
+
+# The third regex does not match anymore.
+reset_pg_hba($node, '/^.*nomatch.*$, baddb, /^regex_t.*ba$',
+ 'all', 'password');
+test_conn(
+ $node,
+ 'user=md5_role dbname=regex_testdb',
+ 'password, non matching regexp for dbname',
+ 2, log_unlike => [qr/connection authenticated:/]);
+
unlink($pgpassfile);
delete $ENV{"PGPASSFILE"};