Hi all,

Here's a v4, with a separate wait event for each location. (I could
use some eyes on the specific phrasing I've chosen for each one.)

On Sun, Sep 1, 2024 at 5:10 PM Michael Paquier <mich...@paquier.xyz> wrote:
> Could it be more transparent to use a "startup" or
> "starting"" state instead that gets also used by pgstat_bestart() in
> the case of the patch where !pre_auth?

Done. (I've used "starting".)

On Mon, Sep 9, 2024 at 10:30 PM Michael Paquier <mich...@paquier.xyz> wrote:
> A new category would be more adapted.  IPC is not adapted because are
> not waiting for another server process.  Perhaps just use a new
> "Authentication" class, as in "The server is waiting for an
> authentication operation to complete"?

Added a new "Auth" class (to cover both authn and authz during
startup), plus documentation.

On Wed, Sep 11, 2024 at 4:42 PM Michael Paquier <mich...@paquier.xyz> wrote:
> Setting up twice the structure as the patch proposes is kind of
> a weird concept, but it feels to me that we should split that and set
> the fields in the pre-auth step and ignore the irrelevant ones, then
> complete the rest in a second step.

The more I look at this, the more uneasy I feel about the goal. Best I
can tell, the pre-auth step can't ignore irrelevant fields, because
they may contain junk from the previous owner of the shared memory. So
if we want to optimize, we can only change the second step to skip
fields that were already filled in by the pre-auth step.

That has its own problems: not every backend type uses the pre-auth
step in the current patch. Which means a bunch of backends that don't
benefit from the two-step initialization nevertheless have to either
do two PGSTAT_BEGIN_WRITE_ACTIVITY() dances in a row, or else we
duplicate a bunch of the logic to make sure they maintain the same
efficient code path as before.

Finally, if we're okay with all of that, future maintainers need to be
careful about which fields get copied in the first (preauth) step, the
second step, or both. GSS, for example, can be set up during transport
negotiation (first step) or authentication (second step), so we have
to duplicate the logic there. SSL is currently first-step-only, I
think -- but are we sure we want to hardcode the assumption that cert
auth can't change any of those parameters after the transport has been
established? (I've been brainstorming ways we might use TLS 1.3's
post-handshake CertificateRequest, for example.)

So before I commit to this path, I just want to double-check that all
of the above sounds good and non-controversial. :)

--

In the meantime, is anyone willing and able to commit 0001 and/or 0002?

Thanks!
--Jacob
1:  8f9315949e = 1:  64289b97e5 BackgroundPsql: handle empty query results
2:  7f404f5ee8 ! 2:  18a9531a25 Test::Cluster: let background_psql() work 
asynchronously
    @@ src/test/perl/PostgreSQL/Test/Cluster.pm: connection.
      
      =cut
     @@ src/test/perl/PostgreSQL/Test/Cluster.pm: sub background_psql
    -           '-');
    +           '-XAtq', '-d', $psql_connstr, '-f', '-');
      
        $params{on_error_stop} = 1 unless defined $params{on_error_stop};
     +  $params{wait} = 1 unless defined $params{wait};
3:  a7a7551d09 ! 3:  c8071f91d8 pgstat: report in earlier with 
STATE_AUTHENTICATING
    @@ Metadata
     Author: Jacob Champion <jacob.champ...@enterprisedb.com>
     
      ## Commit message ##
    -    pgstat: report in earlier with STATE_AUTHENTICATING
    +    pgstat: report in earlier with STATE_STARTING
     
    -    Add pgstat_bestart_pre_auth(), which reports an 'authenticating' state
    -    while waiting for backend initialization and client authentication to
    +    Add pgstat_bestart_pre_auth(), which reports a 'starting' state while
    +    waiting for backend initialization and client authentication to
         complete. Since we hold a transaction open for a good amount of that,
         and some authentication methods call out to external systems, having a
         pg_stat_activity entry helps DBAs debug when things go badly wrong.
     
    + ## doc/src/sgml/monitoring.sgml ##
    +@@ doc/src/sgml/monitoring.sgml: postgres   27093  0.0  0.0  30096  2752 ? 
       Ss   11:34   0:00 postgres: ser
    +        Current overall state of this backend.
    +        Possible values are:
    +        <itemizedlist>
    ++        <listitem>
    ++         <para>
    ++          <literal>starting</literal>: The backend is in initial startup. 
Client
    ++          authentication is performed during this phase.
    ++         </para>
    ++        </listitem>
    +         <listitem>
    +         <para>
    +           <literal>active</literal>: The backend is executing a query.
    +
      ## src/backend/utils/activity/backend_status.c ##
     @@ src/backend/utils/activity/backend_status.c: static int 
localNumBackends = 0;
      static MemoryContext backendStatusSnapContext;
    @@ src/backend/utils/activity/backend_status.c: pgstat_bestart(void)
      #endif
      
     -  lbeentry.st_state = STATE_UNDEFINED;
    -+  lbeentry.st_state = pre_auth ? STATE_AUTHENTICATING : STATE_UNDEFINED;
    ++  lbeentry.st_state = pre_auth ? STATE_STARTING : STATE_UNDEFINED;
        lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
        lbeentry.st_progress_command_target = InvalidOid;
        lbeentry.st_query_id = UINT64CONST(0);
    @@ src/backend/utils/adt/pgstatfuncs.c: 
pg_stat_get_activity(PG_FUNCTION_ARGS)
      
                        switch (beentry->st_state)
                        {
    -+                          case STATE_AUTHENTICATING:
    -+                                  values[4] = 
CStringGetTextDatum("authenticating");
    ++                          case STATE_STARTING:
    ++                                  values[4] = 
CStringGetTextDatum("starting");
     +                                  break;
                                case STATE_IDLE:
                                        values[4] = CStringGetTextDatum("idle");
    @@ src/backend/utils/init/postinit.c: InitPostgres(const char *in_dbname, 
Oid dboid
     +  pgstat_beinit();
     +
     +  /*
    -+   * This is a convenient time to sketch in a partial pgstat entry. That 
way,
    -+   * if LWLocks or third-party authentication should happen to hang, the 
DBA
    -+   * will still be able to see what's going on. (A later call to
    ++   * This is a convenient time to sketch in a partial pgstat entry. That
    ++   * way, if LWLocks or third-party authentication should happen to hang,
    ++   * the DBA will still be able to see what's going on. (A later call to
     +   * pgstat_bestart() will fill in the rest of the status.)
     +   */
     +  if (!bootstrap)
    @@ src/include/utils/backend_status.h
      typedef enum BackendState
      {
        STATE_UNDEFINED,
    -+  STATE_AUTHENTICATING,
    ++  STATE_STARTING,
        STATE_IDLE,
        STATE_RUNNING,
        STATE_IDLEINTRANSACTION,
    @@ src/test/authentication/t/007_injection_points.pl (new)
     +my $psql = $node->background_psql('postgres');
     +$psql->query_safe("SELECT injection_points_attach('init-pre-auth', 
'wait')");
     +
    -+# From this point on, all new connections will hang in authentication. 
Use the
    -+# $psql connection handle for server interaction.
    ++# From this point on, all new connections will hang during startup, just 
before
    ++# authentication. Use the $psql connection handle for server interaction.
     +my $conn = $node->background_psql('postgres', wait => 0);
     +
     +# Wait for the connection to show up.
    @@ src/test/authentication/t/007_injection_points.pl (new)
     +while (1)
     +{
     +  $pid = $psql->query(
    -+          "SELECT pid FROM pg_stat_activity WHERE state = 
'authenticating';");
    ++          "SELECT pid FROM pg_stat_activity WHERE state = 'starting';");
     +  last if $pid ne "";
     +
     +  usleep(500_000);
    @@ src/test/authentication/t/007_injection_points.pl (new)
     +# Make sure the pgstat entry is updated eventually.
     +while (1)
     +{
    -+  my $state = $psql->query(
    -+          "SELECT state FROM pg_stat_activity WHERE pid = $pid;");
    ++  my $state =
    ++    $psql->query("SELECT state FROM pg_stat_activity WHERE pid = $pid;");
     +  last if $state eq "idle";
     +
     +  note "state for backend $pid is '$state'; waiting for 'idle'...";
4:  5c85c3e0e9 ! 4:  d14b97cb77 WIP: report external auth calls as wait events
    @@ Metadata
     Author: Jacob Champion <jacob.champ...@enterprisedb.com>
     
      ## Commit message ##
    -    WIP: report external auth calls as wait events
    +    Report external auth calls as wait events
     
    -    Introduce new WAIT_EVENT_AUTHN_* types for various external
    -    authentication systems, to make it obvious what's going wrong if one of
    -    those systems hangs.
    +    Introduce a new "Auth" wait class for various external authentication
    +    systems, to make it obvious what's going wrong if one of those systems
    +    hangs. Each new wait event is unique in order to more easily pinpoint
    +    problematic locations in the code.
     
    -    TODO:
    -    - don't abuse the IPC wait event group like this
    -    - test
    +    Discussion: 
https://postgr.es/m/CAOYmi%2B%3D60deN20WDyCoHCiecgivJxr%3D98s7s7-C8SkXwrCfHXg%40mail.gmail.com
    +
    + ## doc/src/sgml/monitoring.sgml ##
    +@@ doc/src/sgml/monitoring.sgml: postgres   27093  0.0  0.0  30096  2752 ? 
       Ss   11:34   0:00 postgres: ser
    +        see <xref linkend="wait-event-activity-table"/>.
    +       </entry>
    +      </row>
    ++     <row>
    ++      <entry><literal>Auth</literal></entry>
    ++      <entry>The server process is waiting for an external system to
    ++       authenticate and/or authorize the client connection.
    ++       <literal>wait_event</literal> will identify the specific wait 
point;
    ++       see <xref linkend="wait-event-auth-table"/>.
    ++      </entry>
    ++     </row>
    +      <row>
    +       <entry><literal>BufferPin</literal></entry>
    +       <entry>The server process is waiting for exclusive access to
     
      ## src/backend/libpq/auth.c ##
     @@
    @@ src/backend/libpq/auth.c: pg_GSS_recvauth(Port *port)
                elog(DEBUG4, "processing received GSS token of length %u",
                         (unsigned int) gbuf.length);
      
    -+          pgstat_report_wait_start(WAIT_EVENT_AUTHN_GSSAPI);
    ++          pgstat_report_wait_start(WAIT_EVENT_GSSAPI_ACCEPT_SEC_CONTEXT);
                maj_stat = gss_accept_sec_context(&min_stat,
                                                                                
  &port->gss->ctx,
                                                                                
  port->gss->cred,
    @@ src/backend/libpq/auth.c: pg_SSPI_recvauth(Port *port)
        /*
         * Acquire a handle to the server credentials.
         */
    -+  pgstat_report_wait_start(WAIT_EVENT_AUTHN_SSPI);
    ++  pgstat_report_wait_start(WAIT_EVENT_SSPI_ACQUIRE_CREDENTIALS_HANDLE);
        r = AcquireCredentialsHandle(NULL,
                                                                 "negotiate",
                                                                 
SECPKG_CRED_INBOUND,
    @@ src/backend/libpq/auth.c: pg_SSPI_recvauth(Port *port)
                elog(DEBUG4, "processing received SSPI token of length %u",
                         (unsigned int) buf.len);
      
    -+          pgstat_report_wait_start(WAIT_EVENT_AUTHN_SSPI);
    ++          
pgstat_report_wait_start(WAIT_EVENT_SSPI_ACCEPT_SECURITY_CONTEXT);
                r = AcceptSecurityContext(&sspicred,
                                                                  sspictx,
                                                                  &inbuf,
    @@ src/backend/libpq/auth.c: pg_SSPI_recvauth(Port *port)
      
        CloseHandle(token);
      
    -+  pgstat_report_wait_start(WAIT_EVENT_AUTHN_SSPI);
    ++  pgstat_report_wait_start(WAIT_EVENT_SSPI_LOOKUP_ACCOUNT_SID);
        if (!LookupAccountSid(NULL, tokenuser->User.Sid, accountname, 
&accountnamesize,
                                                  domainname, &domainnamesize, 
&accountnameuse))
                ereport(ERROR,
    @@ src/backend/libpq/auth.c: pg_SSPI_recvauth(Port *port)
     -                                                                          
          port->hba->upn_username);
     +          int                     status;
     +
    -+          pgstat_report_wait_start(WAIT_EVENT_AUTHN_SSPI);
    ++          pgstat_report_wait_start(WAIT_EVENT_SSPI_MAKE_UPN);
     +          status = pg_SSPI_make_upn(accountname, sizeof(accountname),
     +                                                            domainname, 
sizeof(domainname),
     +                                                            
port->hba->upn_username);
    @@ src/backend/libpq/auth.c: CheckPAMAuth(Port *port, const char *user, 
const char
                return STATUS_ERROR;
        }
      
    -+  pgstat_report_wait_start(WAIT_EVENT_AUTHN_PAM);
    ++  pgstat_report_wait_start(WAIT_EVENT_PAM_AUTHENTICATE);
        retval = pam_authenticate(pamh, 0);
     +  pgstat_report_wait_end();
      
    @@ src/backend/libpq/auth.c: CheckPAMAuth(Port *port, const char *user, 
const char
                return pam_no_password ? STATUS_EOF : STATUS_ERROR;
        }
      
    -+  pgstat_report_wait_start(WAIT_EVENT_AUTHN_PAM);
    ++  pgstat_report_wait_start(WAIT_EVENT_PAM_ACCT_MGMT);
        retval = pam_acct_mgmt(pamh, 0);
     +  pgstat_report_wait_end();
      
    @@ src/backend/libpq/auth.c: CheckLDAPAuth(Port *port)
                return STATUS_EOF;              /* client wouldn't send 
password */
      
     -  if (InitializeLDAPConnection(port, &ldap) == STATUS_ERROR)
    -+  pgstat_report_wait_start(WAIT_EVENT_AUTHN_LDAP);
    ++  pgstat_report_wait_start(WAIT_EVENT_LDAP_INITIALIZE);
     +  r = InitializeLDAPConnection(port, &ldap);
     +  pgstat_report_wait_end();
     +
    @@ src/backend/libpq/auth.c: CheckLDAPAuth(Port *port)
                 * Bind with a pre-defined username/password (if available) for
                 * searching. If none is specified, this turns into an 
anonymous bind.
                 */
    -+          pgstat_report_wait_start(WAIT_EVENT_AUTHN_LDAP);
    ++          pgstat_report_wait_start(WAIT_EVENT_LDAP_BIND_FOR_SEARCH);
                r = ldap_simple_bind_s(ldap,
                                                           
port->hba->ldapbinddn ? port->hba->ldapbinddn : "",
                                                           
port->hba->ldapbindpasswd ? ldap_password_hook(port->hba->ldapbindpasswd) : "");
    @@ src/backend/libpq/auth.c: CheckLDAPAuth(Port *port)
      
                search_message = NULL;
     +
    -+          pgstat_report_wait_start(WAIT_EVENT_AUTHN_LDAP);
    ++          pgstat_report_wait_start(WAIT_EVENT_LDAP_SEARCH);
                r = ldap_search_s(ldap,
                                                  port->hba->ldapbasedn,
                                                  port->hba->ldapscope,
    @@ src/backend/libpq/auth.c: CheckLDAPAuth(Port *port)
                                                        port->user_name,
                                                        port->hba->ldapsuffix ? 
port->hba->ldapsuffix : "");
      
    -+  pgstat_report_wait_start(WAIT_EVENT_AUTHN_LDAP);
    ++  pgstat_report_wait_start(WAIT_EVENT_LDAP_BIND);
        r = ldap_simple_bind_s(ldap, fulluser, passwd);
     +  pgstat_report_wait_end();
      
    @@ src/backend/libpq/auth.c: CheckRADIUSAuth(Port *port)
     -                                                                          
                   passwd);
     +          int                     ret;
     +
    -+          pgstat_report_wait_start(WAIT_EVENT_AUTHN_RADIUS);
    ++          pgstat_report_wait_start(WAIT_EVENT_RADIUS_TRANSACTION);
     +          ret = PerformRadiusTransaction(lfirst(server),
     +                                                                     
lfirst(secrets),
     +                                                                     
radiusports ? lfirst(radiusports) : NULL,
    @@ src/backend/libpq/auth.c: CheckRADIUSAuth(Port *port)
                /*------
                 * STATUS_OK = Login OK
     
    + ## src/backend/utils/activity/wait_event.c ##
    +@@ src/backend/utils/activity/wait_event.c: static const char 
*pgstat_get_wait_client(WaitEventClient w);
    + static const char *pgstat_get_wait_ipc(WaitEventIPC w);
    + static const char *pgstat_get_wait_timeout(WaitEventTimeout w);
    + static const char *pgstat_get_wait_io(WaitEventIO w);
    ++static const char *pgstat_get_wait_auth(WaitEventAuth w);
    + 
    + 
    + static uint32 local_my_wait_event_info;
    +@@ src/backend/utils/activity/wait_event.c: 
pgstat_get_wait_event_type(uint32 wait_event_info)
    +           case PG_WAIT_INJECTIONPOINT:
    +                   event_type = "InjectionPoint";
    +                   break;
    ++          case PG_WAIT_AUTH:
    ++                  event_type = "Auth";
    ++                  break;
    +           default:
    +                   event_type = "???";
    +                   break;
    +@@ src/backend/utils/activity/wait_event.c: pgstat_get_wait_event(uint32 
wait_event_info)
    +                           event_name = pgstat_get_wait_io(w);
    +                           break;
    +                   }
    ++          case PG_WAIT_AUTH:
    ++                  {
    ++                          WaitEventAuth w = (WaitEventAuth) 
wait_event_info;
    ++
    ++                          event_name = pgstat_get_wait_auth(w);
    ++                          break;
    ++                  }
    +           default:
    +                   event_name = "unknown wait event";
    +                   break;
    +
      ## src/backend/utils/activity/wait_event_names.txt ##
    -@@ src/backend/utils/activity/wait_event_names.txt: Section: ClassName - 
WaitEventIPC
    - APPEND_READY      "Waiting for subplan nodes of an 
<literal>Append</literal> plan node to be ready."
    - ARCHIVE_CLEANUP_COMMAND   "Waiting for <xref 
linkend="guc-archive-cleanup-command"/> to complete."
    - ARCHIVE_COMMAND   "Waiting for <xref linkend="guc-archive-command"/> to 
complete."
    -+AUTHN_GSSAPI      "Waiting for a response from a Kerberos server via 
GSSAPI."
    -+AUTHN_LDAP        "Waiting for a response from an LDAP server."
    -+AUTHN_PAM "Waiting for a response from the local PAM service."
    -+AUTHN_RADIUS      "Waiting for a response from a RADIUS server."
    -+AUTHN_SSPI        "Waiting for a response from a Windows security 
provider via SSPI."
    - BACKEND_TERMINATION       "Waiting for the termination of another 
backend."
    - BACKUP_WAIT_WAL_ARCHIVE   "Waiting for WAL files required for a backup to 
be successfully archived."
    - BGWORKER_SHUTDOWN "Waiting for background worker to shut down."
    +@@ src/backend/utils/activity/wait_event_names.txt: XACT_GROUP_UPDATE      
"Waiting for the group leader to update transaction status at
    + 
    + ABI_compatibility:
    + 
    ++#
    ++# Wait Events - Auth
    ++#
    ++# Use this category when a process is waiting for a third party to
    ++# authenticate/authorize the user.
    ++#
    ++
    ++Section: ClassName - WaitEventAuth
    ++
    ++GSSAPI_ACCEPT_SEC_CONTEXT "Waiting for a response from a Kerberos server 
via GSSAPI."
    ++LDAP_BIND "Waiting for an LDAP bind operation to authenticate the user."
    ++LDAP_BIND_FOR_SEARCH      "Waiting for an LDAP bind operation to search 
the directory."
    ++LDAP_INITIALIZE   "Waiting to initialize an LDAP connection."
    ++LDAP_SEARCH       "Waiting for an LDAP search operation to complete."
    ++PAM_ACCT_MGMT     "Waiting for the local PAM service to validate the user 
account."
    ++PAM_AUTHENTICATE  "Waiting for the local PAM service to authenticate the 
user."
    ++RADIUS_TRANSACTION        "Waiting for a RADIUS transaction to complete."
    ++SSPI_ACCEPT_SECURITY_CONTEXT      "Waiting for a Windows security 
provider to accept the client's SSPI token."
    ++SSPI_ACQUIRE_CREDENTIALS_HANDLE   "Waiting for a Windows security 
provider to acquire server credentials for SSPI."
    ++SSPI_LOOKUP_ACCOUNT_SID   "Waiting for Windows to find the user's account 
SID."
    ++SSPI_MAKE_UPN     "Waiting for Windows to translate a Kerberos UPN."
    ++
    ++ABI_compatibility:
    ++
    + #
    + # Wait Events - Timeout
    + #
    +
    + ## src/include/utils/wait_event.h ##
    +@@
    + #define PG_WAIT_TIMEOUT                           0x09000000U
    + #define PG_WAIT_IO                                        0x0A000000U
    + #define PG_WAIT_INJECTIONPOINT            0x0B000000U
    ++#define PG_WAIT_AUTH                              0x0C000000U
    + 
    + /* enums for wait events */
    + #include "utils/wait_event_types.h"
    +
    + ## src/test/regress/expected/sysviews.out ##
    +@@ src/test/regress/expected/sysviews.out: select type, count(*) > 0 as ok 
FROM pg_wait_events
    +    type    | ok 
    + -----------+----
    +  Activity  | t
    ++ Auth      | t
    +  BufferPin | t
    +  Client    | t
    +  Extension | t
    +@@ src/test/regress/expected/sysviews.out: select type, count(*) > 0 as ok 
FROM pg_wait_events
    +  LWLock    | t
    +  Lock      | t
    +  Timeout   | t
    +-(9 rows)
    ++(10 rows)
    + 
    + -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
    + -- more-or-less working.  We can't test their contents in any great detail

Attachment: v4-0001-BackgroundPsql-handle-empty-query-results.patch
Description: Binary data

Attachment: v4-0002-Test-Cluster-let-background_psql-work-asynchronou.patch
Description: Binary data

Attachment: v4-0003-pgstat-report-in-earlier-with-STATE_STARTING.patch
Description: Binary data

Attachment: v4-0004-Report-external-auth-calls-as-wait-events.patch
Description: Binary data

Reply via email to