Hi, here's version 4 of the patch.
On Wed, Oct 19, 2011 at 19:34, Robert Haas wrote:
> I think it would be safer to write this so that
> pgstat_get_crashed_backend_activity writes its answer into a
> statically allocated buffer and returns a pointer to that buffer,
> rather than using palloc. I think the current coding might lead to a
> memory leak in the postmaster
Good catch about the memory leak; I always assumed that the caller
takes care of cleaning the memory context. But looking at the code,
that doesn't seem to happen in postmaster.
Using a global buffer would waste memory in every backend, but this is
needed rarely only in postmaster. So instead I'm allocating the buffer
on stack in LogChildExit(), and pass that to
pgstat_get_crashed_backend_activity() in arguments.
I use a character array of 1024 bytes in LogChildExit() since
'track_activity_query_size' is unknown at compile time (1024 is the
default). I could have used alloca(), but doesn't seem portable or
robust with arbitrary inputs coming from GUC.
> Also, how about having CreateSharedBackendStatus store the length of
> the backend activity buffer in a global somewhere, instead of
> repeating the calculation here?
Sure, I added a BackendActivityBufferSize global to pgstat.c
> return "";
> I'd suggest that we instead return , and
> avoid making judgements about how things got that way.
Originally I wanted to use exact same messages as
pg_stat_get_backend_activity; but you're right, we should be as
accurate as possible. I think "" is better,
since it means the PID was found, but it had a zero-length activity
string.
> It's almost making me cry
> thinking about how much time this would have saved me
Thanks for your review and the generous words. :)
Regards,
Marti
From bce8e81ea4811a823ec7c3a0ad15ff63b5cd1be4 Mon Sep 17 00:00:00 2001
From: Marti Raudsepp
Date: Fri, 21 Oct 2011 18:36:50 +0300
Subject: [PATCH] Log crashed backend's query (activity string)
The crashing query is often a good starting point in debugging the
cause, and much more easily accessible than core dumps.
We're extra-paranoid in reading the activity buffer since it might be
corrupt. All non-ASCII characters are replaced with '?'
Example output:
LOG: server process (PID 31451) was terminated by signal 9: Killed
DETAIL: Running query: DO LANGUAGE plpythonu 'import os;os.kill(os.getpid(),9)'
---
src/backend/postmaster/pgstat.c | 73 ++-
src/backend/postmaster/postmaster.c | 22 --
src/backend/utils/adt/ascii.c | 34
src/include/pgstat.h|2 +
src/include/utils/ascii.h |1 +
5 files changed, 125 insertions(+), 7 deletions(-)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 44956c1..ba64f23 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -58,6 +58,7 @@
#include "storage/pg_shmem.h"
#include "storage/pmsignal.h"
#include "storage/procsignal.h"
+#include "utils/ascii.h"
#include "utils/guc.h"
#include "utils/memutils.h"
#include "utils/ps_status.h"
@@ -2228,6 +2229,7 @@ static PgBackendStatus *MyBEEntry = NULL;
static char *BackendClientHostnameBuffer = NULL;
static char *BackendAppnameBuffer = NULL;
static char *BackendActivityBuffer = NULL;
+static Size BackendActivityBufferSize = 0;
/*
@@ -2310,9 +2312,12 @@ CreateSharedBackendStatus(void)
}
/* Create or attach to the shared activity buffer */
- size = mul_size(pgstat_track_activity_query_size, MaxBackends);
+ BackendActivityBufferSize = mul_size(pgstat_track_activity_query_size,
+ MaxBackends);
BackendActivityBuffer = (char *)
- ShmemInitStruct("Backend Activity Buffer", size, &found);
+ ShmemInitStruct("Backend Activity Buffer",
+ BackendActivityBufferSize,
+ &found);
if (!found)
{
@@ -2751,6 +2756,70 @@ pgstat_get_backend_current_activity(int pid, bool checkUser)
return "";
}
+/* --
+ * pgstat_get_crashed_backend_activity() -
+ *
+ * Return a string representing the current activity of the backend with
+ * the specified PID. Like the function above, but reads shared memory with
+ * the expectation that it may be corrupt. Returns either a pointer to a
+ * constant string, or writes into the 'buffer' argument and returns it.
+ *
+ * This function is only intended to be used by postmaster to report the
+ * query that crashed the backend. In particular, no attempt is made to
+ * follow the correct concurrency protocol when accessing the
+ * BackendStatusArray. But that's OK, in the worst case we'll return a
+ * corrupted message. We also must take care not to trip on ereport(ERROR).
+ *
+ * Note: return strings for special cases match pg_stat_get_backend_activity.
+ * --
+ */
+const char *
+pgstat_get_crashed_backend_activity(int pid, char *buffer,
+ int len)
+{
+ volatile PgBackendStatus *beentry;
+ int i;
+
+ beentry = Back