Re: POC: GUC option for skipping shared buffers in core dumps

2023-09-25 Thread Andrey Lepikhov

Hi,

The current approach could be better because we want to use it on 
Windows/MacOS and other systems. So, I've tried to develop another 
strategy - detaching shmem and DSM blocks before executing the abort() 
routine.

As I can see, it helps and reduces the size of the core file.

--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index 5810f8825e..4d7bf2c0e4 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -325,7 +325,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
{
SetProcessingMode(NormalProcessing);
CheckerModeMain();
-   abort();
+   pg_abort();
}
 
/*
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index ed11e8be7f..34ac874ad0 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -197,7 +197,7 @@ main(int argc, char *argv[])
else
PostmasterMain(argc, argv);
/* the functions above should not return */
-   abort();
+   pg_abort();
 }
 
 
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index 54e9bfb8c4..fc32a6bb1b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1469,7 +1469,7 @@ PostmasterMain(int argc, char *argv[])
 */
ExitPostmaster(status != STATUS_OK);
 
-   abort();/* not reached */
+   pg_abort(); /* not reached */
 }
 
 
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index e250b0567e..3123b388ab 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -385,7 +385,7 @@ WalSndShutdown(void)
whereToSendOutput = DestNone;
 
proc_exit(0);
-   abort();/* keep the compiler 
quiet */
+   pg_abort(); /* keep the compiler 
quiet */
 }
 
 /*
diff --git a/src/backend/utils/error/assert.c b/src/backend/utils/error/assert.c
index 719dd7b309..f422d42440 100644
--- a/src/backend/utils/error/assert.c
+++ b/src/backend/utils/error/assert.c
@@ -19,6 +19,32 @@
 #include 
 #endif
 
+#include 
+#include 
+
+int core_dump_no_shared_buffers = COREDUMP_INCLUDE_ALL;
+
+/*
+ * Remember, at the same time someone can work with shared memory, write them 
to
+ * disk and so on.
+ */
+void
+pg_abort(void)
+{
+   if (core_dump_no_shared_buffers != COREDUMP_INCLUDE_ALL)
+   {
+   if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL ||
+   core_dump_no_shared_buffers == COREDUMP_EXCLUDE_DSM)
+   dsm_detach_all();
+
+   if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL ||
+   core_dump_no_shared_buffers == COREDUMP_EXCLUDE_SHMEM)
+   PGSharedMemoryDetach();
+   }
+
+   abort();
+}
+
 /*
  * ExceptionalCondition - Handles the failure of an Assert()
  *
@@ -63,5 +89,5 @@ ExceptionalCondition(const char *conditionName,
sleep(100);
 #endif
 
-   abort();
+   pg_abort();
 }
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8e1f3e8521..f6c863ca68 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -601,7 +601,7 @@ errfinish(const char *filename, int lineno, const char 
*funcname)
 * children...
 */
fflush(NULL);
-   abort();
+   pg_abort();
}
 
/*
diff --git a/src/backend/utils/misc/guc_tables.c 
b/src/backend/utils/misc/guc_tables.c
index bdb26e2b77..95e205e8d1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -427,6 +427,14 @@ static const struct config_enum_entry 
debug_logical_replication_streaming_option
{NULL, 0, false}
 };
 
+static const struct config_enum_entry core_dump_no_shared_buffers_options[] = {
+   {"none", COREDUMP_INCLUDE_ALL, false},
+   {"shmem", COREDUMP_EXCLUDE_SHMEM, false},
+   {"dsm", COREDUMP_EXCLUDE_DSM, false},
+   {"all", COREDUMP_EXCLUDE_ALL, false},
+   {NULL, 0, false}
+};
+
 StaticAssertDecl(lengthof(ssl_protocol_versions_info) == (PG_TLS1_3_VERSION + 
2),
 "array length mismatch");
 
@@ -4971,6 +4979,17 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
 
+   {
+   {"core_dump_no_shared_buffers", PGC_POSTMASTER, 
DEVELOPER_OPTIONS,
+   NULL,
+   NULL,
+   GUC_NOT_IN_SAMPLE
+   },
+   _dump_no_shared_buffers,
+   COREDUMP_INCLUDE_ALL, core_dump_no_shared_buffers_options,
+   NULL, NULL, 

Re: POC: GUC option for skipping shared buffers in core dumps

2023-09-13 Thread Lepikhov Andrei



On Wed, Feb 12, 2020, at 7:55 AM, tsunakawa.ta...@fujitsu.com wrote:
> From: Craig Ringer 
>> Currently my options are "dump all shmem including shared_buffers" or
>> "dump no shmem". But I usually want "dump all shmem except
>> shared_buffers". It's tolerable to just dump s_b on a test system with
>> a small s_b, but if enabling coredumps to track down some
>> hard-to-repro crash on a production system I really don't want 20GB
>> coredumps...
>
> We have a simple implementation that allows to exclude shared memory.  
> That's been working for years.
>
> [postgresql.conf]
> core_directory = 'location of core dumps'
> core_contents = '{none | minimum | full}'
> # none = doesn't dump core, minimum excludes shared memory, and full dumps all
>
> I can provide it.  But it simply changes the current directory and 
> detaches shared memory when postgres receives signals that dump core.
>
> I made this GUC because Windows also had to be dealt with.

If it's still possible, share your patch here. I don't know what about the 
core, but during development, especially the bug-fixing process, it is really 
dull to wait for the core generation process every time, even if you debug a 
planner issue and are not interested in shared memory blocks ...

-- 
Regards,
Andrei Lepikhov




RE: POC: GUC option for skipping shared buffers in core dumps

2020-02-11 Thread tsunakawa.ta...@fujitsu.com
From: Craig Ringer 
> Currently my options are "dump all shmem including shared_buffers" or
> "dump no shmem". But I usually want "dump all shmem except
> shared_buffers". It's tolerable to just dump s_b on a test system with
> a small s_b, but if enabling coredumps to track down some
> hard-to-repro crash on a production system I really don't want 20GB
> coredumps...

We have a simple implementation that allows to exclude shared memory.  That's 
been working for years.

[postgresql.conf]
core_directory = 'location of core dumps'
core_contents = '{none | minimum | full}'
# none = doesn't dump core, minimum excludes shared memory, and full dumps all

I can provide it.  But it simply changes the current directory and detaches 
shared memory when postgres receives signals that dump core.

I made this GUC because Windows also had to be dealt with.


From: Andres Freund 
> > Hah.  This argument boils down to saying our packagers suck :-)
> 
> Hm? I'd say it's a sign of respect to not have each of them do the same
> work. Especially when they can't address it to the same degree core PG
> can. So maybe I'm saying we shouldn't be lazy ;)

Maybe we should add options to pg_ctl just like -c which is available now, so 
that OS packagers can easily use in their start scripts.  Or, can they just use 
pg_ctl's -o to specify new GUC parameters?


Regards
Takayuki Tsunakawa



Re: POC: GUC option for skipping shared buffers in core dumps

2020-02-10 Thread Craig Ringer
On Tue, 11 Feb 2020 at 03:07, Alexander Korotkov
 wrote:
>
> Hi!
>
> In Postgres Pro we have complaints about too large core dumps.  The
> possible way to reduce code dump size is to skip some information.
> Frequently shared buffers is most long memory segment in core dump.
> For sure, contents of shared buffers is required for discovering many
> of bugs.  But short core dump without shared buffers might be still
> useful.  If system appears to be not capable to capture full core
> dump, short core dump appears to be valuable option.
>
> Attached POC patch implements core_dump_no_shared_buffers GUC, which
> does madvise(MADV_DONTDUMP) for shared buffers.  Any thoughts?

I'd like this a lot. In fact I'd like it so much I kinda hope it'd be
considered backpatchable because coredump_filter is much too crude and
coarse grained.

Now that Pg has parallel query we all rely on shm_mq, DSM/DSA, etc.
It's increasingly problematic to have these areas left out of core
dumps in order to avoid bloating them with shared_buffers contents.
Doubly so if, like me, you work with extensions that make very heavy
use of shared memory areas for their own IPC.

Currently my options are "dump all shmem including shared_buffers" or
"dump no shmem". But I usually want "dump all shmem except
shared_buffers". It's tolerable to just dump s_b on a test system with
a small s_b, but if enabling coredumps to track down some
hard-to-repro crash on a production system I really don't want 20GB
coredumps...

Please, please apply.

Please backpatch, if you can possibly stand to do so.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: POC: GUC option for skipping shared buffers in core dumps

2020-02-10 Thread Andres Freund
Hi,

On 2020-02-10 18:21:30 -0300, Alvaro Herrera wrote:
> On 2020-Feb-10, Andres Freund wrote:
> 
> > Hi,
> > 
> > On 2020-02-10 17:31:47 -0300, Alvaro Herrera wrote:
> > > Yeah.  Maybe we should file bug reports against downstream packages to
> > > include a corefilter tweak.
> > 
> > Hm, I'm not sure that's a reasonable way to scale things. Nor am I
> > really sure that's the right granularity.
> 
> Hah.  This argument boils down to saying our packagers suck :-)

Hm? I'd say it's a sign of respect to not have each of them do the same
work. Especially when they can't address it to the same degree core PG
can. So maybe I'm saying we shouldn't be lazy ;)


> > > I don't know how easy is it to teach systemd to do this on its service
> > > files.
> > 
> > Well, you could just make it part of the command that starts the
> > server. Not aware of anything else.
> 
> I tried to do that, but couldn't figure out a clean way, because you
> have to do it after the fact (not in the process itself).  Maybe it's
> possible to have pg_ctl do it once postmaster is running.

Shouldn't it be sufficient to just do it to /proc/self/coredump_filter?
It's inherited IIUC?

Yep:
   A child process created via fork(2) inherits its parent's 
coredump_filter value; the coredump_filter value is preserved across an 
execve(2).


> > > But maybe we should have a way to disable the corefiltering.
> > 
> > There should, imo. That's why I was wondering about making this a GUC
> > (presumably suset).
> 
> Not really sure about suset ... AFAIR that means superuser can SET it;
> but what you really care about is more like ALTER SYSTEM, which is
> SIGHUP unless I misremember.

I really want both. Sometimes it's annoying to get followup coredumps by
other processes, even if I just want to get a corefile from one process
doing something more specific. It seems nice to alter that session /
user to have large coredumps, but not the rest?

Greetings,

Andres Freund




Re: POC: GUC option for skipping shared buffers in core dumps

2020-02-10 Thread Alvaro Herrera
On 2020-Feb-10, Andres Freund wrote:

> Hi,
> 
> On 2020-02-10 17:31:47 -0300, Alvaro Herrera wrote:
> > Yeah.  Maybe we should file bug reports against downstream packages to
> > include a corefilter tweak.
> 
> Hm, I'm not sure that's a reasonable way to scale things. Nor am I
> really sure that's the right granularity.

Hah.  This argument boils down to saying our packagers suck :-)

> > I don't know how easy is it to teach systemd to do this on its service
> > files.
> 
> Well, you could just make it part of the command that starts the
> server. Not aware of anything else.

I tried to do that, but couldn't figure out a clean way, because you
have to do it after the fact (not in the process itself).  Maybe it's
possible to have pg_ctl do it once postmaster is running.

> > FWIW I've heard that some people like to have shmem in core files to
> > improve debuggability, but it's *very* infrequent.
> 
> Oh, I pretty regularly want that. If you're debugging anthying that
> includes locks, page accesses, etc, it's pretty hard to succeed without?

eah kinda, I guess -- I don't remember cases when I've wanted to do
that in production systems.

> > But maybe we should have a way to disable the corefiltering.
> 
> There should, imo. That's why I was wondering about making this a GUC
> (presumably suset).

Not really sure about suset ... AFAIR that means superuser can SET it;
but what you really care about is more like ALTER SYSTEM, which is
SIGHUP unless I misremember.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: POC: GUC option for skipping shared buffers in core dumps

2020-02-10 Thread Andres Freund
Hi,

On 2020-02-10 17:31:47 -0300, Alvaro Herrera wrote:
> Yeah.  Maybe we should file bug reports against downstream packages to
> include a corefilter tweak.

Hm, I'm not sure that's a reasonable way to scale things. Nor am I
really sure that's the right granularity.


> My development helper script uses this
> 
> runpg_corefilter() {
> pid=$(head -1 $PGDATADIR/postmaster.pid)
> if [ ! -z "$pid" ]; then
> echo 0x01 > /proc/$pid/coredump_filter 
> fi
> }
> 
> I don't know how easy is it to teach systemd to do this on its service
> files.

Well, you could just make it part of the command that starts the
server. Not aware of anything else.


> FWIW I've heard that some people like to have shmem in core files to
> improve debuggability, but it's *very* infrequent.

Oh, I pretty regularly want that. If you're debugging anthying that
includes locks, page accesses, etc, it's pretty hard to succeed without?


> But maybe we should have a way to disable the corefiltering.

There should, imo. That's why I was wondering about making this a GUC
(presumably suset).

Greetings,

Andres Freund




Re: POC: GUC option for skipping shared buffers in core dumps

2020-02-10 Thread Alvaro Herrera
On 2020-Feb-10, Andres Freund wrote:

> Have you considered postmaster (or even just the GUC processing in each
> process) adjusting /proc/self/coredump_filter instead?
> 
> From the man page:
> 
>The  value  in the file is a bit mask of memory mapping types (see 
> mmap(2)).  If a bit is set in the mask, then memory mappings of the 
> corresponding
>type are dumped; otherwise they are not dumped.  The bits in this file 
> have the following meanings:
> 
>bit 0  Dump anonymous private mappings.
>bit 1  Dump anonymous shared mappings.
>bit 2  Dump file-backed private mappings.
>bit 3  Dump file-backed shared mappings.
>bit 4 (since Linux 2.6.24)
>   Dump ELF headers.
>bit 5 (since Linux 2.6.28)
>   Dump private huge pages.
>bit 6 (since Linux 2.6.28)
>   Dump shared huge pages.
>bit 7 (since Linux 4.4)
>   Dump private DAX pages.
>bit 8 (since Linux 4.4)
>   Dump shared DAX pages.
> 
> You can also incorporate this into the start script for postgres today.

Yeah.  Maybe we should file bug reports against downstream packages to
include a corefilter tweak.

My development helper script uses this

runpg_corefilter() {
pid=$(head -1 $PGDATADIR/postmaster.pid)
if [ ! -z "$pid" ]; then
echo 0x01 > /proc/$pid/coredump_filter 
fi
}

I don't know how easy is it to teach systemd to do this on its service
files.

FWIW I've heard that some people like to have shmem in core files to
improve debuggability, but it's *very* infrequent.  But maybe we should have
a way to disable the corefiltering.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: POC: GUC option for skipping shared buffers in core dumps

2020-02-10 Thread Andres Freund
Hi,

On 2020-02-10 22:07:13 +0300, Alexander Korotkov wrote:
> In Postgres Pro we have complaints about too large core dumps.

I've seen those too, and I've had them myself. It's pretty frustrating
if a core dump makes the machine unusable for half an hour while said
coredump is being written out...


> The possible way to reduce code dump size is to skip some information.
> Frequently shared buffers is most long memory segment in core dump.
> For sure, contents of shared buffers is required for discovering many
> of bugs.  But short core dump without shared buffers might be still
> useful.  If system appears to be not capable to capture full core
> dump, short core dump appears to be valuable option.

It's possibly interesting, in the interim at least, that enabling huge
pages on linux has the effect that pages aren't included in core dumps
by default.


> Attached POC patch implements core_dump_no_shared_buffers GUC, which
> does madvise(MADV_DONTDUMP) for shared buffers.  Any thoughts?

Hm. Not really convinced by this. The rest of shared memory is still
pretty large, and this can't be tuned at runtime.

Have you considered postmaster (or even just the GUC processing in each
process) adjusting /proc/self/coredump_filter instead?

>From the man page:

   The  value  in the file is a bit mask of memory mapping types (see 
mmap(2)).  If a bit is set in the mask, then memory mappings of the 
corresponding
   type are dumped; otherwise they are not dumped.  The bits in this file 
have the following meanings:

   bit 0  Dump anonymous private mappings.
   bit 1  Dump anonymous shared mappings.
   bit 2  Dump file-backed private mappings.
   bit 3  Dump file-backed shared mappings.
   bit 4 (since Linux 2.6.24)
  Dump ELF headers.
   bit 5 (since Linux 2.6.28)
  Dump private huge pages.
   bit 6 (since Linux 2.6.28)
  Dump shared huge pages.
   bit 7 (since Linux 4.4)
  Dump private DAX pages.
   bit 8 (since Linux 4.4)
  Dump shared DAX pages.

You can also incorporate this into the start script for postgres today.


> +static Size ShmemPageSize = FPM_PAGE_SIZE;

I am somewhat confused by the use of FPM_PAGE_SIZE? What does this have
to do with any of this? Is it just because it's set to 4kb by default?


>  /*
> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> index 8228e1f3903..c578528b0bb 100644
> --- a/src/backend/utils/misc/guc.c
> +++ b/src/backend/utils/misc/guc.c
> @@ -2037,6 +2037,19 @@ static struct config_bool ConfigureNamesBool[] =
>   NULL, NULL, NULL
>   },
>  
> +#if HAVE_DECL_MADV_DONTDUMP
> + {
> + {"core_dump_no_shared_buffers", PGC_POSTMASTER, 
> DEVELOPER_OPTIONS,
> + gettext_noop("Exclude shared buffers from core dumps."),
> + NULL,
> + GUC_NOT_IN_SAMPLE
> + },
> + _dump_no_shared_buffers,
> + false,
> + NULL, NULL, NULL
> + },
> +#endif

IMO it's better to have GUCs always present, but don't allow them to be
enabled if prerequisites aren't fulfilled.

Greetings,

Andres Freund