Re: margay fails assertion in stats/dsa/dsm code

2024-02-21 Thread Thomas Munro
On Sat, Jul 2, 2022 at 11:10 AM Thomas Munro  wrote:
> On Sat, Jul 2, 2022 at 1:15 AM Robert Haas  wrote:
> > Changing the default on certain platforms to 'posix' or 'sysv'
> > according to what works best on that platform seems reasonable to me.
>
> Ok, I'm going to make that change in 15 + master.

For the record, I asked a Solaris kernel engineer about that
shm_open() problem and learned that a fix shipped about a month after
we had this discussion (though I haven't tested it myself):

https://twitter.com/casperdik/status/1730288613722562986

I also reported the issue to illumos, since I'd like to be able to
revert 94ebf811 eventually...:

https://www.illumos.org/issues/16093




Re: margay fails assertion in stats/dsa/dsm code

2022-07-01 Thread Andres Freund
Hi,

On 2022-07-02 11:10:07 +1200, Thomas Munro wrote:
> 2022-07-01 18:25:25.848 CEST [27738:21] pg_regress/prepared_xacts
> ERROR:  could not open shared memory segment "/PostgreSQL.499018794":
> File exists
> 2022-07-01 18:25:25.848 CEST [27738:22] pg_regress/prepared_xacts
> STATEMENT:  SELECT * FROM pxtest1;
> TRAP: FailedAssertion("!hash_table->find_locked", File: "dshash.c",
> Line: 312, PID: 27784)
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'ExceptionalCondition+0x64
> [0x1008bb8b0]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'dshash_detach+0x48
> [0x10058674c]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_detach_shmem+0x68
> [0x10075e630]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_shutdown_hook+0x94
> [0x10075989c]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'shmem_exit+0x84
> [0x100701198]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit_prepare+0x88
> [0x100701394]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit+0x4
> [0x10070148c]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'StartBackgroundWorker+0x150
> [0x10066957c]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'maybe_start_bgworkers+0x604
> [0x1006717ec]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'sigusr1_handler+0x190
> [0x100672510]
> 
> So that's an exception safety problem in dshash or pgstat's new usage
> thereof, which is arguably independent of Solaris and probably
> deserves a new thread.  You don't need Solaris to see it, you can just
> add in some random fault injection.

FWIW potentially relevant thread for that aspect: 
https://postgr.es/m/20220311012712.botrpsikaufzteyt%40alap3.anarazel.de

What do you think about the proposal at the end of that email?

Greetings,

Andres Freund




Re: margay fails assertion in stats/dsa/dsm code

2022-07-01 Thread Thomas Munro
On Sat, Jul 2, 2022 at 1:15 AM Robert Haas  wrote:
> Changing the default on certain platforms to 'posix' or 'sysv'
> according to what works best on that platform seems reasonable to me.

Ok, I'm going to make that change in 15 + master.

> I agree that defaulting to 'mmap' doesn't seem like a lot of fun,
> although I think it could be a reasonable choice on a platform where
> everything else is broken. You could alternatively try to fix 'posix'
> by adding some kind of code to work around that platform's
> deficiencies. Insert handwaving here.

I don't think that 'posix' mode is salvageable on Solaris, but a new
GUC to control where 'mmap' mode puts its files would be nice.  Then
you could set it to '/tmp' (or some other RAM disk), and you'd have
the same end result as shm_open() on that platform, without the lock
problem.  Perhaps someone could propose a patch for 16.

As for the commit I already made, we can now see the new error:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay=2022-07-01%2016%3A00%3A07

2022-07-01 18:25:25.848 CEST [27784:1] ERROR:  could not open shared
memory segment "/PostgreSQL.499018794": File exists

Unfortunately this particular run crashed anyway, for a new reason:
one backend didn't like the state the new error left the dshash in,
during shmem_exit:

2022-07-01 18:25:25.848 CEST [27738:21] pg_regress/prepared_xacts
ERROR:  could not open shared memory segment "/PostgreSQL.499018794":
File exists
2022-07-01 18:25:25.848 CEST [27738:22] pg_regress/prepared_xacts
STATEMENT:  SELECT * FROM pxtest1;
TRAP: FailedAssertion("!hash_table->find_locked", File: "dshash.c",
Line: 312, PID: 27784)
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'ExceptionalCondition+0x64
[0x1008bb8b0]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'dshash_detach+0x48
[0x10058674c]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_detach_shmem+0x68
[0x10075e630]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_shutdown_hook+0x94
[0x10075989c]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'shmem_exit+0x84
[0x100701198]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit_prepare+0x88
[0x100701394]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit+0x4
[0x10070148c]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'StartBackgroundWorker+0x150
[0x10066957c]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'maybe_start_bgworkers+0x604
[0x1006717ec]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'sigusr1_handler+0x190
[0x100672510]

So that's an exception safety problem in dshash or pgstat's new usage
thereof, which is arguably independent of Solaris and probably
deserves a new thread.  You don't need Solaris to see it, you can just
add in some random fault injection.




Re: margay fails assertion in stats/dsa/dsm code

2022-07-01 Thread Robert Haas
On Thu, Jun 30, 2022 at 10:34 PM Thomas Munro  wrote:
> So... I think we should select a different default
> dynamic_shared_memory_type in initdb.c if defined(__sun__).  Which is
> the least terrible?  For sysv, it looks like all the relevant sysctls
> that used to be required to use sysv memory became obsolete/automatic
> in Sol 10 (note: Sol 9 is long EOL'd), so it should just work AFAICT,
> whereas for mmap mode your shared memory data is likely to cause file
> I/O because we put the temporary files in your data directory.  I'm
> thinking perhaps we should default to dynamic_shared_memory_type=sysv
> for 15+.  I don't really want to change it in the back branches, since
> nobody has actually complained about "posix" performance and it might
> upset someone if we change it for newly initdb'd DBs in a major
> release series.  But I'm not an expert or even user of this OS, I'm
> just trying to fix the build farm; better ideas welcome.

Boy, relying on DSM for critical stuff sure is a lot of fun! This is
exactly why I hate adding new facilities that have to be implemented
in OS-dependent ways.

Changing the default on certain platforms to 'posix' or 'sysv'
according to what works best on that platform seems reasonable to me.
I agree that defaulting to 'mmap' doesn't seem like a lot of fun,
although I think it could be a reasonable choice on a platform where
everything else is broken. You could alternatively try to fix 'posix'
by adding some kind of code to work around that platform's
deficiencies. Insert handwaving here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: margay fails assertion in stats/dsa/dsm code

2022-06-30 Thread Thomas Munro
On Fri, Jul 1, 2022 at 4:02 AM Robert Haas  wrote:
> On Wed, Jun 29, 2022 at 12:01 AM Thomas Munro  wrote:
> > -   if (errno != EEXIST)
> > +   if (op == DSM_OP_ATTACH || errno != EEXIST)
> > ereport(elevel,
> > 
> > (errcode_for_dynamic_shared_memory(),
> >  errmsg("could not open shared
> > memory segment \"%s\": %m",
> >
> > margay would probably still fail until that underlying problem is
> > addressed, but less mysteriously on our side at least.
>
> That seems like a correct fix, but maybe we should also be checking
> the return value of dsm_impl_op() e.g. define dsm_impl_op_error() as
> an inline function that does if (!dsm_impl_op(..., ERROR)) elog(ERROR,
> "the author of dsm.c is not as clever as he thinks he is").

Thanks.  Also the mmap and sysv paths do something similar, so I also
made the same change there just on principle.  I didn't make the extra
belt-and-braces check you suggested for now, preferring minimalism.  I
think the author of dsm.c was pretty clever, it's just that the world
turned out to be more hostile than expected, in one very specific way.

Pushed.

So that should get us to a state where margay still fails
occasionally, but now with an ERROR rather than a crash.

Next up, I confirmed my theory about what's happening on closed
Solaris by tracing syscalls.  It is indeed that clunky sleep(1) code
that gives up after 64 tries.  Even in pre-shmem-stats releases that
don't contend enough to reach the bogus EEXIST error, I'm pretty sure
people must be getting random sleeps injected into their parallel
queries in the wild by this code.

I have concluded that that implementation of shm_open() is not really
usable for our purposes.  We'll have to change *something* to turn
margay reliably green, not to mention bogus error reports we can
expect from 15 in the wild, and performance woes that I cannot now
unsee.

So... I think we should select a different default
dynamic_shared_memory_type in initdb.c if defined(__sun__).  Which is
the least terrible?  For sysv, it looks like all the relevant sysctls
that used to be required to use sysv memory became obsolete/automatic
in Sol 10 (note: Sol 9 is long EOL'd), so it should just work AFAICT,
whereas for mmap mode your shared memory data is likely to cause file
I/O because we put the temporary files in your data directory.  I'm
thinking perhaps we should default to dynamic_shared_memory_type=sysv
for 15+.  I don't really want to change it in the back branches, since
nobody has actually complained about "posix" performance and it might
upset someone if we change it for newly initdb'd DBs in a major
release series.  But I'm not an expert or even user of this OS, I'm
just trying to fix the build farm; better ideas welcome.

Thoughts?




Re: margay fails assertion in stats/dsa/dsm code

2022-06-30 Thread Robert Haas
On Wed, Jun 29, 2022 at 12:01 AM Thomas Munro  wrote:
> As for whether PostgreSQL needs to do anything, perhaps we should
> ereport for this unexpected error as a matter of self-preservation, to
> avoid the NULL dereference you'd presumably get on a non-cassert build
> with the current coding?  Maybe just:
>
> -   if (errno != EEXIST)
> +   if (op == DSM_OP_ATTACH || errno != EEXIST)
> ereport(elevel,
> (errcode_for_dynamic_shared_memory(),
>  errmsg("could not open shared
> memory segment \"%s\": %m",
>
> margay would probably still fail until that underlying problem is
> addressed, but less mysteriously on our side at least.

That seems like a correct fix, but maybe we should also be checking
the return value of dsm_impl_op() e.g. define dsm_impl_op_error() as
an inline function that does if (!dsm_impl_op(..., ERROR)) elog(ERROR,
"the author of dsm.c is not as clever as he thinks he is").

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: margay fails assertion in stats/dsa/dsm code

2022-06-29 Thread Thomas Munro
On Wed, Jun 29, 2022 at 4:00 PM Thomas Munro  wrote:
> I suppose this could indicate that the machine and/or RAM disk is
> overloaded/swapping and one of those open() or unlink() calls is
> taking a really long time, and that could be fixed with some system
> tuning.

Hmm, I take that bit back.  Every backend that starts up is trying to
attach to the same segment, the one with the new pgstats stuff in it
(once the small space in the main shmem segment is used up and we
create a DSM segment).  There's no fairness/queue, random back-off or
guarantee of progress in that librt lock code, so you can get into
lock-step with other backends retrying, and although some waiter
always gets to make progress, any given backend can lose every round
and run out of retries.  Even when you're lucky and don't fail with an
undocumented incomprehensible error, it's very slow, and I'd
considering filing a bug report about that.  A work-around on
PostgreSQL would be to set dynamic_shared_memory_type to mmap (= we
just open our own files and map them directly), and making pg_dynshmem
a symlink to something under /tmp (or some other RAM disk) to avoid
touch regular disk file systems.




Re: margay fails assertion in stats/dsa/dsm code

2022-06-28 Thread Thomas Munro
On Wed, Jun 29, 2022 at 6:04 AM Robert Haas  wrote:
> My first thought was that the return value of the call to
> dsm_impl_op() at the end of dsm_attach() is not checked and that maybe
> it was returning NULL, but it seems like whoever wrote
> dsm_impl_posix() was pretty careful to ereport(elevel, ...) in every
> failure path, and elevel is ERROR here, so I don't see any issue. My
> second thought was that maybe control had escaped from dsm_attach()
> due to an error before we got to the step where we actually map the
> segment, but then the dsm_segment * would be returned to the caller.
> Maybe they could retrieve it later using dsm_find_mapping(), but that
> function has no callers in core.

Thanks for looking.  Yeah.  I also read through that code many times
and drew the same conclusion.

> So I'm kind of stumped too, but did you by any chance check whether
> there are any DSM-related messages in the logs before the assertion
> failure?

Marcel kindly granted me access to his test machine, where the failure
can be reproduced by running make check lots of times.  I eventually
figured out that the problem control flow is ... of course ... the one
path that doesn't ereport(), and that's when errno == EEXIST.  That is
a path that is intended to handle DSM_OP_CREATE.  Here we are handling
DSM_OP_ATTACH, and I have verified that we're passing in just O_RDWR.
EEXIST is a nonsensical error for shm_open() without flags containing
O_CREAT | O_EXCL (according to POSIX and Solaris's man page).

On this OS, shm_open() opens plain files in /tmp (normally a RAM disk,
so kinda like /dev/shm on Linux), that much I can tell with a plain
old "ls" command.  We can also read its long lost open source cousin
(which may be completely different for all I know, but I'd doubt it):

https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/rt/shm.c
https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/rt/pos4obj.c

Erm.  It looks like __pos4obj_lock() could possibly return -1 and
leave errno == EEXIST, if it runs out of retries?  Then shm_open()
would return -1, and we'd blow up.  However, for that to happen, one
of those "SHM_LOCK_TYPE" files would have to linger for 64 sleep
loops, and I'm not sure why that'd happen, or what to do about it.  (I
don't immediately grok what that lock file is even for.)

I suppose this could indicate that the machine and/or RAM disk is
overloaded/swapping and one of those open() or unlink() calls is
taking a really long time, and that could be fixed with some system
tuning.  I suppose it's also remotely possible that the process is
getting peppered with signals so that funky shell script-style locking
scheme is interrupted and doesn't really wait very long.  Or maybe I
guessed wrong and some other closed source path is to blame *shrug*.

As for whether PostgreSQL needs to do anything, perhaps we should
ereport for this unexpected error as a matter of self-preservation, to
avoid the NULL dereference you'd presumably get on a non-cassert build
with the current coding?  Maybe just:

-   if (errno != EEXIST)
+   if (op == DSM_OP_ATTACH || errno != EEXIST)
ereport(elevel,
(errcode_for_dynamic_shared_memory(),
 errmsg("could not open shared
memory segment \"%s\": %m",

margay would probably still fail until that underlying problem is
addressed, but less mysteriously on our side at least.




Re: margay fails assertion in stats/dsa/dsm code

2022-06-28 Thread Robert Haas
On Thu, Jun 2, 2022 at 8:06 PM Thomas Munro  wrote:
> I know that on Solaris we use dynamic_shared_memory=posix.  The other
> Solaris/Sparc system is wrasse, and it's not doing this.  I don't see
> it yet, but figured I'd report this much to the list in case someone
> else does.

My first thought was that the return value of the call to
dsm_impl_op() at the end of dsm_attach() is not checked and that maybe
it was returning NULL, but it seems like whoever wrote
dsm_impl_posix() was pretty careful to ereport(elevel, ...) in every
failure path, and elevel is ERROR here, so I don't see any issue. My
second thought was that maybe control had escaped from dsm_attach()
due to an error before we got to the step where we actually map the
segment, but then the dsm_segment * would be returned to the caller.
Maybe they could retrieve it later using dsm_find_mapping(), but that
function has no callers in core.

So I'm kind of stumped too, but did you by any chance check whether
there are any DSM-related messages in the logs before the assertion
failure?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: margay fails assertion in stats/dsa/dsm code

2022-06-28 Thread Marcel Hofstetter

Am 28.06.2022 um 08:27 schrieb Thomas Munro:

On Fri, Jun 3, 2022 at 12:05 PM Thomas Munro  wrote:

BF animal margay (a newly started Solaris 11.4/Sparc/GCC 11.2 box) is
sometimes failing with:

TRAP: FailedAssertion("seg->mapped_address != NULL", File: "dsm.c",
Line: 1069, PID: 9038)


I spent some time on the GCC farm machine gcc211 (Sol 11.3, GCC 5.5),
but could not repro this.  It's also not happening on wrasse (Sol
11.3, Sun Studio compiler).  I don't have access to a Sol 11.4
CBE/Sparc system like margay, but I have learned that CBE is the name
of a very recently announced rolling release intended for open source
developers[1].  I still have no idea if the active thing here is
Sparc, Sol 11.4, "CBE", GCC 11.2 or just timing conditions that reveal
bugs in our dsm/dsa/dshash/pgstat code that show up here in about 1/4
of make check runs on this stack, but miraculously nowhere else.
Perhaps margay's owner could shed some light, or has a way to provide
ssh access to a similar zone with a debugger etc installed?

[1] 
https://blogs.oracle.com/solaris/post/announcing-the-first-oracle-solaris-114-cbe



Looks like a timing issue for me, because it happens only sometimes.
No problems with versions 14 and 13.

I can provide ssh access to this system.





Re: margay fails assertion in stats/dsa/dsm code

2022-06-28 Thread Thomas Munro
On Fri, Jun 3, 2022 at 12:05 PM Thomas Munro  wrote:
> BF animal margay (a newly started Solaris 11.4/Sparc/GCC 11.2 box) is
> sometimes failing with:
>
> TRAP: FailedAssertion("seg->mapped_address != NULL", File: "dsm.c",
> Line: 1069, PID: 9038)

I spent some time on the GCC farm machine gcc211 (Sol 11.3, GCC 5.5),
but could not repro this.  It's also not happening on wrasse (Sol
11.3, Sun Studio compiler).  I don't have access to a Sol 11.4
CBE/Sparc system like margay, but I have learned that CBE is the name
of a very recently announced rolling release intended for open source
developers[1].  I still have no idea if the active thing here is
Sparc, Sol 11.4, "CBE", GCC 11.2 or just timing conditions that reveal
bugs in our dsm/dsa/dshash/pgstat code that show up here in about 1/4
of make check runs on this stack, but miraculously nowhere else.
Perhaps margay's owner could shed some light, or has a way to provide
ssh access to a similar zone with a debugger etc installed?

[1] 
https://blogs.oracle.com/solaris/post/announcing-the-first-oracle-solaris-114-cbe




margay fails assertion in stats/dsa/dsm code

2022-06-02 Thread Thomas Munro
Hi,

BF animal margay (a newly started Solaris 11.4/Sparc/GCC 11.2 box) is
sometimes failing with:

TRAP: FailedAssertion("seg->mapped_address != NULL", File: "dsm.c",
Line: 1069, PID: 9038)

I can't immediately see why it's doing this, but my tool that looks
for assertion failures hasn't seen that on any other system.   Example
stack (trimmed from log), in this case a regular backend, other times
it was a parallel worker:

TRAP: FailedAssertion("seg->mapped_address != NULL", File: "dsm.c",
Line: 1069, PID: 3944)
ExceptionalCondition+0x64 [0x1008bb348]
dsm_segment_address+0x44 [0x1006ff7d0]
get_segment_by_index+0x7c [0x1008ee960]
dsa_get_address+0x9c [0x1008ef754]
pgstat_get_entry_ref+0x1068 [0x10075f348]
pgstat_prep_pending_entry+0x58 [0x100758424]
pgstat_assoc_relation+0x44 [0x10075b314]
_bt_first+0x9ac [0x10036cd78]
btgettuple+0x10c [0x1003653a8]
index_getnext_tid+0x4c [0x1003531c4]
index_getnext_slot+0x78 [0x100353564]
systable_getnext+0x18 [0x1003519b4]
SearchCatCacheMiss+0x74 [0x10089ce18]
SearchCatCacheInternal+0x1c0 [0x10089d0a4]
GetSysCacheOid+0x34 [0x1008b5ca4]
get_role_oid+0x18 [0x100767444]
hba_getauthmethod+0x8 [0x100599da4]
ClientAuthentication+0x1c [0x10058cb68]
InitPostgres+0xacc [0x1008d26b8]
PostgresMain+0x94 [0x1007397f8]
ServerLoop+0x1184 [0x1006739e8]
PostmasterMain+0x1400 [0x10067520c]
main+0x2e0 [0x1005a28c0]
_start+0x64 [0x1002c5c44]

I know that on Solaris we use dynamic_shared_memory=posix.  The other
Solaris/Sparc system is wrasse, and it's not doing this.  I don't see
it yet, but figured I'd report this much to the list in case someone
else does.