On Fri, Jul 1, 2022 at 4:02 AM Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Jun 29, 2022 at 12:01 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > > - if (errno != EEXIST) > > + if (op == DSM_OP_ATTACH || errno != EEXIST) > > ereport(elevel, > > > > (errcode_for_dynamic_shared_memory(), > > errmsg("could not open shared > > memory segment \"%s\": %m", > > > > margay would probably still fail until that underlying problem is > > addressed, but less mysteriously on our side at least. > > That seems like a correct fix, but maybe we should also be checking > the return value of dsm_impl_op() e.g. define dsm_impl_op_error() as > an inline function that does if (!dsm_impl_op(..., ERROR)) elog(ERROR, > "the author of dsm.c is not as clever as he thinks he is").
Thanks. Also the mmap and sysv paths do something similar, so I also made the same change there just on principle. I didn't make the extra belt-and-braces check you suggested for now, preferring minimalism. I think the author of dsm.c was pretty clever, it's just that the world turned out to be more hostile than expected, in one very specific way. Pushed. So that should get us to a state where margay still fails occasionally, but now with an ERROR rather than a crash. Next up, I confirmed my theory about what's happening on closed Solaris by tracing syscalls. It is indeed that clunky sleep(1) code that gives up after 64 tries. Even in pre-shmem-stats releases that don't contend enough to reach the bogus EEXIST error, I'm pretty sure people must be getting random sleeps injected into their parallel queries in the wild by this code. I have concluded that that implementation of shm_open() is not really usable for our purposes. We'll have to change *something* to turn margay reliably green, not to mention bogus error reports we can expect from 15 in the wild, and performance woes that I cannot now unsee. So... I think we should select a different default dynamic_shared_memory_type in initdb.c if defined(__sun__). Which is the least terrible? For sysv, it looks like all the relevant sysctls that used to be required to use sysv memory became obsolete/automatic in Sol 10 (note: Sol 9 is long EOL'd), so it should just work AFAICT, whereas for mmap mode your shared memory data is likely to cause file I/O because we put the temporary files in your data directory. I'm thinking perhaps we should default to dynamic_shared_memory_type=sysv for 15+. I don't really want to change it in the back branches, since nobody has actually complained about "posix" performance and it might upset someone if we change it for newly initdb'd DBs in a major release series. But I'm not an expert or even user of this OS, I'm just trying to fix the build farm; better ideas welcome. Thoughts?