On 2013-09-24 12:19:51 -0400, Robert Haas wrote: > On Fri, Sep 20, 2013 at 7:44 AM, Andres Freund <and...@2ndquadrant.com> wrote: > >> > Hm, I guess you dont't want to add it to global/ or so because of the > >> > mmap implementation where you presumably scan the directory? > >> > >> Yes, and also because I thought this way would make it easier to teach > >> things like pg_basebackup (or anybody's home-brew scripts) to just > >> skip that directory completely. Actually, I was wondering if we ought > >> to have a directory under pgdata whose explicit charter it was to > >> contain files that shouldn't be copied as part of a base backup. > >> pg_do_not_back_this_up. > > > > Wondered exactly about that as soon as you've mentioned > > pg_basebackup. pg_local/? > > That seems reasonable. It's not totally transparent what that's > supposed to mean, but it's fairly mnemonic once you know. Other > suggestions welcome, if anyone has ideas.
pg_node_local/ was the only reasonable thing I could think of otherwise, and I disliked it because it seems we shouldn't introduce "node" as a term just for this. > Are there any other likely candidates for inclusion in that directory > other than this stuff? You could argue that pg_stat_tmp/ is one. > >> > Why are you using open() and not > >> > BasicOpenFile or even OpenTransientFile? > >> > >> Because those don't work in the postmaster. > > > > Oh, that's news to me. Seems strange, especially for BasicOpenFile. > Per its header comment, InitFileAccess is not called in the > postmaster, so there's no VFD cache. Thus, any attempt by > BasicOpenFile to call ReleaseLruFile would be pointless at best. Well, but it makes code running in both backends and postmaster easier to write. Good enough for me anyway. > >> > Imo that's a PANIC or at the very least a FATAL. > >> > >> Sure, that's a tempting option, but it doesn't seem to serve any very > >> necessary point. There's no data corruption problem if we proceed > >> here. Most likely either (1) there's a bug in the code, which > >> panicking won't fix or (2) the DBA hand-edited the state file, in > >> which case maybe he shouldn't have done that, but if he thinks the > >> best way to recover from that is a cluster-wide restart, he can do > >> that himself. > > > > "There's no data corruption problem if we proceed" - but there likely > > has been one leading to the current state. > > I doubt it. It's more likely that the file permissions got changed or > something. We panic in that case during a shutdown, don't we? ... Yep: PANIC: could not open control file "global/pg_control": Permission denied > > I am not talking about lwlocks itself being setup but an environment > > that has resource owners defined and catches errors. I am specifically > > asking because you're a) ereport()ing without releasing an LWLock b) > > unconditionally relying on the fact that there's a current resource > > owner. > > In shared_preload_libraries neither is the case afair? > I don't really feel like solving all of those problems and, TBH, I > don't see why it's particularly important. If somebody wants a > loadable module that can be loaded either from > shared_preload_libraries or on the fly, and they use dynamic shared > memory in the latter case, then they can use it in the former case as > well. If they've already got logic to create the DSM when it's first > needed, it doesn't cost extra to do it that way in both cases. Fair enough. > >> They'll continue to see the portion they have mapped, but must do > >> dsm_remap() if they want to see the whole thing. > > > > But resizing can shrink, can it not? And we do an ftruncate() in at > > least the posix shmem case. Which means the other backend will get a > > SIGSEGV accessing that memory IIRC. > Yep. Shrinking the shared memory segment will require special > caution. Caveat emptor. Then a comment to that effect would be nice. > > We're not talking about a missed munmap() but about one that failed. If > > we unpin the leaked pins and notice that we haven't actually pinned it > > anymore we do error (well, Assert) out. Same for TupleDescs. > > > > If there were valid scenarios in which you could get into that > > situation, maybe. But which would that be? ISTM we can only get there if > > our internal state is messed up. > I don't know. I think that's part of why it's hard to decide what we > want to happen. But personally I think it's paranoid to say, well, > something happened that we weren't expecting, so that must mean > something totally horrible has happened and we'd better die in a fire. Well, by that argument we wouldn't need to PANIC on a whole host of issues. Like segfaults. Anyway, I guess we need other opinions here. > >> > Why isn't the port number part of the posix shmem identifiers? Sure, we > >> > retry, but using a logic similar to sysv_shmem.c seems like a good idea. > >> > >> According to the man page for shm_open on Solaris, "For maximum > >> portability, name should include no more than 14 characters, but this > >> limit is not enforced." > > > > What about "/pgsql.%u" or something similar? That should still fit. > > Well, if you want both the port and the identifier in there, that > doesn't get you there. Port seems enough to start with - most machines are configured to only have one cluster starting on one port. That way we wouldn't always get conflicts but just if somebody does something crazy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers