[HACKERS] Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files
"Joel Burton" [EMAIL PROTECTED] writes: On 25 Nov 2000, at 17:35, Tom Lane wrote: Ugh. The reason that removing the socket file allowed a second postmaster to start up is that we use an advisory lock on the socket file as the interlock that prevents two PMs on the same port number. Remove the socket file, poof no interlock. *However*, there is a second line of defense to prevent two postmasters in the same directory, and I don't understand why that didn't trigger. Unless you are running a version old enough to not have it. What PG version is this, anyway? 7.1devel, from about 1 week ago. Ah, I see why the data-directory interlock file wasn't helping: it wasn't checked until *after* shared memory was set up (read clobbered :-(). This was not a very bright choice. I'm still surprised that the shared-memory reset should've trashed your database so thoroughly, though. Over the past two days I've committed changes that should make the data directory, socket file, and shared memory interlocks considerably more robust. In particular, mechanically doing "rm -f /tmp/.s.PGSQL.5432" should never be necessary anymore. Sorry about your trouble... BTW, your original message mentioned something about a recursive view definition that wasn't being recognized as such. Could you provide details on that? regards, tom lane
[HACKERS] Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files
Ah, I see why the data-directory interlock file wasn't helping: it wasn't checked until *after* shared memory was set up (read clobbered :-(). This was not a very bright choice. I'm still surprised that the shared-memory reset should've trashed your database so thoroughly, though. Over the past two days I've committed changes that should make the data directory, socket file, and shared memory interlocks considerably more robust. In particular, mechanically doing "rm -f /tmp/.s.PGSQL.5432" should never be necessary anymore. That's fantastic. Thanks for the quick fix. BTW, your original message mentioned something about a recursive view definition that wasn't being recognized as such. Could you provide details on that? I can't. It's a few weeks ago, the database has been in furious development, and, of course, I didn't bother to save all those views that crashed my server. I keep trying to re-create it, but can't figure it out. I'm sorry. I think it wasn't just two views pointing at each other (it would, of course, be next to impossible to even create those, unless you hand tweaked the system tables), but I think was a view-relies-on-a- function-relies-on-a-view kind of problem. If I ever see it again, I'll save it. Thanks! -- Joel Burton, Director of Information Systems -*- [EMAIL PROTECTED] Support Center of Washington (www.scw.org)
[HACKERS] Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files
"Joel Burton" [EMAIL PROTECTED] writes: I think it wasn't just two views pointing at each other (it would, of course, be next to impossible to even create those, unless you hand tweaked the system tables), but I think was a view-relies-on-a- function-relies-on-a-view kind of problem. Oh, OK. I wouldn't expect the rewriter to realize that that sort of situation is recursive. Depending on what your function is doing, it might or might not be an infinite recursion, so I don't think I'd want the system arbitrarily preventing you from doing this sort of thing. Perhaps there should be an upper bound on function-call recursion depth enforced someplace? Not sure. regards, tom lane
Re: [HACKERS] Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files
On Sat, Nov 25, 2000 at 07:41:52PM -0500, Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: Actually, this turns out to be similar to what you wrote in http://www.postgresql.org/mhonarc/pgsql-hackers/1998-08/msg00835.html Well, we've talked before about moving the socket files to someplace safer than /tmp. The problem is to find another place that's not platform-dependent --- else you've got a major configuration headache. Could this be described in e.g. /etc/postgresql/pg_client.conf? a la the dbname idea? I cant remember the exact terminology, but there is a configuration file for clients, set at compile time where are set the connection params for clients. - [db_foo] type=inet host=srv3.devel.net port=1234 # there should be a way of specifing dbname later too database=asdf [db_baz] type=unix socket=/var/lib/postgres/comm/db_baz Also there should be possible to give another configuration file with env vars or command-line parameters. Well, just a idea. -- marko
[HACKERS] Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files
* Tom Lane [EMAIL PROTECTED] [001125 16:37]: "Joel Burton" [EMAIL PROTECTED] writes: This story does indicate that we need a less fragile interlock against starting two postmasters on one database. I have to admit that it hadn't occurred to me that you could break the port-number interlock so easily as that :-(. But obviously you can, so we need a different way of representing the interlock. Hackers, any thoughts? how about a .pid/.port/.??? file in the /data directory, and a lock on that? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
Re: [HACKERS] Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.*files
Tom Lane writes: There is a related issue on my todo list, though --- didn't we find out awhile back that some older Linux kernels crash and burn if one attempts to get an advisory lock on a socket file? (See thread 7/6/00) Were we going to fix that, and if so how? Or will we just tell people that they have to update their kernel to run Postgres? The current configure script "works around" this by disabling the advisory lock on *all* versions of Linux, which I regard as a completely unacceptable solution... Firstly, AFAIK there's no official production kernel that fixes this. When and if it gets fixed we can change that logic. I have simple test program that exhibits the problem (taken from the kernel mailing list), but a) You shouldn't run test programs in configure. b) You really shouldn't run test programs in configure that set up networking connections. c) You definitely shouldn't run test programs in configure that provoke kernel exceptions. We could use flock() on Linux, though. Maybe we could name the socket file .s.PGSQL.port.pid and make .s.PGSQL.port a symlink. Then you can find out whether the postmaster that created the file is still running. (You could even put the actual socket file into the data directory, although that would require re-thinking the file permissions on the latter.) Actually, this turns out to be similar to what you wrote in http://www.postgresql.org/mhonarc/pgsql-hackers/1998-08/msg00835.html But we really should be fixing the IPC interlock with IPC_EXCL, but the code changes look to be non-trivial. -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/
Re: [HACKERS] Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files
Peter Eisentraut [EMAIL PROTECTED] writes: Maybe we could name the socket file .s.PGSQL.port.pid and make .s.PGSQL.port a symlink. Then you can find out whether the postmaster that created the file is still running. Or just create a lockfile /tmp/.s.PGSQL.port#.lock, ie, same name as socket file with ".lock" added (containing postmaster's PID). Then we could share code with the data-directory-lockfile case. Actually, this turns out to be similar to what you wrote in http://www.postgresql.org/mhonarc/pgsql-hackers/1998-08/msg00835.html Well, we've talked before about moving the socket files to someplace safer than /tmp. The problem is to find another place that's not platform-dependent --- else you've got a major configuration headache. But we really should be fixing the IPC interlock with IPC_EXCL, but the code changes look to be non-trivial. AFAIR the previous thread, it wasn't that bad, it was just a matter of someone taking the time to do it. Maybe I'll have a go at it... regards, tom lane