Other things I perhaps ought to mention: Trying to stop the postmaster using pg_ctl fails (unsurprisingly, since pg_ctl relies on /var/pgsql/data/postmaster.pid, which contains a nonexistent PID); I haven't tried to start a new postmaster yet, because the old backends are hanging around.

Nor have I attempted to restart the web server, which might allow the hanging-round backends to die by closing the old connections it's holding to them. I'm tempted to go ahead and do this, though I'm not sure whether I ought to until I've diagnosed what's going on right now.

In case it's relevant, I've gone back through the logs and discovered that for the past week or so I've been occasionally running out of connections (I was running w/ the default of 16) and getting 'FATAL: Non-superuser connection limit exceeded errors' (about a dozen a day), but I can't find any other related messages in the logs.

If anyone has any suggestions, I'd really appreciate your input; I'm getting a bit antsy since my production database server is basically halfway down and users are wondering why their web pages don't work ...

-Charlie


Charles Hornberger wrote:
I've got what looks like a really strange situation on my hands (or else I've got a normal situation that I'm looking at strangely): It appears that the main postmaster process is dead & gone, but I have a bunch of backends still running.

I can't connect to the database server any more, but a bunch of old persistent connections (which are about four days old and which I think are being kept alive by my web server) are still up & running; at least some of them are serving data to web pages.

To wit:

[rhodes] data/$ /usr/ucb/ps axuw | grep post
postgres 9238 0.2 1.4 8664 5104 ? S Jun 13 3:13 /its/software/bin/postmaster
postgres 9268 0.1 1.4 8672 5144 ? S Jun 13 3:26 /its/software/bin/postmaster
postgres 8920 0.1 0.6 2480 2024 pts/0 R 11:08:26 0:00 bash
postgres 9237 0.1 1.4 8664 5104 ? S Jun 13 3:01 /its/software/bin/postmaster
root 5411 0.0 0.4 1904 1448 ? S Jun 09 0:00 /software/stow/postfix-2.0.10/libexec/postfix/master
postfix 5413 0.0 0.4 1992 1528 ? S Jun 09 0:00 qmgr -l -t fifo -u
postfix 8857 0.0 0.4 1960 1552 ? S 11:03:14 0:00 pickup -l -t fifo -u
postgres 9236 0.0 1.4 8664 5120 ? S Jun 13 3:12 /its/software/bin/postmaster
postgres 9243 0.0 1.5 8720 5584 ? S Jun 13 3:06 /its/software/bin/postmaster
postgres 9254 0.0 1.4 8656 5128 ? S Jun 13 3:22 /its/software/bin/postmaster
postgres 9278 0.0 1.4 8664 5192 ? S Jun 13 3:08 /its/software/bin/postmaster
postgres 9333 0.0 1.5 8672 5312 ? S Jun 13 3:33 /its/software/bin/postmaster
postgres 9379 0.0 1.4 8720 5176 ? S Jun 13 3:08 /its/software/bin/postmaster
postgres 9431 0.0 1.4 8672 5112 ? S Jun 13 3:18 /its/software/bin/postmaster
postgres 9877 0.0 0.0 2480 ? pts/0 R 11:47:15 0:00 bash


The file /var/pgsql/data/postmaster.pid claims that the postmaster's PID is 27215; there's no process with that PID running on my system.

Whenever I try to create a new connection, it fail:

[rhodes] data/$ psql template1
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
[rhodes] data/$ psql -h localhost template1
psql: could not connect to server: Connection refused
        Is the server running on host localhost and accepting
        TCP/IP connections on port 5432?

Any ideas on what I should do now? I'm running 7.3.2 on Solaris 7.

-Charlie


-- Charles Hornberger Caltech Division of the Humanities and Social Sciences M/C 228-77 Tel (626) 395-3474


---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Reply via email to