Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-29 Thread Bruce Momjian
On Wed, Aug 29, 2012 at 12:56:26AM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
  It's a pretty strange line wrap you got in this version of the patch.
  Normally we just let the string run past the 78 char limit, without
  cutting it in any way.  And moving the start of the string to the line
  following errhint( looks very odd to me.
 
  OK, updated patch attached.
 
 I agree with Alvaro's complaint that moving the whole string literal to
 the next line isn't conformant to our usual coding style.  Definitely
 nitpicky, but why would you do it like that?

I remember now why I added \n.  I am used to writing pg_upgrade output
strings, which are obviously not sent to log files.  Seems I forgot that
distinction.  As far as moving the string to the next line, I was trying
to keep the line from getting too long.  

Updated patch has everyone on the same line.  I am fine with nitpicky. 
Frankly, I have applied so many patches in the past few weeks, I am glad
someone is watching.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..9a0f92c
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 766,771 
--- 766,779 
  			filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			ereport(FATAL,
+ 	(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 	 errmsg(lock file \%s\ is empty, filename),
+ 	 errhint(Either another server is starting, or the lock file is the remnant of a previous server startup crash.)));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index af8d8b2..81ba39e
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** get_pgpid(void)
*** 292,299 
  	}
  	if (fscanf(pidf, %ld, pid) != 1)
  	{
! 		write_stderr(_(%s: invalid data in PID file \%s\\n),
! 	 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);
--- 292,304 
  	}
  	if (fscanf(pidf, %ld, pid) != 1)
  	{
! 		/* Is the file empty? */
! 		if (ftell(pidf) == 0  feof(pidf))
! 			write_stderr(_(%s: the PID file \%s\ is empty\n),
! 		 progname, pid_file);
! 		else
! 			write_stderr(_(%s: invalid data in PID file \%s\\n),
! 		 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-29 Thread Bruce Momjian

Applied.

---

On Wed, Aug 29, 2012 at 08:51:40AM -0400, Bruce Momjian wrote:
 On Wed, Aug 29, 2012 at 12:56:26AM -0400, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
   It's a pretty strange line wrap you got in this version of the patch.
   Normally we just let the string run past the 78 char limit, without
   cutting it in any way.  And moving the start of the string to the line
   following errhint( looks very odd to me.
  
   OK, updated patch attached.
  
  I agree with Alvaro's complaint that moving the whole string literal to
  the next line isn't conformant to our usual coding style.  Definitely
  nitpicky, but why would you do it like that?
 
 I remember now why I added \n.  I am used to writing pg_upgrade output
 strings, which are obviously not sent to log files.  Seems I forgot that
 distinction.  As far as moving the string to the next line, I was trying
 to keep the line from getting too long.  
 
 Updated patch has everyone on the same line.  I am fine with nitpicky. 
 Frankly, I have applied so many patches in the past few weeks, I am glad
 someone is watching.
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +

 diff --git a/src/backend/utils/init/miscinit.c 
 b/src/backend/utils/init/miscinit.c
 new file mode 100644
 index 775d71f..9a0f92c
 *** a/src/backend/utils/init/miscinit.c
 --- b/src/backend/utils/init/miscinit.c
 *** CreateLockFile(const char *filename, boo
 *** 766,771 
 --- 766,779 
   filename)));
   close(fd);
   
 + if (len == 0)
 + {
 + ereport(FATAL,
 + (errcode(ERRCODE_LOCK_FILE_EXISTS),
 +  errmsg(lock file \%s\ is empty, 
 filename),
 +  errhint(Either another server is 
 starting, or the lock file is the remnant of a previous server startup 
 crash.)));
 + }
 + 
   buffer[len] = '\0';
   encoded_pid = atoi(buffer);
   
 diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
 new file mode 100644
 index af8d8b2..81ba39e
 *** a/src/bin/pg_ctl/pg_ctl.c
 --- b/src/bin/pg_ctl/pg_ctl.c
 *** get_pgpid(void)
 *** 292,299 
   }
   if (fscanf(pidf, %ld, pid) != 1)
   {
 ! write_stderr(_(%s: invalid data in PID file \%s\\n),
 !  progname, pid_file);
   exit(1);
   }
   fclose(pidf);
 --- 292,304 
   }
   if (fscanf(pidf, %ld, pid) != 1)
   {
 ! /* Is the file empty? */
 ! if (ftell(pidf) == 0  feof(pidf))
 ! write_stderr(_(%s: the PID file \%s\ is empty\n),
 !  progname, pid_file);
 ! else
 ! write_stderr(_(%s: invalid data in PID file \%s\\n),
 !  progname, pid_file);
   exit(1);
   }
   fclose(pidf);

 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-28 Thread Bruce Momjian
On Mon, Aug 27, 2012 at 10:17:43PM -0400, Bruce Momjian wrote:
 Seems pg_ctl would also need some cleanup if we change the error
 message and/or timing.
 
 I am thinking we should just change the error message in the postmaster
 and pg_ctl to say the file is empty, and call it done (no hint message).
 If we do want a hint, say that either the file is stale from a crash or
 another postmaster is starting up, and let the user diagnose it.

Updated patch attached which just reports the file as empty.  I assume
we don't want the extra text output for pg_ctl like we do for the
backend.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..9f0a58a
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 766,771 
--- 766,781 
  			filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			ereport(FATAL,
+ 	(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 	 errmsg(lock file \%s\ is empty, filename),
+ 	 errhint(
+ 	 Either another server is starting, or the lock file is the remnant\n
+ 	 of a previous server startup crash.)));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index af8d8b2..0be0f2d
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** get_pgpid(void)
*** 292,299 
  	}
  	if (fscanf(pidf, %ld, pid) != 1)
  	{
! 		write_stderr(_(%s: invalid data in PID file \%s\\n),
  	 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);
--- 292,304 
  	}
  	if (fscanf(pidf, %ld, pid) != 1)
  	{
! 		/* Is the file empty? */
! 		if (feof(pidf))
! 			write_stderr(_(%s: the PID file \%s\ is empty\n),
  	 progname, pid_file);
+ 		else
+ 			write_stderr(_(%s: invalid data in PID file \%s\\n),
+ 		 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-28 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Updated patch attached which just reports the file as empty.  I assume
 we don't want the extra text output for pg_ctl like we do for the
 backend.

The backend side of this looks mostly sane to me (but drop the \n,
messages are not supposed to contain those).  But the feof test proposed
for pg_ctl is no good: consider a file containing just, say, -.
fscanf would eat the -, then hit eof, and this would complain the file
is empty.  Possibly checking for ftell(pidf) == 0 would do, though I'm
not sure whether it's portable to assume fscanf would eat a non-numeric
character before complaining.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-28 Thread Bruce Momjian
On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Updated patch attached which just reports the file as empty.  I assume
  we don't want the extra text output for pg_ctl like we do for the
  backend.
 
 The backend side of this looks mostly sane to me (but drop the \n,
 messages are not supposed to contain those).  But the feof test proposed

Removed.  I thought we needed to add \n so that strings 80 would wrap
properly.  How do we handle this?

 for pg_ctl is no good: consider a file containing just, say, -.
 fscanf would eat the -, then hit eof, and this would complain the file
 is empty.  Possibly checking for ftell(pidf) == 0 would do, though I'm
 not sure whether it's portable to assume fscanf would eat a non-numeric
 character before complaining.

ftell() seems to work fine when combined with feof(), so I used that in
the attached patch.  ftell() alone remains at zero if the file contains
A, so feof() is also needed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..eadfcbf
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 766,771 
--- 766,781 
  			filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			ereport(FATAL,
+ 	(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 	 errmsg(lock file \%s\ is empty, filename),
+ 	 errhint(
+ 	 Either another server is starting, or the lock file is the remnant 
+ 	 of a previous server startup crash.)));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index af8d8b2..81ba39e
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** get_pgpid(void)
*** 292,299 
  	}
  	if (fscanf(pidf, %ld, pid) != 1)
  	{
! 		write_stderr(_(%s: invalid data in PID file \%s\\n),
! 	 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);
--- 292,304 
  	}
  	if (fscanf(pidf, %ld, pid) != 1)
  	{
! 		/* Is the file empty? */
! 		if (ftell(pidf) == 0  feof(pidf))
! 			write_stderr(_(%s: the PID file \%s\ is empty\n),
! 		 progname, pid_file);
! 		else
! 			write_stderr(_(%s: invalid data in PID file \%s\\n),
! 		 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-28 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
 The backend side of this looks mostly sane to me (but drop the \n,
 messages are not supposed to contain those).  But the feof test proposed

 Removed.  I thought we needed to add \n so that strings 80 would wrap
 properly.  How do we handle this?

We don't.  Per the message style guidelines, it's the responsibility of
a client frontend to line-wrap such text if it feels the need to.  The
backend has no business assuming that 80 characters (or any other
number) is where to wrap.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-28 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of mar ago 28 22:21:27 -0400 2012:
 On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   Updated patch attached which just reports the file as empty.  I assume
   we don't want the extra text output for pg_ctl like we do for the
   backend.
  
  The backend side of this looks mostly sane to me (but drop the \n,
  messages are not supposed to contain those).  But the feof test proposed
 
 Removed.

It's a pretty strange line wrap you got in this version of the patch.
Normally we just let the string run past the 78 char limit, without
cutting it in any way.  And moving the start of the string to the line
following errhint( looks very odd to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-28 Thread Bruce Momjian
On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
 Excerpts from Bruce Momjian's message of mar ago 28 22:21:27 -0400 2012:
  On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
Updated patch attached which just reports the file as empty.  I assume
we don't want the extra text output for pg_ctl like we do for the
backend.
   
   The backend side of this looks mostly sane to me (but drop the \n,
   messages are not supposed to contain those).  But the feof test proposed
  
  Removed.
 
 It's a pretty strange line wrap you got in this version of the patch.
 Normally we just let the string run past the 78 char limit, without
 cutting it in any way.  And moving the start of the string to the line
 following errhint( looks very odd to me.

OK, updated patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..efd5152
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 766,771 
--- 766,780 
  			filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			ereport(FATAL,
+ 	(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 	 errmsg(lock file \%s\ is empty, filename),
+ 	 errhint(
+ 	 Either another server is starting, or the lock file is the remnant of a previous server startup crash.)));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index af8d8b2..81ba39e
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** get_pgpid(void)
*** 292,299 
  	}
  	if (fscanf(pidf, %ld, pid) != 1)
  	{
! 		write_stderr(_(%s: invalid data in PID file \%s\\n),
! 	 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);
--- 292,304 
  	}
  	if (fscanf(pidf, %ld, pid) != 1)
  	{
! 		/* Is the file empty? */
! 		if (ftell(pidf) == 0  feof(pidf))
! 			write_stderr(_(%s: the PID file \%s\ is empty\n),
! 		 progname, pid_file);
! 		else
! 			write_stderr(_(%s: invalid data in PID file \%s\\n),
! 		 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-28 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
 It's a pretty strange line wrap you got in this version of the patch.
 Normally we just let the string run past the 78 char limit, without
 cutting it in any way.  And moving the start of the string to the line
 following errhint( looks very odd to me.

 OK, updated patch attached.

I agree with Alvaro's complaint that moving the whole string literal to
the next line isn't conformant to our usual coding style.  Definitely
nitpicky, but why would you do it like that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-27 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 I have developed the attached patch to report a zero-length file, as you
 suggested.

DIRECTORY_LOCK_FILE is entirely incorrect there.

Taking a step back, I don't think this message is much better than the
existing behavior of reporting bogus data.  Either way, it's not
obvious to typical users what the problem is or what to do about it.
If we're going to emit a special message I think it should be more user
friendly than this.

Perhaps something like:

FATAL: lock file foo is empty
HINT: This may mean that another postmaster was starting at the
same time.  If not, remove the lock file and try again.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-27 Thread Robert Haas
On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 I have developed the attached patch to report a zero-length file, as you
 suggested.

 DIRECTORY_LOCK_FILE is entirely incorrect there.

 Taking a step back, I don't think this message is much better than the
 existing behavior of reporting bogus data.  Either way, it's not
 obvious to typical users what the problem is or what to do about it.
 If we're going to emit a special message I think it should be more user
 friendly than this.

 Perhaps something like:

 FATAL: lock file foo is empty
 HINT: This may mean that another postmaster was starting at the
 same time.  If not, remove the lock file and try again.

The problem with this is that it gives the customer only one remedy,
which they will (if experience is any guide) try whether it is
actually correct to do so or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-27 Thread Alvaro Herrera
Excerpts from Robert Haas's message of lun ago 27 18:02:06 -0400 2012:
 On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Bruce Momjian br...@momjian.us writes:
  I have developed the attached patch to report a zero-length file, as you
  suggested.
 
  DIRECTORY_LOCK_FILE is entirely incorrect there.
 
  Taking a step back, I don't think this message is much better than the
  existing behavior of reporting bogus data.  Either way, it's not
  obvious to typical users what the problem is or what to do about it.
  If we're going to emit a special message I think it should be more user
  friendly than this.
 
  Perhaps something like:
 
  FATAL: lock file foo is empty
  HINT: This may mean that another postmaster was starting at the
  same time.  If not, remove the lock file and try again.
 
 The problem with this is that it gives the customer only one remedy,
 which they will (if experience is any guide) try whether it is
 actually correct to do so or not.

How about having it sleep for a short while, then try again?  I would
expect that it would cause the second postmaster to fail during the
second try, which is okay because the first one is then operational.
The problem, of course, is how long to sleep so that this doesn't fail
when load is high enough that the first postmaster still hasn't written
the file after the sleep.

Maybe

LOG:  lock file foo is empty, sleeping to retry
-- sleep 100ms and recheck
LOG:  lock file foo is empty, sleeping to retry
-- sleep, dunno, 1s, recheck
LOG:  lock file foo is empty, sleeping to retry
-- sleep maybe 5s? recheck
FATAL:  lock file foo is empty
HINT:  Is another postmaster running on data directory bar?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-27 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Perhaps something like:
 
 FATAL: lock file foo is empty
 HINT: This may mean that another postmaster was starting at the
 same time.  If not, remove the lock file and try again.

 The problem with this is that it gives the customer only one remedy,
 which they will (if experience is any guide) try whether it is
 actually correct to do so or not.

Which other remedy(s) do you think the hint should suggest?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-27 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 How about having it sleep for a short while, then try again?

I could get behind that, but I don't think the delay should be more than
100ms or so.  It's important for the postmaster to acquire the lock (or
not) pretty quickly, or pg_ctl is going to get confused.  If we keep it
short, we can also dispense with the log spam you were suggesting.

(Actually, I wonder if this type of scenario isn't going to confuse
pg_ctl already --- it might think the lockfile belongs to the postmaster
*it* started, not some pre-existing one.  Does that matter?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-27 Thread Bruce Momjian
On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  How about having it sleep for a short while, then try again?
 
 I could get behind that, but I don't think the delay should be more than
 100ms or so.  It's important for the postmaster to acquire the lock (or
 not) pretty quickly, or pg_ctl is going to get confused.  If we keep it
 short, we can also dispense with the log spam you were suggesting.
 
 (Actually, I wonder if this type of scenario isn't going to confuse
 pg_ctl already --- it might think the lockfile belongs to the postmaster
 *it* started, not some pre-existing one.  Does that matter?)

I took Alvaro's approach of a sleep.  The file test was already in a
loop that went 100 times.  Basically, if the lock file exists, this
postmaster isn't going to succeed, so I figured there is no reason to
rush in the testing.  I gave it 5 tries with one second between
attempts.  Either the file is being populated, or it is stale and empty.

I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
to exit out of the postmaster should be fine.

Patch attached.

FYI, I noticed we have a similar 5-second creation time requirement in
pg_ctl:

/*
 * The postmaster should create postmaster.pid very soon after being
 * started.  If it's not there after we've waited 5 or more seconds,
 * assume startup failed and give up waiting.  (Note this covers both
 * cases where the pidfile was never created, and where it was created
 * and then removed during postmaster exit.)  Also, if there *is* a
 * file there but it appears stale, issue a suitable warning and give
 * up waiting.
 */
if (i = 5)

This is for the case where the file has an old pid, rather than it is
empty.

FYI, I fixed the filename problem Tom found.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..0309494
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 766,771 
--- 766,793 
  			filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			/*
+ 			 *	An empty lock file exits;  either is it from another postmaster
+ 			 *	that is still starting up, or left from a crash.  Check for
+ 			 *	five seconds, then if it still empty, it must be from a crash,
+ 			 *	so fail and recommend lock file removal.
+ 			 */
+ 			if (ntries  5)
+ 			{
+ sleep(1);
+ continue;
+ 			}
+ 			else
+ ereport(FATAL,
+ 		(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 		 errmsg(lock file \%s\ is empty, filename),
+ 		 errhint(
+ 		Empty lock file probably left from operating system crash during\n
+ 		database startup;  file deletion suggested.)));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-27 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
 I could get behind that, but I don't think the delay should be more than
 100ms or so.

 I took Alvaro's approach of a sleep.  The file test was already in a
 loop that went 100 times.  Basically, if the lock file exists, this
 postmaster isn't going to succeed, so I figured there is no reason to
 rush in the testing.  I gave it 5 tries with one second between
 attempts.  Either the file is being populated, or it is stale and empty.

How did 100ms translate to 5 seconds?

 I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
 to exit out of the postmaster should be fine.

pg_ctl is not the only consideration here.  In particular, there are a
lot of initscripts out there (all of Red Hat's, for instance) that don't
use pg_ctl and expect the postmaster to come up (or not) in a couple of
seconds.

I don't see a need for more than about one retry with 100ms delay.
There is no evidence that the case we're worried about has ever occurred
in the real world anyway, so slowing down error failures to make really
really really sure there's not a competing postmaster doesn't seem like
a good tradeoff.

I'm not terribly impressed with that errhint, either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-27 Thread Bruce Momjian
On Mon, Aug 27, 2012 at 09:59:10PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
  I could get behind that, but I don't think the delay should be more than
  100ms or so.
 
  I took Alvaro's approach of a sleep.  The file test was already in a
  loop that went 100 times.  Basically, if the lock file exists, this
  postmaster isn't going to succeed, so I figured there is no reason to
  rush in the testing.  I gave it 5 tries with one second between
  attempts.  Either the file is being populated, or it is stale and empty.
 
 How did 100ms translate to 5 seconds?

That was the no need to rush, let's just be sure of what we report.

  I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
  to exit out of the postmaster should be fine.
 
 pg_ctl is not the only consideration here.  In particular, there are a
 lot of initscripts out there (all of Red Hat's, for instance) that don't
 use pg_ctl and expect the postmaster to come up (or not) in a couple of
 seconds.
 
 I don't see a need for more than about one retry with 100ms delay.
 There is no evidence that the case we're worried about has ever occurred
 in the real world anyway, so slowing down error failures to make really
 really really sure there's not a competing postmaster doesn't seem like
 a good tradeoff.
 
 I'm not terribly impressed with that errhint, either.

I am concerned at 100ms that we can't be sure if it is still being
created, and if we can't be sure, I am not sure there is much point in
trying to clarify the odd error message we omit.

FYI, here is what the code does now with a zero-length pid file, with my
patch:

$ postmaster
[ wait 5 seconds ]
FATAL:  lock file postmaster.pid is empty
HINT:  Empty lock file probably left from operating system crash during
database startup;  file deletion suggested.
$ pg_ctl start
pg_ctl: invalid data in PID file /u/pgsql/data/postmaster.pid
$ pg_ctl -w start
pg_ctl: invalid data in PID file /u/pgsql/data/postmaster.pid

Seems pg_ctl would also need some cleanup if we change the error
message and/or timing.

I am thinking we should just change the error message in the postmaster
and pg_ctl to say the file is empty, and call it done (no hint message).
If we do want a hint, say that either the file is stale from a crash or
another postmaster is starting up, and let the user diagnose it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-08-26 Thread Bruce Momjian
On Fri, Jan  6, 2012 at 08:28:48AM -0500, Michael Beattie wrote:
 On Fri, Jan 6, 2012 at 6:13 AM, Magnus Hagander mag...@hagander.net wrote:
 
 On Thu, Jan 5, 2012 at 23:19, Tom Lane t...@sss.pgh.pa.us wrote:
  Magnus Hagander mag...@hagander.net writes:
  On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
  I think link(2) would create race conditions of its own.  I'd be
  inclined to suggest that maybe we should just special-case a zero
 length
  postmaster.pid file as meaning okay to proceed.  In general, garbage
 
  That's pretty much what I meant - but with a warning message.
 
  Actually ... wait a minute.  If we allow that, don't we create a race
  condition between two postmasters starting at almost the same instant?
  The second one could see the lock file when the first has created but
  not yet filled it.
 
 Good point, yeah, it should do that. But I still think it's rare
 enough that just special-casing the error message should be enough...
 
 
 
 so just something that does like
 
 stat(filename, st);
 size = st.st_size;
 if (size == 0)
elog(ERROR, lock file \%s\ has 0 length., filename);
 
 somewhere in CreateLockFile in miscinit.c? 

I have developed the attached patch to report a zero-length file, as you
suggested.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..7b7c141
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 765,770 
--- 765,772 
  	 errmsg(could not read lock file \%s\: %m,
  			filename)));
  		close(fd);
+ 		if (len == 0)
+ 			elog(FATAL, lock file \%s\ is empty, DIRECTORY_LOCK_FILE);
  
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-06 Thread Magnus Hagander
On Thu, Jan 5, 2012 at 23:19, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
 I think link(2) would create race conditions of its own.  I'd be
 inclined to suggest that maybe we should just special-case a zero length
 postmaster.pid file as meaning okay to proceed.  In general, garbage

 That's pretty much what I meant - but with a warning message.

 Actually ... wait a minute.  If we allow that, don't we create a race
 condition between two postmasters starting at almost the same instant?
 The second one could see the lock file when the first has created but
 not yet filled it.

Good point, yeah, it should do that. But I still think it's rare
enough that just special-casing the error message should be enough...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-06 Thread Michael Beattie
On Fri, Jan 6, 2012 at 6:13 AM, Magnus Hagander mag...@hagander.net wrote:

 On Thu, Jan 5, 2012 at 23:19, Tom Lane t...@sss.pgh.pa.us wrote:
  Magnus Hagander mag...@hagander.net writes:
  On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
  I think link(2) would create race conditions of its own.  I'd be
  inclined to suggest that maybe we should just special-case a zero
 length
  postmaster.pid file as meaning okay to proceed.  In general, garbage
 
  That's pretty much what I meant - but with a warning message.
 
  Actually ... wait a minute.  If we allow that, don't we create a race
  condition between two postmasters starting at almost the same instant?
  The second one could see the lock file when the first has created but
  not yet filled it.

 Good point, yeah, it should do that. But I still think it's rare
 enough that just special-casing the error message should be enough...


so just something that does like

stat(filename, st);
size = st.st_size;
if (size == 0)
   elog(ERROR, lock file \%s\ has 0 length., filename);

somewhere in CreateLockFile in miscinit.c?


[HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Heikki Linnakangas
My laptop ran out of battery and turned itself off while I was just 
starting up postmaster. After plugging in the charger and rebooting, I 
got the following error when I tried to restart PostgreSQL:


FATAL:  bogus data in lock file postmaster.pid: 

postmaster.pid file was present in the data directory, but had zero 
length. Looking at the way the file is created and written, that can 
happen if you crash after the file is created, but before it's 
written/fsync'd (my laptop might have write-cache enabled, which would 
make the window larger).


I was a bit surprised by that. That's probably not a big deal in 
practice, but I wonder if there was some easy way to avoid that. First I 
thought we could create the new postmaster.pid file with a temporary 
name and rename it in place, but rename(2) will merrily overwrite any 
existing file which is not what we want. We could use link(2), I guess.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Magnus Hagander
On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 My laptop ran out of battery and turned itself off while I was just starting
 up postmaster. After plugging in the charger and rebooting, I got the
 following error when I tried to restart PostgreSQL:

 FATAL:  bogus data in lock file postmaster.pid: 

 postmaster.pid file was present in the data directory, but had zero length.
 Looking at the way the file is created and written, that can happen if you
 crash after the file is created, but before it's written/fsync'd (my laptop
 might have write-cache enabled, which would make the window larger).

 I was a bit surprised by that. That's probably not a big deal in practice,
 but I wonder if there was some easy way to avoid that. First I thought we
 could create the new postmaster.pid file with a temporary name and rename it
 in place, but rename(2) will merrily overwrite any existing file which is
 not what we want. We could use link(2), I guess.

Is this really a problem big enough to spend even that much effort on?

Perhaps a special-case in the error message instead is enough?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 8:23 AM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 My laptop ran out of battery and turned itself off while I was just starting
 up postmaster. After plugging in the charger and rebooting, I got the
 following error when I tried to restart PostgreSQL:

 FATAL:  bogus data in lock file postmaster.pid: 

 postmaster.pid file was present in the data directory, but had zero length.
 Looking at the way the file is created and written, that can happen if you
 crash after the file is created, but before it's written/fsync'd (my laptop
 might have write-cache enabled, which would make the window larger).

 I was a bit surprised by that. That's probably not a big deal in practice,
 but I wonder if there was some easy way to avoid that. First I thought we
 could create the new postmaster.pid file with a temporary name and rename it
 in place, but rename(2) will merrily overwrite any existing file which is
 not what we want. We could use link(2), I guess.

 Is this really a problem big enough to spend even that much effort on?

 Perhaps a special-case in the error message instead is enough?

On a laptop it's not a big deal, but it sort of sucks if your
production database fails to start automatically after an unexpected
power outage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 My laptop ran out of battery and turned itself off while I was just 
 starting up postmaster. After plugging in the charger and rebooting, I 
 got the following error when I tried to restart PostgreSQL:

 FATAL:  bogus data in lock file postmaster.pid: 

 postmaster.pid file was present in the data directory, but had zero 
 length. Looking at the way the file is created and written, that can 
 happen if you crash after the file is created, but before it's 
 written/fsync'd (my laptop might have write-cache enabled, which would 
 make the window larger).

 I was a bit surprised by that. That's probably not a big deal in 
 practice, but I wonder if there was some easy way to avoid that. First I 
 thought we could create the new postmaster.pid file with a temporary 
 name and rename it in place, but rename(2) will merrily overwrite any 
 existing file which is not what we want. We could use link(2), I guess.

I think link(2) would create race conditions of its own.  I'd be
inclined to suggest that maybe we should just special-case a zero length
postmaster.pid file as meaning okay to proceed.  In general, garbage
data in postmaster.pid is something I'm happy to insist on manual
recovery from, but maybe we could safely make an exception for empty
files.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Magnus Hagander
On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 My laptop ran out of battery and turned itself off while I was just
 starting up postmaster. After plugging in the charger and rebooting, I
 got the following error when I tried to restart PostgreSQL:

 FATAL:  bogus data in lock file postmaster.pid: 

 postmaster.pid file was present in the data directory, but had zero
 length. Looking at the way the file is created and written, that can
 happen if you crash after the file is created, but before it's
 written/fsync'd (my laptop might have write-cache enabled, which would
 make the window larger).

 I was a bit surprised by that. That's probably not a big deal in
 practice, but I wonder if there was some easy way to avoid that. First I
 thought we could create the new postmaster.pid file with a temporary
 name and rename it in place, but rename(2) will merrily overwrite any
 existing file which is not what we want. We could use link(2), I guess.

 I think link(2) would create race conditions of its own.  I'd be
 inclined to suggest that maybe we should just special-case a zero length
 postmaster.pid file as meaning okay to proceed.  In general, garbage

That's pretty much what I meant - but with a warning message.

 data in postmaster.pid is something I'm happy to insist on manual
 recovery from, but maybe we could safely make an exception for empty
 files.

+1.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
 I think link(2) would create race conditions of its own.  I'd be
 inclined to suggest that maybe we should just special-case a zero length
 postmaster.pid file as meaning okay to proceed.  In general, garbage

 That's pretty much what I meant - but with a warning message.

Actually ... wait a minute.  If we allow that, don't we create a race
condition between two postmasters starting at almost the same instant?
The second one could see the lock file when the first has created but
not yet filled it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers