subject:"Re\: \[HACKERS\] file\-locking and postmaster.pid"

On Tuesday 23 May 2006 19:36, Tom Lane wrote:
 Adis Nezirovic [EMAIL PROTECTED] writes:
  Well, maybe you could tweak postgres startup script, add check for post
  master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and
  delete pid file on negative results.

 This is exactly what you should NOT do.

 A start script that thinks it is smarter than the postmaster is almost
 certainly wrong.  It is certainly dangerous, too, because auto-deleting
 that pidfile destroys the interlock against having two postmasters
 running in the same data directory (which WILL corrupt your data,
 quickly and irretrievably).  All it takes to cause a problem is to
 use the start script to start a postmaster, forgetting that you already
 have one running ...

My PG is not started with startup-scripts, but with this command:

pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Hoffsveien 17   | know how to do a thing and to watch |
PO. Box 425 Skøyen  | somebody else doing it wrong, without   |
0213 Oslo   | comment.|
NORWAY  | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] file-locking and postmaster.pid

On Wednesday 24 May 2006 11:36, Andreas Joseph Krogh wrote:
 On Tuesday 23 May 2006 19:36, Tom Lane wrote:
  Adis Nezirovic [EMAIL PROTECTED] writes:
   Well, maybe you could tweak postgres startup script, add check for post
   master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'),
   and delete pid file on negative results.
 
  This is exactly what you should NOT do.
 
  A start script that thinks it is smarter than the postmaster is almost
  certainly wrong.  It is certainly dangerous, too, because auto-deleting
  that pidfile destroys the interlock against having two postmasters
  running in the same data directory (which WILL corrupt your data,
  quickly and irretrievably).  All it takes to cause a problem is to
  use the start script to start a postmaster, forgetting that you already
  have one running ...

 My PG is not started with startup-scripts, but with this command:

 pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start

... and manually after login, ie. not at boot-time.

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Hoffsveien 17   | know how to do a thing and to watch |
PO. Box 425 Skøyen  | somebody else doing it wrong, without   |
0213 Oslo   | comment.|
NORWAY  | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Andrej Ricnik-Bay


On 5/24/06, Andreas Joseph Krogh [EMAIL PROTECTED] wrote:


 My PG is not started with startup-scripts, but with this command:

 pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start

... and manually after login, ie. not at boot-time.

I'd suggest trying to fix your Linux-install instead of mucking
about with Postgres, and this really a pgsql-novice question,
not a -hackers thing.


Cheers,
Andrej


--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] file-locking and postmaster.pid

On Wednesday 24 May 2006 21:03, korry wrote:
  I'm sure there's a good reason for having it the way it is, having so
  many smart knowledgeable people working on this project. Could someone
  please explain the rationale of the current solution to me?

 We've ignored Andreas' original question.  Why not use a lock to
 indicate that the postmaster is still running?  At first blush, that
 seems more reliable than checking for a (possibly recycled) process ID.

As Tom replied: Portability.

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Hoffsveien 17   | know how to do a thing and to watch |
PO. Box 425 Skøyen  | somebody else doing it wrong, without   |
0213 Oslo   | comment.|
NORWAY  | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] file-locking and postmaster.pid

On Wednesday 24 May 2006 20:52, Andrej Ricnik-Bay wrote:
 On 5/24/06, Andreas Joseph Krogh [EMAIL PROTECTED] wrote:
   My PG is not started with startup-scripts, but with this command:
  
   pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start
 
  ... and manually after login, ie. not at boot-time.

 I'd suggest trying to fix your Linux-install instead of mucking
 about with Postgres, and this really a pgsql-novice question,
 not a -hackers thing.

I'm sorry, can't resist, but this has to be *the* dumbest reply to these sort 
of questions. What makes you think it *only* happens when linux freezes(btw, 
I suspect my NVIDIA-driver to be the problem on my laptop, not Linux itself). 
Still - PG *should* handle that situation too, it's like a power outage. I've 
been using Linux exclusively since '96 and PG since 6.5, so I don't consider 
myself a novice in neither. Why PG doesn't use locking *is* definitely 
a -hackers thing.

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Hoffsveien 17   | know how to do a thing and to watch |
PO. Box 425 Skøyen  | somebody else doing it wrong, without   |
0213 Oslo   | comment.|
NORWAY  | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] file-locking and postmaster.pid







I'm sure there's a good reason for having it the way it is, having so many 
smart knowledgeable people working on this project. Could someone please 
explain the rationale of the current solution to me?





We've ignored Andreas' original question. Why not use a lock to indicate that the postmaster is still running? At first blush, that seems more reliable than checking for a (possibly recycled) process ID.


 -- Korry

Re: [HACKERS] file-locking and postmaster.pid







On Wednesday 24 May 2006 21:03, korry wrote:
  I'm sure there's a good reason for having it the way it is, having so
  many smart knowledgeable people working on this project. Could someone
  please explain the rationale of the current solution to me?

 We've ignored Andreas' original question.  Why not use a lock to
 indicate that the postmaster is still running?  At first blush, that
 seems more reliable than checking for a (possibly recycled) process ID.

As Tom replied: Portability.



Thanks - I missed that part of Tom's message. 


The only platform (although certainly not a minor issue) that I can think of that would have a portability issue would be Win32. You can't even read a locked byte in Win32. I usually solve that problem by locking a byte past the end of the file (which is portable).

Is there some other portability issue that I'm missing?


 -- Korry

Re: [HACKERS] file-locking and postmaster.pid








Certainly on all platforms there must be *some* locking primitive.  We
just need to figure out the appropiate parameters to fcntl() or flock()
or lockf() on each.


Right.





The Win32 API for locking seems mighty strange to me.





Linux/Unix byte locking is advisory (meaning that one lock can block another lock, but it can't block a read). Win32 locking is mandatory (at least in the most portable form) so a lock blocks a reader. To avoid that problem, youlock a byte that you never intend to read (that is, you lock a byte past the end of the file). Locking past the end-of-file is portable to all Unix/Linux systems that I've seen (that way, you can lock a region of a file before you grow the file).

 -- Korry

Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Andrew Dunstan


Alvaro Herrera wrote:

korry wrote:

  

The only platform (although certainly not a minor issue) that I can
think of that would have a portability issue would be Win32. You can't
even read a locked byte in Win32.  I usually solve that problem by
locking a byte past the end of the file (which is portable).



Certainly on all platforms there must be *some* locking primitive.  We
just need to figure out the appropiate parameters to fcntl() or flock()
or lockf() on each.

The Win32 API for locking seems mighty strange to me.

  


We use file locking on Win32 (and on all other platforms)  in the 
buildfarm ... it's done from perl so maybe perl does some magic under 
the hood. The call looks just the same, and works fine on W32, I 
believe. It is roughly:


use Fcntl qw(:flock);
open($lockfile,builder.LCK) || die opening lockfile;
exit(0) unless flock($lockfile,LOCK_EX|LOCK_NB);


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] file-locking and postmaster.pid

korry wrote:

  The Win32 API for locking seems mighty strange to me.
 
 Linux/Unix byte locking is advisory (meaning that one lock can block
 another lock, but it can't block a read).

No -- it is advisory meaning that a process that does not try to acquire
the lock is not locked out.  You can certainly block a file in exclusive
mode, using the LOCK_EX flag.  (And at least on my Linux system, there
is mandatory locking too, using the fcntl() interface).

I think the next question is -- how would the lock interface be used?
We could acquire an exclusive lock on postmaster start (to make sure no
backend is running), then reduce it to a shared lock.  Every backend
would inherit the shared lock.  But the lock exchange is not guaranteed
to be atomic so a new postmaster could start just after we acquire the
lock and acquire the shared lock.  It'd need to be complemented with
another lock.

 Win32 locking is mandatory (at least in the most portable form) so a
 lock blocks a reader.

There is also shared/exclusive locking of a file on Win32.  My comment
weas more directed at the fact that you have to create some sort of
lock handle from a file handle and then lock the lock handle, or
something like that.  I don't recall the exact details but it was
strange (as opposed to just open and then flock).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] file-locking and postmaster.pid

Alvaro Herrera [EMAIL PROTECTED] writes:
 Certainly on all platforms there must be *some* locking primitive.  We
 just need to figure out the appropiate parameters to fcntl() or flock()
 or lockf() on each.

Quite aside from the hassle factor of needing to deal with N variants of
the syscalls, I'm not convinced that it's guaranteed to work.  ISTR that
for instance NFS file locking is pretty much Alice-in-Wonderland :-(

Since the entire point here is to have a guaranteed bulletproof check,
locks that work most of the time on most platforms/filesystems aren't
gonna be an improvement.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] file-locking and postmaster.pid

Andrew Dunstan wrote:

 We use file locking on Win32 (and on all other platforms)  in the 
 buildfarm ... it's done from perl so maybe perl does some magic under 
 the hood. The call looks just the same, and works fine on W32, I 
 believe. It is roughly:
 
 use Fcntl qw(:flock);
 open($lockfile,builder.LCK) || die opening lockfile;
 exit(0) unless flock($lockfile,LOCK_EX|LOCK_NB);

flock on Perl is implemented using platform-dependent system calls.  Per
the docs,

   flock FILEHANDLE,OPERATION
   Calls flock(2), or an emulation of it, on FILEHANDLE.  Returns
   true for success, false on failure.  Produces a fatal error if
   used on a machine that doesn't implement flock(2), fcntl(2)
   locking, or lockf(3).  flock is Perl's portable file locking
   interface, although it locks only entire files, not records.

Note that it may fail!  This seems to indicate that some platforms do
not provide either locking mechanism.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] file-locking and postmaster.pid

Alvaro Herrera wrote:

 Note that it may fail!  This seems to indicate that some platforms do
 not provide either locking mechanism.

(Which means the whole discussion is a waste of time)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Andrew Dunstan


Alvaro Herrera wrote:

Alvaro Herrera wrote:

  

Note that it may fail!  This seems to indicate that some platforms do
not provide either locking mechanism.



(Which means the whole discussion is a waste of time)

  


Umm, no, I don't think so. It will block instead of failing unless you 
request a non blocking call. Failure means someone else holds the lock.


But what Tom says about NFS is probably true, and a good enough reason 
not to trust locking in general for this purpose, I think


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] file-locking and postmaster.pid

On Wed, 2006-05-24 at 16:34 -0400, Alvaro Herrera wrote:

korry wrote:

The Win32 API for locking seems mighty strange to me.

Linux/Unix byte locking is advisory (meaning that one lock can block
another lock, but it can't block a read).

No -- it is advisory meaning that a process that does not try to acquire
the lock is not locked out.

Right, that's why I said can block instead of will block. An advisory lock will only block another locker, not another reader (except in Win32).

You can certainly block a file in exclusive
mode, using the LOCK_EX flag. (And at least on my Linux system, there
is mandatory locking too, using the fcntl() interface).

My fault - I'm not really talking about file locking, I'm talking about byte-range locking (via lockf() and family).

I don't believe that you can use byte-range locking to block read-access to a file, you can only use byte-range locking to block other locks.

A simple exclusive lock on the first byte past the end of the file will do.

I think the next question is -- how would the lock interface be used?
We could acquire an exclusive lock on postmaster start (to make sure no
backend is running), then reduce it to a shared lock. Every backend
would inherit the shared lock. But the lock exchange is not guaranteed
to be atomic so a new postmaster could start just after we acquire the
lock and acquire the shared lock. It'd need to be complemented with
another lock.

You never need to reduce it to a shared lock. On postmaster startup, try to lock the sentinel byte (one byte past the end-of-file). If you can lock it, you know that no other postmaster has that byte locked. If you can't lock it, another postmaster is running. It is an atomic operation.

However, Tom may be correct about NFS locking, but I guess I'm surprised that anyone would care :-)

Win32 locking is mandatory (at least in the most portable form) so a
lock blocks a reader.

There is also shared/exclusive locking of a file on Win32.

Yes, but Win32 shared locking only works on NTFS-type file systems. And you don't need shared locking anyway.

-- Korry

Re: [HACKERS] file-locking and postmaster.pid







Alvaro Herrera [EMAIL PROTECTED] writes:
 Certainly on all platforms there must be *some* locking primitive.  We
 just need to figure out the appropiate parameters to fcntl() or flock()
 or lockf() on each.



I use lockf() (not fcntl() or flock()) on every platform other than Win32. Of course, I may not run on every system that PostgreSQL supports.




Quite aside from the hassle factor of needing to deal with N variants of
the syscalls, I'm not convinced that it's guaranteed to work.  ISTR that
for instance NFS file locking is pretty much Alice-in-Wonderland :-(

Since the entire point here is to have a guaranteed bulletproof check,
locks that work most of the time on most platforms/filesystems aren't
gonna be an improvement.



NFS file locking may certainly be problematic. I don't know about NFS byte-range locking.

What we currently have in place is not bulletproof. I think holding a byte-range lock in addition to the is there some process with the right pid? check might be a little more bullet resistant :-)


 -- Korry

Re: [HACKERS] file-locking and postmaster.pid

Andrew Dunstan wrote:
 Alvaro Herrera wrote:
 Alvaro Herrera wrote:
 
 Note that it may fail!  This seems to indicate that some platforms do
 not provide either locking mechanism.
 
 (Which means the whole discussion is a waste of time)
 
 Umm, no, I don't think so. It will block instead of failing unless you 
 request a non blocking call. Failure means someone else holds the lock.

I removed the part of the manual I had written which said that it will
raise an error if the platform it's running doesn't have any locking
primitive.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] file-locking and postmaster.pid

korry [EMAIL PROTECTED] writes:
 However, Tom may be correct about NFS locking, but I guess I'm surprised
 that anyone would care :-)

Whether we think it's a real good idea or not, *plenty* of people run
databases across NFS.  We can't blow off that set of users.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] file-locking and postmaster.pid

korry wrote:

  I think the next question is -- how would the lock interface be used?
  We could acquire an exclusive lock on postmaster start (to make sure no
  backend is running), then reduce it to a shared lock.  Every backend
  would inherit the shared lock.  But the lock exchange is not guaranteed
  to be atomic so a new postmaster could start just after we acquire the
  lock and acquire the shared lock.  It'd need to be complemented with
  another lock.
 
 You never need to reduce it to a shared lock.  On postmaster startup,
 try to lock the sentinel byte (one byte past the end-of-file).  If you
 can lock it, you know that no other postmaster has that byte locked.  If
 you can't lock it, another postmaster is running. It is an atomic
 operation. 

This doesn't work if the postmaster dies but a backend continues to run,
which is arguably the most important case we need to protect against.

 However, Tom may be correct about NFS locking, but I guess I'm surprised
 that anyone would care :-)

Quite a lot of people run NFS-mounted data directories ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] file-locking and postmaster.pid







 You never need to reduce it to a shared lock.  On postmaster startup,
 try to lock the sentinel byte (one byte past the end-of-file).  If you
 can lock it, you know that no other postmaster has that byte locked.  If
 you can't lock it, another postmaster is running. It is an atomic
 operation. 

This doesn't work if the postmaster dies but a backend continues to run,
which is arguably the most important case we need to protect against.



I may be confused here, but I don't see the problem - byte-range locks are not inherited across a fork. A backend would never hold the lock, a backend would never even look for the lock.




 However, Tom may be correct about NFS locking, but I guess I'm surprised
 that anyone would care :-)

Quite a lot of people run NFS-mounted data directories ...



I'm happy to take your word for that, and I agree that if NFS is important and locking is brain-dead on NFS, then relying solely on a lock is unacceptable.


 -- Korry

Re: [HACKERS] file-locking and postmaster.pid

korry [EMAIL PROTECTED] writes:
 Well, it fails in the safe direction: the postmaster may occasionally
 refuse to start when it should, but it won't ever start when it should
 not.  It appears to me that anything relying on file locking will tend
 to fail in the other direction, and that's not acceptable IMHO.

 I was suggesting that we keep the current check in place too - if the
 lock exists, another postmaster must be running, if the lock doesn't
 exist, check the pid.

But then you've not accomplished anything.  The complaints about the
pid-based mechanism are about false positives, not false negatives.
Adding an independent check won't eliminate the false positives.

 How about a semaphore with a SEM_UNDO?  That's guaranteed atomic (or it
 better be :-), the kernel automatically cleans up after a failure, if
 the mechanism fails, it fails in the safe direction (the kernel may not
 have cleaned up the semaphore before a new postmaster starts).  And, I
 think it would be reasonably portable - I haven't carefully eyeballed
 the Win32 semaphore code so I don't know if it supports SEM_UNDO.

We already have two platforms that don't use the SysV semaphore
interface, and even on ones that have it, I wouldn't want to assume they
all support SEM_UNDO.

But aside from any portability issues, ISTM this would have its own
failure modes.  In particular you still have to rely on a pid-file
(only now it's holding a semaphore ID not a PID), and there's still
a bit of a leap of faith required to get from the observation that
somebody is holding a lock on semaphore X to the conclusion that that
somebody is a conflicting postmaster.  It doesn't look to me like this
is any better than the PID solution, really, as far as false positives
go.  As for false negatives: ipcrm.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] file-locking and postmaster.pid

korry wrote:
   You never need to reduce it to a shared lock.  On postmaster startup,
   try to lock the sentinel byte (one byte past the end-of-file).  If you
   can lock it, you know that no other postmaster has that byte locked.  If
   you can't lock it, another postmaster is running. It is an atomic
   operation. 
  
  This doesn't work if the postmaster dies but a backend continues to run,
  which is arguably the most important case we need to protect against.
 
 I may be confused here, but I don't see the problem - byte-range locks
 are not inherited across a fork.  A backend would never hold the lock, a
 backend would never even look for the lock.

Well, you are wrong here.  We _want_ every backend to hold a shared
lock.  We need to stop a postmaster from starting if there is a backend
running that was started by a no-longer-running postmaster.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] file-locking and postmaster.pid








We already have two platforms that don't use the SysV semaphore
interface, and even on ones that have it, I wouldn't want to assume they
all support SEM_UNDO.


Which platforms, just out of curiousity? I assume that Win32 is one of them.



But aside from any portability issues, ISTM this would have its own
failure modes.  In particular you still have to rely on a pid-file
(only now it's holding a semaphore ID not a PID)


You've lost me... why would you store the semid and not the pid? I was thinking that the semid might be a postgresql.conf thingie.



 and there's still
a bit of a leap of faith required to get from the observation that
somebody is holding a lock on semaphore X to the conclusion that that
somebody is a conflicting postmaster. 


Isn't that sort of like saying that if a postmaster.pid file exists, it must have been written by a postmaster? Pick a semaphore id and dedicate it to postmaster exclusion. 



It doesn't look to me like this
is any better than the PID solution, really, as far as false positives
go. 



As long as the kernel cleans up SEM_UNDO semaphores, I guess I don't see have you would have a false positive. Oh, I guess I should say that is you use a SEM_UNDO semaphore, you don't need the pid check anymore. And, no worry about NFS.



As for false negatives: ipcrm.


Yes, that's a problem, but I think it's the same as rm postmaster.pid, isn't it?

Re: [HACKERS] file-locking and postmaster.pid