Re: [HACKERS] file-locking and postmaster.pid

2006-05-25 Thread korry







That's not workable, unless you want to assume that nothing on the
system except Postgres uses SysV semaphores.  Otherwise something else
could randomly gobble up the semid you want to use.  I don't care very
much for requiring a distinct semid to be hand-specified for each
postmaster on a machine, either. 



Yeah, that does suck. Ok, naming problems seem to make semaphores useless.

I'm back to byte-range locking, but if NFS is important and is truly unreliable, then that's out too.

I've never had locking problems on NFS (probably because we tell our users not to use NFS), but now that I think about it, SMB locking is very unreliable so Win32 would be an issue too.

 -- Korry





Re: [HACKERS] file-locking and postmaster.pid

2006-05-25 Thread Andreas Joseph Krogh
On Thursday 25 May 2006 14:35, korry wrote:
  That's not workable, unless you want to assume that nothing on the
  system except Postgres uses SysV semaphores.  Otherwise something else
  could randomly gobble up the semid you want to use.  I don't care very
  much for requiring a distinct semid to be hand-specified for each
  postmaster on a machine, either.

 Yeah, that does suck.  Ok, naming problems seem to make semaphores
 useless.

 I'm back to byte-range locking, but if NFS is important and is truly
 unreliable, then that's out too.

 I've never had locking problems on NFS (probably because we tell our
 users not to use NFS), but now that I think about it, SMB locking is
 very unreliable so Win32 would be an issue too.

What I don't get is why everybody think that because one solution doesn't fit 
all needs on all platforms(or NFS), it shouldn't be implemented on those 
platforms it *does* work on. Why can't those platforms(like Linux) benefit 
from a better solution, if one exists? There are plenty of examples of 
software providing better solutions on platforms supporting more features.

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Hoffsveien 17   | know how to do a thing and to watch |
PO. Box 425 Skøyen  | somebody else doing it wrong, without   |
0213 Oslo   | comment.|
NORWAY  | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] file-locking and postmaster.pid

2006-05-25 Thread Tom Lane
Andreas Joseph Krogh [EMAIL PROTECTED] writes:
 What I don't get is why everybody think that because one solution doesn't fit 
 all needs on all platforms(or NFS), it shouldn't be implemented on those 
 platforms it *does* work on.

(1) Because we're not really interested in supporting multiple fundamentally
different approaches to postmaster interlocking.  The system is
complicated enough already.

(2) Because according to discussion so far, we can't rely on this solution
anywhere.  Postgres can't easily tell whether its data directory is
mounted over NFS, for example.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Andreas Joseph Krogh
On Tuesday 23 May 2006 19:36, Tom Lane wrote:
 Adis Nezirovic [EMAIL PROTECTED] writes:
  Well, maybe you could tweak postgres startup script, add check for post
  master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and
  delete pid file on negative results.

 This is exactly what you should NOT do.

 A start script that thinks it is smarter than the postmaster is almost
 certainly wrong.  It is certainly dangerous, too, because auto-deleting
 that pidfile destroys the interlock against having two postmasters
 running in the same data directory (which WILL corrupt your data,
 quickly and irretrievably).  All it takes to cause a problem is to
 use the start script to start a postmaster, forgetting that you already
 have one running ...

My PG is not started with startup-scripts, but with this command:

pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Hoffsveien 17   | know how to do a thing and to watch |
PO. Box 425 Skøyen  | somebody else doing it wrong, without   |
0213 Oslo   | comment.|
NORWAY  | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Andreas Joseph Krogh
On Wednesday 24 May 2006 11:36, Andreas Joseph Krogh wrote:
 On Tuesday 23 May 2006 19:36, Tom Lane wrote:
  Adis Nezirovic [EMAIL PROTECTED] writes:
   Well, maybe you could tweak postgres startup script, add check for post
   master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'),
   and delete pid file on negative results.
 
  This is exactly what you should NOT do.
 
  A start script that thinks it is smarter than the postmaster is almost
  certainly wrong.  It is certainly dangerous, too, because auto-deleting
  that pidfile destroys the interlock against having two postmasters
  running in the same data directory (which WILL corrupt your data,
  quickly and irretrievably).  All it takes to cause a problem is to
  use the start script to start a postmaster, forgetting that you already
  have one running ...

 My PG is not started with startup-scripts, but with this command:

 pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start

... and manually after login, ie. not at boot-time.

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Hoffsveien 17   | know how to do a thing and to watch |
PO. Box 425 Skøyen  | somebody else doing it wrong, without   |
0213 Oslo   | comment.|
NORWAY  | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Andrej Ricnik-Bay

On 5/24/06, Andreas Joseph Krogh [EMAIL PROTECTED] wrote:


 My PG is not started with startup-scripts, but with this command:

 pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start

... and manually after login, ie. not at boot-time.

I'd suggest trying to fix your Linux-install instead of mucking
about with Postgres, and this really a pgsql-novice question,
not a -hackers thing.


Cheers,
Andrej


--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Andreas Joseph Krogh
On Wednesday 24 May 2006 21:03, korry wrote:
  I'm sure there's a good reason for having it the way it is, having so
  many smart knowledgeable people working on this project. Could someone
  please explain the rationale of the current solution to me?

 We've ignored Andreas' original question.  Why not use a lock to
 indicate that the postmaster is still running?  At first blush, that
 seems more reliable than checking for a (possibly recycled) process ID.

As Tom replied: Portability.

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Hoffsveien 17   | know how to do a thing and to watch |
PO. Box 425 Skøyen  | somebody else doing it wrong, without   |
0213 Oslo   | comment.|
NORWAY  | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Andreas Joseph Krogh
On Wednesday 24 May 2006 20:52, Andrej Ricnik-Bay wrote:
 On 5/24/06, Andreas Joseph Krogh [EMAIL PROTECTED] wrote:
   My PG is not started with startup-scripts, but with this command:
  
   pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start
 
  ... and manually after login, ie. not at boot-time.

 I'd suggest trying to fix your Linux-install instead of mucking
 about with Postgres, and this really a pgsql-novice question,
 not a -hackers thing.

I'm sorry, can't resist, but this has to be *the* dumbest reply to these sort 
of questions. What makes you think it *only* happens when linux freezes(btw, 
I suspect my NVIDIA-driver to be the problem on my laptop, not Linux itself). 
Still - PG *should* handle that situation too, it's like a power outage. I've 
been using Linux exclusively since '96 and PG since 6.5, so I don't consider 
myself a novice in neither. Why PG doesn't use locking *is* definitely 
a -hackers thing.

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Hoffsveien 17   | know how to do a thing and to watch |
PO. Box 425 Skøyen  | somebody else doing it wrong, without   |
0213 Oslo   | comment.|
NORWAY  | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread korry






I'm sure there's a good reason for having it the way it is, having so many 
smart knowledgeable people working on this project. Could someone please 
explain the rationale of the current solution to me?





We've ignored Andreas' original question. Why not use a lock to indicate that the postmaster is still running? At first blush, that seems more reliable than checking for a (possibly recycled) process ID.


 -- Korry





Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread korry






On Wednesday 24 May 2006 21:03, korry wrote:
  I'm sure there's a good reason for having it the way it is, having so
  many smart knowledgeable people working on this project. Could someone
  please explain the rationale of the current solution to me?

 We've ignored Andreas' original question.  Why not use a lock to
 indicate that the postmaster is still running?  At first blush, that
 seems more reliable than checking for a (possibly recycled) process ID.

As Tom replied: Portability.



Thanks - I missed that part of Tom's message. 


The only platform (although certainly not a minor issue) that I can think of that would have a portability issue would be Win32. You can't even read a locked byte in Win32. I usually solve that problem by locking a byte past the end of the file (which is portable).

Is there some other portability issue that I'm missing?


 -- Korry






Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread korry







Certainly on all platforms there must be *some* locking primitive.  We
just need to figure out the appropiate parameters to fcntl() or flock()
or lockf() on each.


Right.





The Win32 API for locking seems mighty strange to me.





Linux/Unix byte locking is advisory (meaning that one lock can block another lock, but it can't block a read). Win32 locking is mandatory (at least in the most portable form) so a lock blocks a reader. To avoid that problem, youlock a byte that you never intend to read (that is, you lock a byte past the end of the file). Locking past the end-of-file is portable to all Unix/Linux systems that I've seen (that way, you can lock a region of a file before you grow the file).

 -- Korry





Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Andrew Dunstan

Alvaro Herrera wrote:

korry wrote:

  

The only platform (although certainly not a minor issue) that I can
think of that would have a portability issue would be Win32. You can't
even read a locked byte in Win32.  I usually solve that problem by
locking a byte past the end of the file (which is portable).



Certainly on all platforms there must be *some* locking primitive.  We
just need to figure out the appropiate parameters to fcntl() or flock()
or lockf() on each.

The Win32 API for locking seems mighty strange to me.

  


We use file locking on Win32 (and on all other platforms)  in the 
buildfarm ... it's done from perl so maybe perl does some magic under 
the hood. The call looks just the same, and works fine on W32, I 
believe. It is roughly:


use Fcntl qw(:flock);
open($lockfile,builder.LCK) || die opening lockfile;
exit(0) unless flock($lockfile,LOCK_EX|LOCK_NB);


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Alvaro Herrera
korry wrote:

  The Win32 API for locking seems mighty strange to me.
 
 Linux/Unix byte locking is advisory (meaning that one lock can block
 another lock, but it can't block a read).

No -- it is advisory meaning that a process that does not try to acquire
the lock is not locked out.  You can certainly block a file in exclusive
mode, using the LOCK_EX flag.  (And at least on my Linux system, there
is mandatory locking too, using the fcntl() interface).

I think the next question is -- how would the lock interface be used?
We could acquire an exclusive lock on postmaster start (to make sure no
backend is running), then reduce it to a shared lock.  Every backend
would inherit the shared lock.  But the lock exchange is not guaranteed
to be atomic so a new postmaster could start just after we acquire the
lock and acquire the shared lock.  It'd need to be complemented with
another lock.

 Win32 locking is mandatory (at least in the most portable form) so a
 lock blocks a reader.

There is also shared/exclusive locking of a file on Win32.  My comment
weas more directed at the fact that you have to create some sort of
lock handle from a file handle and then lock the lock handle, or
something like that.  I don't recall the exact details but it was
strange (as opposed to just open and then flock).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Certainly on all platforms there must be *some* locking primitive.  We
 just need to figure out the appropiate parameters to fcntl() or flock()
 or lockf() on each.

Quite aside from the hassle factor of needing to deal with N variants of
the syscalls, I'm not convinced that it's guaranteed to work.  ISTR that
for instance NFS file locking is pretty much Alice-in-Wonderland :-(

Since the entire point here is to have a guaranteed bulletproof check,
locks that work most of the time on most platforms/filesystems aren't
gonna be an improvement.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Alvaro Herrera
Andrew Dunstan wrote:

 We use file locking on Win32 (and on all other platforms)  in the 
 buildfarm ... it's done from perl so maybe perl does some magic under 
 the hood. The call looks just the same, and works fine on W32, I 
 believe. It is roughly:
 
 use Fcntl qw(:flock);
 open($lockfile,builder.LCK) || die opening lockfile;
 exit(0) unless flock($lockfile,LOCK_EX|LOCK_NB);

flock on Perl is implemented using platform-dependent system calls.  Per
the docs,

   flock FILEHANDLE,OPERATION
   Calls flock(2), or an emulation of it, on FILEHANDLE.  Returns
   true for success, false on failure.  Produces a fatal error if
   used on a machine that doesn't implement flock(2), fcntl(2)
   locking, or lockf(3).  flock is Perl's portable file locking
   interface, although it locks only entire files, not records.

Note that it may fail!  This seems to indicate that some platforms do
not provide either locking mechanism.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Alvaro Herrera
Alvaro Herrera wrote:

 Note that it may fail!  This seems to indicate that some platforms do
 not provide either locking mechanism.

(Which means the whole discussion is a waste of time)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Andrew Dunstan

Alvaro Herrera wrote:

Alvaro Herrera wrote:

  

Note that it may fail!  This seems to indicate that some platforms do
not provide either locking mechanism.



(Which means the whole discussion is a waste of time)

  


Umm, no, I don't think so. It will block instead of failing unless you 
request a non blocking call. Failure means someone else holds the lock.


But what Tom says about NFS is probably true, and a good enough reason 
not to trust locking in general for this purpose, I think


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread korry




On Wed, 2006-05-24 at 16:34 -0400, Alvaro Herrera wrote:


korry wrote:

  The Win32 API for locking seems mighty strange to me.
 
 Linux/Unix byte locking is advisory (meaning that one lock can block
 another lock, but it can't block a read).

No -- it is advisory meaning that a process that does not try to acquire
the lock is not locked out. 



Right, that's why I said can block instead of will block. An advisory lock will only block another locker, not another reader (except in Win32).



You can certainly block a file in exclusive
mode, using the LOCK_EX flag.  (And at least on my Linux system, there
is mandatory locking too, using the fcntl() interface).



My fault - I'm not really talking about file locking, I'm talking about byte-range locking (via lockf() and family). 

I don't believe that you can use byte-range locking to block read-access to a file, you can only use byte-range locking to block other locks.

A simple exclusive lock on the first byte past the end of the file will do. 



I think the next question is -- how would the lock interface be used?
We could acquire an exclusive lock on postmaster start (to make sure no
backend is running), then reduce it to a shared lock.  Every backend
would inherit the shared lock.  But the lock exchange is not guaranteed
to be atomic so a new postmaster could start just after we acquire the
lock and acquire the shared lock.  It'd need to be complemented with
another lock.



You never need to reduce it to a shared lock. On postmaster startup, try to lock the sentinel byte (one byte past the end-of-file). If you can lock it, you know that no other postmaster has that byte locked. If you can't lock it, another postmaster is running. It is an atomic operation. 

However, Tom may be correct about NFS locking, but I guess I'm surprised that anyone would care :-)



 Win32 locking is mandatory (at least in the most portable form) so a
 lock blocks a reader.

There is also shared/exclusive locking of a file on Win32. 



Yes, but Win32 shared locking only works on NTFS-type file systems. And you don't need shared locking anyway.

 -- Korry






Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread korry






Alvaro Herrera [EMAIL PROTECTED] writes:
 Certainly on all platforms there must be *some* locking primitive.  We
 just need to figure out the appropiate parameters to fcntl() or flock()
 or lockf() on each.



I use lockf() (not fcntl() or flock()) on every platform other than Win32. Of course, I may not run on every system that PostgreSQL supports.




Quite aside from the hassle factor of needing to deal with N variants of
the syscalls, I'm not convinced that it's guaranteed to work.  ISTR that
for instance NFS file locking is pretty much Alice-in-Wonderland :-(

Since the entire point here is to have a guaranteed bulletproof check,
locks that work most of the time on most platforms/filesystems aren't
gonna be an improvement.



NFS file locking may certainly be problematic. I don't know about NFS byte-range locking.

What we currently have in place is not bulletproof. I think holding a byte-range lock in addition to the is there some process with the right pid? check might be a little more bullet resistant :-)


 -- Korry





Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Alvaro Herrera
Andrew Dunstan wrote:
 Alvaro Herrera wrote:
 Alvaro Herrera wrote:
 
 Note that it may fail!  This seems to indicate that some platforms do
 not provide either locking mechanism.
 
 (Which means the whole discussion is a waste of time)
 
 Umm, no, I don't think so. It will block instead of failing unless you 
 request a non blocking call. Failure means someone else holds the lock.

I removed the part of the manual I had written which said that it will
raise an error if the platform it's running doesn't have any locking
primitive.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Tom Lane
korry [EMAIL PROTECTED] writes:
 However, Tom may be correct about NFS locking, but I guess I'm surprised
 that anyone would care :-)

Whether we think it's a real good idea or not, *plenty* of people run
databases across NFS.  We can't blow off that set of users.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Alvaro Herrera
korry wrote:

  I think the next question is -- how would the lock interface be used?
  We could acquire an exclusive lock on postmaster start (to make sure no
  backend is running), then reduce it to a shared lock.  Every backend
  would inherit the shared lock.  But the lock exchange is not guaranteed
  to be atomic so a new postmaster could start just after we acquire the
  lock and acquire the shared lock.  It'd need to be complemented with
  another lock.
 
 You never need to reduce it to a shared lock.  On postmaster startup,
 try to lock the sentinel byte (one byte past the end-of-file).  If you
 can lock it, you know that no other postmaster has that byte locked.  If
 you can't lock it, another postmaster is running. It is an atomic
 operation. 

This doesn't work if the postmaster dies but a backend continues to run,
which is arguably the most important case we need to protect against.

 However, Tom may be correct about NFS locking, but I guess I'm surprised
 that anyone would care :-)

Quite a lot of people run NFS-mounted data directories ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread korry






 You never need to reduce it to a shared lock.  On postmaster startup,
 try to lock the sentinel byte (one byte past the end-of-file).  If you
 can lock it, you know that no other postmaster has that byte locked.  If
 you can't lock it, another postmaster is running. It is an atomic
 operation. 

This doesn't work if the postmaster dies but a backend continues to run,
which is arguably the most important case we need to protect against.



I may be confused here, but I don't see the problem - byte-range locks are not inherited across a fork. A backend would never hold the lock, a backend would never even look for the lock.




 However, Tom may be correct about NFS locking, but I guess I'm surprised
 that anyone would care :-)

Quite a lot of people run NFS-mounted data directories ...



I'm happy to take your word for that, and I agree that if NFS is important and locking is brain-dead on NFS, then relying solely on a lock is unacceptable.


 -- Korry





Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Tom Lane
korry [EMAIL PROTECTED] writes:
 Well, it fails in the safe direction: the postmaster may occasionally
 refuse to start when it should, but it won't ever start when it should
 not.  It appears to me that anything relying on file locking will tend
 to fail in the other direction, and that's not acceptable IMHO.

 I was suggesting that we keep the current check in place too - if the
 lock exists, another postmaster must be running, if the lock doesn't
 exist, check the pid.

But then you've not accomplished anything.  The complaints about the
pid-based mechanism are about false positives, not false negatives.
Adding an independent check won't eliminate the false positives.

 How about a semaphore with a SEM_UNDO?  That's guaranteed atomic (or it
 better be :-), the kernel automatically cleans up after a failure, if
 the mechanism fails, it fails in the safe direction (the kernel may not
 have cleaned up the semaphore before a new postmaster starts).  And, I
 think it would be reasonably portable - I haven't carefully eyeballed
 the Win32 semaphore code so I don't know if it supports SEM_UNDO.

We already have two platforms that don't use the SysV semaphore
interface, and even on ones that have it, I wouldn't want to assume they
all support SEM_UNDO.

But aside from any portability issues, ISTM this would have its own
failure modes.  In particular you still have to rely on a pid-file
(only now it's holding a semaphore ID not a PID), and there's still
a bit of a leap of faith required to get from the observation that
somebody is holding a lock on semaphore X to the conclusion that that
somebody is a conflicting postmaster.  It doesn't look to me like this
is any better than the PID solution, really, as far as false positives
go.  As for false negatives: ipcrm.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Alvaro Herrera
korry wrote:
   You never need to reduce it to a shared lock.  On postmaster startup,
   try to lock the sentinel byte (one byte past the end-of-file).  If you
   can lock it, you know that no other postmaster has that byte locked.  If
   you can't lock it, another postmaster is running. It is an atomic
   operation. 
  
  This doesn't work if the postmaster dies but a backend continues to run,
  which is arguably the most important case we need to protect against.
 
 I may be confused here, but I don't see the problem - byte-range locks
 are not inherited across a fork.  A backend would never hold the lock, a
 backend would never even look for the lock.

Well, you are wrong here.  We _want_ every backend to hold a shared
lock.  We need to stop a postmaster from starting if there is a backend
running that was started by a no-longer-running postmaster.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread korry







We already have two platforms that don't use the SysV semaphore
interface, and even on ones that have it, I wouldn't want to assume they
all support SEM_UNDO.


Which platforms, just out of curiousity? I assume that Win32 is one of them.



But aside from any portability issues, ISTM this would have its own
failure modes.  In particular you still have to rely on a pid-file
(only now it's holding a semaphore ID not a PID)


You've lost me... why would you store the semid and not the pid? I was thinking that the semid might be a postgresql.conf thingie.



 and there's still
a bit of a leap of faith required to get from the observation that
somebody is holding a lock on semaphore X to the conclusion that that
somebody is a conflicting postmaster. 


Isn't that sort of like saying that if a postmaster.pid file exists, it must have been written by a postmaster? Pick a semaphore id and dedicate it to postmaster exclusion. 



It doesn't look to me like this
is any better than the PID solution, really, as far as false positives
go. 



As long as the kernel cleans up SEM_UNDO semaphores, I guess I don't see have you would have a false positive. Oh, I guess I should say that is you use a SEM_UNDO semaphore, you don't need the pid check anymore. And, no worry about NFS.



As for false negatives: ipcrm.


Yes, that's a problem, but I think it's the same as rm postmaster.pid, isn't it?





Re: [HACKERS] file-locking and postmaster.pid

2006-05-24 Thread Tom Lane
korry [EMAIL PROTECTED] writes:
 Isn't that sort of like saying that if a postmaster.pid file exists, it
 must have been written by a postmaster?  Pick a semaphore id and
 dedicate it to postmaster exclusion.  

That's not workable, unless you want to assume that nothing on the
system except Postgres uses SysV semaphores.  Otherwise something else
could randomly gobble up the semid you want to use.  I don't care very
much for requiring a distinct semid to be hand-specified for each
postmaster on a machine, either.  At least for my use, that would be a
grade-A PITA: I normally have several postmasters of different vintages
running on the same development machine, and having to configure each
one with its own semid is an extra step I'd rather not deal with.

 As long as the kernel cleans up SEM_UNDO semaphores, I guess I don't see
 have you would have a false positive.

My point was that you couldn't reliably tell a postmaster interested in
a different data directory from a postmaster interested in your own data
directory.  Even with a configured semid, I don't see that that's real
reliable.  I know the first thing I'd do is fix my postmaster start
scripts to specify semid on the command line rather than requiring it
to be in the conf file, and as soon as I do that, the connection to
the data directory is gone :-( --- now my security is utterly dependent
on not screwing up by launching a postmaster with the wrong semid for
the data directory it's pointed at.

The only scenario where the PID-based solution is at serious risk of
false positives is where there are multiple postmasters on the same
machine, so unless you've got a bulletproof answer for this case, you
haven't made an improvement over what we've got.

Anyway the real problem here is that neither PIDs nor semids are
strongly wired to a particular data directory, which is the thing you're
really trying to protect.  File locks would really be much nicer all
around, if we could trust them, because they *would* be directly
connected to a data directory.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] file-locking and postmaster.pid

2006-05-23 Thread Tom Lane
Andreas Joseph Krogh [EMAIL PROTECTED] writes:
 I've experienced several times that PG has died somehow and the 
 postmaster.pid 
 file still exists 'cause PG hasn't had the ability to delete it upon proper 
 shutdown. Upon start-up, after such an incidence, PG tells me another PG is 
 running and that I either have to shut down the other instance, or delete the 
 postmaster.pid file if there really isn't an instance running. This seems 
 totally unnecessary to me.

The postmaster does check to see whether the PID mentioned in the file
is still alive, so it's not that easy for the above to happen.  If you
can provide details of a scenario where a failure is likely, we'd like
to know about it.  Also, what PG version are you talking about?

 Why doesn't PG use file-locking to tell if another 
 PG is running or not?

Portability.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] file-locking and postmaster.pid

2006-05-23 Thread Andreas Joseph Krogh
On Tuesday 23 May 2006 17:54, Tom Lane wrote:
 Andreas Joseph Krogh [EMAIL PROTECTED] writes:
  I've experienced several times that PG has died somehow and the
  postmaster.pid file still exists 'cause PG hasn't had the ability to
  delete it upon proper shutdown. Upon start-up, after such an incidence,
  PG tells me another PG is running and that I either have to shut down the
  other instance, or delete the postmaster.pid file if there really isn't
  an instance running. This seems totally unnecessary to me.

 The postmaster does check to see whether the PID mentioned in the file
 is still alive, so it's not that easy for the above to happen.  If you
 can provide details of a scenario where a failure is likely, we'd like
 to know about it.  Also, what PG version are you talking about?

I have experienced this with PG-8.1.3 and will provide details if I can make 
it happen. Basically it has happened when I have had to hard-reset my 
laptop due to some strange bugs in Linux which have made it hang.

  Why doesn't PG use file-locking to tell if another
  PG is running or not?

 Portability.

Ok.

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Hoffsveien 17   | know how to do a thing and to watch |
PO. Box 425 Skøyen  | somebody else doing it wrong, without   |
0213 Oslo   | comment.|
NORWAY  | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] file-locking and postmaster.pid

2006-05-23 Thread Tom Lane
Andreas Joseph Krogh [EMAIL PROTECTED] writes:
 On Tuesday 23 May 2006 17:54, Tom Lane wrote:
 The postmaster does check to see whether the PID mentioned in the file
 is still alive, so it's not that easy for the above to happen.  If you
 can provide details of a scenario where a failure is likely, we'd like
 to know about it.  Also, what PG version are you talking about?

 I have experienced this with PG-8.1.3 and will provide details if I can make 
 it happen. Basically it has happened when I have had to hard-reset my 
 laptop due to some strange bugs in Linux which have made it hang.

If you're talking about a postmaster that's auto-started during the boot
sequence, then there is a risk depending on what start script you use.
The problem is that depending on what else runs during the system
startup, the PID assigned to the postmaster might be the same as in the
last boot cycle, or it might be different by one or two counts.  The
postmaster disregards a pidfile containing its own PID, or its parent
process' PID, or a PID not belonging to a postgres-owned process.
That covers most cases but if your start script does something like

su -l postgres -c pg_ctl start ...

then you have a situation where not only the parent process (pg_ctl)
but also the grandparent (a shell) is postgres-owned, and if the pidfile
PID happens to match the grandparent then you lose.  Solution is to
either not use pg_ctl here, or write exec pg_ctl start ..., so that
there's only one postgres-owned process besides the postmaster itself.

Initscripts published by PGDG itself and by Red Hat have gotten this
right for awhile, but I suspect the word has not propagated to all
distros.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] file-locking and postmaster.pid

2006-05-23 Thread Adis Nezirovic
On Tue, May 23, 2006 at 05:23:16PM +0200, Andreas Joseph Krogh wrote:
 Hi all.
 
 I've experienced several times that PG has died somehow and the 
 postmaster.pid 
 file still exists 'cause PG hasn't had the ability to delete it upon proper 
 shutdown. Upon start-up, after such an incidence, PG tells me another PG is 
 running and that I either have to shut down the other instance, or delete the 
 postmaster.pid file if there really isn't an instance running. This seems 
 totally unnecessary to me. Why doesn't PG use file-locking to tell if another 
 PG is running or not? If PG holds an exclusive-lock on the pid-file and the 
 process crashes, or shuts down, then the lock(which is process-based and 
 controlled by the kernel) will be removed and another PG which tries to start 
 up can detect that. Using the existence of the pid-file as the only evidence 
 gives too many false positives IMO.

Well, maybe you could tweak postgres startup script, add check for post
master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and
delete pid file on negative results.

i.e.

#!/bin/bash
PID=`pgrep -f /usr/bin/postmaster`;

if [[ $PID ]]; then
echo '$PID';
# postgres is already running
else
echo Postmaster is not running;
# delete stale PID file
fi


pgpBL3yb1NFGM.pgp
Description: PGP signature


Re: [HACKERS] file-locking and postmaster.pid

2006-05-23 Thread Tom Lane
Adis Nezirovic [EMAIL PROTECTED] writes:
 Well, maybe you could tweak postgres startup script, add check for post
 master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and
 delete pid file on negative results.

This is exactly what you should NOT do.

A start script that thinks it is smarter than the postmaster is almost
certainly wrong.  It is certainly dangerous, too, because auto-deleting
that pidfile destroys the interlock against having two postmasters
running in the same data directory (which WILL corrupt your data,
quickly and irretrievably).  All it takes to cause a problem is to
use the start script to start a postmaster, forgetting that you already
have one running ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] file-locking and postmaster.pid

2006-05-23 Thread Adis Nezirovic
On Tue, May 23, 2006 at 01:36:41PM -0400, Tom Lane wrote:
 This is exactly what you should NOT do.
 
 A start script that thinks it is smarter than the postmaster is almost
 certainly wrong.  It is certainly dangerous, too, because auto-deleting
 that pidfile destroys the interlock against having two postmasters
 running in the same data directory (which WILL corrupt your data,
 quickly and irretrievably).  All it takes to cause a problem is to
 use the start script to start a postmaster, forgetting that you already
 have one running ...

I do agree with you that we should not play games with postmaster.
Better to be safe than sorry. (So, manually deleting pid file is the
only safe option). I was just suggestion (possibly dangerous)
workaround.

Btw, I do check for running postmaster, using full path (I don't wan to
kill every postmaster on the system), is this safe? Or there could be
race condition?


pgpVhuJZOzXtM.pgp
Description: PGP signature