Re: [PATCHES] POSIX shared memory support

2007-03-24 Thread Bruce Momjian

Newest version added:

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Chris Marcellino wrote:
 In case you haven't had enough, here is another version of the code  
 to make Postgres use POSIX shared memory. Along with the issues that  
 have already been addressed, this version ensures that orphaned  
 backends are not in the database when restarting Postgres by using a  
 single 1 byte SysV segment to see who is attached to the segment  
 using shmctl/IPC_STAT/nattach.
 
 This effectively frees Postgres from the SHMMAX and SHMALL limits.  
 Since this still takes one SysV segment, SHMMNI can still be reached  
 on most platforms if a ton of databases are opened simultaneously  
 (i.e. 32 on Mac OS X, 256 on Linux and Solaris).
 
 If you have the need to ship a product with Postgres embedded in it  
 and are unable to change kernel settings (like myself), this might be  
 of use to you. I have tested all of the failure situations I could  
 think of by various combinations of deleting lockfiles while in use,  
 changing the PID inside the lockfile and trying to restart and run  
 more than one postmaster simultaneously.
 
 Of course, this since this requires both POSIX and SysV shared  
 memory, this doesn't increase the portability of Postgres which might  
 make it less appropriate for mass distribution; I thought I would put  
 it out there for any feedback either way.
 
 Thanks again,
 Chris Marcellino
 

[ Attachment, skipping... ]

 
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] POSIX shared memory support

2007-03-04 Thread Chris Marcellino
AFAIK, the only systems supported by Postgres that this patch won't  
work on are NetBSD and OpenBSD.


The POSIX calls free the user from the SHMMAX and SHMALL limitations  
of the SysV shared memory calls on platforms that support it. Since  
this still takes one SysV segment, SHMMNI can still be reached on  
some platforms if a ton of databases are opened simultaneously (i.e.  
256 on Linux and Solaris, 100 on SCO Unix, 512 on HP-UX, 32 on Mac OS  
X, unlimited on FreeBSD). This is the case without the patch anyhow.


Chris Marcellino



On Mar 3, 2007, at 9:09 AM, Joshua D. Drake wrote:



If you have the need to ship a product with Postgres embedded in  
it and
are unable to change kernel settings (like myself), this might be  
of use
to you. I have tested all of the failure situations I could think  
of by

various combinations of deleting lockfiles while in use, changing the
PID inside the lockfile and trying to restart and run more than one
postmaster simultaneously.

Of course, this since this requires both POSIX and SysV shared  
memory,

this doesn't increase the portability of Postgres which might make it
less appropriate for mass distribution; I thought I would put it out
there for any feedback either way.


Well that depends, what systems don't use (or have) POSIX shared  
memory?

This sounds very interesting to me. Oddly enough I went to do some
digging on what various differences and I came up with:

http://www.nabble.com/POSIX-shared-memory-support-t3298386.html

Which happens to be you ;)

Sincerely,

Joshua D. Drake





Thanks again,
Chris Marcellino




- 
---



---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster



--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/ 
donate

PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of  
broadcast)---

TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] POSIX shared memory support

2007-03-03 Thread Joshua D. Drake

 If you have the need to ship a product with Postgres embedded in it and
 are unable to change kernel settings (like myself), this might be of use
 to you. I have tested all of the failure situations I could think of by
 various combinations of deleting lockfiles while in use, changing the
 PID inside the lockfile and trying to restart and run more than one
 postmaster simultaneously.
 
 Of course, this since this requires both POSIX and SysV shared memory,
 this doesn't increase the portability of Postgres which might make it
 less appropriate for mass distribution; I thought I would put it out
 there for any feedback either way.

Well that depends, what systems don't use (or have) POSIX shared memory?
This sounds very interesting to me. Oddly enough I went to do some
digging on what various differences and I came up with:

http://www.nabble.com/POSIX-shared-memory-support-t3298386.html

Which happens to be you ;)

Sincerely,

Joshua D. Drake



 
 Thanks again,
 Chris Marcellino
 
 
 
 
 
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] POSIX shared memory support

2007-02-27 Thread Magnus Hagander
On Mon, Feb 26, 2007 at 09:00:09PM -0800, Chris Marcellino wrote:

 
 There is also a Windows version of this patch included, which can  
 replace the current SysV-to-Win32 shared memory
 layer as it currently does not check for orphaned backends in the  
 database. If this is used,
 src/backend/port/win32/shmem.c and its makefile reference can be  
 removed.

This code is pretty close to the one I've been working on. Didn't
reliase you would be working on a win32 specific version, thought you
were doing POSIX only :-O

Anyway. My version does not use hashing of the name, it just puts the
name all the way in there. What are peoples feeling about that - win32
supports 32768 characters (though those are UTF16, so maybe the real
value is 16384 - still, if someone has his data directory that deep down
in a path, he deserves not to have it work). Should we be using hashing
or just plaintext there? I can see the argument for plaintext being that
you can then see in process explorer what data directory a backend is
connected to. But I'm open to arguments as to why a hash would be better
:-)


While looking at this, I noticed that Windows 2003 SP1 adds a parameter
SEC_LARGE_PAGES that enables large pages to be used when mapping images
or backing from the pagefile. I would assume this is for performance
;-) Does anybody know anything more about this? Worth testing out to see
if it increases performance?


 The patches are available here (the postings were being silently  
 dropped when I attached this large of a file):

Actually, I don't think it's becauseo f the size of the file. Last time
I checked we still had a configuration which had the lists silently drop
certain file extensions, including .tar, .zip and .tar.gz. I've had
several patches dropped that way. I've had no confirmation it has been
fixed, so I'm assuming it's still a problem.

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] POSIX shared memory support

2007-02-27 Thread Chris Marcellino
The Win32 version didn't materialize until very recently. The Win32  
calls are similar semantically to the POSIX ones, so it was somewhat  
straightforward.


Plaintext is nice if you can fit it, since Windows permits you to  
have slashes and all sorts of other non-filename characters in them,  
unlike POSIX shmem. Also for POSIX, certain platforms (ahem Darwin  
has 30 chars) have very small segment name limits. As Tom said, the  
renaming-while-running issue might be a deal breaker, but I'm not  
sure that is a problem on Windows. It sounds like you are on the  
right track.


I just mentioned this to Tom, but a solution for POSIX might be to  
use the device and inode combination for the data directory, since  
that is constant across renames and is more certain to be unique.  
Like replacing the GenerateIPCName() function with something this:


static void
GenerateIPCName(char destIPCName[IPCSegNameLength])
{
struct  stat statbuf;

/* Get the data directory's device and inode */
if (stat(DataDir, statbuf)  0)
ereport(FATAL,
(errcode_for_file_access(),
 errmsg(could not stat data directory \%s\: 
%m,
DataDir)));

/*
 * POSIX requires that shared memory names begin with a single slash.
 * They should not have any others slashes or any non-alphanumerics to
 * maintain the broadest assumption of what is permitted in a filename.
 * Also, case sensitivity should not be presumed.
 */
snprintf(destIPCName, IPCSegNameLength, /PostgreSQL.%jx.%jx,
 (intmax_t) statbuf.st_dev, (intmax_t) statbuf.st_ino);
}


Does Windows have a method to get a unique ID number for a given data  
directory, or a token file in that directory? It would need to be  
constant while the database is open. Perhaps  
GetFileInformationByHandle? It returns a struct with a nFileIndex  
value that seems to be that, although I'm not certain.
This might make it easier to avoid the complexity of fitting the  
filename in the segment name, and avoid the rename problem,


Thanks,
Chris Marcellino




On Feb 27, 2007, at 12:56 AM, Magnus Hagander wrote:


On Mon, Feb 26, 2007 at 09:00:09PM -0800, Chris Marcellino wrote:



There is also a Windows version of this patch included, which can
replace the current SysV-to-Win32 shared memory
layer as it currently does not check for orphaned backends in the
database. If this is used,
src/backend/port/win32/shmem.c and its makefile reference can be
removed.


This code is pretty close to the one I've been working on. Didn't
reliase you would be working on a win32 specific version, thought you
were doing POSIX only :-O

Anyway. My version does not use hashing of the name, it just puts the
name all the way in there. What are peoples feeling about that - win32
supports 32768 characters (though those are UTF16, so maybe the real
value is 16384 - still, if someone has his data directory that deep  
down
in a path, he deserves not to have it work). Should we be using  
hashing
or just plaintext there? I can see the argument for plaintext being  
that

you can then see in process explorer what data directory a backend is
connected to. But I'm open to arguments as to why a hash would be  
better

:-)


While looking at this, I noticed that Windows 2003 SP1 adds a  
parameter
SEC_LARGE_PAGES that enables large pages to be used when mapping  
images

or backing from the pagefile. I would assume this is for performance
;-) Does anybody know anything more about this? Worth testing out  
to see

if it increases performance?



The patches are available here (the postings were being silently
dropped when I attached this large of a file):


Actually, I don't think it's becauseo f the size of the file. Last  
time
I checked we still had a configuration which had the lists silently  
drop

certain file extensions, including .tar, .zip and .tar.gz. I've had
several patches dropped that way. I've had no confirmation it has been
fixed, so I'm assuming it's still a problem.

//Magnus



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] POSIX shared memory support

2007-02-27 Thread Magnus Hagander
On Tue, Feb 27, 2007 at 01:09:46AM -0800, Chris Marcellino wrote:
 The Win32 version didn't materialize until very recently. The Win32  
 calls are similar semantically to the POSIX ones, so it was somewhat  
 straightforward.
 
 Plaintext is nice if you can fit it, since Windows permits you to  
 have slashes and all sorts of other non-filename characters in them,  
 unlike POSIX shmem. Also for POSIX, certain platforms (ahem Darwin  
 has 30 chars) have very small segment name limits. As Tom said, the  
 renaming-while-running issue might be a deal breaker, but I'm not  
 sure that is a problem on Windows. It sounds like you are on the  
 right track.

This cannot happen on windows - you cannot rename a directory which
someone has open files in. The infamous sharing violation error.

 Does Windows have a method to get a unique ID number for a given data  
 directory, or a token file in that directory? It would need to be  
 constant while the database is open. Perhaps  
 GetFileInformationByHandle? It returns a struct with a nFileIndex  
 value that seems to be that, although I'm not certain.
 This might make it easier to avoid the complexity of fitting the  
 filename in the segment name, and avoid the rename problem,

Yes, you could use the fileindex value. You need that one and the volume
serial number, total of 64+32 bits of data.

So yeah, we cuold use that instead of the full path name if we want to.
The advantage of this one is that it's shorter, the advantage of the
full path name is that you can see where the backend is from.

However, in most cases you will be able to see where the backend is from
anyway, because it is likely to have some other file open in the data
directory, so maybe that isn't such a big point after all?

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] POSIX shared memory support

2007-02-27 Thread Magnus Hagander
On Tue, Feb 27, 2007 at 10:30:15AM +0100, Magnus Hagander wrote:
  Does Windows have a method to get a unique ID number for a given data  
  directory, or a token file in that directory? It would need to be  
  constant while the database is open. Perhaps  
  GetFileInformationByHandle? It returns a struct with a nFileIndex  
  value that seems to be that, although I'm not certain.
  This might make it easier to avoid the complexity of fitting the  
  filename in the segment name, and avoid the rename problem,
 
 Yes, you could use the fileindex value. You need that one and the volume
 serial number, total of 64+32 bits of data.
 
 So yeah, we cuold use that instead of the full path name if we want to.
 The advantage of this one is that it's shorter, the advantage of the
 full path name is that you can see where the backend is from.
 
 However, in most cases you will be able to see where the backend is from
 anyway, because it is likely to have some other file open in the data
 directory, so maybe that isn't such a big point after all?

Actually, I'm not sure we can. It's only stable as long as someone has
the file open. It will change if it's closed and re-opened later.

Given that we don't actually open the directory, and only files inside
it, I don't know how that works.

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] POSIX shared memory support

2007-02-27 Thread Chris Marcellino
I believe that all we need is the ID to be constant and unique while  
the postmaster or its associated backends are running. If anything  
from a given generation has the database open, it will remain  
constant before any new process can connect to it successfully. Would  
it be feasible to lookup the ID of an important file in the DataDir?


As far as the POSIX version goes, here is an updated patch. I changed  
it to use the inode/device ID instead of the filename to avoid the  
renaming cases. It also no longer needs the more clunky than  
necessary hash stuff.





configure.in.patch
Description: Binary data


netbsd.patch
Description: Binary data


openbsd.patch
Description: Binary data


posix_shmem.c
Description: Binary data


(The NetBSD/OpenBSD patches are to force those to build with the SysV  
shmem calls, since they are notably without POSIX shmem support.)


Thanks,
Chris Marcellino



On Feb 27, 2007, at 1:40 AM, Magnus Hagander wrote:


On Tue, Feb 27, 2007 at 10:30:15AM +0100, Magnus Hagander wrote:
Does Windows have a method to get a unique ID number for a given  
data

directory, or a token file in that directory? It would need to be
constant while the database is open. Perhaps
GetFileInformationByHandle? It returns a struct with a nFileIndex
value that seems to be that, although I'm not certain.
This might make it easier to avoid the complexity of fitting the
filename in the segment name, and avoid the rename problem,


Yes, you could use the fileindex value. You need that one and the  
volume

serial number, total of 64+32 bits of data.

So yeah, we cuold use that instead of the full path name if we  
want to.

The advantage of this one is that it's shorter, the advantage of the
full path name is that you can see where the backend is from.

However, in most cases you will be able to see where the backend  
is from

anyway, because it is likely to have some other file open in the data
directory, so maybe that isn't such a big point after all?


Actually, I'm not sure we can. It's only stable as long as someone has
the file open. It will change if it's closed and re-opened later.

Given that we don't actually open the directory, and only files inside
it, I don't know how that works.

//Magnus

---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] POSIX shared memory support

2007-02-26 Thread Tom Lane
Chris Marcellino [EMAIL PROTECTED] writes:
 The System V shared memory facilities provide a method to determine  
 who is attached to a shared memory segment.
 This is used to prevent backends that were orphaned by crashed or  
 killed database processes from corrupting the data-
 base as it is restarted. The same effect can be achieved with using  
 the POSIX APIs,

... except that it can't ...

 but since the POSIX library does not
 have a way to check who is attached to a segment, atomic segment  
 creation must be used to ensure exclusive access to
 the database.

How does that fix the problem?  If you can't actually tell whether
someone is attached to an existing segment, then you're still up against
the basic rock-and-a-hard-place issue: either you assume there is no one
there (and corrupt your database if you're wrong) or you assume there is
someone there (and force manual intervention by the DBA to recover after
postmaster crashes).  Neither of these alternatives is really acceptable.

 In order for this to work, the key name used to open and create the  
 shared memory segment must be unique for each
 data directory. This is done by using a strong hash of the canonical  
 form of the data directory’s pathname.

Strong hash is not a guarantee, even if you could promise that you
could get a unique canonical path, which I doubt you can.  In any case
this fails if the DBA decides to rename the directory on the fly (don't
laugh; not only are there instances of that in our archives, there are
people opining that we need to allow it --- even with the postmaster
still running).

 This also re-
 moves any risk of other applications, or other databases’ memory  
 segments colliding with the current shared memory
 segment, which conveniently simplifies the logic.

How exactly does it remove that risk?  I think you're wishfully-thinking
that if you are creating an unreadable hash value then there will never
be any collisions against someone else with the same touching faith that
*his* unreadable hash values will never collide with anyone else's.
Doesn't give me a lot of comfort.  Not that it matters, since the
approach is broken even if this specific assumption were sustainable.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] POSIX shared memory support

2007-02-26 Thread Chris Marcellino


On Feb 26, 2007, at 10:43 PM, Tom Lane wrote:


Chris Marcellino [EMAIL PROTECTED] writes:

The System V shared memory facilities provide a method to determine
who is attached to a shared memory segment.
This is used to prevent backends that were orphaned by crashed or
killed database processes from corrupting the data-
base as it is restarted. The same effect can be achieved with using
the POSIX APIs,


... except that it can't ...


but since the POSIX library does not
have a way to check who is attached to a segment, atomic segment
creation must be used to ensure exclusive access to
the database.


How does that fix the problem?  If you can't actually tell whether
someone is attached to an existing segment, then you're still up  
against
the basic rock-and-a-hard-place issue: either you assume there is  
no one
there (and corrupt your database if you're wrong) or you assume  
there is
someone there (and force manual intervention by the DBA to recover  
after
postmaster crashes).  Neither of these alternatives is really  
acceptable.


Ignoring the case where backends are still alive in the database,  
since they would require intervention or patience either way, there  
are two options:
1) There is a postmaster/backend still running and you try to start  
another postmaster: the unique segment cannot be closed and  
atomically recreated and will fail as it does in the current  
implementation.
2) There are no errant processes still in the database: the segment  
can be closed and atomically recreated.


Try making a build with the patch, then start a postmaster for a  
given folder, delete the lock file and start another postmaster (on a  
different port) in that folder. Please let me know if I am  
overlooking something.





In order for this to work, the key name used to open and create the
shared memory segment must be unique for each
data directory. This is done by using a strong hash of the canonical
form of the data directory’s pathname.


Strong hash is not a guarantee, even if you could promise that you
could get a unique canonical path, which I doubt you can.  In any case
this fails if the DBA decides to rename the directory on the fly  
(don't

laugh; not only are there instances of that in our archives, there are
people opining that we need to allow it --- even with the postmaster
still running).


Strong hash is an effective guarantee that many computing paradigms  
are based upon. The collision rate is astronomically small, and can  
be made astronomically smaller with longer hashes.
(For MD5 there would need to be 10^15 postmasters on a server before  
a collision is likely, and they all would need to have crashed and  
left backends in the database, etc. )


True, renaming is a problem that I had had not anticipated at all.  
Now that you mention it, hard links might be an issue on some  
machines that don't canonicalize them to a unique path, since that  
isn't required by the POSIX docs. Oh, the horrible degenerate cases.  
Good point though.


Perhaps there is some other unique identifying feature of a given  
database. A per-database persistent UUID would fit nicely here. It  
could just be the shmem key.





This also re-
moves any risk of other applications, or other databases’ memory
segments colliding with the current shared memory
segment, which conveniently simplifies the logic.


How exactly does it remove that risk?


This is fruitless due to the renaming issue, but the hash isn't an  
issue. I'm not sure that a hex string beginning with \pg_x is any  
less readable than the shmem id integers that are generated ad-hoc by  
the current implementation.



I think you're wishfully-thinking
that if you are creating an unreadable hash value then there will  
never
be any collisions against someone else with the same touching faith  
that

*his* unreadable hash values will never collide with anyone else's.


I'm flattered that you hold my coding abilities with such devout  
conviction, but I assure you that cryptography, even in this limited  
use, is based in rational thought :).
In addition, the astronomically unlikely collision isn't a risk as  
the database can't be damaged. The admin would then need to clear the  
lockfile, after he won the lottery twice and was stuck by lightning  
in his overturned car.



Doesn't give me a lot of comfort.
Not that it matters, since the
approach is broken even if this specific assumption were sustainable.


Postmasters failing to load don't give me much comfort either, and  
that isn't a pipe dream.


I suppose that the renaming issue relegates this patch to situations  
where the database cannot be renamed or hard linked to and started  
more than once, yet require this to start up databases without  
restarting and needing to control how many other databases are  
consuming shmem on the same box.


Thanks for the reply,
Chris Marcellino



regards, tom lane



---(end of