subject:"\[HACKERS\] dynamic shared memory"

Re: [HACKERS] Dynamic shared memory areas

2017-01-04 Thread Peter Geoghegan

On Tue, Nov 15, 2016 at 5:31 PM, Robert Haas  wrote:
> I think we should develop versions of this that (1) allocate from the
> main shared memory segment and (2) allocate from backend-private
> memory.  Per my previous benchmarking results, allocating from
> backend-private memory would be a substantial win for tuplesort.c
> because this allocator is substantially more memory-efficient for
> large memory contexts than aset.c, and Tomas Vondra tested it out and
> found that it is also faster for logical decoding than the approach he
> proposed.

The approach that I'd prefer to take with tuplesort.c is to have a
buffer for caller tuples that is written to sequentially, and
repalloc()'d as needed, much like the memtuples array. It would be
slightly tricky to make this work when memtuples needs to be a heap
(I'm mostly thinking of top-N heapsorts here). That has perhaps
unbeatable efficiency, while also helping cases with significant
physical/logical correlation in their input, which is pretty common.
Creating an index on a serial PK within pg_restore would probably get
notably faster if we went this way.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic shared memory areas

2016-12-05 Thread Robert Haas

On Fri, Dec 2, 2016 at 3:46 PM, Thomas Munro
 wrote:
> Here's a patch to provide the right format string for dsa_pointer to
> printf-like functions, which clears a warning coming from dsa_dump (a
> debugging function) on 32 bit systems.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic shared memory areas

2016-12-02 Thread Thomas Munro

On Sat, Dec 3, 2016 at 9:02 AM, Robert Haas  wrote:
> On Fri, Dec 2, 2016 at 2:56 PM, Robert Haas  wrote:
>> On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas  wrote:
>>> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
>>>  wrote:
 Please find attached dsa-v8.patch, and also a small test module for
 running random allocate/free exercises and dumping the internal
 allocator state.
>>>
>>> OK, I've committed the main patch.
>>
>> ...but the buildfarm isn't very happy about it.
>>
>> tern complains:
>>
>> In file included from dsa.c:58:0:
>> ../../../../src/include/utils/dsa.h:59:1: error: unknown type name
>> 'pg_atomic_uint64'
>>  typedef pg_atomic_uint64 dsa_pointer_atomic;
>>
>> ...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails
>> to be true.  And that should always be true unless
>> PG_HAVE_ATOMIC_U64_SUPPORT is defined.  So apparently tern claims to
>> PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define
>> pg_atomic_uint64?  That doesn't seem right.
>
> No, that's not the problem.  Just a garden variety thinko in dsa.h.
> Will push a fix presently.

Here's a patch to provide the right format string for dsa_pointer to
printf-like functions, which clears a warning coming from dsa_dump (a
debugging function) on 32 bit systems.

-- 
Thomas Munro
http://www.enterprisedb.com


fix-dsa-pointer-format.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic shared memory areas

2016-12-02 Thread Robert Haas

On Fri, Dec 2, 2016 at 2:56 PM, Robert Haas  wrote:
> On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas  wrote:
>> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
>>  wrote:
>>> Please find attached dsa-v8.patch, and also a small test module for
>>> running random allocate/free exercises and dumping the internal
>>> allocator state.
>>
>> OK, I've committed the main patch.
>
> ...but the buildfarm isn't very happy about it.
>
> tern complains:
>
> In file included from dsa.c:58:0:
> ../../../../src/include/utils/dsa.h:59:1: error: unknown type name
> 'pg_atomic_uint64'
>  typedef pg_atomic_uint64 dsa_pointer_atomic;
>
> ...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails
> to be true.  And that should always be true unless
> PG_HAVE_ATOMIC_U64_SUPPORT is defined.  So apparently tern claims to
> PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define
> pg_atomic_uint64?  That doesn't seem right.

No, that's not the problem.  Just a garden variety thinko in dsa.h.
Will push a fix presently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic shared memory areas

2016-12-02 Thread Robert Haas

On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas  wrote:
> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
>  wrote:
>> Please find attached dsa-v8.patch, and also a small test module for
>> running random allocate/free exercises and dumping the internal
>> allocator state.
>
> OK, I've committed the main patch.

...but the buildfarm isn't very happy about it.

tern complains:

In file included from dsa.c:58:0:
../../../../src/include/utils/dsa.h:59:1: error: unknown type name
'pg_atomic_uint64'
 typedef pg_atomic_uint64 dsa_pointer_atomic;

...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails
to be true.  And that should always be true unless
PG_HAVE_ATOMIC_U64_SUPPORT is defined.  So apparently tern claims to
PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define
pg_atomic_uint64?  That doesn't seem right.

The failures on several other BF members appear to be similar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic shared memory areas

2016-12-02 Thread Robert Haas

On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
 wrote:
> Please find attached dsa-v8.patch, and also a small test module for
> running random allocate/free exercises and dumping the internal
> allocator state.

OK, I've committed the main patch.  As far as test-dsa.patch, can we
tie that into make check-world so that committing it delivers some
buildfarm coverage for this code?  Of course, the test settings would
have to be fairly conservative given that some buildfarm machines have
very limited resources, but it still seems worth doing.  test_shm_mq
might provide some useful precedent.

Note that you don't need the prototype if you've already used
PG_FUNCTION_INFO_V1.

I'm not sure that using the same random seed every time is a good
idea.  Maybe you should provide a way to set the seed as part of
starting the test, or to not do that (pass NULL?) and then elog(LOG,
...) the seed that's chosen.  Then if the BF crashes, we can see what
seed was in use for that particular test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic shared memory areas

2016-12-02 Thread Haribabu Kommi

On Thu, Dec 1, 2016 at 10:33 PM, Thomas Munro  wrote:

>
> Please find attached dsa-v8.patch, and also a small test module for
> running random allocate/free exercises and dumping the internal
> allocator state.


Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Dynamic shared memory areas

2016-11-29 Thread Robert Haas

More review:

+ * For large objects, we just stick all of the allocations in fullness class
+ * 0. Since we can just return the space directly to the free page manager,
+ * we don't really need them on a list at all, except that if someone wants
+ * to bulk release everything allocated using this BlockAreaContext, we
+ * have no other way of finding them.

This comment is out-of-date.

+   /*
+* If this is the only span, and there is no active
span, then maybe
+* we should probably move this span to fullness class
1.  (Otherwise
+* if you allocate exactly all the objects in the only
span, it moves
+* to class 3, then you free them all, it moves to 2,
and then is
+* given back, leaving no active span).
+*/

"maybe we should probably" seems to have one more doubt-expressing
word than it needs.

+   if (size_class == DSA_SCLASS_SPAN_LARGE)
+   /* Large object frees give back segments
aggressively already. */
+   continue;

We generally use braces in this kind of case.

+* Search the fullness class 1 only.  That is where we
expect to find

extra "the"

+   /* Call for effect (we don't need the result). */
+   get_segment_by_index(area, index);
...
+   return area->segment_maps[index].mapped_address + offset;

It isn't guaranteed that area->segment_maps[index].mapped_address will
be non-NULL on return from get_segment_by_index, and then this
function will return a completely bogus pointer to the caller.  I
think you should probably elog() instead.

+   elog(ERROR, "dsa: can't attach to area handle %u", handle);

Avoid ":" in elog messages.   You don't really need to - and it isn't
project style to - tag these with "dsa:"; that's what \errverbose or
\set VERBOSITY verbose is for.  In this particular case, I might just
adopt the formulation from parallel.c:

ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("could not map dynamic shared
memory segment")));

+   elog(FATAL,
+"dsa couldn't find run of pages:
fpm_largest out of sync");

Here I'd go with "dsa could not find %u free pages".

+   elog(ERROR, "dsa_pin: area already pinned");

"dsa_area already pinned"

+   elog(ERROR, "dsa_unpin: area not pinned");

"dsa_area not pinned"

+   if (segment == NULL)
+   elog(ERROR, "dsa: can't attach to segment");

As above.

+static dsa_segment_map *
+get_segment_by_index(dsa_area *area, dsa_segment_index index)
+{
+   if (unlikely(area->segment_maps[index].mapped_address == NULL))
+   {
+   dsm_handle  handle;
+   dsm_segment *segment;
+   dsa_segment_map *segment_map;
+
+   handle = area->control->segment_handles[index];

Don't you need to acquire the lock for this?

+   /* Check all currently mapped segments to find what's
been freed. */
+   for (i = 0; i <= area->high_segment_index; ++i)
+   {
+   if (area->segment_maps[i].header != NULL &&
+   area->segment_maps[i].header->freed)
+   {
+   dsm_detach(area->segment_maps[i].segment);
+   area->segment_maps[i].segment = NULL;
+   area->segment_maps[i].header = NULL;
+   area->segment_maps[i].mapped_address = NULL;
+   }
+   }
+   area->freed_segment_counter = freed_segment_counter;

And this?

+/*
+ * Release a DSA area that was produced by dsa_create_in_place or
+ * dsa_attach_in_place.  It is preferable to use one of the 'dsa_on_XXX'
+ * callbacks so that this is managed automatically, because failure to release
+ * an area created in-place leaks its segments permanently.
+ */
+void
+dsa_release_in_place(void *place)
+{
+   decrement_reference_count((dsa_area_control *) place);
+}

Since this seems to be the only caller of decrement_reference_count,
you could just put the logic here.  The contract for this function is
also a bit unclear from the header comment.  I initially thought that
it was your intention that this should be called from every process
that has either created or attached the segment.  But that doesn't
seem like it will work, because decrement_reference_count calls
dsm_unpin_segment on every segment, and a segment can only be pinned
once, so everybody else would fail.  So maybe the idea is that ANY ONE
process has to call dsa_release_in_place.  But then that could lead to
failures in other backends inside get_segment_by_index(), because
segments they don't have mapped might already be gone.  OK, third try:
maybe the idea is that the LAST

Re: [HACKERS] Dynamic shared memory areas

2016-11-28 Thread Robert Haas

On Wed, Nov 23, 2016 at 7:07 AM, Thomas Munro
 wrote:
> Those let you create an area in existing memory (in a DSM segment,
> traditional inherited shmem).  The in-place versions will stlll create
> DSM segments on demand as required, though I suppose if you wanted to
> prevent that you could with dsa_set_size_limit(area, size).  One
> complication is that of course the automatic detach feature doesn't
> work if you're in some random piece of memory.  I have exposed
> dsa_on_dsm_detach, so that there is a way to hook it up to the detach
> hook for a pre-existing DSM segment, but that's the caller's
> responibility.

shm_mq_attach() made the opposite decision about how to solve this
problem, and frankly I think that API is a lot more convenient: if the
first argument to shm_mq_attach() happens to be located inside of a
DSM, you can pass the DSM as the second argument and it registers the
on_dsm_detach() hook for you.  If not, you can pass NULL and deal with
it in some other way.  But this makes the common case very simple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic shared memory areas

2016-11-15 Thread Robert Haas

On Thu, Nov 10, 2016 at 12:37 AM, Thomas Munro
 wrote:
> On Tue, Nov 1, 2016 at 5:06 PM, Thomas Munro
>  wrote:
>> On Wed, Oct 5, 2016 at 11:28 PM, Thomas Munro
>>  wrote:
>>> [dsa-v3.patch]
>>
>> Here is a new version which just adds CLOBBER_FREED_MEMORY support to 
>> dsa_free.
>
> Here is a new version that fixes a bug I discovered in freepage.c today.
>
> Details:  When dsa_free decides to give back a whole superblock back
> to the free page manager for a segment with FreePageManagerPut, and
> there was already exactly one span of exactly one free page in that
> segment, and the span being 'put' is not adjacent to that existing
> free page, then the singleton format must be converted to a btree with
> the existing page as root and the newly put span as the sole leaf.
> But in that special case we forgot to add the newly put span to the
> appropriate free list.  Not only did we lose track of it, but a future
> call to FreePageManagerPut might try to merge it with another adjacent
> span, which will try to manipulate the freelist that it expects it to
> be in and blow up.  The fix is just to add a call to
> FreePagePushSpanLeader in this corner case before the early return.

Since a lot of the design of this patch is mine - from my earlier work
on sb_alloc - I don't expect to have a ton of objections to it. And
I'd like to get it committed, because other parallelism work depends
on it (bitmap heap scan and parallel hash join in particular), and
because it's got other uses as well.  However, I don't want to be
perceived as slamming my code (or that of my colleagues) into the tree
without due opportunity for other people to provide feedback, so if
anyone has questions, comments, concerns, or review to offer, please
do.

I think we should develop versions of this that (1) allocate from the
main shared memory segment and (2) allocate from backend-private
memory.  Per my previous benchmarking results, allocating from
backend-private memory would be a substantial win for tuplesort.c
because this allocator is substantially more memory-efficient for
large memory contexts than aset.c, and Tomas Vondra tested it out and
found that it is also faster for logical decoding than the approach he
proposed.  Perhaps that's not an argument for holding up his proposed
patches for that problem, but I think it IS a good argument for
pressing forward with a backend-private version of this allocator.
I'm not saying that should be part of the initial commit of this code,
but I think it's a good direction to pursue.

One question that we need to resolve is where the code should live in
the source tree.  When I wrote the original patches upon which this
work was based, I think that I put all the code in
src/backend/utils/mmgr, since it's all memory-management code.  In
this patch, Thomas left the free page manager code there, but put the
allocator itself in src/backend/storage/ipc.  There's a certain logic
to that because dynamic shared areas (dsa.c) sit right next to dynamic
shared memory (dsm.c) but it feels a bit schizophrenic to have half of
the code in storage/ipc and the other half in utils/mmgr.  I guess my
view is that utils/mmgr is a better fit, because I think that this is
basically memory management code that happens to use shared memory,
rather than basically IPC that happens to be an allocator.  If we
decide that this stuff goes in storage/ipc then that's basically
saying that everything that uses dynamic shared memory is going to end
up in that directory, which seems like a mess.  The fact that the
free-page manager, at least, could be used for allocations not based
on DSM strengthens that argument in my view. Other opinions?

The #ifdef HAVE__BUILTIN_TYPES_COMPATIBLE_P hack in relptr.h, for
which I believe I'm responsible, is ugly.  There is probably a
compiler out there that has __typeof__ but not
__builtin_types_compatible_p, and we could cater to that by adding a
separate configure test for __typeof__.  A little browsing of the
documentation on at https://gcc.gnu.org/onlinedocs/ seems to suggest
that __builtin_types_compatible_p didn't exist before GCC 3.1, but
__typeof__ seems to be there even in 2.95.3.  That's not very
interesting, honestly, because 3.1 came out in 2002, but there might
be non-GCC compilers that implement __typeof__ but not
__builtin_types_compatible_p.  I am inclined not to worry about this
unless somebody feels otherwise, but it's not beautiful.

I wonder if it wouldn't be a good idea to allow the dsa_area_control
to be stored wherever instead of insisting that it's got to be located
inside the first DSM segment backing the pool itself.  For example,
you could make dsa_create_dynamic() take a third argument which is a
pointer to enough space for a dsa_area_control, and it would
initialize it in place.  Then you could make dsa_attach_dynamic() take
a pointer to that same structure instead of taking a

Re: [HACKERS] Dynamic shared memory areas

2016-10-05 Thread Dilip Kumar

On Wed, Oct 5, 2016 at 3:00 AM, Thomas Munro
 wrote:
> Here's a new version that does that.

While testing this patch I found some issue,

+ total_size = DSA_INITIAL_SEGMENT_SIZE;
+ total_pages = total_size / FPM_PAGE_SIZE;
+ metadata_bytes =
+ MAXALIGN(sizeof(dsa_area_control)) +
+ MAXALIGN(sizeof(FreePageManager)) +
+ total_pages * sizeof(dsa_pointer);
+ /* Add padding up to next page boundary. */
+ if (metadata_bytes % FPM_PAGE_SIZE != 0)
+ metadata_bytes += FPM_PAGE_SIZE - (metadata_bytes % FPM_PAGE_SIZE);
+ usable_pages =
+ (total_size - metadata_bytes) / FPM_PAGE_SIZE;

+ segment = dsm_create(total_size, 0);
+ dsm_pin_segment(segment);

Actually problem is that size of dsa_area_control is bigger than
DSA_INITIAL_SEGMENT_SIZE.
but we are allocating segment of DSA_INITIAL_SEGMENT_SIZE size.

(gdb) p sizeof(dsa_area_control)
$8 = 67111000
(gdb) p DSA_INITIAL_SEGMENT_SIZE
$9 = 1048576

In dsa-v1 problem was not exist because  DSA_MAX_SEGMENTS was 1024,
but in dsa-v2 I think it's calculated wrongly.

(gdb) p DSA_MAX_SEGMENTS
$10 = 16777216

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-16 Thread Robert Haas

On Tue, Apr 15, 2014 at 10:46 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com 
 wrote:
 On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 For the create case, I'm wondering if we should put the block that
 tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
 is your opinion?

 Either way is okay, but I think the way you are suggesting is better as it
 will make code consistent with other place (PGSharedMemoryCreate()).

 OK, can you prepare a patch?

 Please find attached patch to address this issue.
 One minor point to note is that now we have to call GetLastError() twice,
 once inside error path and once to check EEXIST, but I think that is okay
 as existing code in PGSharedMemoryCreate() does it that way.

OK.  I committed this blindly, but I don't have a Windows dev
environment, so please keep an eye on the Windows buildfarm members
and provide follow-on patches if any of them get unhappy about this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-15 Thread Robert Haas

On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I have checked that other place in code also check handle to
 decide if API has failed.  Refer function PGSharedMemoryIsInUse().
 So I think fix to call GetLastError() after checking handle is okay.
 Attached patch fixes this issue.  After patch, the server shows below
 log which is exactly what is expected from test_shm_mq

 In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred
 way to test for a NULL pointer.  I notice that the !hmap style is used
 throughout this code, so I guess cleaning that up is a matter for a
 separate commit.

 I think in that case we might want to cleanup some other similar usage
 (PGSharedMemoryCreate) of !hmap.

Ah.  Well, in that case maybe we should just leave it alone, since
it's been like that forever and nobody's cared until now.

 For the create case, I'm wondering if we should put the block that
 tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
 is your opinion?

 Either way is okay, but I think the way you are suggesting is better as it
 will make code consistent with other place (PGSharedMemoryCreate()).

OK, can you prepare a patch?


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-15 Thread Amit Kapila

On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 For the create case, I'm wondering if we should put the block that
 tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
 is your opinion?

 Either way is okay, but I think the way you are suggesting is better as it
 will make code consistent with other place (PGSharedMemoryCreate()).

 OK, can you prepare a patch?

Please find attached patch to address this issue.
One minor point to note is that now we have to call GetLastError() twice,
once inside error path and once to check EEXIST, but I think that is okay
as existing code in PGSharedMemoryCreate() does it that way.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_dsm_invalid_errcode_issue-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-14 Thread Robert Haas

On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Apr 9, 2014 at 9:20 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 9, 2014 at 7:41 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I am just not sure whether it is okay to rearrange the code and call
 GetLastError() only if returned handle is Invalid (NULL) or try to look
 for more info.

 Well, I don't know either.  You wrote the Windows portion of this
 code, so you'll have to decide what's best.  If the best practice in
 this area is to only call GetLastError() if the handle isn't valid,
 then that's probably what we should do, too.  But I can't speak to
 what the best practice is.

 I have checked that other place in code also check handle to
 decide if API has failed.  Refer function PGSharedMemoryIsInUse().
 So I think fix to call GetLastError() after checking handle is okay.
 Attached patch fixes this issue.  After patch, the server shows below
 log which is exactly what is expected from test_shm_mq

In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred
way to test for a NULL pointer.  I notice that the !hmap style is used
throughout this code, so I guess cleaning that up is a matter for a
separate commit.

For the create case, I'm wondering if we should put the block that
tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
is your opinion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-14 Thread Amit Kapila

On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I have checked that other place in code also check handle to
 decide if API has failed.  Refer function PGSharedMemoryIsInUse().
 So I think fix to call GetLastError() after checking handle is okay.
 Attached patch fixes this issue.  After patch, the server shows below
 log which is exactly what is expected from test_shm_mq

 In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred
 way to test for a NULL pointer.  I notice that the !hmap style is used
 throughout this code, so I guess cleaning that up is a matter for a
 separate commit.

I think in that case we might want to cleanup some other similar usage
(PGSharedMemoryCreate) of !hmap.

 For the create case, I'm wondering if we should put the block that
 tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
 is your opinion?

Either way is okay, but I think the way you are suggesting is better as it
will make code consistent with other place (PGSharedMemoryCreate()).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-11 Thread Amit Kapila

On Wed, Apr 9, 2014 at 9:20 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 9, 2014 at 7:41 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I am just not sure whether it is okay to rearrange the code and call
 GetLastError() only if returned handle is Invalid (NULL) or try to look
 for more info.

 Well, I don't know either.  You wrote the Windows portion of this
 code, so you'll have to decide what's best.  If the best practice in
 this area is to only call GetLastError() if the handle isn't valid,
 then that's probably what we should do, too.  But I can't speak to
 what the best practice is.

I have checked that other place in code also check handle to
decide if API has failed.  Refer function PGSharedMemoryIsInUse().
So I think fix to call GetLastError() after checking handle is okay.
Attached patch fixes this issue.  After patch, the server shows below
log which is exactly what is expected from test_shm_mq

LOG:  registering background worker test_shm_mq
LOG:  starting background worker process test_shm_mq
LOG:  worker process: test_shm_mq (PID 4888) exited with exit code 1
LOG:  unregistering background worker test_shm_mq
LOG:  registering background worker test_shm_mq
LOG:  starting background worker process test_shm_mq
LOG:  worker process: test_shm_mq (PID 3128) exited with exit code 1
LOG:  unregistering background worker test_shm_mq

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_dsm_invalid_errcode_issue.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-09 Thread Amit Kapila

On Tue, Apr 8, 2014 at 9:15 PM, Robert Haas robertmh...@gmail.com wrote:
 Apparently not.  However, I'm fairly sure this is a step toward
 addressing the complaints previously raised, even if there may be some
 details people still want changed, so I've gone ahead and committed
 it.

Few Observations:

1. One new warning has been introduced in code.
1src\backend\port\win32_shmem.c(295): warning C4013:
'dsm_set_control_handle' undefined; assuming extern returning int
Attached patch fixes this warning.

2. On running test_shm_mq manually, the client side didn't get
any problem (the results are as expected). However I am seeing
below log on server:
LOG:  registering background worker test_shm_mq
LOG:  starting background worker process test_shm_mq
LOG:  unrecognized win32 error code: 127
LOG:  worker process: test_shm_mq (PID 2528) exited with exit code 1
LOG:  unregistering background worker test_shm_mq

I think below message in log is not expected:
LOG:  unrecognized win32 error code: 127

This has been started appearing after your commit related to DSM.
I have debugged this issue and found that it comes from below part
of code:
dsm_impl_windows
{
..
else
{
hmap = OpenFileMapping(FILE_MAP_WRITE | FILE_MAP_READ,
  FALSE, /* do not inherit the name */
  name); /* name of mapping object */
_dosmaperr(GetLastError());
}
}

Here even though handle returned by OpenFileMapping() is a valid handle,
but still GetLastError() return 127 (The specified procedure could not be
found.).  Now the specs[1] of API says that if handle is non-NULL, then
consider it success, so I am not sure whether we should bother about this
error or not.  I have tried many ways (trying different parameters, search on
net) to change this and related API's, but the same problem exists.  One
strange thing is if I just call the API twice (I know this is not
right, but just
to experiment for finding some info), second time this error doesn't
occur.
The only difference in latest changes is that now the OpenFileMapping()
is getting called by Child process rather than peer process (w.r.t process
that has called CreateFileMapping), don't know why this should make
such a difference.

On net whatever examples I have seen for such usage, they call
GetLastError() only if handle is invalid, we have called in above fashion
just to keep code little bit simple.

I am just not sure whether it is okay to rearrange the code and call
GetLastError() only if returned handle is Invalid (NULL) or try to look
for more info.

One question:
1. I have seen that initdb still creates pg_dynshmem, is it required
after your latest changes?


Let me know your opinion?

[1]
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366791(v=vs.85).aspx



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_dsm_win_warning.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-09 Thread Robert Haas

On Wed, Apr 9, 2014 at 7:41 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Few Observations:

 1. One new warning has been introduced in code.
 1src\backend\port\win32_shmem.c(295): warning C4013:
 'dsm_set_control_handle' undefined; assuming extern returning int
 Attached patch fixes this warning.

OK, committed, after moving the header to the correct position in
alphabetical order.

 2. On running test_shm_mq manually, the client side didn't get
 any problem (the results are as expected). However I am seeing
 below log on server:
 LOG:  registering background worker test_shm_mq
 LOG:  starting background worker process test_shm_mq
 LOG:  unrecognized win32 error code: 127
 LOG:  worker process: test_shm_mq (PID 2528) exited with exit code 1
 LOG:  unregistering background worker test_shm_mq

 I think below message in log is not expected:
 LOG:  unrecognized win32 error code: 127

 This has been started appearing after your commit related to DSM.
 I have debugged this issue and found that it comes from below part
 of code:
 dsm_impl_windows
 {
 ..
 else
 {
 hmap = OpenFileMapping(FILE_MAP_WRITE | FILE_MAP_READ,
   FALSE, /* do not inherit the name */
   name); /* name of mapping object */
 _dosmaperr(GetLastError());
 }
 }

 Here even though handle returned by OpenFileMapping() is a valid handle,
 but still GetLastError() return 127 (The specified procedure could not be
 found.).  Now the specs[1] of API says that if handle is non-NULL, then
 consider it success, so I am not sure whether we should bother about this
 error or not.  I have tried many ways (trying different parameters, search on
 net) to change this and related API's, but the same problem exists.  One
 strange thing is if I just call the API twice (I know this is not
 right, but just
 to experiment for finding some info), second time this error doesn't
 occur.
 The only difference in latest changes is that now the OpenFileMapping()
 is getting called by Child process rather than peer process (w.r.t process
 that has called CreateFileMapping), don't know why this should make
 such a difference.

 On net whatever examples I have seen for such usage, they call
 GetLastError() only if handle is invalid, we have called in above fashion
 just to keep code little bit simple.

 I am just not sure whether it is okay to rearrange the code and call
 GetLastError() only if returned handle is Invalid (NULL) or try to look
 for more info.

Well, I don't know either.  You wrote the Windows portion of this
code, so you'll have to decide what's best.  If the best practice in
this area is to only call GetLastError() if the handle isn't valid,
then that's probably what we should do, too.  But I can't speak to
what the best practice is.

 One question:
 1. I have seen that initdb still creates pg_dynshmem, is it required
 after your latest changes?

It's only used now if dynamic_shared_memory_type = mmap.  I know
Andres was never a huge fan of the mmap implementation, so we could
rip that out and get rid of the directory, too, but I kind of liked
having it, and broke the tie in favor of myself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-09 Thread Andres Freund

On 2014-04-09 11:50:33 -0400, Robert Haas wrote:
  One question:
  1. I have seen that initdb still creates pg_dynshmem, is it required
  after your latest changes?
 
 It's only used now if dynamic_shared_memory_type = mmap.  I know
 Andres was never a huge fan of the mmap implementation, so we could
 rip that out and get rid of the directory, too, but I kind of liked
 having it, and broke the tie in favor of myself.

It's purely a toy thing. I think pretty much every dynshm user that
actually transfers data through it will be better off falling back to
single process style work, rather than parellizing it.

Anyway, it's there, it doesn't presently cause problems, and I don't
have to maintain it, so ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-08 Thread Robert Haas

On Fri, Apr 4, 2014 at 10:01 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 22, 2014 at 10:17 AM, Noah Misch n...@leadboat.com wrote:
 Yeah, abandoning the state file is looking attractive.

 Here's a draft patch getting rid of the state file.  This should
 address concerns raised by Heikki and Fujii Masao and echoed by Tom
 that dynamic shared memory behaves differently than the main shared
 memory segment.  The control segment ID is stored in the System V
 shared memory block, so we're guaranteed that when using either System
 V or POSIX shared memory we'll always latch onto the control segment
 that matches up with the main shared memory segment we latched on to.
 Cleanup for the file-mmap and Windows methods doesn't need to change,
 because the former always just means clearing out $PGDATA/pg_dynshmem,
 and the latter is done automatically by the operating system.

 Comments?

Apparently not.  However, I'm fairly sure this is a step toward
addressing the complaints previously raised, even if there may be some
details people still want changed, so I've gone ahead and committed
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-04 Thread Robert Haas

On Wed, Jan 22, 2014 at 10:17 AM, Noah Misch n...@leadboat.com wrote:
 Yeah, abandoning the state file is looking attractive.

Here's a draft patch getting rid of the state file.  This should
address concerns raised by Heikki and Fujii Masao and echoed by Tom
that dynamic shared memory behaves differently than the main shared
memory segment.  The control segment ID is stored in the System V
shared memory block, so we're guaranteed that when using either System
V or POSIX shared memory we'll always latch onto the control segment
that matches up with the main shared memory segment we latched on to.
Cleanup for the file-mmap and Windows methods doesn't need to change,
because the former always just means clearing out $PGDATA/pg_dynshmem,
and the latter is done automatically by the operating system.

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
commit 6095f077a0d9a45e422087be1c7e8791c247a4e9
Author: Robert Haas rh...@postgresql.org
Date:   Fri Apr 4 09:06:23 2014 -0400

Get rid of the dynamic shared memory state file.

Instead, store the control segment ID in the header for the main
shared memory segment, and recover it from there when we restart after
a crash.  This ensures that we never latch onto the main shared memory
segment from one previous run and the dynamic shared memory control
segment from a different one.  It also avoids problems if you take a
base backup and start it up as hot standby on the same machine while
the previous postmaster is still running.

diff --git a/src/backend/port/ipc_test.c b/src/backend/port/ipc_test.c
index 6259919..936c1a4 100644
--- a/src/backend/port/ipc_test.c
+++ b/src/backend/port/ipc_test.c
@@ -30,6 +30,7 @@
 #include unistd.h
 
 #include miscadmin.h
+#include storage/dsm.h
 #include storage/ipc.h
 #include storage/pg_sema.h
 #include storage/pg_shmem.h
@@ -214,12 +215,13 @@ int
 main(int argc, char **argv)
 {
 	MyStorage  *storage;
+	PGShmemHeader *shim;
 	int			cpid;
 
 	printf(Creating shared memory ... );
 	fflush(stdout);
 
-	storage = (MyStorage *) PGSharedMemoryCreate(8192, false, 5433);
+	storage = (MyStorage *) PGSharedMemoryCreate(8192, false, 5433, shim);
 
 	storage-flag = 1234;
 
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 51c1a2b..5e3850b 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -30,6 +30,7 @@
 
 #include miscadmin.h
 #include portability/mem.h
+#include storage/dsm.h
 #include storage/ipc.h
 #include storage/pg_shmem.h
 #include utils/guc.h
@@ -421,7 +422,8 @@ CreateAnonymousSegment(Size *size)
  * zero will be passed.
  */
 PGShmemHeader *
-PGSharedMemoryCreate(Size size, bool makePrivate, int port)
+PGSharedMemoryCreate(Size size, bool makePrivate, int port,
+	 PGShmemHeader **shim)
 {
 	IpcMemoryKey NextShmemSegID;
 	void	   *memAddress;
@@ -509,10 +511,13 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port)
 
 		/*
 		 * The segment appears to be from a dead Postgres process, or from a
-		 * previous cycle of life in this same process.  Zap it, if possible.
+		 * previous cycle of life in this same process.  Zap it, if possible,
+		 * and any associated dynamic shared memory segments, as well.
 		 * This probably shouldn't fail, but if it does, assume the segment
 		 * belongs to someone else after all, and continue quietly.
 		 */
+		if (hdr-dsm_control != 0)
+			dsm_cleanup_using_control_segment(hdr-dsm_control);
 		shmdt(memAddress);
 		if (shmctl(shmid, IPC_RMID, NULL)  0)
 			continue;
@@ -539,6 +544,7 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port)
 	hdr = (PGShmemHeader *) memAddress;
 	hdr-creatorPID = getpid();
 	hdr-magic = PGShmemMagic;
+	hdr-dsm_control = 0;
 
 	/* Fill in the data directory ID info, too */
 	if (stat(DataDir, statbuf)  0)
@@ -554,6 +560,7 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port)
 	 */
 	hdr-totalsize = size;
 	hdr-freeoffset = MAXALIGN(sizeof(PGShmemHeader));
+	*shim = hdr;
 
 	/* Save info for possible future use */
 	UsedShmemSegAddr = memAddress;
@@ -608,6 +615,7 @@ PGSharedMemoryReAttach(void)
 	if (hdr != origUsedShmemSegAddr)
 		elog(FATAL, reattaching to shared memory returned unexpected address (got %p, expected %p),
 			 hdr, origUsedShmemSegAddr);
+	dsm_set_control_handle(((PGShmemHeader *) hdr)-dsm_control);
 
 	UsedShmemSegAddr = hdr;		/* probably redundant */
 }
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index dca371c..3a0ded4 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -117,7 +117,8 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2)
  *
  */
 PGShmemHeader *
-PGSharedMemoryCreate(Size size, bool makePrivate, int port)
+PGSharedMemoryCreate(Size size, bool makePrivate, int port,
+	 PGShmemHeader **shim)
 {
 	void	   *memAddress;
 	PGShmemHeader *hdr;
@@ -245,12 +246,14 @@ PGSharedMemoryCreate(Size

Re: [HACKERS] dynamic shared memory and locks

2014-02-12 Thread Robert Haas

On Mon, Feb 10, 2014 at 7:17 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Does it make another problem if dsm_detach() also releases lwlocks
 being allocated on the dsm segment to be released?
 Lwlocks being held is tracked in the held_lwlocks[] array; its length is
 usually 100. In case when dsm_detach() is called towards the segment
 between addr ~ (addr + length), it seems to me an easy job to walk on
 this array to find out lwlocks on the range.

Oh, sure, it could be done.  I just don't see the point.  If you're
explicitly detaching a shared memory segment while still holding
lwlocks within that segment, your code is seriously buggy.  Making
dsm_detach() silently release those locks would probably just be
hiding a bug that you'd really rather find out about.

 Just my rough idea. Doesn't it make sense to have an offset from the
 head of DSM segment that contains the lwlock, instead of the identifier
 form, to indicate a cstring datum? It does not prevent to allocate it
 later; after the postmaster starting up, and here is no problem if number
 of dynamic lwlocks grows millions or more.
 If lwlock has a tranche_offset, not tranche_id, all it needs to do is:
 1. find out either of DSM segment or regular SHM segment that contains
 the supplied lwlock.
 2. Calculate mapped_address + tranche_offset; that is the local location
 where cstring form is put.

Well, if that offset is 8 bytes, it's going to make the LWLock
structure grow past 32 bytes on common platforms, which I do not want
to do.  If it's 4 bytes, sure, that'll work, but now you've gotta make
sure that string gets into the 4GB of the relevant shared memory
segment, which sounds like an annoying requirement.  How would you
satisfy it with respect to the main shared memory segment, for
example?

I mean, one way to handle this problem is to put a hash table in the
main shared memory segment with tranche ID - name mappings.  Backends
can suck mappings out of there and cache them locally as needed.  But
that's a lot of mechanism for a facility with no known users.

 Probably, it needs a common utility routine on dsm.c to find out
 a particular DSM segment that contains the supplied address.
 (Inverse function towards dsm_segment_address)

Yeah, I've thought of that.  It may be useful enough to be worth
doing, whether we use it for this or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-02-10 Thread Kohei KaiGai

2014-02-08 4:52 GMT+09:00 Robert Haas robertmh...@gmail.com:
 On Tue, Jan 21, 2014 at 11:37 AM, Robert Haas robertmh...@gmail.com wrote:
 One idea I just had is to improve the dsm_toc module so that it can
 optionally set up a tranche of lwlocks for you, and provide some
 analogues of RequestAddinLWLocks and LWLockAssign for that case.  That
 would probably make this quite a bit simpler to use, at least for
 people using it with dynamic shared memory.  But I think that's a
 separate patch.

 I played with this a bit today and it doesn't actually seem to
 simplify things very much.  The backend that creates the DSM needs to
 do this:

 lwlocks = shm_toc_allocate(toc, sizeof(LWLockPadded) * nlwlocks);
 for (i = 0; i  nlwlocks; ++i)
 LWLockInitialize(lwlocks[i].lock, tranche_id);

 Since that's all of three lines, encapsulating it doesn't look all
 that helpful.  Then each backend needs to do something like this:

 static LWLockTranche mytranche;
 mytranche.name = some descriptive module name;
 mytranche.array_base = lwlocks;
 mytranche.array_stride = sizeof(LWLockPadded);
 LWLockRegisterTranche(tranche_id, mytranche);

 That's not a lot of code either, and there's no obvious way to reduce
 it much by hooking into shm_toc.

 I thought maybe we needed some cleanup when the dynamic shared memory
 segment is unmapped, but looks like we really don't.  One can do
 something like LWLockRegisterTranche(tranche_id, NULL) for debugging
 clarity, but it isn't strictly needed; one can release all lwlocks
 from the tranche or assert that none are held, but that should really
 only be a problem if the user does something like
 LWLockAcquire(lock_in_the_segment); dsm_detach(seg), because error
 recovery starts by releasing *all* lwlocks.  And if the user does it
 explicitly, I'm kinda OK with that just seg faulting.  After all, the
 user could equally well have done dsm_detach(seg);
 LWLockAcquire(lock_in_the_segment) and there's no way at all to
 cross-check for that sort of mistake.

Does it make another problem if dsm_detach() also releases lwlocks
being allocated on the dsm segment to be released?
Lwlocks being held is tracked in the held_lwlocks[] array; its length is
usually 100. In case when dsm_detach() is called towards the segment
between addr ~ (addr + length), it seems to me an easy job to walk on
this array to find out lwlocks on the range.


 I do see one thing about the status quo that does look reasonably
 annoying: the code expects that tranche IDs are going to stay
 relatively small.  For example, consider a module foobar that uses DSM
 + LWLocks.  It won't do at all for each backend, on first use of the
 foobar module, to do LWLockNewTrancheId() and then reuse that
 tranche_id repeatedly for each new DSM - because over a system
 lifetime of months, tranche IDs could grow into the millions, causing
 LWLockTrancheArray to get really big (and eventually repalloc will
 fail).  Rather, the programmer needs to ensure that
 LWLockNewTrancheId() gets called *once per postmaster lifetime*,
 perhaps by allocating a chunk of permanent shared memory and using
 that to store the tranche_id that should be used each time an
 individual backend fires up a DSM.  Considering that one of the goals
 of dynamic shared memory is to allow modules to be loaded after
 postmaster startup and still be able to do useful stuff, that's kind
 of obnoxious.  I don't have a good idea what to do about it, though;
 making LWLockTrancheArray anything other than a flat array looks
 likely to slow down --enable-dtrace builds unacceptably.

Just my rough idea. Doesn't it make sense to have an offset from the
head of DSM segment that contains the lwlock, instead of the identifier
form, to indicate a cstring datum? It does not prevent to allocate it
later; after the postmaster starting up, and here is no problem if number
of dynamic lwlocks grows millions or more.
If lwlock has a tranche_offset, not tranche_id, all it needs to do is:
1. find out either of DSM segment or regular SHM segment that contains
the supplied lwlock.
2. Calculate mapped_address + tranche_offset; that is the local location
where cstring form is put.

Probably, it needs a common utility routine on dsm.c to find out
a particular DSM segment that contains the supplied address.
(Inverse function towards dsm_segment_address)

How about my ideas?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-02-07 Thread Robert Haas

On Tue, Jan 21, 2014 at 11:37 AM, Robert Haas robertmh...@gmail.com wrote:
 One idea I just had is to improve the dsm_toc module so that it can
 optionally set up a tranche of lwlocks for you, and provide some
 analogues of RequestAddinLWLocks and LWLockAssign for that case.  That
 would probably make this quite a bit simpler to use, at least for
 people using it with dynamic shared memory.  But I think that's a
 separate patch.

I played with this a bit today and it doesn't actually seem to
simplify things very much.  The backend that creates the DSM needs to
do this:

lwlocks = shm_toc_allocate(toc, sizeof(LWLockPadded) * nlwlocks);
for (i = 0; i  nlwlocks; ++i)
LWLockInitialize(lwlocks[i].lock, tranche_id);

Since that's all of three lines, encapsulating it doesn't look all
that helpful.  Then each backend needs to do something like this:

static LWLockTranche mytranche;
mytranche.name = some descriptive module name;
mytranche.array_base = lwlocks;
mytranche.array_stride = sizeof(LWLockPadded);
LWLockRegisterTranche(tranche_id, mytranche);

That's not a lot of code either, and there's no obvious way to reduce
it much by hooking into shm_toc.

I thought maybe we needed some cleanup when the dynamic shared memory
segment is unmapped, but looks like we really don't.  One can do
something like LWLockRegisterTranche(tranche_id, NULL) for debugging
clarity, but it isn't strictly needed; one can release all lwlocks
from the tranche or assert that none are held, but that should really
only be a problem if the user does something like
LWLockAcquire(lock_in_the_segment); dsm_detach(seg), because error
recovery starts by releasing *all* lwlocks.  And if the user does it
explicitly, I'm kinda OK with that just seg faulting.  After all, the
user could equally well have done dsm_detach(seg);
LWLockAcquire(lock_in_the_segment) and there's no way at all to
cross-check for that sort of mistake.

I do see one thing about the status quo that does look reasonably
annoying: the code expects that tranche IDs are going to stay
relatively small.  For example, consider a module foobar that uses DSM
+ LWLocks.  It won't do at all for each backend, on first use of the
foobar module, to do LWLockNewTrancheId() and then reuse that
tranche_id repeatedly for each new DSM - because over a system
lifetime of months, tranche IDs could grow into the millions, causing
LWLockTrancheArray to get really big (and eventually repalloc will
fail).  Rather, the programmer needs to ensure that
LWLockNewTrancheId() gets called *once per postmaster lifetime*,
perhaps by allocating a chunk of permanent shared memory and using
that to store the tranche_id that should be used each time an
individual backend fires up a DSM.  Considering that one of the goals
of dynamic shared memory is to allow modules to be loaded after
postmaster startup and still be able to do useful stuff, that's kind
of obnoxious.  I don't have a good idea what to do about it, though;
making LWLockTrancheArray anything other than a flat array looks
likely to slow down --enable-dtrace builds unacceptably.

Bottom line, I guess, is that I don't currently have any real idea how
to make this any better than it already is.  Maybe that will become
more clear as this facility (hopefully) acquires some users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-27 Thread Robert Haas

On Thu, Jan 23, 2014 at 11:10 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 22, 2014 at 12:42 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 On 2014-01-22 12:40:34 -0500, Robert Haas wrote:
 On Wed, Jan 22, 2014 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andres Freund and...@2ndquadrant.com writes:
  Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid
  breaking external code using lwlocks?
 
  +1, in fact there's probably no reason to touch most *internal* code using
  that type name either.

 I thought about this but figured it was too much of a misnomer to
 refer to a pointer as an ID.  But, if we're sure we want to go that
 route, I can go revise the patch along those lines.

 I personally don't care either way for internal code as long as external
 code continues to work. There's the argument of making the commit better
 readable by having less noise and less divergence in the branches and
 there's your argument of that being less clear.

 OK, well then, if no one objects violently, I'll stick my current
 approach of getting rid of all core mentions of LWLockId in favor of
 LWLock *, but also add typedef LWLock *LWLockId with a comment that
 this is to minimize breakage of third-party code.

Hearing no objections, violent or otherwise, I've done that, made the
other adjustments suggested by Andres and KaiGai, and committed this.
Let's see what the buildfarm thinks...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-23 Thread Kohei KaiGai

Isn't it necessary to have an interface to initialize LWLock structure being
allocated on a dynamic shared memory segment?
Even though LWLock structure is exposed at lwlock.h, we have no common
way to initialize it.

How about to have a following function?

void
InitLWLock(LWLock *lock)
{
SpinLockInit(lock-lock.mutex);
lock-lock.releaseOK = true;
lock-lock.exclusive = 0;
lock-lock.shared = 0;
lock-lock.head = NULL;
lock-lock.tail = NULL;
}

Thanks,

2014/1/22 KaiGai Kohei kai...@ak.jp.nec.com:
 (2014/01/22 1:37), Robert Haas wrote:

 On Mon, Jan 20, 2014 at 11:23 PM, KaiGai Kohei kai...@ak.jp.nec.com
 wrote:

 I briefly checked the patch. Most of lines are mechanical replacement
 from LWLockId to LWLock *, and compiler didn't claim anything with
 -Wall -Werror option.

 My concern is around LWLockTranche mechanism. Isn't it too complicated
 towards the purpose?
 For example, it may make sense to add char lockname[NAMEDATALEN]; at
 the tail of LWLock structure if LOCK_DEBUG is enabled. Probably, it
 also adds an argument of LWLockAssign() to gives the human readable
 name. Is the additional 64bytes (= NAMEDATALEN) per lock too large
 for recent hardware?


 Well, we'd need it when either LOCK_DEBUG was defined or when
 LWLOCK_STATS was defined or when --enable-dtrace was used, and while
 the first two are probably rarely enough used that that would be OK, I
 think the third case is probably fairly common, and I don't think we
 want to have such a potentially performance-critical difference
 between builds with and without dtrace.

 Also... yeah, it's a lot of memory.  If we add an additional 64 bytes
 to the structure, then we're looking at 96 bytes per lwlock instead of
 32, after padding out to a 32-byte boundary to avoid crossing cache
 lines.  We need 2 lwlocks per buffer, so that's an additional 128
 bytes per 8kB buffer.  For shared_buffers = 8GB, that's an extra 128MB
 for storing lwlocks.  I'm not willing to assume nobody cares about
 that.  And while I agree that this is a bit complex, I don't think
 it's really as bad as all that.  We've gotten by for a long time
 without people being able to put lwlocks in other parts of memory, and
 most extension authors have gotten by with LWLockAssign() just fine
 and can continue doing things that way; only users with particularly
 sophisticated needs should bother to worry about the tranche stuff.

 Hmm... 1/64 of main memory (if large buffer system) might not be
 an ignorable memory consumption.


 One idea I just had is to improve the dsm_toc module so that it can
 optionally set up a tranche of lwlocks for you, and provide some
 analogues of RequestAddinLWLocks and LWLockAssign for that case.  That
 would probably make this quite a bit simpler to use, at least for
 people using it with dynamic shared memory.  But I think that's a
 separate patch.

 I agree with this idea. It seems to me quite natural to keep properties
 of objects held on shared memory (LWLock) on shared memory.
 Also, a LWLock once assigned shall not be never released. So, I think
 dsm_toc can provide a well suitable storage for them.


 Thanks,
 --
 OSS Promotion Center / The PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-23 Thread Andres Freund

On 2014-01-23 23:03:40 +0900, Kohei KaiGai wrote:
 Isn't it necessary to have an interface to initialize LWLock structure being
 allocated on a dynamic shared memory segment?
 Even though LWLock structure is exposed at lwlock.h, we have no common
 way to initialize it.

There's LWLockInitialize() or similar in the patch afair.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-23 Thread Kohei KaiGai

2014/1/23 Andres Freund and...@2ndquadrant.com:
 On 2014-01-23 23:03:40 +0900, Kohei KaiGai wrote:
 Isn't it necessary to have an interface to initialize LWLock structure being
 allocated on a dynamic shared memory segment?
 Even though LWLock structure is exposed at lwlock.h, we have no common
 way to initialize it.

 There's LWLockInitialize() or similar in the patch afair.

Ahh, I oversight the code around tranche-id. Sorry for this noise.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-23 Thread Robert Haas

On Wed, Jan 22, 2014 at 12:42 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-01-22 12:40:34 -0500, Robert Haas wrote:
 On Wed, Jan 22, 2014 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andres Freund and...@2ndquadrant.com writes:
  Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid
  breaking external code using lwlocks?
 
  +1, in fact there's probably no reason to touch most *internal* code using
  that type name either.

 I thought about this but figured it was too much of a misnomer to
 refer to a pointer as an ID.  But, if we're sure we want to go that
 route, I can go revise the patch along those lines.

 I personally don't care either way for internal code as long as external
 code continues to work. There's the argument of making the commit better
 readable by having less noise and less divergence in the branches and
 there's your argument of that being less clear.

OK, well then, if no one objects violently, I'll stick my current
approach of getting rid of all core mentions of LWLockId in favor of
LWLock *, but also add typedef LWLock *LWLockId with a comment that
this is to minimize breakage of third-party code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-01-22 Thread Robert Haas

On Tue, Jan 21, 2014 at 2:58 PM, Noah Misch n...@leadboat.com wrote:
 What do people prefer?

 I recommend performing cleanup on the control segment named in PGShmemHeader
 just before shmdt() in PGSharedMemoryCreate().  No new ERROR or WARNING sites
 are necessary.  Have dsm_postmaster_startup() continue to perform a cleanup on
 the control segment named in the state file.

I think I'm on board with the first two sentences of this, but after
Fujii Masao's email yesterday, I can't help thinking that what you
propose the third sentence is a bad idea.  He cloned a master to
create a standby server on the same machine, and the standby startup
ate the master's dynamic shared memory.  We could teach pg_basebackup
not to copy the state file, but that wouldn't help people who take
base backups using the file system copy method, which is a lot of
people.

 5. Give up on this approach.  We could keep what we have now, or make
 the DSM control segment land at a well-known address as we do for the
 main segment.

 How would having the DSM control segment at a well-known address affect the
 problem at hand?  Did you mean a well-known dsm_handle?

Yeah.  So the idea is that we'd always use a dsm_handle of 100 +
(100 * port) or something like that, and then search forward until we
find a dsm_handle that works.  This is basically the same algorithm
we're using today for the main shared memory segment, but with a large
additive constant so that they don't collide with each other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-01-22 Thread Noah Misch

On Wed, Jan 22, 2014 at 09:32:09AM -0500, Robert Haas wrote:
 On Tue, Jan 21, 2014 at 2:58 PM, Noah Misch n...@leadboat.com wrote:
  What do people prefer?
 
  I recommend performing cleanup on the control segment named in PGShmemHeader
  just before shmdt() in PGSharedMemoryCreate().  No new ERROR or WARNING 
  sites
  are necessary.  Have dsm_postmaster_startup() continue to perform a cleanup 
  on
  the control segment named in the state file.
 
 I think I'm on board with the first two sentences of this, but after
 Fujii Masao's email yesterday, I can't help thinking that what you
 propose the third sentence is a bad idea.  He cloned a master to
 create a standby server on the same machine, and the standby startup
 ate the master's dynamic shared memory.  We could teach pg_basebackup
 not to copy the state file, but that wouldn't help people who take
 base backups using the file system copy method, which is a lot of
 people.

I agree we should not rely on folks learning to omit the state file from base
backups.  Abandoning the state file is one way to resolve that, and the
reasons I outlined for preferring to keep it were not overriding concerns.  We
could instead store a postmaster PID in dsm_control_header and only clean if
that PID is dead.  We could make DSM startup aware of whether we're using a
backup label, but that would be awkward thanks to StartupXLOG() happening a
good bit later.  Yeah, abandoning the state file is looking attractive.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-22 Thread Andres Freund

On 2014-01-10 13:11:32 -0500, Robert Haas wrote:
 OK, I've implemented this: here's what I believe to be a complete
 patch, based on the previous lwlock-pointers.patch but now handling
 LOCK_DEBUG and TRACE_LWLOCKS and dtrace and a bunch of other loose
 ends.  I think this should be adequate for allowing lwlocks to be
 stored elsewhere in the main shared memory segment as well as in
 dynamic shared memory segments.
 
 Thoughts?

Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid
breaking external code using lwlocks? Requiring code that has worked for
several versions to sprinkle version specific ifdefs seems unneccessary
here.

I see the comments still claim
+ * LWLock is between 16 and 32 bytes on all known platforms, so these two
+ * cases are sufficient.
+ */
+#define LWLOCK_PADDED_SIZE (sizeof(LWLock) = 16 ? 16 : 32)

I don't think that's true anymore with the addition of the tranche
id. The new minimum size seems to be 20 on 32bit platforms. That could
be fixed by making the tranche id a uint8, but I don't think that's
necessary, so just removing the support for  32 seems sufficient.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid
 breaking external code using lwlocks?

+1, in fact there's probably no reason to touch most *internal* code using
that type name either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-22 Thread Robert Haas

On Wed, Jan 22, 2014 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid
 breaking external code using lwlocks?

 +1, in fact there's probably no reason to touch most *internal* code using
 that type name either.

I thought about this but figured it was too much of a misnomer to
refer to a pointer as an ID.  But, if we're sure we want to go that
route, I can go revise the patch along those lines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-22 Thread Andres Freund

On 2014-01-22 12:40:34 -0500, Robert Haas wrote:
 On Wed, Jan 22, 2014 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andres Freund and...@2ndquadrant.com writes:
  Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid
  breaking external code using lwlocks?
 
  +1, in fact there's probably no reason to touch most *internal* code using
  that type name either.
 
 I thought about this but figured it was too much of a misnomer to
 refer to a pointer as an ID.  But, if we're sure we want to go that
 route, I can go revise the patch along those lines.

I personally don't care either way for internal code as long as external
code continues to work. There's the argument of making the commit better
readable by having less noise and less divergence in the branches and
there's your argument of that being less clear.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-21 Thread Robert Haas

On Mon, Jan 20, 2014 at 11:23 PM, KaiGai Kohei kai...@ak.jp.nec.com wrote:
 I briefly checked the patch. Most of lines are mechanical replacement
 from LWLockId to LWLock *, and compiler didn't claim anything with
 -Wall -Werror option.

 My concern is around LWLockTranche mechanism. Isn't it too complicated
 towards the purpose?
 For example, it may make sense to add char lockname[NAMEDATALEN]; at
 the tail of LWLock structure if LOCK_DEBUG is enabled. Probably, it
 also adds an argument of LWLockAssign() to gives the human readable
 name. Is the additional 64bytes (= NAMEDATALEN) per lock too large
 for recent hardware?

Well, we'd need it when either LOCK_DEBUG was defined or when
LWLOCK_STATS was defined or when --enable-dtrace was used, and while
the first two are probably rarely enough used that that would be OK, I
think the third case is probably fairly common, and I don't think we
want to have such a potentially performance-critical difference
between builds with and without dtrace.

Also... yeah, it's a lot of memory.  If we add an additional 64 bytes
to the structure, then we're looking at 96 bytes per lwlock instead of
32, after padding out to a 32-byte boundary to avoid crossing cache
lines.  We need 2 lwlocks per buffer, so that's an additional 128
bytes per 8kB buffer.  For shared_buffers = 8GB, that's an extra 128MB
for storing lwlocks.  I'm not willing to assume nobody cares about
that.  And while I agree that this is a bit complex, I don't think
it's really as bad as all that.  We've gotten by for a long time
without people being able to put lwlocks in other parts of memory, and
most extension authors have gotten by with LWLockAssign() just fine
and can continue doing things that way; only users with particularly
sophisticated needs should bother to worry about the tranche stuff.

One idea I just had is to improve the dsm_toc module so that it can
optionally set up a tranche of lwlocks for you, and provide some
analogues of RequestAddinLWLocks and LWLockAssign for that case.  That
would probably make this quite a bit simpler to use, at least for
people using it with dynamic shared memory.  But I think that's a
separate patch.

 Below is minor comments.

 It seems to me this newer form increased direct reference to
 the MainLWLockArray. Is it really graceful?
 My recommendation is to define a macro that wraps the reference to
 MainLWLockArray.

 For example:
   #define PartitionLock(i) \
 (MainLWLockArray[LOCK_MANAGER_LWLOCK_OFFSET + (i)].lock)

Yeah, that's probably a good idea.

 This chunk is enclosed by #ifdef EXEC_BACKEND, the legacy LWLockArray
 was not removed.

Oops, will fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-01-21 Thread Noah Misch

On Wed, Dec 18, 2013 at 12:21:08PM -0500, Robert Haas wrote:
 On Tue, Dec 10, 2013 at 6:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  The larger point is that such a shutdown process has never in the history
  of Postgres been successful at removing shared-memory (or semaphore)
  resources.  I do not feel a need to put a higher recoverability standard
  onto the DSM code than what we've had for fifteen years with SysV shmem.
 
  But, by the same token, it shouldn't be any *less* recoverable.  In this
  context that means that starting a new postmaster should recover the
  resources, at least 90% of the time (100% if we still have the old
  postmaster lockfile).  I think the idea of keeping enough info in the
  SysV segment to permit recovery of DSM resources is a good one.  Then,
  any case where the existing logic is able to free the old SysV segment
  is guaranteed to also work for DSM.
 
 So I'm taking a look at this.  There doesn't appear to be anything
 intrinsically intractable here, but it seems important to time the
 order of operations so as to minimize the chances that things fail in
 awkward places.  The point where we remove the old shared memory
 segment from the previous postmaster invocation is here:
 
 /*
  * The segment appears to be from a dead Postgres process, or from a
  * previous cycle of life in this same process.  Zap it, if possible.
  * This probably shouldn't fail, but if it does, assume the segment
  * belongs to someone else after all, and continue quietly.
  */
 shmdt(memAddress);
 if (shmctl(shmid, IPC_RMID, NULL)  0)
 continue;
 
 My first thought was to remember the control segment ID from the
 header just before the shmdt() there, and then later when the DSM
 module is starting, do cleanup.  But that doesn't seem like a good
 idea, because then if startup fails after we remove the old shared
 memory segment and before we get to the DSM initialization code, we'll
 have lost the information on what control segment needs to be cleaned
 up.  A subsequent startup attempt won't see the old shm again, because
 it's already gone.  I'm fairly sure that this would be a net reduction
 in reliability vs. the status quo.
 
 So now what I'm thinking is that we ought to actually perform the DSM
 cleanup before detaching the old segment and trying to remove it.
 That shouldn't fail, but if it does, or if we get killed before
 completing it, the next run will hopefully find the same old shm and
 finish the cleanup.  But that kind of flies in the face of the comment
 above: if we perform DSM cleanup and then discover that the segment
 wasn't ours after all, that means we just stomped all over somebody
 else's stuff.  That's not too good.  But trying to remove the segment
 first and then perform the cleanup creates a window where, if we get
 killed, the next restart will have lost information about how to
 finish cleaning up.  So it seems that some kind of logic tweak is
 needed here, but I'm not sure exactly what.  As I see it, the options
 are:
 
 1. Make failure to remove the shared memory segment we thought was
 ours an error.  This will definitely show up any problems, but only
 after we've burned down some other processes's dynamic shared memory
 segments.  The most likely outcome is creating failure-to-start
 problems that don't exist today.
 
 2. Make it a warning.  We'll still burn down somebody else's DSMs, but
 at least we'll still start ourselves.  Sadly, WARNING: You have just
 done a bad thing.  It's too late to fix it.  Sorry! is not very
 appealing.

It has long been the responsibility of PGSharedMemoryCreate() to determine
that a segment is unimportant before calling IPC_RMID.  The success or failure
of IPC_RMID is an unreliable guide to the correctness of that determination.
IPC_RMID will succeed on an important segment owned by the same UID, and it
will fail if some other process removed the segment after our shmat().  Such a
failure does not impugn our having requested DSM cleanup on the basis of the
PGShmemHeader we did read, so an apologetic WARNING would be wrong.

 3. Remove the old shared memory segment first, then perform the
 cleanup immediately afterwards.  If we get killed before completing
 the cleanup, we'll leak the un-cleaned-up stuff.  Decide that's OK and
 move on.

The period in which process exit would leak segments is narrow enough that
this seems fine.  I see no net advantage over performing the cleanup before
shmdt(), though.

 4. Adopt some sort of belt-and-suspenders approach, keeping the state
 file we have now and backstopping it with this mechanism, so that we
 really only need this to work when $PGDATA has been blown away and
 recreated.  This seems pretty inelegant, and I'm not sure who it'll
 benefit other than those (few?) people who kill -9 the postmaster (or
 it segfaults or otherwise dies without running the code to remove
 shared memory segments) and then remove

Re: [HACKERS] dynamic shared memory and locks

2014-01-21 Thread KaiGai Kohei


(2014/01/22 1:37), Robert Haas wrote:

On Mon, Jan 20, 2014 at 11:23 PM, KaiGai Kohei kai...@ak.jp.nec.com wrote:

I briefly checked the patch. Most of lines are mechanical replacement
from LWLockId to LWLock *, and compiler didn't claim anything with
-Wall -Werror option.

My concern is around LWLockTranche mechanism. Isn't it too complicated
towards the purpose?
For example, it may make sense to add char lockname[NAMEDATALEN]; at
the tail of LWLock structure if LOCK_DEBUG is enabled. Probably, it
also adds an argument of LWLockAssign() to gives the human readable
name. Is the additional 64bytes (= NAMEDATALEN) per lock too large
for recent hardware?


Well, we'd need it when either LOCK_DEBUG was defined or when
LWLOCK_STATS was defined or when --enable-dtrace was used, and while
the first two are probably rarely enough used that that would be OK, I
think the third case is probably fairly common, and I don't think we
want to have such a potentially performance-critical difference
between builds with and without dtrace.

Also... yeah, it's a lot of memory.  If we add an additional 64 bytes
to the structure, then we're looking at 96 bytes per lwlock instead of
32, after padding out to a 32-byte boundary to avoid crossing cache
lines.  We need 2 lwlocks per buffer, so that's an additional 128
bytes per 8kB buffer.  For shared_buffers = 8GB, that's an extra 128MB
for storing lwlocks.  I'm not willing to assume nobody cares about
that.  And while I agree that this is a bit complex, I don't think
it's really as bad as all that.  We've gotten by for a long time
without people being able to put lwlocks in other parts of memory, and
most extension authors have gotten by with LWLockAssign() just fine
and can continue doing things that way; only users with particularly
sophisticated needs should bother to worry about the tranche stuff.


Hmm... 1/64 of main memory (if large buffer system) might not be
an ignorable memory consumption.


One idea I just had is to improve the dsm_toc module so that it can
optionally set up a tranche of lwlocks for you, and provide some
analogues of RequestAddinLWLocks and LWLockAssign for that case.  That
would probably make this quite a bit simpler to use, at least for
people using it with dynamic shared memory.  But I think that's a
separate patch.


I agree with this idea. It seems to me quite natural to keep properties
of objects held on shared memory (LWLock) on shared memory.
Also, a LWLock once assigned shall not be never released. So, I think
dsm_toc can provide a well suitable storage for them.

Thanks,
--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-20 Thread KaiGai Kohei


(2014/01/11 3:11), Robert Haas wrote:

On Mon, Jan 6, 2014 at 5:50 PM, Robert Haas robertmh...@gmail.com wrote:

This is only part of the solution, of course: a complete solution will
involve making the hash table key something other than the lock ID.
What I'm thinking we can do is making the lock ID consist of two
unsigned 32-bit integers.  One of these will be stored in the lwlock
itself, which if my calculations are correct won't increase the size
of LWLockPadded on any common platforms (a 64-bit integer would).
Let's call this the tranch id.  The other will be derived from the
LWLock address.  Let's call this the instance ID.  We'll keep a
table of tranch IDs, which will be assigned consecutively starting
with 0.  We'll keep an array of metadata for tranches, indexed by
tranch ID, and each entry will have three associated pieces of
information: an array base, a stride length, and a printable name.
When we need to identify an lwlock in the log or to dtrace, we'll
fetch the tranch ID from the lwlock itself and use that to index into
the tranch metadata array.  We'll then take the address of the lwlock,
subtract the array base address for the tranch, and divide by the
stride length; the result is the instance ID.  When reporting the
user, we can report either the tranch ID directly or the associated
name for that tranch; in either case, we'll also report the instance
ID.

So initially we'll probably just have tranch 0: the main LWLock array.
  If we move buffer content and I/O locks to the buffer headers, we'll
define tranch 1 and tranch 2 with the same base address: the start of
the buffer descriptor array, and the same stride length, the size of a
buffer descriptor.  One will have the associated name buffer content
lock and the other buffer I/O lock.  If we want, we can define
split the main LWLock array into several tranches so that we can more
easily identify lock manager locks, predicate lock manager locks, and
buffer mapping locks.


OK, I've implemented this: here's what I believe to be a complete
patch, based on the previous lwlock-pointers.patch but now handling
LOCK_DEBUG and TRACE_LWLOCKS and dtrace and a bunch of other loose
ends.  I think this should be adequate for allowing lwlocks to be
stored elsewhere in the main shared memory segment as well as in
dynamic shared memory segments.

Thoughts?


I briefly checked the patch. Most of lines are mechanical replacement
from LWLockId to LWLock *, and compiler didn't claim anything with
-Wall -Werror option.

My concern is around LWLockTranche mechanism. Isn't it too complicated
towards the purpose?
For example, it may make sense to add char lockname[NAMEDATALEN]; at
the tail of LWLock structure if LOCK_DEBUG is enabled. Probably, it
also adds an argument of LWLockAssign() to gives the human readable
name. Is the additional 64bytes (= NAMEDATALEN) per lock too large
for recent hardware?

Below is minor comments.

It seems to me this newer form increased direct reference to
the MainLWLockArray. Is it really graceful?
My recommendation is to define a macro that wraps the reference to
MainLWLockArray.

For example:
  #define PartitionLock(i) \
(MainLWLockArray[LOCK_MANAGER_LWLOCK_OFFSET + (i)].lock)

instead of the following manner.

@@ -3400,7 +3406,12 @@ GetLockStatusData(void)
 * Must grab LWLocks in partition-number order to avoid LWLock deadlock.
 */
for (i = 0; i  NUM_LOCK_PARTITIONS; i++)
-   LWLockAcquire(FirstLockMgrLock + i, LW_SHARED);
+   {
+   LWLock *partitionLock;
+
+   partitionLock = MainLWLockArray[LOCK_MANAGER_LWLOCK_OFFSET + i].lock;
+   LWLockAcquire(partitionLock, LW_SHARED);
+   }


This chunk is enclosed by #ifdef EXEC_BACKEND, the legacy LWLockArray
was not removed.

@@ -5633,6 +5630,7 @@ save_backend_variables(BackendParameters *param, Port 
*port,
param-SpinlockSemaArray = SpinlockSemaArray;
 #endif
param-LWLockArray = LWLockArray;
+   param-MainLWLockArray = MainLWLockArray;
param-ProcStructLock = ProcStructLock;
param-ProcGlobal = ProcGlobal;
param-AuxiliaryProcs = AuxiliaryProcs;

Thanks,
--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-07 Thread Andres Freund

On 2014-01-06 21:35:22 -0300, Alvaro Herrera wrote:
 Jim Nasby escribió:
  On 1/6/14, 2:59 PM, Robert Haas wrote:
  On Mon, Jan 6, 2014 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  The point I'm making is that no such code should get past review,
  whether it's got an obvious performance problem or not.
  
  Sure, I agree, but we all make mistakes.  It's just a judgement call
  as to how likely you think it is that someone might make this
  particular mistake, a topic upon which opinions may vary.

I don't think it's that unlikely as the previous implementation's rules
when viewed while squinting allowed nesting spinlocks. And it's a pretty
simple check.

 Maybe it makes sense to have such a check #ifdef'ed out on most builds
 to avoid extra overhead, but not having any check at all just because we
 trust the review process too much doesn't strike me as the best of
 ideas.

I don't think that check would have relevantly high performance impact
in comparison to the rest of --enable-cassert - it's a single process
local variable which is regularly accessed. It will just stay in
L1 or even registers.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-07 Thread Robert Haas

On Tue, Jan 7, 2014 at 6:54 AM, Andres Freund and...@2ndquadrant.com wrote:
 Maybe it makes sense to have such a check #ifdef'ed out on most builds
 to avoid extra overhead, but not having any check at all just because we
 trust the review process too much doesn't strike me as the best of
 ideas.

 I don't think that check would have relevantly high performance impact
 in comparison to the rest of --enable-cassert - it's a single process
 local variable which is regularly accessed. It will just stay in
 L1 or even registers.

I tried it out on a 16-core, 64-hwthread community box, PPC.  pgbench
-S, 5-minute rules at scale factor 300 with shared_buffers=8GB:

resultsr.cassert.32.300.300:tps = 11341.627815 (including connections
establishing)
resultsr.cassert.32.300.300:tps = 11339.407976 (including connections
establishing)
resultsr.cassert.32.300.300:tps = 11321.339971 (including connections
establishing)
resultsr.cassert-spin-multiplex.32.300.300:tps = 11492.927372
(including connections establishing)
resultsr.cassert-spin-multiplex.32.300.300:tps = 11471.810937
(including connections establishing)
resultsr.cassert-spin-multiplex.32.300.300:tps = 11516.471053
(including connections establishing)

By comparison:

resultsr.master.32.300.300:tps = 197939.111998 (including connections
establishing)
resultsr.master.32.300.300:tps = 198641.337720 (including connections
establishing)
resultsr.master.32.300.300:tps = 198675.404349 (including connections
establishing)

So yeah, the overhead is negligible, if not negative.  None of the
other changes in that patch were even compiled in this case, since all
that code is protected by --disable-spinlocks.  Maybe somebody can
find another workload where the overhead is visible, but here it's
not.  On the other hand, I did discover another bit of ugliness - the
macros actually have to be defined this way:

+#define SpinLockAcquire(lock) \
+   (AssertMacro(!any_spinlock_held), any_spinlock_held = true, \
+   S_LOCK(lock))
+#define SpinLockRelease(lock) \
+   do { Assert(any_spinlock_held); any_spinlock_held = false; \
+   S_UNLOCK(lock); } while (0)

SpinLockAcquire has to be a comma-separated expression rather than a
do {} while (0) block because S_LOCK returns a value (which is used
when compiling with LWLOCK_STATS); SpinLockRelease, however, has to be
done the other way because S_UNLOCK() is defined as a do {} while (0)
block already on PPC among other architectures.

Overall I don't really care enough about this to argue strenuously for
it.  I'm not nearly so confident as Tom that no errors of this type
will creep into future code, but on the other hand, keeping
--disable-spinlocks working reliably isn't significant enough for me
to want to spend any political capital on it.  It's probably got a dim
future regardless of what we do here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Heikki Linnakangas


On 01/05/2014 07:56 PM, Robert Haas wrote:

Right now, storing spinlocks in dynamic shared memory *almost* works,
but there are problems with --disable-spinlocks.  In that
configuration, we use semaphores to simulate spinlocks.  Every time
someone calls SpinLockInit(), it's going to allocate a new semaphore
which will never be returned to the operating system, so you're pretty
quickly going to run out.  There are a couple of things we could do
about this:


5. Allocate a fixed number of semaphores, and multiplex them to emulate 
any number of spinlocks. For example, allocate 100 semaphores, and use 
the address of the spinlock modulo 100 to calculate which semaphore to 
use to emulate that spinlock.


That assumes that you never hold more than one spinlock at a time, 
otherwise you can get deadlocks. I think that assumptions holds 
currently, because acquiring two spinlocks at a time would be bad on 
performance grounds anyway.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Andres Freund

On 2014-01-06 10:35:59 +0200, Heikki Linnakangas wrote:
 On 01/05/2014 07:56 PM, Robert Haas wrote:
 Right now, storing spinlocks in dynamic shared memory *almost* works,
 but there are problems with --disable-spinlocks.  In that
 configuration, we use semaphores to simulate spinlocks.  Every time
 someone calls SpinLockInit(), it's going to allocate a new semaphore
 which will never be returned to the operating system, so you're pretty
 quickly going to run out.  There are a couple of things we could do
 about this:
 
 5. Allocate a fixed number of semaphores, and multiplex them to emulate any
 number of spinlocks. For example, allocate 100 semaphores, and use the
 address of the spinlock modulo 100 to calculate which semaphore to use to
 emulate that spinlock.
 
 That assumes that you never hold more than one spinlock at a time, otherwise
 you can get deadlocks. I think that assumptions holds currently, because
 acquiring two spinlocks at a time would be bad on performance grounds
 anyway.

I think that's a pretty dangerous assumption - and one which would only
break without just about anybody noticing because nobody tests
--disable-spinlock builds regularly. We could improve on that by adding
cassert checks against nested spinlocks, but that'd be a far too high
price for little benefit imo.

I am not convinced by the performance argument - there's lots of
spinlock that are rarely if ever contended. If you lock two such locks
in a consistent nested fashion there won't be a performance problem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 Another idea is to include some identifying information in the lwlock.

That was my immediate reaction to this issue...

  For example, each lwlock could have a char *name in it, and we could
 print the name.  In theory, this could be a big step forward in terms
 of usability, because it'd spare us all needing to remember that 4 ==
 ProcArrayLock.  But it's awkward for buffer locks, of which there
 might be a great many, and we surely don't want to allocate a
 dynamically-generated string in shared memory for each one.  You could
 do a bit better by making the identifying information a string plus an
 integer, because then all the buffer locks could set the string to a
 static constant like buffer content lock and the integer to the
 buffer number, and similarly for lock manager partition locks and so
 on.  This is appealing, but would increase the size of LWLockPadded
 from 16 bytes to 32 on 64-bit platforms where slock_t is four bytes or
 less, which I'm not that excited about even though it would reduce
 cache line contention in some cases.

I'm not thrilled with this either but at the same time, it may well be
worth it.

 Yet a third idea is to try to reverse-engineer a name for a given
 lwlock from the pointer address.  If it's an offset into the main
 array, this is easy enough to do, and even if we ended up with several
 arrays (like one for bufmgr locks) it wouldn't be too hard to write
 code to figure out which array contains it and emit the appropriate
 string.  The only real problem that I see with this is that it might
 cause a performance hit.  A performance hit when running with
 trace_lwlocks or LWLOCK_STATS is not really a problem, but people
 won't like if --enable-dtrace slow things up.

This seems like an interesting idea- but this and my other thought
(having a defined array of strings) seem like they might fall over in
the face of DSMs.

 Preferences, other ideas?

My preference would generally be use more memory rather than CPU time;
so I'd vote for adding identifying info rather than having the code hunt
through arrays to try and find it.

 None of these ideas are a complete solution for LWLOCK_STATS.  In the
 other three cases noted above, we only need an identifier for the lock
 instantaneously, so that we can pass it off to the logger or dtrace
 or whatever.  But LWLOCK_STATS wants to hold on to data about the
 locks that were visited until the end of the session, and it does that
 using an array that is *indexed* by lwlockid.  I guess we could
 replace that with a hash table.  Ugh.  Any suggestions?

Yeah, that's not fun.  No good suggestions here offhand.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-06 10:35:59 +0200, Heikki Linnakangas wrote:
 That assumes that you never hold more than one spinlock at a time, otherwise
 you can get deadlocks. I think that assumptions holds currently, because
 acquiring two spinlocks at a time would be bad on performance grounds
 anyway.

 I think that's a pretty dangerous assumption

I think it's a fine assumption.  Code that could possibly do that should
never get within hailing distance of being committed, because you are only
supposed to have short straight-line bits of code under a spinlock.
Taking another spinlock breaks both the straight line code and the no
loops aspects of that coding rule.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 I guess the question boils down to: why are we keeping
 --disable-spinlocks around?  If we're expecting that people might
 really use it for serious work, then it needs to remain and it needs
 to work with dynamic shared memory.  If we're expecting that people
 might use it while porting PostgreSQL to a new platform but only as an
 aid to get things up and running, then it needs to remain, but it's
 probably OK for dynamic shared memory not to work when this option is
 in use.  If nobody uses it at all, then we can just forget the whole
 thing.

I think we can eliminate the first of those.  Semaphores for spinlocks
were a performance disaster fifteen years ago, and the situation has
surely only gotten worse since then.  I do, however, believe that
--disable-spinlocks has some use while porting to a new platform.
(And I don't believe the argument advanced elsewhere in this thread
that everybody uses gcc; much less that gcc's atomic intrinsics are
of uniformly high quality even on oddball architectures.)

Heikki's idea has some attraction for porting work whether or not
you believe that DSM needs to work during initial porting.  By
default, PG will try to create upwards of ten thousand spinlocks
just for buffer headers.  It's likely that that will fail unless
you whack around the kernel's SysV parameters.  Being able to
constrain the number of semaphores to something sane would probably
be useful.

Having said all that, I'm not personally going to take the time to
implement it, and I don't think the DSM patch should be required to
either.  Somebody who actually needs it can go make it work.

But I think Heikki's idea points the way forward, so I'm now against
having the DSM code reject --disable-spinlocks.  I think benign
neglect is the way for now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Andres Freund

On 2014-01-06 09:59:49 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-01-06 10:35:59 +0200, Heikki Linnakangas wrote:
  That assumes that you never hold more than one spinlock at a time, 
  otherwise
  you can get deadlocks. I think that assumptions holds currently, because
  acquiring two spinlocks at a time would be bad on performance grounds
  anyway.
 
  I think that's a pretty dangerous assumption
 
 I think it's a fine assumption.  Code that could possibly do that should
 never get within hailing distance of being committed, because you are only
 supposed to have short straight-line bits of code under a spinlock.
 Taking another spinlock breaks both the straight line code and the no
 loops aspects of that coding rule.

I think it's fair to say that no such code should be accepted - but I
personally don't trust the review process to prevent such cases and not
all spinlock usages are obvious (e.g. LockBufferHdr). And having rules
that only cause breakage in configurations that nobody ever uses doesn't
inspire much trust in that configuration.
If we go that way, we should add an Assert preventing recursive spinlock
acquiration...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Robert Haas

On Mon, Jan 6, 2014 at 9:59 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-06 10:35:59 +0200, Heikki Linnakangas wrote:
 That assumes that you never hold more than one spinlock at a time, otherwise
 you can get deadlocks. I think that assumptions holds currently, because
 acquiring two spinlocks at a time would be bad on performance grounds
 anyway.

 I think that's a pretty dangerous assumption

 I think it's a fine assumption.  Code that could possibly do that should
 never get within hailing distance of being committed, because you are only
 supposed to have short straight-line bits of code under a spinlock.
 Taking another spinlock breaks both the straight line code and the no
 loops aspects of that coding rule.

OK, so we could do this.  But should we?  If we're going to get rid of
--disable-spinlocks for other reasons, then there's not much point in
spending the time to make it work with dynamic shared memory segments
that contain locks.

In fact, even if we *aren't* going to get rid of it, it's not 100%
clear to me that there's much point in supporting that intersection: a
big point of the point of dsm is to enable parallelism, and the point
of parallelism is to be fast, and if you want things to be fast, you
should probably have working spinlocks.  Of course, there are some
other applications for dynamic shared memory as well, so this isn't a
watertight argument, and in a quick test on my laptop, the performance
of --disable-spinlocks wasn't horrible, so this argument isn't
watertight.

I guess the question boils down to: why are we keeping
--disable-spinlocks around?  If we're expecting that people might
really use it for serious work, then it needs to remain and it needs
to work with dynamic shared memory.  If we're expecting that people
might use it while porting PostgreSQL to a new platform but only as an
aid to get things up and running, then it needs to remain, but it's
probably OK for dynamic shared memory not to work when this option is
in use.  If nobody uses it at all, then we can just forget the whole
thing.

I guess my bottom line here is that if --disable-spinlocks is
important to somebody, then I'm willing to expend some effort on
keeping it working.  But if it's just inertia, then I'd rather not
spend a lot of time on it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Robert Haas

On Mon, Jan 6, 2014 at 11:22 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think we can eliminate the first of those.  Semaphores for spinlocks
 were a performance disaster fifteen years ago, and the situation has
 surely only gotten worse since then.  I do, however, believe that
 --disable-spinlocks has some use while porting to a new platform.
 (And I don't believe the argument advanced elsewhere in this thread
 that everybody uses gcc; much less that gcc's atomic intrinsics are
 of uniformly high quality even on oddball architectures.)

 Heikki's idea has some attraction for porting work whether or not
 you believe that DSM needs to work during initial porting.  By
 default, PG will try to create upwards of ten thousand spinlocks
 just for buffer headers.  It's likely that that will fail unless
 you whack around the kernel's SysV parameters.  Being able to
 constrain the number of semaphores to something sane would probably
 be useful.

 Having said all that, I'm not personally going to take the time to
 implement it, and I don't think the DSM patch should be required to
 either.  Somebody who actually needs it can go make it work.

Well, I took a look at this and it turns out not to be very hard, so
here's a patch.  Currently, we allocate 3 semaphore per shared buffer
and a bunch of others, but the 3 per shared buffer dominates, so you
end up with ~49k spinlocks for the default of 128MB shared_buffers.  I
chose to peg the number of semaphores at 1024, which is quite small
compared to the current allocation, but the number of spinlock
allocations that can be in progress at any given time is limited by
the number of running backends.  Even allowing for the birthday
paradox, that should be enough to run at least a few dozen backends
without suffering serious problems due to the multiplexing -
especially because in real workloads, contention is usually
concentrated around a small number of spinlocks that are unlikely to
all be mapped to the same underlying semaphore.

I'm happy enough with this way forward.  Objections?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 048a189..7eae2a1 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -471,6 +471,9 @@ typedef struct
 	slock_t*ShmemLock;
 	VariableCache ShmemVariableCache;
 	Backend*ShmemBackendArray;
+#ifndef HAVE_SPINLOCKS
+	PGSemaphore	SpinlockSemaArray;
+#endif
 	LWLock	   *LWLockArray;
 	slock_t*ProcStructLock;
 	PROC_HDR   *ProcGlobal;
@@ -5626,6 +5629,9 @@ save_backend_variables(BackendParameters *param, Port *port,
 	param-ShmemVariableCache = ShmemVariableCache;
 	param-ShmemBackendArray = ShmemBackendArray;
 
+#ifndef HAVE_SPINLOCKS
+	param-SpinlockSemaArray = SpinlockSemaArray;
+#endif
 	param-LWLockArray = LWLockArray;
 	param-ProcStructLock = ProcStructLock;
 	param-ProcGlobal = ProcGlobal;
@@ -5854,6 +5860,9 @@ restore_backend_variables(BackendParameters *param, Port *port)
 	ShmemVariableCache = param-ShmemVariableCache;
 	ShmemBackendArray = param-ShmemBackendArray;
 
+#ifndef HAVE_SPINLOCKS
+	SpinlockSemaArray = param-SpinlockSemaArray;
+#endif
 	LWLockArray = param-LWLockArray;
 	ProcStructLock = param-ProcStructLock;
 	ProcGlobal = param-ProcGlobal;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 040c7aa..ad663ec 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -105,6 +105,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		 * need to be so careful during the actual allocation phase.
 		 */
 		size = 10;
+		size = add_size(size, SpinlockSemaSize());
 		size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
  sizeof(ShmemIndexEnt)));
 		size = add_size(size, BufferShmemSize());
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index 18ba426..4efd0fb 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -116,9 +116,24 @@ InitShmemAllocation(void)
 	Assert(shmhdr != NULL);
 
 	/*
-	 * Initialize the spinlock used by ShmemAlloc.	We have to do the space
-	 * allocation the hard way, since obviously ShmemAlloc can't be called
-	 * yet.
+	 * If spinlocks are disabled, initialize emulation layer.  We have to do
+	 * the space allocation the hard way, since obviously ShmemAlloc can't be
+	 * called yet.
+	 */
+#ifndef HAVE_SPINLOCKS
+	{
+		PGSemaphore spinsemas;
+
+		spinsemas = (PGSemaphore) (((char *) shmhdr) + shmhdr-freeoffset);
+		shmhdr-freeoffset += MAXALIGN(SpinlockSemaSize());
+		SpinlockSemaInit(spinsemas);
+		Assert(shmhdr-freeoffset = shmhdr-totalsize);
+	}
+#endif
+
+	/*
+	 * Initialize the spinlock used by ShmemAlloc; we have to do this the hard
+	 * way, too, for the same reasons as above.
 	 */
 	ShmemLock = (slock_t *) (((char *) shmhdr) + shmhdr-freeoffset);
 	shmhdr-freeoffset +=

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-05 14:06:52 -0500, Tom Lane wrote:
 I seem to recall that there was some good reason for keeping all the
 LWLocks in an array, back when the facility was first designed.
 I'm too lazy to research the point right now, but you might want to
 go back and look at the archives around when lwlock.c was written.

 Your proposal is at
 http://www.postgresql.org/message-id/1054.1001520...@sss.pgh.pa.us -
 afaics there hasn't been much discussion about implementation details at
 that level.

Yeah, there wasn't :-(.  On reflection I believe that my reason for
building it like this was partially to reduce the number of pointer
variables needed, and partially to avoid exposing the contents of the
LWLock struct type to the rest of the backend.

Robert's idea of continuing to keep the fixed LWLocks in an array
would solve the first problem, but I don't think there's any way to
avoid exposing the struct type if we want to embed the things into
other structs, or even just rely on array indexing.

OTOH, the LWLock mechanism has been stable for long enough now that
we can probably suppose this struct is no more subject to churn than
any other widely-known one, so maybe that consideration is no longer
significant.

Another nice benefit of switching to separate allocations that we could
get rid of the horrid kluge that lwlock.h is the place that defines
NUM_BUFFER_PARTITIONS and some other constants that don't belong there.

So on the whole I'm in favor of this, as long as we can avoid any
notational churn at most call sites.

Should we get rid of RequestAddinLWLocks() and LWLockAssign(), in
favor of telling extensions to just allocate some space and do
LWLockInit() on it?

BTW, NumLWLocks() also becomes a big problem if lwlock creation
gets decentralized.  This is only used by SpinlockSemas() ...
so maybe we need to use Heikki's modulo idea to get rid of the
need for that to know the number of LWLocks, independently of DSM.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 Well, I took a look at this and it turns out not to be very hard, so
 here's a patch.  Currently, we allocate 3 semaphore per shared buffer
 and a bunch of others, but the 3 per shared buffer dominates, so you
 end up with ~49k spinlocks for the default of 128MB shared_buffers.  I
 chose to peg the number of semaphores at 1024, which is quite small
 compared to the current allocation, but the number of spinlock
 allocations that can be in progress at any given time is limited by
 the number of running backends.  Even allowing for the birthday
 paradox, that should be enough to run at least a few dozen backends
 without suffering serious problems due to the multiplexing -
 especially because in real workloads, contention is usually
 concentrated around a small number of spinlocks that are unlikely to
 all be mapped to the same underlying semaphore.

 I'm happy enough with this way forward.  Objections?

-1 for the any_spinlock_held business (useless overhead IMO, as it doesn't
have anything whatsoever to do with enforcing the actual coding rule).
And I'd suggest defining NUM_SPINLOCK_SEMAPHORES in pg_config_manual.h,
and maybe dropping SpinlockSemas() altogether in favor of just referencing
the constant.  Otherwise this seems reasonable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Robert Haas

On Mon, Jan 6, 2014 at 2:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Well, I took a look at this and it turns out not to be very hard, so
 here's a patch.  Currently, we allocate 3 semaphore per shared buffer
 and a bunch of others, but the 3 per shared buffer dominates, so you
 end up with ~49k spinlocks for the default of 128MB shared_buffers.  I
 chose to peg the number of semaphores at 1024, which is quite small
 compared to the current allocation, but the number of spinlock
 allocations that can be in progress at any given time is limited by
 the number of running backends.  Even allowing for the birthday
 paradox, that should be enough to run at least a few dozen backends
 without suffering serious problems due to the multiplexing -
 especially because in real workloads, contention is usually
 concentrated around a small number of spinlocks that are unlikely to
 all be mapped to the same underlying semaphore.

 I'm happy enough with this way forward.  Objections?

 -1 for the any_spinlock_held business (useless overhead IMO, as it doesn't
 have anything whatsoever to do with enforcing the actual coding rule).

Hmm.  I thought that was a pretty well-aimed bullet myself; why do you
think that it isn't?  I don't particularly mind ripping it out, but it
seemed like a good automated test to me.

 And I'd suggest defining NUM_SPINLOCK_SEMAPHORES in pg_config_manual.h,
 and maybe dropping SpinlockSemas() altogether in favor of just referencing
 the constant.  Otherwise this seems reasonable.

As far as pg_config_manual.h is concerned, is this the sort of thing
you have in mind?

#ifndef HAVE_SPINLOCKS
#define NUM_SPINLOCK_SEMAPHORES 1024
#endif

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Robert Haas

On Mon, Jan 6, 2014 at 1:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 OTOH, the LWLock mechanism has been stable for long enough now that
 we can probably suppose this struct is no more subject to churn than
 any other widely-known one, so maybe that consideration is no longer
 significant.

On the whole, I'd say it's been more stable than most.  But even if we
do decide to change it, I'm not sure that really matters very much.
We whack widely-used data structures around fairly regularly, and I
haven't observed that to cause many problems.  Just as one concrete
example, PGPROC changes pretty regularly, and I'm not aware of much
code that's been broken by that.

 Should we get rid of RequestAddinLWLocks() and LWLockAssign(), in
 favor of telling extensions to just allocate some space and do
 LWLockInit() on it?

I don't really see any reason to deprecate that mechanism - it'll just
make it harder for people who want to write code that works on
multiple PostgreSQL releases to do so easily, and it's my belief that
there's a decent amount of third-party code that uses those APIs.
EnterpriseDB has some, for example.  Broadly, I don't see any problem
with presuming that most code that uses lwlocks will be happy to keep
those lwlocks in the main array while allowing them to be stored
elsewhere in those cases where that's not convenient.  Perhaps in the
long run that won't be true, but then again perhaps it will.  Either
way, I don't see a real reason to rush into a change in policy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 6, 2014 at 2:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 -1 for the any_spinlock_held business (useless overhead IMO, as it doesn't
 have anything whatsoever to do with enforcing the actual coding rule).

 Hmm.  I thought that was a pretty well-aimed bullet myself; why do you
 think that it isn't?  I don't particularly mind ripping it out, but it
 seemed like a good automated test to me.

The coding rule is only short straight-line code while holding a
spinlock.  That could be violated in any number of nasty ways without
trying to take another spinlock.  And since the whole point of the rule is
that spinlock-holding code segments should be quick, adding more overhead
to them doesn't seem very nice, even if it is only done in Assert builds.

I agree it'd be nicer if we had some better way than mere manual
inspection to enforce proper use of spinlocks; but this change doesn't
seem to me to move the ball downfield by any meaningful distance.

 And I'd suggest defining NUM_SPINLOCK_SEMAPHORES in pg_config_manual.h,
 and maybe dropping SpinlockSemas() altogether in favor of just referencing
 the constant.  Otherwise this seems reasonable.

 As far as pg_config_manual.h is concerned, is this the sort of thing
 you have in mind?

 #ifndef HAVE_SPINLOCKS
 #define NUM_SPINLOCK_SEMAPHORES 1024
 #endif

I think we can just define it unconditionally, don't you?  It shouldn't
get referenced in HAVE_SPINLOCK builds.  Or is the point that you want
to enforce that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 6, 2014 at 1:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 OTOH, the LWLock mechanism has been stable for long enough now that
 we can probably suppose this struct is no more subject to churn than
 any other widely-known one, so maybe that consideration is no longer
 significant.

 On the whole, I'd say it's been more stable than most.  But even if we
 do decide to change it, I'm not sure that really matters very much.

Actually, the real value of a module-local struct definition is that you
can be pretty darn sure that nothing except the code in that file is
manipulating the struct contents.  I would've preferred that we expose
only an abstract struct definition, but don't quite see how to do that
if we're going to embed the things in buffer headers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Robert Haas

On Mon, Jan 6, 2014 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 6, 2014 at 2:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 -1 for the any_spinlock_held business (useless overhead IMO, as it doesn't
 have anything whatsoever to do with enforcing the actual coding rule).

 Hmm.  I thought that was a pretty well-aimed bullet myself; why do you
 think that it isn't?  I don't particularly mind ripping it out, but it
 seemed like a good automated test to me.

 The coding rule is only short straight-line code while holding a
 spinlock.  That could be violated in any number of nasty ways without
 trying to take another spinlock.  And since the whole point of the rule is
 that spinlock-holding code segments should be quick, adding more overhead
 to them doesn't seem very nice, even if it is only done in Assert builds.

 I agree it'd be nicer if we had some better way than mere manual
 inspection to enforce proper use of spinlocks; but this change doesn't
 seem to me to move the ball downfield by any meaningful distance.

Well, my thought was mainly that, while it may be a bad idea to take
another spinlock while holding a spinlock under any circumstances,
somebody might do it and it might appear to work just fine.  The most
likely sequences seems to me to be something like SpinLockAcquire(...)
followed by LWLockConditionalAcquire(), thinking that things are OK
because the lock acquisition is conditional - but in fact the
conditional acquire still takes the spinlock unconditionally.  Even
so, without --disable-spinlocks, this will probably work just fine.
Most likely there won't even be a performance problem, because a
lwlock has to be *very* hot for the spinlock acquisitions to become a
problem.  But with --disable-spinlocks, it will unpredictably
self-deadlock.  And since few developers are likely to test with
--disable-spinlocks, the problem most likely won't be noticed.  When
someone does then try to use --disable-spinlocks to port to a new
problem and starts hitting the deadlocks, they may not know enough to
attribute them to the real cause.  All in all it doesn't seem like
unduly expensive insurance, but I can remove it if the consensus is
against it.

 And I'd suggest defining NUM_SPINLOCK_SEMAPHORES in pg_config_manual.h,
 and maybe dropping SpinlockSemas() altogether in favor of just referencing
 the constant.  Otherwise this seems reasonable.

 As far as pg_config_manual.h is concerned, is this the sort of thing
 you have in mind?

 #ifndef HAVE_SPINLOCKS
 #define NUM_SPINLOCK_SEMAPHORES 1024
 #endif

 I think we can just define it unconditionally, don't you?  It shouldn't
 get referenced in HAVE_SPINLOCK builds.  Or is the point that you want
 to enforce that?

I don't think it really matters much; seems like a style question to
me.  That's why I asked.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Robert Haas

On Mon, Jan 6, 2014 at 3:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 6, 2014 at 1:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 OTOH, the LWLock mechanism has been stable for long enough now that
 we can probably suppose this struct is no more subject to churn than
 any other widely-known one, so maybe that consideration is no longer
 significant.

 On the whole, I'd say it's been more stable than most.  But even if we
 do decide to change it, I'm not sure that really matters very much.

 Actually, the real value of a module-local struct definition is that you
 can be pretty darn sure that nothing except the code in that file is
 manipulating the struct contents.  I would've preferred that we expose
 only an abstract struct definition, but don't quite see how to do that
 if we're going to embed the things in buffer headers.

Agreed.  I think it's good to keep struct definitions private as much
as possible, and I do.  But I don't think this is going to cause a big
problem either; lwlocks have been around for a long time and the
conventions for using them are pretty well understood, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 6, 2014 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I agree it'd be nicer if we had some better way than mere manual
 inspection to enforce proper use of spinlocks; but this change doesn't
 seem to me to move the ball downfield by any meaningful distance.

 Well, my thought was mainly that, while it may be a bad idea to take
 another spinlock while holding a spinlock under any circumstances,
 somebody might do it and it might appear to work just fine.  The most
 likely sequences seems to me to be something like SpinLockAcquire(...)
 followed by LWLockConditionalAcquire(), thinking that things are OK
 because the lock acquisition is conditional - but in fact the
 conditional acquire still takes the spinlock unconditionally.

The point I'm making is that no such code should get past review,
whether it's got an obvious performance problem or not.

There's a huge amount of stuff that in principle we could add overhead
to Assert-enabled builds for, but we prefer not to --- an example not
too far afield from this issue is that there's no mechanism for detecting
deadlocks among multiple LWLock holders.  If we go too far down the path
of adding useless checks to Assert builds, people will stop using such
builds for routine development, which would surely be a large net negative
reliability-wise.

I'd be okay with adding overhead to SpinLockAcquire/Release if it had some
meaningful probability of catching real bugs, but I don't see that that
claim can be made here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Robert Haas

On Mon, Jan 6, 2014 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 6, 2014 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I agree it'd be nicer if we had some better way than mere manual
 inspection to enforce proper use of spinlocks; but this change doesn't
 seem to me to move the ball downfield by any meaningful distance.

 Well, my thought was mainly that, while it may be a bad idea to take
 another spinlock while holding a spinlock under any circumstances,
 somebody might do it and it might appear to work just fine.  The most
 likely sequences seems to me to be something like SpinLockAcquire(...)
 followed by LWLockConditionalAcquire(), thinking that things are OK
 because the lock acquisition is conditional - but in fact the
 conditional acquire still takes the spinlock unconditionally.

 The point I'm making is that no such code should get past review,
 whether it's got an obvious performance problem or not.

Sure, I agree, but we all make mistakes.  It's just a judgement call
as to how likely you think it is that someone might make this
particular mistake, a topic upon which opinions may vary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Robert Haas

On Mon, Jan 6, 2014 at 9:48 AM, Stephen Frost sfr...@snowman.net wrote:
 None of these ideas are a complete solution for LWLOCK_STATS.  In the
 other three cases noted above, we only need an identifier for the lock
 instantaneously, so that we can pass it off to the logger or dtrace
 or whatever.  But LWLOCK_STATS wants to hold on to data about the
 locks that were visited until the end of the session, and it does that
 using an array that is *indexed* by lwlockid.  I guess we could
 replace that with a hash table.  Ugh.  Any suggestions?

 Yeah, that's not fun.  No good suggestions here offhand.

Replacing it with a hashtable turns out not to be too bad, either in
terms of code complexity or performance, so I think that's the way to
go.  I did some test runs with pgbench -S, scale factor 300, 32
clients, shared_buffers=8GB, five minute runs and got these results:

resultsr.lwlock-stats.32.300.300:tps = 195493.037962 (including
connections establishing)
resultsr.lwlock-stats.32.300.300:tps = 189985.964658 (including
connections establishing)
resultsr.lwlock-stats.32.300.300:tps = 197641.293892 (including
connections establishing)
resultsr.lwlock-stats-htab.32.300.300:tps = 193286.066063 (including
connections establishing)
resultsr.lwlock-stats-htab.32.300.300:tps = 191832.100877 (including
connections establishing)
resultsr.lwlock-stats-htab.32.300.300:tps = 191899.834211 (including
connections establishing)
resultsr.master.32.300.300:tps = 197939.111998 (including connections
establishing)
resultsr.master.32.300.300:tps = 198641.337720 (including connections
establishing)
resultsr.master.32.300.300:tps = 198675.404349 (including connections
establishing)

master is the master branch, commit
10a82cda67731941c18256e009edad4a784a2994.  lwlock-stats is the same,
but with LWLOCK_STATS defined.  lwlock-stats-htab is the same, with
the attached patch and LWLOCK_STATS defined.  The runs were
interleaved, but the results are shown here grouped by test run.  If
we assume that the 189k result is an outlier, then there's probably
some regression associated with the lwlock-stats-htab patch, but not a
lot.  Considering that this code isn't even compiled unless you have
LWLOCK_STATS defined, I think that's OK.

This is only part of the solution, of course: a complete solution will
involve making the hash table key something other than the lock ID.
What I'm thinking we can do is making the lock ID consist of two
unsigned 32-bit integers.  One of these will be stored in the lwlock
itself, which if my calculations are correct won't increase the size
of LWLockPadded on any common platforms (a 64-bit integer would).
Let's call this the tranch id.  The other will be derived from the
LWLock address.  Let's call this the instance ID.  We'll keep a
table of tranch IDs, which will be assigned consecutively starting
with 0.  We'll keep an array of metadata for tranches, indexed by
tranch ID, and each entry will have three associated pieces of
information: an array base, a stride length, and a printable name.
When we need to identify an lwlock in the log or to dtrace, we'll
fetch the tranch ID from the lwlock itself and use that to index into
the tranch metadata array.  We'll then take the address of the lwlock,
subtract the array base address for the tranch, and divide by the
stride length; the result is the instance ID.  When reporting the
user, we can report either the tranch ID directly or the associated
name for that tranch; in either case, we'll also report the instance
ID.

So initially we'll probably just have tranch 0: the main LWLock array.
 If we move buffer content and I/O locks to the buffer headers, we'll
define tranch 1 and tranch 2 with the same base address: the start of
the buffer descriptor array, and the same stride length, the size of a
buffer descriptor.  One will have the associated name buffer content
lock and the other buffer I/O lock.  If we want, we can define
split the main LWLock array into several tranches so that we can more
easily identify lock manager locks, predicate lock manager locks, and
buffer mapping locks.

I like this system because it's very cheap - we only need a small
array of metadata and a couple of memory accesses to name a lock - but
it still lets us report data in a way that's actually *more*
human-readable than what we have now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 4f88d3f..c1da2a8 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -32,6 +32,10 @@
 #include storage/proc.h
 #include storage/spin.h
 
+#ifdef LWLOCK_STATS
+#include utils/hsearch.h
+#endif
+
 
 /* We use the ShmemLock spinlock to protect LWLockAssign */
 extern slock_t *ShmemLock;
@@ -91,11 +95,17 @@ static int	lock_addin_request = 0;
 static bool lock_addin_request_allowed = true;
 
 #ifdef LWLOCK_STATS
+typedef struct lwlock_stats
+{
+

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Jim Nasby


On 1/6/14, 2:59 PM, Robert Haas wrote:

On Mon, Jan 6, 2014 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Robert Haas robertmh...@gmail.com writes:

On Mon, Jan 6, 2014 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:

I agree it'd be nicer if we had some better way than mere manual
inspection to enforce proper use of spinlocks; but this change doesn't
seem to me to move the ball downfield by any meaningful distance.



Well, my thought was mainly that, while it may be a bad idea to take
another spinlock while holding a spinlock under any circumstances,
somebody might do it and it might appear to work just fine.  The most
likely sequences seems to me to be something like SpinLockAcquire(...)
followed by LWLockConditionalAcquire(), thinking that things are OK
because the lock acquisition is conditional - but in fact the
conditional acquire still takes the spinlock unconditionally.


The point I'm making is that no such code should get past review,
whether it's got an obvious performance problem or not.


Sure, I agree, but we all make mistakes.  It's just a judgement call
as to how likely you think it is that someone might make this
particular mistake, a topic upon which opinions may vary.


There's been discussions in the past about the overhead added by testing and 
having more than one level of test capabilities so that facilities like the 
buildfarm can run more expensive testing without burdening developers with 
that. ISTM this is another case of that (presumably Tom's concern is the 
performance hit of adding an assert in a critical path).

If we had a more aggressive form of assert (assert_anal? :)) we could use that 
here and let the buildfarm bore the brunt of that cost.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-06 Thread Alvaro Herrera

Jim Nasby escribió:
 On 1/6/14, 2:59 PM, Robert Haas wrote:
 On Mon, Jan 6, 2014 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 The point I'm making is that no such code should get past review,
 whether it's got an obvious performance problem or not.
 
 Sure, I agree, but we all make mistakes.  It's just a judgement call
 as to how likely you think it is that someone might make this
 particular mistake, a topic upon which opinions may vary.

I agree that excessive optimism might cause problems in the future.
Maybe it makes sense to have such a check #ifdef'ed out on most builds
to avoid extra overhead, but not having any check at all just because we
trust the review process too much doesn't strike me as the best of
ideas.

 There's been discussions in the past about the overhead added by testing and 
 having more than one level of test capabilities so that facilities like the 
 buildfarm can run more expensive testing without burdening developers with 
 that. ISTM this is another case of that (presumably Tom's concern is the 
 performance hit of adding an assert in a critical path).
 
 If we had a more aggressive form of assert (assert_anal? :)) we could use 
 that here and let the buildfarm bore the brunt of that cost.

Sounds good, but let's not enable it blindly on all machines.  There are
some that are under very high pressure to build and test all the
branches, so if we add something too costly on them it might cause
trouble.  So whatever we do, it should at least have the option to
opt-out, or perhaps even make it opt-in.  (I'd think opt-out is better,
because otherwise very few machines are going to have it enabled at
all.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] dynamic shared memory and locks

2014-01-05 Thread Robert Haas

One of the things that you might want to do with dynamic shared memory
is store a lock in it.  In fact, my bet is that almost everything that
uses dynamic shared memory will want to do precisely that, because, of
course, it's dynamic *shared* memory, which means that it is
concurrently accessed by multiple processes, which tends to require
locking.  Typically, what you're going to want are either spinlocks
(for very short critical sections) or lwlocks (for longer ones).  It
doesn't really make sense to talk about storing heavyweight locks in
dynamic shared memory, because we're talking about storing locks with
the data structures that they protect, and heavyweight locks are used
to protect database or shared objects, not shared memory structures.
Of course, someone might think of trying to provide a mechanism for
the heavyweight lock manager to overflow to dynamic shared memory, but
that's a different thing altogether and not what I'm talking about
here.

Right now, storing spinlocks in dynamic shared memory *almost* works,
but there are problems with --disable-spinlocks.  In that
configuration, we use semaphores to simulate spinlocks.  Every time
someone calls SpinLockInit(), it's going to allocate a new semaphore
which will never be returned to the operating system, so you're pretty
quickly going to run out.  There are a couple of things we could do
about this:

1. Decide we don't care.  If you compile with --disable-spinlocks, and
then you try to use dynamic shared memory, it's going to leak
semaphores until none remain, and then start failing from there until
the postmaster is restarted.  If you don't like that, provide a
working spinlock implementation for your platform.

2. Forbid the use of dynamic shared memory when compiling with
--disable-spinlocks.  This is a more polite version of #1.  It seems
likely to me that nearly every piece of code that uses dynamic shared
memory will require locking.  Instead of letting people allocate
dynamic shared memory segments anyway and then having them start
failing shortly after postmaster startup, we could just head the
problem off at the pass by denying the request for dynamic shared
memory in the first place.  Dynamic shared memory allocation can
always fail (e.g. because we're out of memory) and also has an
explicit off switch that will make all requests fail
(dynamic_shared_memory_type=none), so any code that uses dynamic
shared memory has to be prepared for a failure at that point, whereas
a failure in SpinLockInit() might be more surprising.

3. Provide an inverse for SpinLockInit, say SpinLockDestroy, and
require all code written for dynamic shared memory to invoke this
function on every spinlock before the shared memory segment is
destroyed.  I initially thought that this could be done using the
on_dsm_detach infrastructure, but it turns out that doesn't really
work.  The on_dsm_detach infrastructure is designed to make sure that
you *release* all of your locks when detaching - i.e. those hooks get
invoked for each process that detaches.  For this, you'd need an
on_dsm_final_detach callback that gets called only for the very last
detach (and after prohibiting any other processes from attaching).  I
can certainly engineer all that, but it's a decent amount of extra
work for everyone who wants to use dynamic shared memory to write the
appropriate callback, and because few people actually use
--disable-spinlocks, I think those callbacks will tend to be rather
lightly tested and thus a breeding ground for marginal bugs that
nobody's terribly excited about fixing.

4. Drop support for --disable-spinlocks.

For what it's worth, my vote is currently for #2.  I can't think of
many interesting to do with dynamic shared memory without having at
least spinlocks, so I don't think we'd be losing much.  #1 seems
needlessly unfriendly, #3 seems like a lot of work for not much, and
#4 seems excessive at least as a solution to this particular problem,
though there may be other arguments for it.  What do others think?

I think we're also going to want to be able to create LWLocks in
dynamic shared memory: some critical sections won't be short enough or
safe enough to be protected by spinlocks.  At some level this seems
easy: change LWLockAcquire and friends to accept an LWLock * rather
than an LWLockId, and similarly change held_lwlocks[] to hold LWLock
pointers rather than LWLockIds.  One tricky point is that you'd better
try not to detach a shared memory segment while you're holding lwlocks
inside that segment, but I think just making that a coding rule
shouldn't cause any great problem, and conversely you'd better release
all lwlocks in the segment before detaching it, but this seems mostly
OK: throwing an error will call LWLockReleaseAll before doing the
resource manager cleanups that will unmap the dynamic shared memory
segment, so that's probably OK too.  There may be corner cases I
haven't thought about, though.  A bigger problem is that I think we
want to avoid

Re: [HACKERS] dynamic shared memory and locks

2014-01-05 Thread Andres Freund

On 2014-01-05 12:56:05 -0500, Robert Haas wrote:
 Right now, storing spinlocks in dynamic shared memory *almost* works,
 but there are problems with --disable-spinlocks.  In that
 configuration, we use semaphores to simulate spinlocks.  Every time
 someone calls SpinLockInit(), it's going to allocate a new semaphore
 which will never be returned to the operating system, so you're pretty
 quickly going to run out.  There are a couple of things we could do
 about this:

 4. Drop support for --disable-spinlocks.

I very strongly vote 4). I think we're going to hit this more and more
often and it's a facility that benefits almost nobody. Just about every
new platform will be/is on gcc or clang and you can just duplicate the
compiler provided generic implementation we have for arm for there.

The atomics implementation make this an automatic fallback if there's
no compiler specific variant around.

 I think we're also going to want to be able to create LWLocks in
 dynamic shared memory: some critical sections won't be short enough or
 safe enough to be protected by spinlocks.

Agreed.

 At some level this seems  easy: change LWLockAcquire and friends to
 accept an LWLock * rather than an LWLockId, and similarly change
 held_lwlocks[] to hold LWLock pointers rather than LWLockIds.

My primary reason isn't dsm TBH but wanting to embed the buffer lwlocks
in the bufferdesc, on the same cacheline as the buffer headers
spinlock. All the embedded ones can be allocated without padding, while
the relatively low number of non-embedded ones can be padded to the full
cacheline size.

 A bigger problem is that I think we
 want to avoid having a large amount of notational churn.  The obvious
 way to do that is to get rid of the LWLockId array and instead declare
 each fixed LWLock separately as e.g. LWLock *ProcArrayLock.  However,
 creating a large number of new globals that will need to be
 initialized in every new EXEC_BACKEND process seems irritating.  So
 maybe a better idea is to do something like this:

 #define BufFreelistLock (fixedlwlocks[0])
 #define ShmemIndexLock (fixedlwlocks[1])
 ...
 #define AutoFileLock (fixedlwlocks[36])
 #define NUM_FIXED_LWLOCKS 37
 
 Comments, suggestions?

My idea here was to just have two APIs, a legacy one that works like the
current one, and a new one that locks the lwlocks passed in by a
pointer. After all, LWLockAssign() which you use in extensions currently
returns a LWLockId. Seems ugly to turn that into a pointer.

But perhaps your idea is better anyway, no matter the hackery of turning
LWLockId into a pointer.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-05 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 For what it's worth, my vote is currently for #2.  I can't think of
 many interesting to do with dynamic shared memory without having at
 least spinlocks, so I don't think we'd be losing much.  #1 seems
 needlessly unfriendly, #3 seems like a lot of work for not much, and
 #4 seems excessive at least as a solution to this particular problem,
 though there may be other arguments for it.  What do others think?

I agree with this position.  There may be some good reason to drop
--disable-spinlocks altogether in future, but DSM isn't a sufficient
excuse.

 I think we're also going to want to be able to create LWLocks in
 dynamic shared memory: some critical sections won't be short enough or
 safe enough to be protected by spinlocks.  At some level this seems
 easy: change LWLockAcquire and friends to accept an LWLock * rather
 than an LWLockId, and similarly change held_lwlocks[] to hold LWLock
 pointers rather than LWLockIds.

I seem to recall that there was some good reason for keeping all the
LWLocks in an array, back when the facility was first designed.
I'm too lazy to research the point right now, but you might want to
go back and look at the archives around when lwlock.c was written.

 creating a large number of new globals that will need to be
 initialized in every new EXEC_BACKEND process seems irritating.

This might've been the good reason, but not sure --- I think LWLocks
predate our Windows support.

In the end, though, that decision was taken before we were concerned
about being able to add new LWLocks on the fly, which is what this is
really about (whether they're stored in DSM or not is a secondary point).
The pressure for that has gotten strong enough that it may be time to
change the tradeoff.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-05 Thread Andres Freund

On 2014-01-05 14:06:52 -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  For what it's worth, my vote is currently for #2.  I can't think of
  many interesting to do with dynamic shared memory without having at
  least spinlocks, so I don't think we'd be losing much.  #1 seems
  needlessly unfriendly, #3 seems like a lot of work for not much, and
  #4 seems excessive at least as a solution to this particular problem,
  though there may be other arguments for it.  What do others think?
 
 I agree with this position.  There may be some good reason to drop
 --disable-spinlocks altogether in future, but DSM isn't a sufficient
 excuse.

Agreed that DSM isn't sufficient cause. The reasons for removing it for
future reasons I see are:
* It's not tested at all and it has been partially broken for
  significants of time. Afair Heikki just noticed the breakage
  accidentally when adding enough new spinlocks recently.
* It's showed up as problematic in a couple of patches while adding not
  much value (at least dsm, atomic ops, afair some others)
* It's slow enough that it's not of a practical value.
* Implementing simple support for spinlocks on realistic platforms isn't
  hard these days due to compiler intrinsics.
* The platforms that don't have a barrier implementation will rely on
  spinlocks. And for correctness those spinlocks should employ
  barriers. That might be more of an argument for getting rid of that
  fallback tho.


  I think we're also going to want to be able to create LWLocks in
  dynamic shared memory: some critical sections won't be short enough or
  safe enough to be protected by spinlocks.  At some level this seems
  easy: change LWLockAcquire and friends to accept an LWLock * rather
  than an LWLockId, and similarly change held_lwlocks[] to hold LWLock
  pointers rather than LWLockIds.
 
 I seem to recall that there was some good reason for keeping all the
 LWLocks in an array, back when the facility was first designed.
 I'm too lazy to research the point right now, but you might want to
 go back and look at the archives around when lwlock.c was written.

Your proposal is at
http://www.postgresql.org/message-id/1054.1001520...@sss.pgh.pa.us -
afaics there hasn't been much discussion about implementation details at
that level.

 In the end, though, that decision was taken before we were concerned
 about being able to add new LWLocks on the fly, which is what this is
 really about (whether they're stored in DSM or not is a secondary point).
 The pressure for that has gotten strong enough that it may be time to
 change the tradeoff.

I personally find the sharing of a cacheline between a buffer headers
spinlock and the lwlock much more interesting than DSM...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-05 Thread Robert Haas

On Sun, Jan 5, 2014 at 2:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I seem to recall that there was some good reason for keeping all the
 LWLocks in an array, back when the facility was first designed.
 I'm too lazy to research the point right now, but you might want to
 go back and look at the archives around when lwlock.c was written.

To some extent it's an orthogonal question.  It's true that there
isn't much point in using LWLock * rather than LWLockId to refer to
LWLocks unless we wish to be able to store them outside the shared
memory segment, but the reverse is not true: just because we have the
ability to move things outside the main array in the general case
doesn't make it a good idea in any particular case.  Andres's email
seems to indicate that he sees performance advantages in moving buffer
locks elsewhere, and I have a sneaking suspicion that we'll find that
it's more convenient to move some other things around as well, but
that's policy, not mechanism.  Very little of the core LWLock
machinery actually cares how the locks are stored, so the attached
patch to make it use LWLock * rather than LWLockId as a handle is
pretty straightforward.

The only real problem I see here is that we occasionally *print out*
LWLockIds as a way of identifying lwlocks.  This is not a great
system, but printing out pointers is almost certainly much worse (e.g.
because of ASLR).  The cases where this is currently an issue are:

- You try to release a lwlock you haven't acquired.  We elog(ERROR) the ID.
- You define LWLOCK_STATS.  The resulting reports are print the lock ID.
- You define LOCK_DEBUG and set trace_lwlocks=true.  We print the lock
ID in the trace messages.
- You compile with --enable-dtrace.  We pass the lock IDs to the dtrace probes.

In the attached patch I handled the first case by printing the pointer
(which I don't feel too bad about) and the remaining three cases by
leaving them broken.  I wouldn't shed a tear about ripping out
trace_lwlocks, but LWLOCK_STATS is useful and I bet somebody is using
--enable-dtrace, so we probably need to fix those cases at least.  I
suppose one option is to make LWLOCK_STATS and the dtrace probes only
look at locks in the main array and just ignore everything else.  But
that's kind of crappy, especially if we're going to soon move buffer
locks out of the main array.

Another idea is to include some identifying information in the lwlock.
 For example, each lwlock could have a char *name in it, and we could
print the name.  In theory, this could be a big step forward in terms
of usability, because it'd spare us all needing to remember that 4 ==
ProcArrayLock.  But it's awkward for buffer locks, of which there
might be a great many, and we surely don't want to allocate a
dynamically-generated string in shared memory for each one.  You could
do a bit better by making the identifying information a string plus an
integer, because then all the buffer locks could set the string to a
static constant like buffer content lock and the integer to the
buffer number, and similarly for lock manager partition locks and so
on.  This is appealing, but would increase the size of LWLockPadded
from 16 bytes to 32 on 64-bit platforms where slock_t is four bytes or
less, which I'm not that excited about even though it would reduce
cache line contention in some cases.

Yet a third idea is to try to reverse-engineer a name for a given
lwlock from the pointer address.  If it's an offset into the main
array, this is easy enough to do, and even if we ended up with several
arrays (like one for bufmgr locks) it wouldn't be too hard to write
code to figure out which array contains it and emit the appropriate
string.  The only real problem that I see with this is that it might
cause a performance hit.  A performance hit when running with
trace_lwlocks or LWLOCK_STATS is not really a problem, but people
won't like if --enable-dtrace slow things up.

Preferences, other ideas?

None of these ideas are a complete solution for LWLOCK_STATS.  In the
other three cases noted above, we only need an identifier for the lock
instantaneously, so that we can pass it off to the logger or dtrace
or whatever.  But LWLOCK_STATS wants to hold on to data about the
locks that were visited until the end of the session, and it does that
using an array that is *indexed* by lwlockid.  I guess we could
replace that with a hash table.  Ugh.  Any suggestions?

(Incidentally, while developing this patch I found a bug in the
current code: lock.c iterates over all PGPROCs from 0 to
ProcGlobal-allProcCount and takes the backendLock for each, but if
max_prepared_transactions0 then the PGPROCs for prepared transactions
do not have a backendLock and so we take and release BufFreelistLock -
i.e. 0 - a number of times equal to max_prepared_transactions.  That's
not cool.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/transam/slru.c

Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-18 Thread Robert Haas

On Tue, Dec 10, 2013 at 6:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Noah Misch n...@leadboat.com writes:
 On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote:
 Let's not add more cases like that, if we can avoid it.

 Only if we can avoid it for a modicum of effort and feature compromise.
 You're asking for PostgreSQL to reshape its use of persistent resources so 
 you
 can throw around killall -9 postgres; rm -rf $PGDATA without so much as a
 memory leak.  That use case, not PostgreSQL, has the defect here.

 The larger point is that such a shutdown process has never in the history
 of Postgres been successful at removing shared-memory (or semaphore)
 resources.  I do not feel a need to put a higher recoverability standard
 onto the DSM code than what we've had for fifteen years with SysV shmem.

 But, by the same token, it shouldn't be any *less* recoverable.  In this
 context that means that starting a new postmaster should recover the
 resources, at least 90% of the time (100% if we still have the old
 postmaster lockfile).  I think the idea of keeping enough info in the
 SysV segment to permit recovery of DSM resources is a good one.  Then,
 any case where the existing logic is able to free the old SysV segment
 is guaranteed to also work for DSM.

So I'm taking a look at this.  There doesn't appear to be anything
intrinsically intractable here, but it seems important to time the
order of operations so as to minimize the chances that things fail in
awkward places.  The point where we remove the old shared memory
segment from the previous postmaster invocation is here:

/*
 * The segment appears to be from a dead Postgres process, or from a
 * previous cycle of life in this same process.  Zap it, if possible.
 * This probably shouldn't fail, but if it does, assume the segment
 * belongs to someone else after all, and continue quietly.
 */
shmdt(memAddress);
if (shmctl(shmid, IPC_RMID, NULL)  0)
continue;

My first thought was to remember the control segment ID from the
header just before the shmdt() there, and then later when the DSM
module is starting, do cleanup.  But that doesn't seem like a good
idea, because then if startup fails after we remove the old shared
memory segment and before we get to the DSM initialization code, we'll
have lost the information on what control segment needs to be cleaned
up.  A subsequent startup attempt won't see the old shm again, because
it's already gone.  I'm fairly sure that this would be a net reduction
in reliability vs. the status quo.

So now what I'm thinking is that we ought to actually perform the DSM
cleanup before detaching the old segment and trying to remove it.
That shouldn't fail, but if it does, or if we get killed before
completing it, the next run will hopefully find the same old shm and
finish the cleanup.  But that kind of flies in the face of the comment
above: if we perform DSM cleanup and then discover that the segment
wasn't ours after all, that means we just stomped all over somebody
else's stuff.  That's not too good. But trying to remove the segment
first and then perform the cleanup creates a window where, if we get
killed, the next restart will have lost information about how to
finish cleaning up.  So it seems that some kind of logic tweak is
needed here, but I'm not sure exactly what.  As I see it, the options
are:

1. Make failure to remove the shared memory segment we thought was
ours an error.  This will definitely show up any problems, but only
after we've burned down some other processes's dynamic shared memory
segments.  The most likely outcome is creating failure-to-start
problems that don't exist today.

2. Make it a warning.  We'll still burn down somebody else's DSMs, but
at least we'll still start ourselves.  Sadly, WARNING: You have just
done a bad thing.  It's too late to fix it.  Sorry! is not very
appealing.

3. Remove the old shared memory segment first, then perform the
cleanup immediately afterwards.  If we get killed before completing
the cleanup, we'll leak the un-cleaned-up stuff.  Decide that's OK and
move on.

4. Adopt some sort of belt-and-suspenders approach, keeping the state
file we have now and backstopping it with this mechanism, so that we
really only need this to work when $PGDATA has been blown away and
recreated.  This seems pretty inelegant, and I'm not sure who it'll
benefit other than those (few?) people who kill -9 the postmaster (or
it segfaults or otherwise dies without running the code to remove
shared memory segments) and then remove $PGDATA and then re-initdb and
then expect cleanup to find the old DSMs.

5. Give up on this approach.  We could keep what we have now, or make
the DSM control segment land at a well-known address as we do for the
main segment.

What do people prefer?

The other thing I'm a bit squidgy about is injecting more code that
runs before we get the new shared memory segment up and

Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-10 Thread Noah Misch

On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:
 On 11/20/2013 09:58 PM, Robert Haas wrote:
 On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:
 * As discussed in the Something fishy happening on frogmouth thread, I
 don't like the fact that the dynamic shared memory segments will be
 permanently leaked if you kill -9 postmaster and destroy the data directory.

 Your test elicited different behavior for the dsm code vs. the main
 shared memory segment because it involved running a new postmaster
 with a different data directory but the same port number on the same
 machine, and expecting that that new - and completely unrelated -
 postmaster would clean up the resources left behind by the old,
 now-destroyed cluster.  I tend to view that as a defect in your test
 case more than anything else, but as I suggested previously, we could
 potentially change the code to use something like 100 + (port *
 100) with a forward search for the control segment identifier, instead
 of using a state file, mimicking the behavior of the main shared
 memory segment.  I'm not sure we ever reached consensus on whether
 that was overall better than what we have now.

 I really think we need to do something about it. To use your earlier  
 example of parallel sort, it's not acceptable to permanently leak a 512  
 GB segment on a system with 1 TB of RAM.

I don't.  Erasing your data directory after an unclean shutdown voids any
expectations for a thorough, automatic release of system resources.  Don't do
that.  The next time some new use of a persistent resource violates your hope
for this scenario, there may be no remedy.

Having said that, I'm not opposed to storing the DSM control segment handle in
PGShmemHeader, which would enable DSM cleanup whenever we find cause to
reclaim a main sysv segment.

 One idea is to create the shared memory object with shm_open, and wait  
 until all the worker processes that need it have attached to it. Then,  
 shm_unlink() it, before using it for anything. That way the segment will  
 be automatically released once all the processes close() it, or die. In  
 particular, kill -9 will release it. (This is a variant of my earlier  
 idea to create a small number of anonymous shared memory file  
 descriptors in postmaster startup with shm_open(), and pass them down to  
 child processes with fork()). I think you could use that approach with  
 SysV shared memory as well, by destroying the segment with  
 sgmget(IPC_RMID) immediately after all processes have attached to it.

That leaves a window in which we still leak the segment, and it is less
general: not every use of DSM is conducive to having all processes attach in a
short span of time.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-10 Thread Heikki Linnakangas


On 12/10/2013 07:27 PM, Noah Misch wrote:

On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:

On 11/20/2013 09:58 PM, Robert Haas wrote:

On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas hlinnakan...@vmware.com 
wrote:

* As discussed in the Something fishy happening on frogmouth thread, I
don't like the fact that the dynamic shared memory segments will be
permanently leaked if you kill -9 postmaster and destroy the data directory.


Your test elicited different behavior for the dsm code vs. the main
shared memory segment because it involved running a new postmaster
with a different data directory but the same port number on the same
machine, and expecting that that new - and completely unrelated -
postmaster would clean up the resources left behind by the old,
now-destroyed cluster.  I tend to view that as a defect in your test
case more than anything else, but as I suggested previously, we could
potentially change the code to use something like 100 + (port *
100) with a forward search for the control segment identifier, instead
of using a state file, mimicking the behavior of the main shared
memory segment.  I'm not sure we ever reached consensus on whether
that was overall better than what we have now.


I really think we need to do something about it. To use your earlier
example of parallel sort, it's not acceptable to permanently leak a 512
GB segment on a system with 1 TB of RAM.


I don't.  Erasing your data directory after an unclean shutdown voids any
expectations for a thorough, automatic release of system resources.  Don't do
that.  The next time some new use of a persistent resource violates your hope
for this scenario, there may be no remedy.


Well, the point of erasing the data directory is to release system 
resources. I would normally expect killall -9 process; rm -rf data 
dir to thorougly get rid of the running program and all the resources. 
It's surprising enough that the regular shared memory segment is left 
behind, but at least that one gets cleaned up when you start a new 
server (on same port). Let's not add more cases like that, if we can 
avoid it.


BTW, what if the data directory is seriously borked, and the server 
won't start? Sure, don't do that, but it would be nice to have a way to 
recover if you do anyway. (docs?)



One idea is to create the shared memory object with shm_open, and wait
until all the worker processes that need it have attached to it. Then,
shm_unlink() it, before using it for anything. That way the segment will
be automatically released once all the processes close() it, or die. In
particular, kill -9 will release it. (This is a variant of my earlier
idea to create a small number of anonymous shared memory file
descriptors in postmaster startup with shm_open(), and pass them down to
child processes with fork()). I think you could use that approach with
SysV shared memory as well, by destroying the segment with
sgmget(IPC_RMID) immediately after all processes have attached to it.


That leaves a window in which we still leak the segment,


A small window is better than a large one.

Another refinement is to wait for all the processes to attach before 
setting the segment's size with ftruncate(). That way, when the window 
is open for leaking the segment, it's still 0-sized so leaking it is not 
a big deal.



and it is less
general: not every use of DSM is conducive to having all processes attach in a
short span of time.


Let's cross that bridge when we get there. AFAICS it fits all the use 
cases discussed this far.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-10 Thread Noah Misch

On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote:
 On 12/10/2013 07:27 PM, Noah Misch wrote:
 On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:
 On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:
 * As discussed in the Something fishy happening on frogmouth thread, I
 don't like the fact that the dynamic shared memory segments will be
 permanently leaked if you kill -9 postmaster and destroy the data 
 directory.

 I really think we need to do something about it. To use your earlier
 example of parallel sort, it's not acceptable to permanently leak a 512
 GB segment on a system with 1 TB of RAM.
 
 I don't.  Erasing your data directory after an unclean shutdown voids any
 expectations for a thorough, automatic release of system resources.  Don't do
 that.  The next time some new use of a persistent resource violates your hope
 for this scenario, there may be no remedy.
 
 Well, the point of erasing the data directory is to release system
 resources. I would normally expect killall -9 process; rm -rf
 data dir to thorougly get rid of the running program and all the
 resources. It's surprising enough that the regular shared memory
 segment is left behind

Your expectation is misplaced.  Processes and files are simply not the only
persistent system resources of interest.

 but at least that one gets cleaned up when
 you start a new server (on same port).

In the most-typical case, yes.  In rare cases involving multiple postmasters
starting and stopping, the successor to the erased data directory will not
clean up the sysv segment.

 Let's not add more cases like that, if we can avoid it.

Only if we can avoid it for a modicum of effort and feature compromise.
You're asking for PostgreSQL to reshape its use of persistent resources so you
can throw around killall -9 postgres; rm -rf $PGDATA without so much as a
memory leak.  That use case, not PostgreSQL, has the defect here.

 BTW, what if the data directory is seriously borked, and the server
 won't start? Sure, don't do that, but it would be nice to have a way
 to recover if you do anyway. (docs?)

If something is corrupting your data directory in an open-ended manner, you
have bigger problems than a memory leak until reboot.  Recovering DSM happens
before we read the control file, so the damage would need to fall among a
short list of files for this to happen (bugs excluded).  Nonetheless, I don't
object to documenting the varieties of system resources that PostgreSQL may
reserve and referencing the OS facilities for inspecting them.

Are you actually using PostgreSQL this way: frequent killall -9 postgres; rm
-rf $PGDATA after arbitrarily-bad $PGDATA corruption?  Some automated fault
injection test rig, perhaps?

 One idea is to create the shared memory object with shm_open, and wait
 until all the worker processes that need it have attached to it. Then,
 shm_unlink() it, before using it for anything. That way the segment will
 be automatically released once all the processes close() it, or die. In
 particular, kill -9 will release it. (This is a variant of my earlier
 idea to create a small number of anonymous shared memory file
 descriptors in postmaster startup with shm_open(), and pass them down to
 child processes with fork()). I think you could use that approach with
 SysV shared memory as well, by destroying the segment with
 sgmget(IPC_RMID) immediately after all processes have attached to it.
 
 That leaves a window in which we still leak the segment,
 
 A small window is better than a large one.

Yes.

 Another refinement is to wait for all the processes to attach before
 setting the segment's size with ftruncate(). That way, when the
 window is open for leaking the segment, it's still 0-sized so
 leaking it is not a big deal.
 
 and it is less
 general: not every use of DSM is conducive to having all processes attach in 
 a
 short span of time.
 
 Let's cross that bridge when we get there. AFAICS it fits all the
 use cases discussed this far.

It does fit the use cases discussed thus far.

nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-10 Thread Andres Freund

On 2013-12-10 18:12:53 -0500, Noah Misch wrote:
 On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote:
  On 12/10/2013 07:27 PM, Noah Misch wrote:
  On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:
  Let's not add more cases like that, if we can avoid it.
 
 Only if we can avoid it for a modicum of effort and feature compromise.
 You're asking for PostgreSQL to reshape its use of persistent resources so you
 can throw around killall -9 postgres; rm -rf $PGDATA without so much as a
 memory leak.  That use case, not PostgreSQL, has the defect here.

Empathically seconded.

   Another refinement is to wait for all the processes to attach before 
   setting
   the segment's size with ftruncate(). That way, when the window is open for
   leaking the segment, it's still 0-sized so leaking it is not a big deal.
 
  and it is less
  general: not every use of DSM is conducive to having all processes attach 
  in a
  short span of time.

  Let's cross that bridge when we get there. AFAICS it fits all the
  use cases discussed this far.

The primary use case I have for dsm, namely writing extensions that can
use shared memory without having to be listed in
shared_preload_libraries, certainly wouldn't work in any sensible way
with such a restriction.
And I don't think that's an insignificant usecase.

So I really fail to see what this would buy us.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-10 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote:
 Let's not add more cases like that, if we can avoid it.

 Only if we can avoid it for a modicum of effort and feature compromise.
 You're asking for PostgreSQL to reshape its use of persistent resources so you
 can throw around killall -9 postgres; rm -rf $PGDATA without so much as a
 memory leak.  That use case, not PostgreSQL, has the defect here.

The larger point is that such a shutdown process has never in the history
of Postgres been successful at removing shared-memory (or semaphore)
resources.  I do not feel a need to put a higher recoverability standard
onto the DSM code than what we've had for fifteen years with SysV shmem.

But, by the same token, it shouldn't be any *less* recoverable.  In this
context that means that starting a new postmaster should recover the
resources, at least 90% of the time (100% if we still have the old
postmaster lockfile).  I think the idea of keeping enough info in the
SysV segment to permit recovery of DSM resources is a good one.  Then,
any case where the existing logic is able to free the old SysV segment
is guaranteed to also work for DSM.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-06 Thread Robert Haas

On Thu, Dec 5, 2013 at 4:06 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 That's a very interesting idea.  I've been thinking that we needed to
 preserve the property that new workers could attach to the shared
 memory segment at any time, but that might not be necessary in all
 case.  We could introduce a new dsm operation that means i promise no
 one else needs to attach to this segment.  Further attachments would
 be disallowed by dsm.c regardless of the implementation in use, and
 dsm_impl.c would also be given a chance to perform
 implementation-specific operations, like shm_unlink and
 shmctl(IPC_RMID).  This new operation, when used, would help to reduce
 the chance of leaks and perhaps catch other programming errors as
 well.

 What should we call it?  dsm_finalize() is the first thing that comes
 to mind, but I'm not sure I like that.

 dsm_unlink() would mirror the underlying POSIX shm_unlink() call, and would
 be familiar to anyone who understands how unlink() on a file works on Unix.

OK, let me work on that once this CommitFest is done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-05 Thread Heikki Linnakangas


On 11/20/2013 09:58 PM, Robert Haas wrote:

On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

How many allocations? What size will they have have typically, minimum and
maximum?


The facility is intended to be general, so the answer could vary
widely by application.  The testing that I have done so far suggests
that for message-passing, relatively small queue sizes (a few kB,
perhaps 1 MB at the outside) should be sufficient.  However,
applications such as parallel sort could require vast amounts of
shared memory.  Consider a machine with 1TB of memory performing a
512GB internal sort.  You're going to need 512GB of shared memory for
that.


Hmm. Those two use cases are quite different. For message-passing, you 
want a lot of small queues, but for parallel sort, you want one huge 
allocation. I wonder if we shouldn't even try a one-size-fits-all solution.


For message-passing, there isn't much need to even use dynamic shared 
memory. You could just assign one fixed-sized, single-reader 
multiple-writer queue for each backend.


For parallel sort, you'll want to utilize all the available memory and 
all CPUs for one huge sort. So all you really need is a single huge 
shared memory segment. If one process is already using that 512GB 
segment to do a sort, you do *not* want to allocate a second 512GB 
segment. You'll want to wait for the first operation to finish first. Or 
maybe you'll want to have 3-4 somewhat smaller segments in use at the 
same time, but not more than that.



* As discussed in the Something fishy happening on frogmouth thread, I
don't like the fact that the dynamic shared memory segments will be
permanently leaked if you kill -9 postmaster and destroy the data directory.


Your test elicited different behavior for the dsm code vs. the main
shared memory segment because it involved running a new postmaster
with a different data directory but the same port number on the same
machine, and expecting that that new - and completely unrelated -
postmaster would clean up the resources left behind by the old,
now-destroyed cluster.  I tend to view that as a defect in your test
case more than anything else, but as I suggested previously, we could
potentially change the code to use something like 100 + (port *
100) with a forward search for the control segment identifier, instead
of using a state file, mimicking the behavior of the main shared
memory segment.  I'm not sure we ever reached consensus on whether
that was overall better than what we have now.


I really think we need to do something about it. To use your earlier 
example of parallel sort, it's not acceptable to permanently leak a 512 
GB segment on a system with 1 TB of RAM.


One idea is to create the shared memory object with shm_open, and wait 
until all the worker processes that need it have attached to it. Then, 
shm_unlink() it, before using it for anything. That way the segment will 
be automatically released once all the processes close() it, or die. In 
particular, kill -9 will release it. (This is a variant of my earlier 
idea to create a small number of anonymous shared memory file 
descriptors in postmaster startup with shm_open(), and pass them down to 
child processes with fork()). I think you could use that approach with 
SysV shared memory as well, by destroying the segment with 
sgmget(IPC_RMID) immediately after all processes have attached to it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-05 Thread Robert Haas

On Thu, Dec 5, 2013 at 11:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm. Those two use cases are quite different. For message-passing, you want
 a lot of small queues, but for parallel sort, you want one huge allocation.
 I wonder if we shouldn't even try a one-size-fits-all solution.

 For message-passing, there isn't much need to even use dynamic shared
 memory. You could just assign one fixed-sized, single-reader multiple-writer
 queue for each backend.

True, although if the queue needs to 1MB, or even 128kB, that would
bloat the static shared-memory footprint over the server pretty
significantly.  And I don't know that we know that a small queue will
be adequate in all cases.  If you've got a worker backend feeding data
back to the user backend, the size of the queue limits how far ahead
of the user backend that worker can get.  Big is good, because then
the user backend won't stall on read, but small is also good, in case
the query is cancelled or hits an error.  It is far from obvious to me
that one-size-fits-all is the right solution.

 For parallel sort, you'll want to utilize all the available memory and all
 CPUs for one huge sort. So all you really need is a single huge shared
 memory segment. If one process is already using that 512GB segment to do a
 sort, you do *not* want to allocate a second 512GB segment. You'll want to
 wait for the first operation to finish first. Or maybe you'll want to have
 3-4 somewhat smaller segments in use at the same time, but not more than
 that.

This is all true, but it has basically nothing to do with parallelism.
 work_mem is a poor model, but I didn't invent it.  Hopefully some day
someone will fix it, maybe even me, but that's a separate project.

 I really think we need to do something about it. To use your earlier example
 of parallel sort, it's not acceptable to permanently leak a 512 GB segment
 on a system with 1 TB of RAM.

 One idea is to create the shared memory object with shm_open, and wait until
 all the worker processes that need it have attached to it. Then,
 shm_unlink() it, before using it for anything. That way the segment will be
 automatically released once all the processes close() it, or die. In
 particular, kill -9 will release it. (This is a variant of my earlier idea
 to create a small number of anonymous shared memory file descriptors in
 postmaster startup with shm_open(), and pass them down to child processes
 with fork()). I think you could use that approach with SysV shared memory as
 well, by destroying the segment with sgmget(IPC_RMID) immediately after all
 processes have attached to it.

That's a very interesting idea.  I've been thinking that we needed to
preserve the property that new workers could attach to the shared
memory segment at any time, but that might not be necessary in all
case.  We could introduce a new dsm operation that means i promise no
one else needs to attach to this segment.  Further attachments would
be disallowed by dsm.c regardless of the implementation in use, and
dsm_impl.c would also be given a chance to perform
implementation-specific operations, like shm_unlink and
shmctl(IPC_RMID).  This new operation, when used, would help to reduce
the chance of leaks and perhaps catch other programming errors as
well.

What should we call it?  dsm_finalize() is the first thing that comes
to mind, but I'm not sure I like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-05 Thread Heikki Linnakangas


On 12/05/2013 09:34 PM, Robert Haas wrote:

On Thu, Dec 5, 2013 at 11:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

One idea is to create the shared memory object with shm_open, and wait until
all the worker processes that need it have attached to it. Then,
shm_unlink() it, before using it for anything. That way the segment will be
automatically released once all the processes close() it, or die. In
particular, kill -9 will release it. (This is a variant of my earlier idea
to create a small number of anonymous shared memory file descriptors in
postmaster startup with shm_open(), and pass them down to child processes
with fork()). I think you could use that approach with SysV shared memory as
well, by destroying the segment with sgmget(IPC_RMID) immediately after all
processes have attached to it.


That's a very interesting idea.  I've been thinking that we needed to
preserve the property that new workers could attach to the shared
memory segment at any time, but that might not be necessary in all
case.  We could introduce a new dsm operation that means i promise no
one else needs to attach to this segment.  Further attachments would
be disallowed by dsm.c regardless of the implementation in use, and
dsm_impl.c would also be given a chance to perform
implementation-specific operations, like shm_unlink and
shmctl(IPC_RMID).  This new operation, when used, would help to reduce
the chance of leaks and perhaps catch other programming errors as
well.

What should we call it?  dsm_finalize() is the first thing that comes
to mind, but I'm not sure I like that.


dsm_unlink() would mirror the underlying POSIX shm_unlink() call, and 
would be familiar to anyone who understands how unlink() on a file works 
on Unix.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-11-23 Thread Jeremy Harris


On 20/11/13 19:58, Robert Haas wrote:

Parallel sort, and then parallel other stuff.  Eventually general
parallel query.

I have recently updated https://wiki.postgresql.org/wiki/Parallel_Sort
and you may find that interesting/helpful as a statement of intent.


I've been playing with an internal merge sort which may be of interest
in this area of work.  While I've not worked on parallelizing the merge
stages it does not feel like an impossible task.  More to the immediate
point, the input stage and readout stage can do useful work
overlapped with the data source and sink (respectively).

The current implementation uses the same amount of memory as the
quicksort one, and does approximately the same number of compares
(on random input).  The overheads are higher than the quicksort
implementation (up to 50% higher cpu time, on a single key of
random integers).

Its performance shines on partially- or reverse-sorted input.

Transition to (the existing) external sort is implemented, as is
the limited random-access facility, and bounded output.

It supports unique-output.  The planner interface is extended to
return an indication of dedup guarantee, and the Unique node uses
this to elide itself at planning time.  Dedup operations also
cover external sorts.
--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-11-23 Thread Peter Geoghegan

On Sat, Nov 23, 2013 at 4:21 PM, Jeremy Harris j...@wizmail.org wrote:
 Its performance shines on partially- or reverse-sorted input.

Search the archives for the work I did on timsort support a while
back. A patch was posted, that had some impressive results provided
you just considered the number of comparisons (and not TPS when
sorting text), but at the time my sense was that it didn't have broad
enough applicability for me to pursue further. That doesn't mean the
idea wasn't useful, and it certainly doesn't mean that my rough patch
couldn't be improved upon. For one thing, if there was a type that had
a comparator that was, say, an order of magnitude more expensive than
bttextcmp, it would definitely be a big win.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Dynamic Shared Memory stuff

2013-11-20 Thread Heikki Linnakangas

I'm trying to catch up on all of this dynamic shared memory stuff. A 
bunch of random questions and complaints:


What kind of usage are we trying to cater with the dynamic shared 
memory? How many allocations? What size will they have have typically, 
minimum and maximum? I looked at the message queue implementation you 
posted, but I wonder if that's the use case you're envisioning for this, 
or if you have more things in mind.



* dsm_handle is defined in dsm_impl.h, but it's exposed in the function 
signatures in dsm.h. ISTM it should be moved to dsm.h


* The DSM API contains functions for resizing the segment. That's not 
exercised by the MQ or TOC facilities. Is that going to stay dead code, 
or do you envision a user for it?


* dsm_impl_can_resize() incorrectly returns false for DSM_IMPL_MMAP. The 
mmap() implementation can resize.


* This is an issue I've seen for some time with git master, while 
working on various things. Sometimes, when I kill the server with 
CTRL-C, I get this in the log:


^CLOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating connection due to administrator command
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
LOG:  could not remove shared memory segment /PostgreSQL.1804289383: 
Tiedostoa tai hakemistoa ei ole


(that means ENOENT)

And I just figured out why that happens: If you take a base backup of a 
running system, the pg_dynshmem/state file is included in the backup. If 
you now start up a standby from the backup on the same system, it will 
clean up and reuse the dynshmem segment still used by the master 
system. Now, when you shut down the master, you get that message in the 
log. If the segment was actually used for something, the master would 
naturally crash.


* As discussed in the Something fishy happening on frogmouth thread, I 
don't like the fact that the dynamic shared memory segments will be 
permanently leaked if you kill -9 postmaster and destroy the data directory.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2013-11-20 Thread Robert Haas

On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I'm trying to catch up on all of this dynamic shared memory stuff. A bunch
 of random questions and complaints:

 What kind of usage are we trying to cater with the dynamic shared memory?

Parallel sort, and then parallel other stuff.  Eventually general
parallel query.

I have recently updated https://wiki.postgresql.org/wiki/Parallel_Sort
and you may find that interesting/helpful as a statement of intent.

 How many allocations? What size will they have have typically, minimum and
 maximum?

The facility is intended to be general, so the answer could vary
widely by application.  The testing that I have done so far suggests
that for message-passing, relatively small queue sizes (a few kB,
perhaps 1 MB at the outside) should be sufficient.  However,
applications such as parallel sort could require vast amounts of
shared memory.  Consider a machine with 1TB of memory performing a
512GB internal sort.  You're going to need 512GB of shared memory for
that.

 I looked at the message queue implementation you posted, but I
 wonder if that's the use case you're envisioning for this, or if you have
 more things in mind.

I consider that to be the first application of dynamic shared memory
and expect it to be used for (1) returning errors from background
workers to the user backend and (2) funneling tuples from one portion
of a query tree that has been split off to run in a background worker
back to the user backend.  However, I expect that many clients of the
dynamic shared memory system will want to roll their own data
structures.  Parallel internal sort (or external sort) is obviously
one, and in addition to that we might have parallel construction of
in-memory hash tables for a hash join or hash agg, or, well, anything
else you'd like to parallelize.  I expect that many of those case will
result in much larger allocations than what we need just for message
passing.

 * dsm_handle is defined in dsm_impl.h, but it's exposed in the function
 signatures in dsm.h. ISTM it should be moved to dsm.h

Well, dsm_impl.h is the low-level stuff, and dsm.h is intended as the
user API.  Unfortunately, whichever file declares that will have to be
included by the other one, so the separation is not as clean as I
would like, but I thought it made more sense for the high-level stuff
to depend on the low-level stuff rather than the other way around.

 * The DSM API contains functions for resizing the segment. That's not
 exercised by the MQ or TOC facilities. Is that going to stay dead code, or
 do you envision a user for it?

I dunno.  It certainly seems like a useful thing to be able to do - if
we run out of memory, go get some more.  It'd obviously be more useful
if we had a full-fledged allocator for dynamic shared memory, which is
something that I'd like to build but haven't built yet.  However,
after discovering that it doesn't work either on Windows or with
System V shared memory, I'm less sanguine about the chances of finding
good uses for it.  I haven't completely given up hope, but I don't
have anything concrete in mind at the moment.  It'd be a little more
plausible if we adjusted things so that the mmap() implementation
works on Windows.

 * dsm_impl_can_resize() incorrectly returns false for DSM_IMPL_MMAP. The
 mmap() implementation can resize.

Oops, that's a bug.

 * This is an issue I've seen for some time with git master, while working on
 various things. Sometimes, when I kill the server with CTRL-C, I get this in
 the log:

 ^CLOG:  received fast shutdown request
 LOG:  aborting any active transactions
 FATAL:  terminating connection due to administrator command
 LOG:  autovacuum launcher shutting down
 LOG:  shutting down
 LOG:  database system is shut down
 LOG:  could not remove shared memory segment /PostgreSQL.1804289383:
 Tiedostoa tai hakemistoa ei ole

 (that means ENOENT)

 And I just figured out why that happens: If you take a base backup of a
 running system, the pg_dynshmem/state file is included in the backup. If you
 now start up a standby from the backup on the same system, it will clean
 up and reuse the dynshmem segment still used by the master system. Now,
 when you shut down the master, you get that message in the log. If the
 segment was actually used for something, the master would naturally crash.

Ooh.  Well, pg_basebackup can be fixed not to copy that, but there's
still going to be a problem with old-style base backups.  We could try
to figure out some additional sanity check for the dsm code to use, to
determine whether or not it belongs to the same cluster, like storing
the port number or the system identifier or some other value in the
shared memory segment and then comparing it to verify whether we've
got the same one.  Or perhaps we could store the PID of the creating
postmaster in there and check whether that PID is still alive,
although we might get confused if the PID has been recycled.

 * As

Re: [HACKERS] dynamic shared memory

2013-10-14 Thread Robert Haas

On Sun, Oct 13, 2013 at 3:07 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 1. Do you think we should add information about pg_dynshmem file at link:
 http://www.postgresql.org/docs/devel/static/storage-file-layout.html
 It contains information about all files/folders in data directory

 2.
 +/*
 + * Forget that a temporary file is owned by a ResourceOwner
 + */
 +void
 +ResourceOwnerForgetDSM(ResourceOwner owner, dsm_segment *seg)
 +{

 Above function description should use 'dynamic shmem segment' rather
 than temporary file.
 Forget that a dynamic shmem segment is owned by a ResourceOwner

Good catches, will fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory

2013-10-14 Thread Amit Kapila

On Mon, Oct 14, 2013 at 5:11 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Oct 13, 2013 at 3:07 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 1. Do you think we should add information about pg_dynshmem file at link:
 http://www.postgresql.org/docs/devel/static/storage-file-layout.html
 It contains information about all files/folders in data directory

 2.
 +/*
 + * Forget that a temporary file is owned by a ResourceOwner
 + */
 +void
 +ResourceOwnerForgetDSM(ResourceOwner owner, dsm_segment *seg)
 +{

 Above function description should use 'dynamic shmem segment' rather
 than temporary file.
 Forget that a dynamic shmem segment is owned by a ResourceOwner

 Good catches, will fix.

During test, I found one issue in Windows implementation.

During startup, when it tries to create new control segment for
dynamic shared memory, it loops until an unused identifier is found,
but for Windows implementation (dsm_impl_windows()), it was returning
error for EEXIST. This error will convert into FATAL as it is during
postmaster startup and will not allow server to start.

Please find attached patch to fix the problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_error_handling_eexist_windows.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory

2013-10-14 Thread Robert Haas

On Mon, Oct 14, 2013 at 11:11 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 During test, I found one issue in Windows implementation.

 During startup, when it tries to create new control segment for
 dynamic shared memory, it loops until an unused identifier is found,
 but for Windows implementation (dsm_impl_windows()), it was returning
 error for EEXIST. This error will convert into FATAL as it is during
 postmaster startup and will not allow server to start.

 Please find attached patch to fix the problem.

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory

2013-10-13 Thread Amit Kapila

On Wed, Oct 9, 2013 at 1:10 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Sep 26, 2013 at 9:27 AM, Noah Misch n...@leadboat.com wrote:
  There's no data corruption problem if we proceed - but there likely
  has been one leading to the current state.

 +1 for making this one a PANIC, though.  With startup behind us, a valid dsm
 state file pointed us to a control segment with bogus contents.  The
 conditional probability of shared memory corruption seems higher than that of
 a DBA editing the dsm state file of a running cluster to incorrectly name as
 the dsm control segment some other existing shared memory segment.

 To respond specifically to this point... inability to open a file on
 disk does not mean that shared memory is corrupted.  Full stop.

 A scenario I have seen a few times is that someone changes the
 permissions on part or all of $PGDATA while the server is running.  I
 have only ever seen this happen on Windows.  What typically happens
 today - depending on the exact scenario - is that the checkpoints will
 fail, but the server will remain up, sometimes even committing
 transactions under synchronous_commit=off, even though it can't write
 out its data.  If you fix the permissions before shutting down the
 server, you don't even lose any data.  Making inability to read a file
 into a PANIC condition will cause any such cluster to remain up only
 as long as nobody tries to use dynamic shared memory, and then throw
 up its guts.  I don't think users will appreciate that.

 I am tempted to commit the latest version of this patch as I have it.

1. Do you think we should add information about pg_dynshmem file at link:
http://www.postgresql.org/docs/devel/static/storage-file-layout.html
It contains information about all files/folders in data directory

2.
+/*
+ * Forget that a temporary file is owned by a ResourceOwner
+ */
+void
+ResourceOwnerForgetDSM(ResourceOwner owner, dsm_segment *seg)
+{

Above function description should use 'dynamic shmem segment' rather
than temporary file.
Forget that a dynamic shmem segment is owned by a ResourceOwner


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas

Since, as has been previously discussed in this forum on multiple
occasions [citation needed], the default System V shared memory limits
are absurdly low on many systems, the dynamic shared memory patch
defaults to POSIX shared memory, which has often been touted as a
superior alternative [citation needed].  Unfortunately, the buildfarm
isn't entirely happy with this decision.  On buildfarm member anole
(HP-UX B.11.31), allocation of dynamic shared memory fails with a
Permission denied error, and on smew (Debian GNU/Linux 6.0), it
fails with Function not implemented, which according to a forum
post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
on that box.

What shall we do about this?  I see a few options.

(1) Define the issue as not our problem.  IOW, as of now, if you
want to use PostgreSQL, you've got to either make POSIX shared memory
work on your machine, or change the GUC that selects the type of
dynamic shared memory used.

(2) Default to using System V shared memory.  If people want POSIX
shared memory, let them change the default.

(3) Add a new setting that auto-probes for a type of shared memory
that works.  Try POSIX first, then System V.  Maybe even fall back to
mmap'd files if neither of those works.

(4) Remove the option to use POSIX shared memory.  System V FTW!

After some consideration, I think my vote is for option #2.  Option #1
seems too user-hostile, especially for a facility that has no in-core
users yet, though I can imagine we might want to go that way
eventually, especially if we at some point try to dump System V shared
memory altogether, as has been proposed.  Option #4 seems sad; we know
that System V shared memory limits are a problem for plenty of people,
so it'd be a shame not to have a way to use the POSIX facilities if
they're available.  Option #3 is fine as far as it goes, but it adds a
significant amount of complexity I'd rather not deal with.

Other votes?  Other ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://forums.justlinux.com/showthread.php?142429-shm_open-problem


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Claudio Freire

On Thu, Oct 10, 2013 at 1:13 PM, Robert Haas robertmh...@gmail.com wrote:
 (1) Define the issue as not our problem.  IOW, as of now, if you
 want to use PostgreSQL, you've got to either make POSIX shared memory
 work on your machine, or change the GUC that selects the type of
 dynamic shared memory used.

 (2) Default to using System V shared memory.  If people want POSIX
 shared memory, let them change the default.

 (3) Add a new setting that auto-probes for a type of shared memory
 that works.  Try POSIX first, then System V.  Maybe even fall back to
 mmap'd files if neither of those works.

 (4) Remove the option to use POSIX shared memory.  System V FTW!

 After some consideration, I think my vote is for option #2.  Option #1
 seems too user-hostile, especially for a facility that has no in-core
 users yet, though I can imagine we might want to go that way
 eventually, especially if we at some point try to dump System V shared
 memory altogether, as has been proposed.  Option #4 seems sad; we know
 that System V shared memory limits are a problem for plenty of people,
 so it'd be a shame not to have a way to use the POSIX facilities if
 they're available.  Option #3 is fine as far as it goes, but it adds a
 significant amount of complexity I'd rather not deal with.

 Other votes?  Other ideas?


I believe option 2 is not only good for now, but also a necessary
previous step in the way to option 3, which I believe should be the
goal.

So, ship a version with option 2, then implement 3, and make it the
default when enough people (using option 2) have successfully tested
pg's implementation of POSIX shared memory.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andrew Dunstan



On 10/10/2013 12:13 PM, Robert Haas wrote:

Since, as has been previously discussed in this forum on multiple
occasions [citation needed], the default System V shared memory limits
are absurdly low on many systems, the dynamic shared memory patch
defaults to POSIX shared memory, which has often been touted as a
superior alternative [citation needed].  Unfortunately, the buildfarm
isn't entirely happy with this decision.  On buildfarm member anole
(HP-UX B.11.31), allocation of dynamic shared memory fails with a
Permission denied error, and on smew (Debian GNU/Linux 6.0), it
fails with Function not implemented, which according to a forum
post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
on that box.

What shall we do about this?  I see a few options.

(1) Define the issue as not our problem.  IOW, as of now, if you
want to use PostgreSQL, you've got to either make POSIX shared memory
work on your machine, or change the GUC that selects the type of
dynamic shared memory used.

(2) Default to using System V shared memory.  If people want POSIX
shared memory, let them change the default.

(3) Add a new setting that auto-probes for a type of shared memory
that works.  Try POSIX first, then System V.  Maybe even fall back to
mmap'd files if neither of those works.

(4) Remove the option to use POSIX shared memory.  System V FTW!

After some consideration, I think my vote is for option #2.  Option #1
seems too user-hostile, especially for a facility that has no in-core
users yet, though I can imagine we might want to go that way
eventually, especially if we at some point try to dump System V shared
memory altogether, as has been proposed.  Option #4 seems sad; we know
that System V shared memory limits are a problem for plenty of people,
so it'd be a shame not to have a way to use the POSIX facilities if
they're available.  Option #3 is fine as far as it goes, but it adds a
significant amount of complexity I'd rather not deal with.

Other votes?  Other ideas?



5) test and set it in initdb.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas

On Thu, Oct 10, 2013 at 2:21 PM, Andrew Dunstan and...@dunslane.net wrote:
 Other votes?  Other ideas?

 5) test and set it in initdb.

Are you advocating for that option, or just calling out that it's
possible?  I'd say that's closely related to option #3, except at
initdb time rather than run-time - and it might be preferable to #3
for some of the same reasons discussed on the thread about tuning
work_mem, namely, that having it change from one postmaster lifetime
to the next might lead to user astonishment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Peter Geoghegan

On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas robertmh...@gmail.com wrote:
 (2) Default to using System V shared memory.  If people want POSIX
 shared memory, let them change the default.

 After some consideration, I think my vote is for option #2.

Wouldn't that become the call of packagers? Wasn't there already some
reason why it was advantageous for FreeBSD to continue to use System V
shared memory?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas

On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan p...@heroku.com wrote:
 On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas robertmh...@gmail.com wrote:
 (2) Default to using System V shared memory.  If people want POSIX
 shared memory, let them change the default.

 After some consideration, I think my vote is for option #2.

 Wouldn't that become the call of packagers?

Packagers can certainly override whatever we do, but we still need to
make the buildfarm green again.

 Wasn't there already some
 reason why it was advantageous for FreeBSD to continue to use System V
 shared memory?

Yes, but this code doesn't affect the main shared memory segment, so I
think that's sort of a separate point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andrew Dunstan



On 10/10/2013 02:35 PM, Robert Haas wrote:

On Thu, Oct 10, 2013 at 2:21 PM, Andrew Dunstan and...@dunslane.net wrote:

Other votes?  Other ideas?

5) test and set it in initdb.

Are you advocating for that option, or just calling out that it's
possible?  I'd say that's closely related to option #3, except at
initdb time rather than run-time - and it might be preferable to #3
for some of the same reasons discussed on the thread about tuning
work_mem, namely, that having it change from one postmaster lifetime
to the next might lead to user astonishment.




Mainly just to throw it into the mix, But like you I think it's probably 
a better option than #3 for the reason you give. It also has the 
advantage of keeping any probing code out of the backend.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan p...@heroku.com wrote:
  On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas robertmh...@gmail.com wrote:
  (2) Default to using System V shared memory.  If people want POSIX
  shared memory, let them change the default.
 
  After some consideration, I think my vote is for option #2.
 
  Wouldn't that become the call of packagers?
 
 Packagers can certainly override whatever we do, but we still need to
 make the buildfarm green again.

While I agree that making the buildfarm green is valuable, I really
wonder about a system where /dev/shm is busted.

Personally, I like Andrew's suggestion to test and set accordingly, with
the default being POSIX if it's available and a fall-back to SysV (maybe
with a warning..).  Going back to the situation where our default set-up
limits us to the ridiculously small SysV value would *really* suck; even
if we don't have any users today, we're certainly going to have some
soon and I don't think they'll be happy with a 24MB (or whatever) limit.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Merlin Moncure

On Thu, Oct 10, 2013 at 11:13 AM, Robert Haas robertmh...@gmail.com wrote:
 Since, as has been previously discussed in this forum on multiple
 occasions [citation needed], the default System V shared memory limits
 are absurdly low on many systems, the dynamic shared memory patch
 defaults to POSIX shared memory, which has often been touted as a
 superior alternative [citation needed].  Unfortunately, the buildfarm
 isn't entirely happy with this decision.  On buildfarm member anole
 (HP-UX B.11.31), allocation of dynamic shared memory fails with a
 Permission denied error, and on smew (Debian GNU/Linux 6.0), it
 fails with Function not implemented, which according to a forum
 post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
 on that box.

 What shall we do about this?  I see a few options.

 (1) Define the issue as not our problem.  IOW, as of now, if you
 want to use PostgreSQL, you've got to either make POSIX shared memory
 work on your machine, or change the GUC that selects the type of
 dynamic shared memory used.

 (2) Default to using System V shared memory.  If people want POSIX
 shared memory, let them change the default.

Doesn't #2 negate all advantages of this effort?  Bringing sysv
management back on the table seems like a giant step backwards -- or
am I missing something?

http://www.postgresql.org/docs/9.3/interactive/kernel-resources.html#SYSVIPC

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andrew Dunstan



On 10/10/2013 02:45 PM, Robert Haas wrote:

On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan p...@heroku.com wrote:

On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas robertmh...@gmail.com wrote:

(2) Default to using System V shared memory.  If people want POSIX
shared memory, let them change the default.
After some consideration, I think my vote is for option #2.

Wouldn't that become the call of packagers?

Packagers can certainly override whatever we do, but we still need to
make the buildfarm green again.





I really dislike throwing things over the wall to packagers like this, 
anyway. Quite apart from anything else, not everyone uses pre-built 
packages, and we should make things as as easy as possible for those who 
don't, too.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread David Fetter

On Thu, Oct 10, 2013 at 12:13:20PM -0400, Robert Haas wrote:
 Since, as has been previously discussed in this forum on multiple
 occasions [citation needed], the default System V shared memory limits
 are absurdly low on many systems, the dynamic shared memory patch
 defaults to POSIX shared memory, which has often been touted as a
 superior alternative [citation needed].  Unfortunately, the buildfarm
 isn't entirely happy with this decision.  On buildfarm member anole
 (HP-UX B.11.31), allocation of dynamic shared memory fails with a
 Permission denied error, and on smew (Debian GNU/Linux 6.0), it
 fails with Function not implemented, which according to a forum
 post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
 on that box.
 
 What shall we do about this?  I see a few options.
 
 (1) Define the issue as not our problem.  IOW, as of now, if you
 want to use PostgreSQL, you've got to either make POSIX shared memory
 work on your machine, or change the GUC that selects the type of
 dynamic shared memory used.

+1 for this.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas

On Thu, Oct 10, 2013 at 4:00 PM, Merlin Moncure mmonc...@gmail.com wrote:
 (2) Default to using System V shared memory.  If people want POSIX
 shared memory, let them change the default.

 Doesn't #2 negate all advantages of this effort?  Bringing sysv
 management back on the table seems like a giant step backwards -- or
 am I missing something?

Not unless there's no difference between the default and the only option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Josh Berkus

Robert,

 Doesn't #2 negate all advantages of this effort?  Bringing sysv
 management back on the table seems like a giant step backwards -- or
 am I missing something?
 
 Not unless there's no difference between the default and the only option.

Well, per our earlier discussion about the first 15 minutes, there
actually isn't a difference.

We got rid of SHMMAX for the majority of our users for 9.3.  We should
NOT revert that just so we can support older platforms -- especially
since, if anything, Kernel.org is going to cripple SysV support even
further in the future.  The platforms where it does work represent the
vast majority of our users, and are only on the increase.

I can only see two reasonable alternatives:

This one:
 (3) Add a new setting that auto-probes for a type of shared memory
 that works.  Try POSIX first, then System V.  Maybe even fall back to
 mmap'd files if neither of those works.

Or:

(5) Default to POSIX, and allow for SysV as a compile-time option for
platforms with poor POSIX memory support.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andres Freund

On 2013-10-10 12:13:20 -0400, Robert Haas wrote:
 and on smew (Debian GNU/Linux 6.0), it
 fails with Function not implemented, which according to a forum
 post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
 on that box.

It would be nice to get confirmed what the reason for that is - afaik
debian has mounted /dev/shm for ages. The most likely issue I can see is
an incorrectly setup chroot.
If it's just that I wouldn't be too concerned about it. That's a
scenario that won't happen to many users and relatively easily fixed

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 137 matches

Mail list logo