Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread David Rowley
On 17 August 2017 at 01:20, Heikki Linnakangas  wrote:
> Looks reasonable. I edited the comments and the variable names a bit, to my
> liking, and committed. Thanks!

Thanks for committing. I've just been catching up on all that went on
while I was sleeping. Thanks for handling the cleanup too.

I'll feel better once pademelon goes green again. From looking at the
latest failure on it, it appears that your swapping of
pg_atomic_write_u64 for pg_atomic_init_u64 should fix this. My
apologies for that mistake.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Tom Lane
Andres Freund  writes:
> On August 16, 2017 3:09:27 PM PDT, Tom Lane  wrote:
>> I wonder whether it's sensible to have --enable-cassert have the effect
>> of filling memory allocated by ShmemAlloc or the DSA code with junk (as
>> palloc does) instead of leaving it at zeroes.  It's not modeling the
>> same kind of effect, since we have no shmem-freeing primitives, but
>> it might be useful for this sort of thing.

> We kind of do - crash restarts... So yes, that's probably a good idea.

Crash restart releases the shmem segment and acquires a new one,
doesn't it?  Or am I misremembering?  I thought that it did do so,
if only to make darn sure that no old processes remain connected
to shmem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Andres Freund


On August 16, 2017 3:09:27 PM PDT, Tom Lane  wrote:
>Heikki Linnakangas  writes:
>> On 08/17/2017 12:20 AM, Tom Lane wrote:
>>> Indeed, gaur fails with
>>> 2017-08-16 17:09:38.315 EDT [13043:11] PANIC:  stuck spinlock
>detected at pg_atomic_compare_exchange_u64_impl, atomics.c:196
>
>> I was able to reproduce this locally, with --disable-atomics, but
>only 
>> after hacking it to fill the struct with garbage, before initializing
>
>> it. IOW, it failed to fail, because the spinlock happened to be 
>> initialized correctly by accident. Perhaps that's happening on
>piculet, too.
>
>Oh, right.  HPPA is unique among our platforms, I think, in that the
>"unlocked" state of a spinlock is not "all zeroes".  So if you're
>dealing
>with pre-zeroed memory, which shmem generally would be, failing to
>initialize a spinlock does not cause visible errors except on HPPA.
>
>I wonder whether it's sensible to have --enable-cassert have the effect
>of filling memory allocated by ShmemAlloc or the DSA code with junk (as
>palloc does) instead of leaving it at zeroes.  It's not modeling the
>same kind of effect, since we have no shmem-freeing primitives, but
>it might be useful for this sort of thing.

We kind of do - crash restarts... So yes, that's probably a good idea.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Tom Lane
Heikki Linnakangas  writes:
> On 08/17/2017 12:20 AM, Tom Lane wrote:
>> Indeed, gaur fails with
>> 2017-08-16 17:09:38.315 EDT [13043:11] PANIC:  stuck spinlock detected at 
>> pg_atomic_compare_exchange_u64_impl, atomics.c:196

> I was able to reproduce this locally, with --disable-atomics, but only 
> after hacking it to fill the struct with garbage, before initializing 
> it. IOW, it failed to fail, because the spinlock happened to be 
> initialized correctly by accident. Perhaps that's happening on piculet, too.

Oh, right.  HPPA is unique among our platforms, I think, in that the
"unlocked" state of a spinlock is not "all zeroes".  So if you're dealing
with pre-zeroed memory, which shmem generally would be, failing to
initialize a spinlock does not cause visible errors except on HPPA.

I wonder whether it's sensible to have --enable-cassert have the effect
of filling memory allocated by ShmemAlloc or the DSA code with junk (as
palloc does) instead of leaving it at zeroes.  It's not modeling the
same kind of effect, since we have no shmem-freeing primitives, but
it might be useful for this sort of thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Heikki Linnakangas

On 08/17/2017 12:20 AM, Tom Lane wrote:

Andres Freund  writes:

On 2017-08-16 16:20:28 +0300, Heikki Linnakangas wrote:
+   pg_atomic_write_u64(>phs_nallocated, 0);



It's not ok to initialize an atomic with pg_atomic_write_u64 - works
well enough for "plain" atomics, but the fallback implementation isn't
ok with it. You're probably going to get a failure on the respective
buildfarm animal soon.


Indeed, gaur fails with

2017-08-16 17:09:38.315 EDT [13043:11] PANIC:  stuck spinlock detected at pg_at\
omic_compare_exchange_u64_impl, atomics.c:196
2017-08-16 17:09:38.315 EDT [13043:12] STATEMENT:  select count(*) from a_star;
2017-08-16 17:09:40.350 EDT [12437:3] LOG:  server process (PID 13043) was term\
inated by signal 6
2017-08-16 17:09:40.350 EDT [12437:4] DETAIL:  Failed process was running: sele\
ct count(*) from a_star;

and I'm sure pademelon will fail once it gets to that.


Fixed.


I thought we had other buildfarm animals testing the fallback path,
though?

Yes, at least piculet is building with --disable-atomics.

I was able to reproduce this locally, with --disable-atomics, but only 
after hacking it to fill the struct with garbage, before initializing 
it. IOW, it failed to fail, because the spinlock happened to be 
initialized correctly by accident. Perhaps that's happening on piculet, too.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Tom Lane
Andres Freund  writes:
> On 2017-08-16 16:20:28 +0300, Heikki Linnakangas wrote:
> + pg_atomic_write_u64(>phs_nallocated, 0);

> It's not ok to initialize an atomic with pg_atomic_write_u64 - works
> well enough for "plain" atomics, but the fallback implementation isn't
> ok with it. You're probably going to get a failure on the respective
> buildfarm animal soon.

Indeed, gaur fails with

2017-08-16 17:09:38.315 EDT [13043:11] PANIC:  stuck spinlock detected at pg_at\
omic_compare_exchange_u64_impl, atomics.c:196
2017-08-16 17:09:38.315 EDT [13043:12] STATEMENT:  select count(*) from a_star;
2017-08-16 17:09:40.350 EDT [12437:3] LOG:  server process (PID 13043) was term\
inated by signal 6
2017-08-16 17:09:40.350 EDT [12437:4] DETAIL:  Failed process was running: sele\
ct count(*) from a_star;

and I'm sure pademelon will fail once it gets to that.  I thought we
had other buildfarm animals testing the fallback path, though?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Tom Lane
Heikki Linnakangas  writes:
> On 08/16/2017 09:00 PM, Tom Lane wrote:
>> I don't buy that argument.  A caller might think "Why do I need
>> shm_toc_estimate, when I can compute the *exact* size I need?".
>> And it would have worked, up till this proposed patch.

> Well, no. The size of the shm_toc struct is subtracted from the size 
> that you give to shm_toc_create. As well as the sizes of the TOC 
> entries. And those sizes are private to shm_toc.c, so a caller has no 
> way to know what size it should pass to shm_toc_create(), in order to 
> have N bytes of space actually usable. You really need to use 
> shm_toc_estimate() if you want any guarantees on what will fit.

Good point --- objection withdrawn.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Andres Freund
On 2017-08-16 16:20:28 +0300, Heikki Linnakangas wrote:
> On 05/06/2017 04:57 PM, David Rowley wrote:
> > Andres mentioned in [2] that it might be worth exploring using atomics
> > to do the same job. So I went ahead and did that, and came up with the
> > attached, which is a slight variation on what he mentioned in the
> > thread.
> > 
> > To keep things a bit more simple, and streamline, I ended up pulling
> > out the logic for setting the startblock into another function, which
> > we only call once before the first call to
> > heap_parallelscan_nextpage().  I also ended up changing phs_cblock and
> > replacing it with a counter that always starts at zero. The actual
> > block is calculated based on that + the startblock modulo nblocks.
> > This makes things a good bit more simple for detecting when we've
> > allocated all the blocks to the workers, and also works nicely when
> > wrapping back to the start of a relation when we started somewhere in
> > the middle due to piggybacking with a synchronous scan.

> Looks reasonable. I edited the comments and the variable names a bit, to my
> liking, and committed. Thanks!

Brief post-commit review:

+* phs_nallocated tracks how many pages have been allocated to workers
+* already.  When phs_nallocated >= rs_nblocks, all blocks have been
+* allocated.

allocated seems a bit of a confusing terminology.


@@ -1635,8 +1637,8 @@ heap_parallelscan_initialize(ParallelHeapScanDesc target, 
Relation relation,
!RelationUsesLocalBuffers(relation) &&
target->phs_nblocks > NBuffers / 4;
SpinLockInit(>phs_mutex);
-   target->phs_cblock = InvalidBlockNumber;
target->phs_startblock = InvalidBlockNumber;
+   pg_atomic_write_u64(>phs_nallocated, 0);
SerializeSnapshot(snapshot, target->phs_snapshot_data);
 }

It's not ok to initialize an atomic with pg_atomic_write_u64 - works
well enough for "plain" atomics, but the fallback implementation isn't
ok with it. You're probably going to get a failure on the respective
buildfarm animal soon.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Andres Freund
On 2017-08-16 14:09:08 -0400, Tom Lane wrote:
> >> I'm not sure that that's good enough, and I'm damn sure that it
> >> shouldn't be undocumented.
> 
> > 8 byte alignment would be good enough, so BUFFERALIGN ought to be
> > sufficient. But it'd be nicer to have a separate more descriptive knob.
> 
> What I meant by possibly not good enough is that pg_atomic_uint64 used
> in other places isn't going to be very safe.

Well, it's not used otherwise in core so far, leaving test code
aside. It's correctly aligned if part of a aligned struct - the atomics
code itself can't really do anything about aligning that struct itself
isn't aligned.


> We might be effectively all right as long as we have a coding rule that
> pg_atomic_uint64 can only be placed in memory handed out by ShmemAlloc
> or shm_toc_allocate, which both have bigger-than-MAXALIGN alignment
> practices.  But this needs to be documented.

Well, one could argue the alignment checks in every function are that
:). But yea, we probably should mention it more than that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Heikki Linnakangas

On 08/16/2017 09:00 PM, Tom Lane wrote:

Robert Haas  writes:

On Wed, Aug 16, 2017 at 1:44 PM, Tom Lane  wrote:

I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a
different reason: if the caller has specified the exact amount of space it
needs, having shm_toc_create discard some could lead to an unexpected
failure.



Well, that's why Heikki also patched shm_toc_estimate.  With the
patch, if the size being used in shm_toc_create comes from
shm_toc_estimate, it will always be aligned and nothing bad will
happen.


Sure.  So the point is entirely about what should happen if someone
doesn't use shm_toc_estimate.


If the user invents another size out of whole cloth, then
they might get a few bytes less than they expect, but that's what you
get for not using shm_toc_estimate().


I don't buy that argument.  A caller might think "Why do I need
shm_toc_estimate, when I can compute the *exact* size I need?".
And it would have worked, up till this proposed patch.


Well, no. The size of the shm_toc struct is subtracted from the size 
that you give to shm_toc_create. As well as the sizes of the TOC 
entries. And those sizes are private to shm_toc.c, so a caller has no 
way to know what size it should pass to shm_toc_create(), in order to 
have N bytes of space actually usable. You really need to use 
shm_toc_estimate() if you want any guarantees on what will fit.


I've pushed the patch to fix this, with some extra comments and 
reformatting shm_toc_estimate.



8 byte alignment would be good enough, so BUFFERALIGN ought to be
sufficient. But it'd be nicer to have a separate more descriptive knob.


What I meant by possibly not good enough is that pg_atomic_uint64 used
in other places isn't going to be very safe.

We might be effectively all right as long as we have a coding rule that
pg_atomic_uint64 can only be placed in memory handed out by ShmemAlloc
or shm_toc_allocate, which both have bigger-than-MAXALIGN alignment
practices.  But this needs to be documented.


Yeah. We are implicitly also relying on ShmemAlloc() to return 
sufficiently-aligned memory. Palloc() too, although you probably 
wouldn't use atomic ops on a palloc'd struct. I think we should 
introduce a new ALIGNOF macro for that, and document that those 
functions return memory with enough alignment. GENUINEMAX_ALIGNOF? 
MAXSTRUCT_ALIGNOF?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Tom Lane
Andres Freund  writes:
> On 2017-08-16 13:40:09 -0400, Tom Lane wrote:
>> I was wondering why the shm_toc code was using BUFFERALIGN and not
>> MAXALIGN, and I now suspect that the answer is "it's an entirely
>> undocumented kluge to make the atomics code not crash on 32-bit
>> machines, so long as nobody puts a pg_atomic_uint64 anywhere except in
>> a shm_toc".

> I don't think there were any atomics in affected code until earlier
> today... And given it didn't work for shm_toc anyway, I'm not quite
> following.

Right, Robert pointed out that it's pre-existing code.  My point should
be read as "it's just blind luck that shm_toc is using bigger than
MAXALIGN alignment, or this would never work on 32-bit machines".

>> I'm not sure that that's good enough, and I'm damn sure that it
>> shouldn't be undocumented.

> 8 byte alignment would be good enough, so BUFFERALIGN ought to be
> sufficient. But it'd be nicer to have a separate more descriptive knob.

What I meant by possibly not good enough is that pg_atomic_uint64 used
in other places isn't going to be very safe.

We might be effectively all right as long as we have a coding rule that
pg_atomic_uint64 can only be placed in memory handed out by ShmemAlloc
or shm_toc_allocate, which both have bigger-than-MAXALIGN alignment
practices.  But this needs to be documented.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Tom Lane
Robert Haas  writes:
> On Wed, Aug 16, 2017 at 1:44 PM, Tom Lane  wrote:
>> I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a
>> different reason: if the caller has specified the exact amount of space it
>> needs, having shm_toc_create discard some could lead to an unexpected
>> failure.

> Well, that's why Heikki also patched shm_toc_estimate.  With the
> patch, if the size being used in shm_toc_create comes from
> shm_toc_estimate, it will always be aligned and nothing bad will
> happen.

Sure.  So the point is entirely about what should happen if someone
doesn't use shm_toc_estimate.

> If the user invents another size out of whole cloth, then
> they might get a few bytes less than they expect, but that's what you
> get for not using shm_toc_estimate().

I don't buy that argument.  A caller might think "Why do I need
shm_toc_estimate, when I can compute the *exact* size I need?".
And it would have worked, up till this proposed patch.

I think the new API spec for such cases ought to be "if you supply an
unaligned size, we'll error out", not "if you supply an unaligned size,
we'll silently lose some of it".  The former is going to behave a lot
more predictably.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Andres Freund


On August 16, 2017 10:47:23 AM PDT, Robert Haas  wrote:
>On Wed, Aug 16, 2017 at 1:40 PM, Tom Lane  wrote:
>> I was wondering why the shm_toc code was using BUFFERALIGN and not
>> MAXALIGN, and I now suspect that the answer is "it's an entirely
>> undocumented kluge to make the atomics code not crash on 32-bit
>> machines, so long as nobody puts a pg_atomic_uint64 anywhere except
>> in a shm_toc".
>
>Well, shm_toc considerably predates 64-bit atomics, so I think the
>causality cannot run in that direction.  shm_toc.c first appeared in
>the tree in January of 2014.  src/include/port/atomics didn't show up
>until September of that year, and 64-bit atomics weren't actually
>usable in practice until e8fdbd58fe564a29977f4331cd26f9697d76fc40 in
>April of 2017.

Well, not for core code.  I certainly know about production code using it, 
because crusty platforms are considered irrelevant...

Independent of that, a comment explaining what the BUFFERALIGN is intending 
would be good.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Robert Haas
On Wed, Aug 16, 2017 at 1:44 PM, Tom Lane  wrote:
> I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a
> different reason: if the caller has specified the exact amount of space it
> needs, having shm_toc_create discard some could lead to an unexpected
> failure.

Well, that's why Heikki also patched shm_toc_estimate.  With the
patch, if the size being used in shm_toc_create comes from
shm_toc_estimate, it will always be aligned and nothing bad will
happen.  If the user invents another size out of whole cloth, then
they might get a few bytes less than they expect, but that's what you
get for not using shm_toc_estimate().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Andres Freund
On 2017-08-16 13:44:28 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Don't think we require BUFFERALIGN - MAXALIGN ought to be
> > sufficient.
> 
> Uh, see my other message just now.

Yup, you're right.


> > The use of BUFFERALIGN presumably is to space out things
> > into different cachelines, but that doesn't really seem to be important
> > with this.  Then we can just avoid defining the new macro...
> 
> I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a
> different reason: if the caller has specified the exact amount of space it
> needs, having shm_toc_create discard some could lead to an unexpected
> failure.

Well, that's why shm_toc_estimate() increases the size appropriately.


> I wonder whether maybe shm_toc_create should just error out if the
> number it's handed isn't aligned already.

I think that's going to be harder atm, because it's not the size shm_toc
computes, it's what the caller to shm_toc_estimate_chunk() provides.
And that size is already guaranteed to be upsized by BUFFERALIGN in
shm_toc_estimate_chunk().  It's just that the base-offset from where the
allocations start isn't aligned.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Robert Haas
On Wed, Aug 16, 2017 at 1:40 PM, Tom Lane  wrote:
> I was wondering why the shm_toc code was using BUFFERALIGN and not
> MAXALIGN, and I now suspect that the answer is "it's an entirely
> undocumented kluge to make the atomics code not crash on 32-bit
> machines, so long as nobody puts a pg_atomic_uint64 anywhere except
> in a shm_toc".

Well, shm_toc considerably predates 64-bit atomics, so I think the
causality cannot run in that direction.  shm_toc.c first appeared in
the tree in January of 2014.  src/include/port/atomics didn't show up
until September of that year, and 64-bit atomics weren't actually
usable in practice until e8fdbd58fe564a29977f4331cd26f9697d76fc40 in
April of 2017.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Andres Freund
On 2017-08-16 13:40:09 -0400, Tom Lane wrote:
> I wrote:
> > I can confirm that on dromedary, that regression test case is attempting
> > to create a TOC with a not-well-aligned size: 93268 = 0x16c54 bytes.
> 
> ... although, on closer look, it still seems like we have a fundamental
> bit of schizophrenia here, because on this machine
> 
> $ grep ALIGN pg_config.h
> #define ALIGNOF_DOUBLE 4
> #define ALIGNOF_INT 4
> #define ALIGNOF_LONG 4
> #define ALIGNOF_LONG_LONG_INT 4
> #define ALIGNOF_SHORT 2
> #define MAXIMUM_ALIGNOF 4
> 
> Basically, therefore, ISTM that it is not a good thing that the atomics
> code thinks it can rely on 8-byte-aligned data when the entire rest of
> the system believes that 4-byte alignment is enough for anything.

That's a hardware requirement, we can't do much about it. Several
[micro-]architectures don't support unaligned atomic 8 byte writes.


> I was wondering why the shm_toc code was using BUFFERALIGN and not
> MAXALIGN, and I now suspect that the answer is "it's an entirely
> undocumented kluge to make the atomics code not crash on 32-bit
> machines, so long as nobody puts a pg_atomic_uint64 anywhere except in
> a shm_toc".

I don't think there were any atomics in affected code until earlier
today... And given it didn't work for shm_toc anyway, I'm not quite
following.


> I'm not sure that that's good enough, and I'm damn sure that it
> shouldn't be undocumented.

8 byte alignment would be good enough, so BUFFERALIGN ought to be
sufficient. But it'd be nicer to have a separate more descriptive knob.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Tom Lane
Andres Freund  writes:
> Don't think we require BUFFERALIGN - MAXALIGN ought to be
> sufficient.

Uh, see my other message just now.

> The use of BUFFERALIGN presumably is to space out things
> into different cachelines, but that doesn't really seem to be important
> with this.  Then we can just avoid defining the new macro...

I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a
different reason: if the caller has specified the exact amount of space it
needs, having shm_toc_create discard some could lead to an unexpected
failure.  I wonder whether maybe shm_toc_create should just error out if
the number it's handed isn't aligned already.

>> +return BUFFERALIGN(
>> +add_size(offsetof(shm_toc, toc_entry),
>> + add_size(mul_size(e->number_of_keys, 
>> sizeof(shm_toc_entry)),
>> +  e->space_for_chunks)));

> I think splitting this into separate statements would be better.

+1, it was too complicated already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Tom Lane
I wrote:
> I can confirm that on dromedary, that regression test case is attempting
> to create a TOC with a not-well-aligned size: 93268 = 0x16c54 bytes.

... although, on closer look, it still seems like we have a fundamental
bit of schizophrenia here, because on this machine

$ grep ALIGN pg_config.h
#define ALIGNOF_DOUBLE 4
#define ALIGNOF_INT 4
#define ALIGNOF_LONG 4
#define ALIGNOF_LONG_LONG_INT 4
#define ALIGNOF_SHORT 2
#define MAXIMUM_ALIGNOF 4

Basically, therefore, ISTM that it is not a good thing that the atomics
code thinks it can rely on 8-byte-aligned data when the entire rest of
the system believes that 4-byte alignment is enough for anything.

I was wondering why the shm_toc code was using BUFFERALIGN and not
MAXALIGN, and I now suspect that the answer is "it's an entirely
undocumented kluge to make the atomics code not crash on 32-bit
machines, so long as nobody puts a pg_atomic_uint64 anywhere except
in a shm_toc".

I'm not sure that that's good enough, and I'm damn sure that it
shouldn't be undocumented.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Tom Lane
Heikki Linnakangas  writes:
> On 08/16/2017 08:10 PM, Andres Freund wrote:
>> Seems like we'd just have to add alignment of the total size to
>> shm_toc_estimate()?

> Yeah, that's the gist of it.

I can confirm that on dromedary, that regression test case is attempting
to create a TOC with a not-well-aligned size: 93268 = 0x16c54 bytes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Andres Freund
Hi,

On 2017-08-16 20:19:29 +0300, Heikki Linnakangas wrote:
> diff --git a/src/backend/storage/ipc/shm_toc.c 
> b/src/backend/storage/ipc/shm_toc.c
> index 9f259441f0..121d5a1ec9 100644
> --- a/src/backend/storage/ipc/shm_toc.c
> +++ b/src/backend/storage/ipc/shm_toc.c
> @@ -44,7 +44,7 @@ shm_toc_create(uint64 magic, void *address, Size nbytes)
>   Assert(nbytes > offsetof(shm_toc, toc_entry));
>   toc->toc_magic = magic;
>   SpinLockInit(>toc_mutex);
> - toc->toc_total_bytes = nbytes;
> + toc->toc_total_bytes = BUFFERALIGN_DOWN(nbytes);
>   toc->toc_allocated_bytes = 0;
>   toc->toc_nentry = 0;

Don't think we require BUFFERALIGN - MAXALIGN ought to be
sufficient. The use of BUFFERALIGN presumably is to space out things
into different cachelines, but that doesn't really seem to be important
with this.  Then we can just avoid defining the new macro...


> @@ -252,7 +252,8 @@ shm_toc_lookup(shm_toc *toc, uint64 key, bool noError)
>  Size
>  shm_toc_estimate(shm_toc_estimator *e)
>  {
> - return add_size(offsetof(shm_toc, toc_entry),
> - add_size(mul_size(e->number_of_keys, 
> sizeof(shm_toc_entry)),
> -  e->space_for_chunks));
> + return BUFFERALIGN(
> + add_size(offsetof(shm_toc, toc_entry),
> +  add_size(mul_size(e->number_of_keys, 
> sizeof(shm_toc_entry)),
> +   e->space_for_chunks)));
>  }

I think splitting this into separate statements would be better. I think
we also should rather *ALLIGN the size without e->space_for_chunks. The
latter is already aligned properly, and the important bit is that we
have enough space for e->space_for_chunks afterwards, and only padding
is in th eway of that...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Robert Haas
On Wed, Aug 16, 2017 at 1:19 PM, Heikki Linnakangas  wrote:
> Yeah, that's the gist of it.
>
> The attached patch seems to fix this. I'm not too familiar with this DSM
> stuff, but this seems right to me. Unless someone has a better idea soon,
> I'll commit this to make the buildfarm happy.

Mmm, clever.  Yeah, that looks reasonable at first glance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Heikki Linnakangas

On 08/16/2017 08:10 PM, Andres Freund wrote:

Afaict shm_create/shm_toc_allocate don't actually guarantee that the end
of the toc's memory is suitably aligned.  But I didn't yet have any
coffee, so ...


Robert, I'm not quite sure what the intended behaviour of shm_toc is wrt
alignment. I see that individual chunks are BUFFERALIGNed (both during
estimation, and allocation). But I don't see how the size of the entire
toc is aligned, which seems a requirement, given we allocate from the
end.
Seems like we'd just have to add alignment of the total size to
shm_toc_estimate()?


Yeah, that's the gist of it.

The attached patch seems to fix this. I'm not too familiar with this DSM 
stuff, but this seems right to me. Unless someone has a better idea 
soon, I'll commit this to make the buildfarm happy.


- Heikki
diff --git a/src/backend/storage/ipc/shm_toc.c b/src/backend/storage/ipc/shm_toc.c
index 9f259441f0..121d5a1ec9 100644
--- a/src/backend/storage/ipc/shm_toc.c
+++ b/src/backend/storage/ipc/shm_toc.c
@@ -44,7 +44,7 @@ shm_toc_create(uint64 magic, void *address, Size nbytes)
 	Assert(nbytes > offsetof(shm_toc, toc_entry));
 	toc->toc_magic = magic;
 	SpinLockInit(>toc_mutex);
-	toc->toc_total_bytes = nbytes;
+	toc->toc_total_bytes = BUFFERALIGN_DOWN(nbytes);
 	toc->toc_allocated_bytes = 0;
 	toc->toc_nentry = 0;
 
@@ -252,7 +252,8 @@ shm_toc_lookup(shm_toc *toc, uint64 key, bool noError)
 Size
 shm_toc_estimate(shm_toc_estimator *e)
 {
-	return add_size(offsetof(shm_toc, toc_entry),
-	add_size(mul_size(e->number_of_keys, sizeof(shm_toc_entry)),
-			 e->space_for_chunks));
+	return BUFFERALIGN(
+		add_size(offsetof(shm_toc, toc_entry),
+ add_size(mul_size(e->number_of_keys, sizeof(shm_toc_entry)),
+		  e->space_for_chunks)));
 }
diff --git a/src/include/c.h b/src/include/c.h
index 9066e3c578..af799dc1df 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -598,6 +598,7 @@ typedef NameData *Name;
 #define LONGALIGN_DOWN(LEN)		TYPEALIGN_DOWN(ALIGNOF_LONG, (LEN))
 #define DOUBLEALIGN_DOWN(LEN)	TYPEALIGN_DOWN(ALIGNOF_DOUBLE, (LEN))
 #define MAXALIGN_DOWN(LEN)		TYPEALIGN_DOWN(MAXIMUM_ALIGNOF, (LEN))
+#define BUFFERALIGN_DOWN(LEN)	TYPEALIGN_DOWN(ALIGNOF_BUFFER, (LEN))
 
 /*
  * The above macros will not work with types wider than uintptr_t, like with

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Andres Freund
On 2017-08-16 09:57:35 -0700, Peter Geoghegan wrote:
> On Wed, Aug 16, 2017 at 9:55 AM, Andres Freund  wrote:
> > On 2017-08-16 11:16:58 -0400, Tom Lane wrote:
> >> Heikki Linnakangas  writes:
> >> > A couple of 32-bit x86 buildfarm members don't seem to be happy with
> >> > this. I'll investigate, but if anyone has a clue, I'm all ears...
> >>
> >> dromedary's issue seems to be alignment:
> >>
> >> TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
> >> ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: 
> >> "../../../../src/include/port/atomics.h", Line: 452)
> >> 2017-08-16 11:11:38.558 EDT [75693:3] LOG:  server process (PID 76277) was 
> >> terminated by signal 6: Abort trap
> >> 2017-08-16 11:11:38.558 EDT [75693:4] DETAIL:  Failed process was running: 
> >> select count(*) from a_star;
> >>
> >> Not sure if this is your bug or if it's exposing a pre-existing
> >> deficiency in the atomics code, viz, failure to ensure that
> >> pg_atomic_uint64 is actually a 64-bit-aligned type.  Andres?
> >
> > I suspect it's the former.  Suspect that the shared memory that holds
> > the "parallel desc" isn't properly aligned:
> 
> Clang has an -fsanitize=alignment option that may be useful here.

Given that we're crashing in an assert that does nothing but check for
alignment, before the memory is actually used in a "dangerous" manner, I
don't quite see how?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Andres Freund
Robert, Tom,


On 2017-08-16 09:55:15 -0700, Andres Freund wrote:
> > Not sure if this is your bug or if it's exposing a pre-existing
> > deficiency in the atomics code, viz, failure to ensure that
> > pg_atomic_uint64 is actually a 64-bit-aligned type.  Andres?

> I suspect it's the former.  Suspect that the shared memory that holds
> the "parallel desc" isn't properly aligned:

Or, well, a mixture of both, because it seems like a deficiency in the
shm_toc, code, rather than the atomics code.


> Afaict shm_create/shm_toc_allocate don't actually guarantee that the end
> of the toc's memory is suitably aligned.  But I didn't yet have any
> coffee, so ...

Robert, I'm not quite sure what the intended behaviour of shm_toc is wrt
alignment. I see that individual chunks are BUFFERALIGNed (both during
estimation, and allocation). But I don't see how the size of the entire
toc is aligned, which seems a requirement, given we allocate from the
end.
Seems like we'd just have to add alignment of the total size to
shm_toc_estimate()?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Peter Geoghegan
On Wed, Aug 16, 2017 at 9:55 AM, Andres Freund  wrote:
> On 2017-08-16 11:16:58 -0400, Tom Lane wrote:
>> Heikki Linnakangas  writes:
>> > A couple of 32-bit x86 buildfarm members don't seem to be happy with
>> > this. I'll investigate, but if anyone has a clue, I'm all ears...
>>
>> dromedary's issue seems to be alignment:
>>
>> TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
>> ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: 
>> "../../../../src/include/port/atomics.h", Line: 452)
>> 2017-08-16 11:11:38.558 EDT [75693:3] LOG:  server process (PID 76277) was 
>> terminated by signal 6: Abort trap
>> 2017-08-16 11:11:38.558 EDT [75693:4] DETAIL:  Failed process was running: 
>> select count(*) from a_star;
>>
>> Not sure if this is your bug or if it's exposing a pre-existing
>> deficiency in the atomics code, viz, failure to ensure that
>> pg_atomic_uint64 is actually a 64-bit-aligned type.  Andres?
>
> I suspect it's the former.  Suspect that the shared memory that holds
> the "parallel desc" isn't properly aligned:

Clang has an -fsanitize=alignment option that may be useful here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Andres Freund
On 2017-08-16 11:16:58 -0400, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > A couple of 32-bit x86 buildfarm members don't seem to be happy with 
> > this. I'll investigate, but if anyone has a clue, I'm all ears...
> 
> dromedary's issue seems to be alignment:
> 
> TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
> ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: 
> "../../../../src/include/port/atomics.h", Line: 452)
> 2017-08-16 11:11:38.558 EDT [75693:3] LOG:  server process (PID 76277) was 
> terminated by signal 6: Abort trap
> 2017-08-16 11:11:38.558 EDT [75693:4] DETAIL:  Failed process was running: 
> select count(*) from a_star;
> 
> Not sure if this is your bug or if it's exposing a pre-existing
> deficiency in the atomics code, viz, failure to ensure that
> pg_atomic_uint64 is actually a 64-bit-aligned type.  Andres?

I suspect it's the former.  Suspect that the shared memory that holds
the "parallel desc" isn't properly aligned:

void
ExecSeqScanInitializeDSM(SeqScanState *node,
 ParallelContext *pcxt)
{
...
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
heap_parallelscan_initialize(pscan,
 
node->ss.ss_currentRelation,
 
estate->es_snapshot);


/*
...
 * We allocate backwards from the end of the segment, so that the TOC entries
 * can grow forward from the start of the segment.
 */
extern void *
shm_toc_allocate(shm_toc *toc, Size nbytes)
{
volatile shm_toc *vtoc = toc;
Sizetotal_bytes;
Sizeallocated_bytes;
Sizenentry;
Sizetoc_bytes;

/* Make sure request is well-aligned. */
nbytes = BUFFERALIGN(nbytes);
...
return ((char *) toc) + (total_bytes - allocated_bytes - nbytes);
}


/*
 * Initialize a region of shared memory with a table of contents.
 */
shm_toc *
shm_toc_create(uint64 magic, void *address, Size nbytes)
{
shm_toc*toc = (shm_toc *) address;

Assert(nbytes > offsetof(shm_toc, toc_entry));
toc->toc_magic = magic;
SpinLockInit(>toc_mutex);
toc->toc_total_bytes = nbytes;
toc->toc_allocated_bytes = 0;
toc->toc_nentry = 0;

return toc;
}

Afaict shm_create/shm_toc_allocate don't actually guarantee that the end
of the toc's memory is suitably aligned.  But I didn't yet have any
coffee, so ...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Tom Lane
Heikki Linnakangas  writes:
> A couple of 32-bit x86 buildfarm members don't seem to be happy with 
> this. I'll investigate, but if anyone has a clue, I'm all ears...

dromedary's issue seems to be alignment:

TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: 
"../../../../src/include/port/atomics.h", Line: 452)
2017-08-16 11:11:38.558 EDT [75693:3] LOG:  server process (PID 76277) was 
terminated by signal 6: Abort trap
2017-08-16 11:11:38.558 EDT [75693:4] DETAIL:  Failed process was running: 
select count(*) from a_star;

Not sure if this is your bug or if it's exposing a pre-existing
deficiency in the atomics code, viz, failure to ensure that
pg_atomic_uint64 is actually a 64-bit-aligned type.  Andres?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Heikki Linnakangas

On 08/16/2017 04:20 PM, Heikki Linnakangas wrote:

On 05/06/2017 04:57 PM, David Rowley wrote:

Andres mentioned in [2] that it might be worth exploring using
atomics to do the same job. So I went ahead and did that, and came
up with the attached, which is a slight variation on what he
mentioned in the thread.

To keep things a bit more simple, and streamline, I ended up
pulling out the logic for setting the startblock into another
function, which we only call once before the first call to 
heap_parallelscan_nextpage().  I also ended up changing phs_cblock

and replacing it with a counter that always starts at zero. The
actual block is calculated based on that + the startblock modulo
nblocks. This makes things a good bit more simple for detecting
when we've allocated all the blocks to the workers, and also works
nicely when wrapping back to the start of a relation when we
started somewhere in the middle due to piggybacking with a
synchronous scan.


Looks reasonable. I edited the comments and the variable names a bit,
to my liking, and committed. Thanks!


A couple of 32-bit x86 buildfarm members don't seem to be happy with 
this. I'll investigate, but if anyone has a clue, I'm all ears...


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Heikki Linnakangas

On 05/06/2017 04:57 PM, David Rowley wrote:

Andres mentioned in [2] that it might be worth exploring using atomics
to do the same job. So I went ahead and did that, and came up with the
attached, which is a slight variation on what he mentioned in the
thread.

To keep things a bit more simple, and streamline, I ended up pulling
out the logic for setting the startblock into another function, which
we only call once before the first call to
heap_parallelscan_nextpage().  I also ended up changing phs_cblock and
replacing it with a counter that always starts at zero. The actual
block is calculated based on that + the startblock modulo nblocks.
This makes things a good bit more simple for detecting when we've
allocated all the blocks to the workers, and also works nicely when
wrapping back to the start of a relation when we started somewhere in
the middle due to piggybacking with a synchronous scan.
Looks reasonable. I edited the comments and the variable names a bit, to 
my liking, and committed. Thanks!


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-05-08 Thread Amit Kapila
On Sat, May 6, 2017 at 7:27 PM, David Rowley
 wrote:
>
> I've also noticed that both the atomics patch and unpatched master do
> something that looks a bit weird with synchronous seq-scans. If the
> parallel seq-scan piggybacked on another scan, then subsequent
> parallel scans will start at the same non-zero block location, even
> when no other concurrent scans exist.
>

That is what we expect.  The same is true for non-parallel scans as
well, refer below code for non-parallel scans.

heapgettup_pagemode()
{
..

finished = (page == scan->rs_startblock) ||
(scan->rs_numblocks != InvalidBlockNumber ? --scan->rs_numblocks == 0 : false);


/*
* Report our new scan position for synchronization purposes. We
* don't do that when moving backwards, however. That would just
* mess up any other forward-moving scanners.
*
* Note: we do this before checking for end of scan so that the
* final state of the position hint is back at the start of the
* rel.  That's not strictly necessary, but otherwise when you run
* the same query multiple times the starting position would shift
* a little bit backwards on every invocation, which is confusing.
* We don't guarantee any specific ordering in general, though.
*/
if (scan->rs_syncscan)
ss_report_location(scan->rs_rd, page);
..
}


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Atomics for heap_parallelscan_nextpage()

2017-05-06 Thread David Rowley
Hi,

A while back I did some benchmarking on a big 4 socket machine to
explore a bit around the outer limits of parallel aggregates.  I
discovered along the way that, given enough workers, and a simple
enough task, that seq-scan workers were held up waiting for the lock
to be released in heap_parallelscan_nextpage().

I've since done a little work in this area to try to improve things. I
ended up posting about it yesterday in [1]. My original patch used
batching to solve the issue; instead of allocating 1 block at a time,
the batching patch allocated a range of 10 blocks for the worker to
process. However, the implementation still needed a bit of work around
reporting sync-scan locations.

Andres mentioned in [2] that it might be worth exploring using atomics
to do the same job. So I went ahead and did that, and came up with the
attached, which is a slight variation on what he mentioned in the
thread.

To keep things a bit more simple, and streamline, I ended up pulling
out the logic for setting the startblock into another function, which
we only call once before the first call to
heap_parallelscan_nextpage().  I also ended up changing phs_cblock and
replacing it with a counter that always starts at zero. The actual
block is calculated based on that + the startblock modulo nblocks.
This makes things a good bit more simple for detecting when we've
allocated all the blocks to the workers, and also works nicely when
wrapping back to the start of a relation when we started somewhere in
the middle due to piggybacking with a synchronous scan.

Performance:

With parallel_workers=71, it looks something like:

Query 1: 881 GB, ~6 billion row TPC-H lineitem table.

tpch=# select count(*) from lineitem;
   count

 589709
(1 row)

-- Master
Time: 123421.283 ms (02:03.421)
Time: 118895.846 ms (01:58.896)
Time: 118632.546 ms (01:58.633)

-- Atomics patch
Time: 74038.813 ms (01:14.039)
Time: 73166.200 ms (01:13.166)
Time: 72492.338 ms (01:12.492)

-- Batching Patch: Batching 10 pages at a time in heap_parallelscan_nextpage()
Time: 76364.215 ms (01:16.364)
Time: 75808.900 ms (01:15.809)
Time: 74927.756 ms (01:14.928)

Query 2: Single int column table with 2 billion rows.

tpch=# select count(*) from a;
   count

 20
(1 row)

-- Master
Time: 5853.918 ms (00:05.854)
Time: 5925.633 ms (00:05.926)
Time: 5859.223 ms (00:05.859)

-- Atomics patch
Time: 5825.745 ms (00:05.826)
Time: 5849.139 ms (00:05.849)
Time: 5815.818 ms (00:05.816)

-- Batching Patch: Batching 10 pages at a time in heap_parallelscan_nextpage()
Time: 5789.237 ms (00:05.789)
Time: 5837.395 ms (00:05.837)
Time: 5821.492 ms (00:05.821)

I've also attached a text file with the perf report for the lineitem
query. You'll notice that the heap_parallelscan_nextpage() is very
visible in master, but not on each of the two patches.

With the 2nd query, heap_parallelscan_nextpage is fairly insignificant
on master's profile, it's only showing up as 0.48%. Likely this must
be due to more tuples being read from the page, and more aggregation
work getting done before the next page is needed. I'm uncertain why I
previously saw a speed up in this case in [1].

I've also noticed that both the atomics patch and unpatched master do
something that looks a bit weird with synchronous seq-scans. If the
parallel seq-scan piggybacked on another scan, then subsequent
parallel scans will start at the same non-zero block location, even
when no other concurrent scans exist. I'd have expected this should go
back to block 0 again, but maybe I'm just failing to understand the
reason for reporting the startblock to ss_report_location() at the end
of the scan.

I'll now add this to the first commitfest of pg11. I just wanted to
note that I've done this, so that it's less likely someone else goes
and repeats the same work.

[1] 
https://www.postgresql.org/message-id/CAKJS1f-XhfQ2-%3D85wgYo5b3WtEs%3Dys%3D2Rsq%3DNuvnmaV4ZsM1XQ%40mail.gmail.com

[2] 
https://www.postgresql.org/message-id/20170505023646.3uhnmf2hbwtm63lc%40alap3.anarazel.de

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
-- Unpatched select count(*) from lineitem; 71 workers
  Children  Self  Command   Shared Object Symbol
+   99.83% 0.00%  postgres  libpthread-2.17.so[.] __restore_rt
+   99.83% 0.00%  postgres  postgres  [.] sigusr1_handler
+   99.83% 0.00%  postgres  postgres  [.] maybe_start_bgworkers
+   99.83% 0.00%  postgres  postgres  [.] do_start_bgworker
+   99.83% 0.93%  postgres  postgres  [.] ExecProcNode
+   99.83% 0.00%  postgres  postgres  [.] StartBackgroundWorker
+   99.83% 0.00%  postgres  postgres  [.] ParallelWorkerMain
+   99.83% 0.00%  postgres  postgres  [.] ParallelQueryMain
+   99.83% 0.00%  postgres  postgres  [.] ExecutorRun
+   99.83% 0.00%