Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 4, 2016 at 11:17 AM, Tom Lane  wrote:
>> Huh?  Parallel workers are read-only; what would they be doing sending
>> any of these messages?

> Mumble.  I have no idea what's happening here.

OK, after inserting a bunch of debug logging I have figured out what is
happening.  The updates on trunc_stats_test et al, being updates, are
done in the session's main backend.  But we also have these queries:

-- do a seqscan
SELECT count(*) FROM tenk2;
-- do an indexscan
SELECT count(*) FROM tenk2 WHERE unique1 = 1;

These can be, and are, done in parallel worker processes (and not
necessarily the same one, either).  AFAICT, the parallel worker
processes send their stats messages to the stats collector more or
less immediately after processing their queries.  However, because
of the rate-limiting logic in pgstat_report_stat, the main backend
doesn't.  The point of that "pg_sleep(1.0)" (which was actually added
*after* wait_for_stats) is to ensure that the half-second delay in
the rate limiter has been soaked up, and the stats messages sent,
before we start waiting for the results to become visible in the
stats collector's output.

So the sequence of events when we get a failure looks like

1. parallel workers send stats updates for seqscan and indexscan
on tenk2.

2. stats collector emits output files, probably as a result of
an autovacuum request.

3. session's main backend finishes "pg_sleep(1.0)" and sends
stats updates for what it's done lately, including the
updates on trunc_stats_test et al.

4. wait_for_stats() observes that the tenk2 idx_scan count has
already advanced and figures it need not wait at all.

5. We print stale stats for trunc_stats_test et al.

So it appears to me that to make this robust, we need to adjust
wait_for_stats to verify advances on *all three of* the tenk2
seq_scan count, the tenk2 idx_scan count, and at least one of
the trunc_stats_test tables' counters, because those could be
coming from three different backend processes.

If we ever allow parallel workers to do writes, this will really
become a mess.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 11:17 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane  wrote:
>>> Well, that would make the function more complicated, but maybe it's a
>>> better answer.  On the other hand, we know that the stats updates are
>>> delivered in a deterministic order, so why not simply replace the
>>> existing test in the wait function with one that looks for the truncation
>>> updates?  If we've gotten those, we must have gotten the earlier ones.
>
>> I'm not sure if that's actually true with parallel mode.  I'm pretty
>> sure the earlier workers will have terminated before the later ones
>> start, but is that enough to guarantee that the stats collector sees
>> the messages in that order?
>
> Huh?  Parallel workers are read-only; what would they be doing sending
> any of these messages?

Mumble.  I have no idea what's happening here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane  wrote:
>> Well, that would make the function more complicated, but maybe it's a
>> better answer.  On the other hand, we know that the stats updates are
>> delivered in a deterministic order, so why not simply replace the
>> existing test in the wait function with one that looks for the truncation
>> updates?  If we've gotten those, we must have gotten the earlier ones.

> I'm not sure if that's actually true with parallel mode.  I'm pretty
> sure the earlier workers will have terminated before the later ones
> start, but is that enough to guarantee that the stats collector sees
> the messages in that order?

Huh?  Parallel workers are read-only; what would they be doing sending
any of these messages?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Alvaro Herrera
Robert Haas wrote:

> I'm not sure if that's actually true with parallel mode.  I'm pretty
> sure the earlier workers will have terminated before the later ones
> start, but is that enough to guarantee that the stats collector sees
> the messages in that order?

Um.  So if you have two queries that run in sequence, it's possible
for workers of the first query to be still running when workers for the
second query finish?  That would be very strange.

If that's not what you're saying, I don't understand what guarantees you
say we don't have.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> That's what it looks like to me.  I now think that the apparent
>>> connection to parallel query is a mirage.  The reason we've only
>>> seen a few cases so far is that the flapping test is new: it
>>> wad added in commit d42358efb16cc811, on 20 Feb.
>
>> It was added on Feb 20 all right, but of *last year*.  It's been there
>> working happily for a year now.
>
> Wup, you're right, failed to look closely enough at the commit log
> entry.  So that puts us back to wondering why exactly parallel query
> is triggering this.  Still, Robert's experiment with removing the
> pg_sleep seems fairly conclusive: it is possible to get the failure
> without parallel query.
>
>> Instead of adding another sleep function, another possibility is to add
>> two booleans, one for the index counter and another for the truncate
>> counters, and only terminate the sleep if both are true.  I don't see
>> any reason to make this test any slower than it already is.
>
> Well, that would make the function more complicated, but maybe it's a
> better answer.  On the other hand, we know that the stats updates are
> delivered in a deterministic order, so why not simply replace the
> existing test in the wait function with one that looks for the truncation
> updates?  If we've gotten those, we must have gotten the earlier ones.

I'm not sure if that's actually true with parallel mode.  I'm pretty
sure the earlier workers will have terminated before the later ones
start, but is that enough to guarantee that the stats collector sees
the messages in that order?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 10:33 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Sure.  If you have an idea what the right thing to do is, please go
>> ahead.
>
> Yeah, I'll modify the patch and commit sometime later today.

OK, if you're basing that on the patch I sent upthread, please credit
Rahila Syed as the original author of that code.  (I modified it
before posting, but only trivially.)  Of course if you do something
else, then never mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> That's what it looks like to me.  I now think that the apparent
>> connection to parallel query is a mirage.  The reason we've only
>> seen a few cases so far is that the flapping test is new: it
>> wad added in commit d42358efb16cc811, on 20 Feb.

> It was added on Feb 20 all right, but of *last year*.  It's been there
> working happily for a year now.

Wup, you're right, failed to look closely enough at the commit log
entry.  So that puts us back to wondering why exactly parallel query
is triggering this.  Still, Robert's experiment with removing the
pg_sleep seems fairly conclusive: it is possible to get the failure
without parallel query.

> Instead of adding another sleep function, another possibility is to add
> two booleans, one for the index counter and another for the truncate
> counters, and only terminate the sleep if both are true.  I don't see
> any reason to make this test any slower than it already is.

Well, that would make the function more complicated, but maybe it's a
better answer.  On the other hand, we know that the stats updates are
delivered in a deterministic order, so why not simply replace the
existing test in the wait function with one that looks for the truncation
updates?  If we've gotten those, we must have gotten the earlier ones.

In any case, the real answer to making the test less slow is to get rid of
that vestigial pg_sleep.  I'm wondering why we failed to remove that when
we put in the wait_for_stats function...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Alvaro Herrera
Tom Lane wrote:
> Robert Haas  writes:
> > Sure.  If you have an idea what the right thing to do is, please go
> > ahead.
> 
> Yeah, I'll modify the patch and commit sometime later today.
> 
> > I actually don't have a clear idea what's going on here.  I
> > guess it's that the wait_for_stats() guarantees that the stats message
> > from the index insertion has been received but the status messages
> > from the "trunc" tables might not have gotten there yet.
> 
> That's what it looks like to me.  I now think that the apparent
> connection to parallel query is a mirage.  The reason we've only
> seen a few cases so far is that the flapping test is new: it
> wad added in commit d42358efb16cc811, on 20 Feb.  If we left it
> as-is, I think we'd eventually see the same failure without forcing
> parallel mode.  In fact, that's pretty much what you describe below,
> isn't it?  The pg_sleep is sort of half-bakedly substituting for
> a proper wait.

It was added on Feb 20 all right, but of *last year*.  It's been there
working happily for a year now.

The reason I added the trunc test in the middle of the index update
tests is that I dislike tests that sleep for long without real purpose;
it seems pretty reasonable to me to have both sleeps actually be the
same wait.

Instead of adding another sleep function, another possibility is to add
two booleans, one for the index counter and another for the truncate
counters, and only terminate the sleep if both are true.  I don't see
any reason to make this test any slower than it already is.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane
Robert Haas  writes:
> Sure.  If you have an idea what the right thing to do is, please go
> ahead.

Yeah, I'll modify the patch and commit sometime later today.

> I actually don't have a clear idea what's going on here.  I
> guess it's that the wait_for_stats() guarantees that the stats message
> from the index insertion has been received but the status messages
> from the "trunc" tables might not have gotten there yet.

That's what it looks like to me.  I now think that the apparent
connection to parallel query is a mirage.  The reason we've only
seen a few cases so far is that the flapping test is new: it
wad added in commit d42358efb16cc811, on 20 Feb.  If we left it
as-is, I think we'd eventually see the same failure without forcing
parallel mode.  In fact, that's pretty much what you describe below,
isn't it?  The pg_sleep is sort of half-bakedly substituting for
a proper wait.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 12:46 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> A couple of my colleagues have been looking into this.  It's not
>> entirely clear to me what's going on here yet, but it looks like the
>> stats get there if you wait long enough.  Rahila Syed was able to
>> reproduce the problem and says that the attached patch fixes it.  But
>> I don't quite understand why this should fix it.
>
> I don't like this patch much.  While the new function is not bad in
> itself, it looks really weird to call it immediately after the other
> wait function.  And the reason for that, AFAICT, is that somebody dropped
> the entire "truncation stats" test sequence into the middle of unrelated
> tests, evidently in the vain hope that that way they could piggyback
> on the existing wait.  Which these failures are showing us is wrong.
>
> I think we should move all the inserted logic down so that it's not in the
> middle of unrelated testing.

Sure.  If you have an idea what the right thing to do is, please go
ahead.  I actually don't have a clear idea what's going on here.  I
guess it's that the wait_for_stats() guarantees that the stats message
from the index insertion has been received but the status messages
from the "trunc" tables might not have gotten there yet.  I thought
maybe that works without parallelism because all of those messages are
coming from the same backend, and therefore if you have the later one
you must have all of the earlier ones, too.  But if you're running
some of the queries in parallel workers then it's possible for a stats
message from a worker run later to arrive later.

But it's not that after all, because when I run the regression tests
with the pg_sleep removed, I get this:

*** /Users/rhaas/pgsql/src/test/regress/expected/stats.out
2016-03-04 08:55:33.0 -0500
--- /Users/rhaas/pgsql/src/test/regress/results/stats.out
2016-03-04 09:00:29.0 -0500
***
*** 127,140 
   1
  (1 row)

- -- force the rate-limiting logic in pgstat_report_tabstat() to time out
- -- and send a message
- SELECT pg_sleep(1.0);
-  pg_sleep
- --
-
- (1 row)
-
  -- wait for stats collector to update
  SELECT wait_for_stats();
   wait_for_stats
--- 127,132 
***
*** 148,158 
   WHERE relname like 'trunc_stats_test%' order by relname;
relname  | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup
| n_dead_tup
  
---+---+---+---++
!  trunc_stats_test  | 3 | 0 | 0 |  0
|  0
!  trunc_stats_test1 | 4 | 2 | 1 |  1
|  0
!  trunc_stats_test2 | 1 | 0 | 0 |  1
|  0
!  trunc_stats_test3 | 4 | 0 | 0 |  2
|  2
!  trunc_stats_test4 | 2 | 0 | 0 |  0
|  2
  (5 rows)

  SELECT st.seq_scan >= pr.seq_scan + 1,
--- 140,150 
   WHERE relname like 'trunc_stats_test%' order by relname;
relname  | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup
| n_dead_tup
  
---+---+---+---++
!  trunc_stats_test  | 0 | 0 | 0 |  0
|  0
!  trunc_stats_test1 | 0 | 0 | 0 |  0
|  0
!  trunc_stats_test2 | 0 | 0 | 0 |  0
|  0
!  trunc_stats_test3 | 0 | 0 | 0 |  0
|  0
!  trunc_stats_test4 | 0 | 0 | 0 |  0
|  0
  (5 rows)

  SELECT st.seq_scan >= pr.seq_scan + 1,
***
*** 163,169 
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? | ?column? | ?column?
  --+--+--+--
!  t| t| t| t
  (1 row)

  SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
--- 155,161 
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? | ?column? | ?column?
  --+--+--+--
!  f| f| f| f
  (1 row)

  SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
***
*** 172,178 
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column?
  --+--
!  t| t
  (1 row)

  SELECT pr.snap_ts < pg_stat_get_snapshot_timestamp() as snapshot_newer
--- 164,170 
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column?
  --+--
!  t| f
  (1 row)

  SELECT pr.snap_ts < pg_stat_get_snapshot_timestamp() as snapshot_newer

That looks suspiciously similar to the failure we're getting with the
force_parallel_mode testing, but I'm still confused.

>> BTW, this comment is obsolete:
>
>> -- force the rate-limiting logic in pgstat_report_tabstat() to time out
>> -- and send a message
>> SELECT pg_sleep(1.

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-03 Thread Tom Lane
Robert Haas  writes:
> A couple of my colleagues have been looking into this.  It's not
> entirely clear to me what's going on here yet, but it looks like the
> stats get there if you wait long enough.  Rahila Syed was able to
> reproduce the problem and says that the attached patch fixes it.  But
> I don't quite understand why this should fix it.

I don't like this patch much.  While the new function is not bad in
itself, it looks really weird to call it immediately after the other
wait function.  And the reason for that, AFAICT, is that somebody dropped
the entire "truncation stats" test sequence into the middle of unrelated
tests, evidently in the vain hope that that way they could piggyback
on the existing wait.  Which these failures are showing us is wrong.

I think we should move all the inserted logic down so that it's not in the
middle of unrelated testing.

> BTW, this comment is obsolete:

> -- force the rate-limiting logic in pgstat_report_tabstat() to time out
> -- and send a message
> SELECT pg_sleep(1.0);
>  pg_sleep
> --

> (1 row)

> That function was renamed in commit
> 93c701edc6c6f065cd25f77f63ab31aff085d6ac, but apparently Tom forgot to
> grep for other uses of that identifier name.

Duh :-(.  Actually, do we need that sleep at all anymore?  Seems like
wait_for_stats ought to cover it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-03 Thread Robert Haas
On Thu, Mar 3, 2016 at 1:10 AM, Tom Lane  wrote:
> Noah Misch  writes:
>> I've modified buildfarm member mandrill to use force_parallel_mode=regress 
>> and
>> max_parallel_degree=5; a full run passes.  We'll now see if it intermittently
>> fails the stats test, like Tom witnessed:
>> http://www.postgresql.org/message-id/30385.1456077...@sss.pgh.pa.us
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2016-03-02%2023%3A34%3A10

A couple of my colleagues have been looking into this.  It's not
entirely clear to me what's going on here yet, but it looks like the
stats get there if you wait long enough.  Rahila Syed was able to
reproduce the problem and says that the attached patch fixes it.  But
I don't quite understand why this should fix it.

BTW, this comment is obsolete:

-- force the rate-limiting logic in pgstat_report_tabstat() to time out
-- and send a message
SELECT pg_sleep(1.0);
 pg_sleep
--

(1 row)

That function was renamed in commit
93c701edc6c6f065cd25f77f63ab31aff085d6ac, but apparently Tom forgot to
grep for other uses of that identifier name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


wait-for-trunc-stats.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-02 Thread Tom Lane
Noah Misch  writes:
> I've modified buildfarm member mandrill to use force_parallel_mode=regress and
> max_parallel_degree=5; a full run passes.  We'll now see if it intermittently
> fails the stats test, like Tom witnessed:
> http://www.postgresql.org/message-id/30385.1456077...@sss.pgh.pa.us

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2016-03-02%2023%3A34%3A10

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-29 Thread Robert Haas
On Sat, Feb 27, 2016 at 7:05 PM, Noah Misch  wrote:
> On Fri, Feb 26, 2016 at 04:16:58PM +0530, Robert Haas wrote:
>> Committed these patches after revising the comment you wrote and
>> adding documentation.
>
> I've modified buildfarm member mandrill to use force_parallel_mode=regress and
> max_parallel_degree=5; a full run passes.  We'll now see if it intermittently
> fails the stats test, like Tom witnessed:
> http://www.postgresql.org/message-id/30385.1456077...@sss.pgh.pa.us

Thank you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-27 Thread Noah Misch
On Fri, Feb 26, 2016 at 04:16:58PM +0530, Robert Haas wrote:
> Committed these patches after revising the comment you wrote and
> adding documentation.

I've modified buildfarm member mandrill to use force_parallel_mode=regress and
max_parallel_degree=5; a full run passes.  We'll now see if it intermittently
fails the stats test, like Tom witnessed:
http://www.postgresql.org/message-id/30385.1456077...@sss.pgh.pa.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-26 Thread Robert Haas
On Wed, Feb 24, 2016 at 12:59 PM, Thomas Munro
 wrote:
> On Wed, Feb 24, 2016 at 5:48 PM, Thomas Munro
>  wrote:
>> Here is a first pass at that. [...]
>
> On Wed, Feb 24, 2016 at 1:23 AM, Robert Haas  wrote:
>> file_fdw is parallel-safe, ...
>
> And here is a patch to apply on top of the last one, to make file_fdw
> return true.  But does it really work correctly under parallelism?

Seems like it.  Running the regression tests for file_fdw under
force_parallel_mode=regress, max_parallel_degree>0 passes; you can
verify that's actually doing something by using
force_parallel_mode=on, which will result in some predictable
failures.  From a theoretical point of view, there's no reason I can
see why reading a file shouldn't work just as well from a parallel
worker as from the leader.  They both have the same view of the
filesystem, and in neither case are we trying to write any data; we're
just trying to read it.

Committed these patches after revising the comment you wrote and
adding documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-23 Thread Thomas Munro
On Wed, Feb 24, 2016 at 5:48 PM, Thomas Munro
 wrote:
> Here is a first pass at that. [...]

On Wed, Feb 24, 2016 at 1:23 AM, Robert Haas  wrote:
> file_fdw is parallel-safe, ...

And here is a patch to apply on top of the last one, to make file_fdw
return true.  But does it really work correctly under parallelism?

-- 
Thomas Munro
http://www.enterprisedb.com


file-fdw-parallel-safe.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-23 Thread Thomas Munro
On Tue, Feb 23, 2016 at 6:45 PM, Robert Haas  wrote:
> On Tue, Feb 23, 2016 at 2:06 AM, Tom Lane  wrote:
>> Robert Haas  writes:
 Foreign tables are supposed to be categorically excluded from
 parallelism.  Not sure why that's not working in this instance.
>>
>> BTW, I wonder where you think that's supposed to be enforced, because
>> I sure can't find any such logic.
>>
>> I suppose that has_parallel_hazard() would be the logical place to
>> notice foreign tables, but it currently doesn't even visit RTEs,
>> much less contain any code to check if their tables are foreign.
>> Or did you have another place in mind to do that?
>
> RTEs are checked in set_rel_consider_parallel(), and I thought there
> was a check there related to foreign tables, but there isn't.  Oops.
> In view of 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06, we shouldn't
> blindly assume that foreign scans are not parallel-safe, but we can't
> blindly assume the opposite either.  Maybe we should assume that the
> foreign scan is parallel-safe only if one or more of the new methods
> introduced by the aforementioned commit are set, but actually that
> doesn't seem quite right.  That would tell us whether the scan itself
> can be parallelized, not whether it's safe to run serially but within
> a parallel worker.  I think maybe we need a new FDW API that gets
> called from set_rel_consider_parallel() with the root, rel, and rte as
> arguments and which can return a Boolean.  If the callback is not set,
> assume false.

Here is a first pass at that.  The patch adds
IsForeignScanParallelSafe to the FDW API.   postgres_fdw returns false
(unnecessary but useful to verify that the regression test breaks if
you change it to true), and others don't provide the function so fall
back to false.

I suspect there may be opportunities to return true even if snapshots
and uncommitted reads aren't magically coordinated among workers.  For
example: users of MongoDB-type systems and text files have no
expectation of either snapshot semantics or transaction isolation in
the first place, so doing stuff in parallel won't be any less safe
than usual as far as visibility is concerned; postgres_fdw could in
theory export/import snapshots and allow parallelism in limited cases
if it can somehow prove there have been no uncommitted writes; and
non-MVCC/snapshot RDBMSs might be OK in lower isolation levels if you
haven't written anything or have explicitly opted in to uncommitted
reads (otherwise you'd risk invisible deadlock against the leader when
trying to read what it has written).

Please also find attached a tiny patch to respect TEMP_CONFIG for contribs.

-- 
Thomas Munro
http://www.enterprisedb.com


temp-config-for-contribs.patch
Description: Binary data


fdw-parallel-safe-api.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-23 Thread Robert Haas
On Tue, Feb 23, 2016 at 11:47 AM, Tom Lane  wrote:
> Even if there were, it would not fix this bug, because AFAICS the only
> thing that set_rel_consider_parallel is chartered to do is set the
> per-relation consider_parallel flag.  The failure that is happening in
> that regression test with force_parallel_mode turned on happens because
> standard_planner plasters a Gather node at the top of the plan, causing
> the whole plan including the FDW access to happen inside a parallel
> worker.  The only way to prevent that is to clear the
> wholePlanParallelSafe flag, which as far as I can tell (not that any of
> this is documented worth a damn) isn't something that
> set_rel_consider_parallel is supposed to do.

Hmm.  Well, if you tested it, or looked at the places where
wholePlanParallelSafe is cleared, you would find that it DOES fix the
bug.  create_plan() clears wholePlanParallelSafe if the plan is not
parallel-safe, and the plan won't be parallel-safe unless
consider_parallel was set for the underlying relation.  In case you'd
like to test it for yourself, here's the PoC patch I wrote:

diff --git a/src/backend/optimizer/path/allpaths.c
b/src/backend/optimizer/path/allpaths.c
index bcb668f..8a4179e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -527,6 +527,11 @@ set_rel_consider_parallel(PlannerInfo *root,
RelOptInfo *rel,
return;
return;
}
+
+   /* Not for foreign tables. */
+   if (rte->relkind == RELKIND_FOREIGN_TABLE)
+   return;
+
break;

case RTE_SUBQUERY:

Adding that makes the postgres_fdw case pass.

> It looks to me like there is a good deal of fuzzy thinking here about the
> difference between locally parallelizable and globally parallelizable
> plans, ie Gather at the top vs Gather somewhere else.

If you have a specific complaint, I'm happy to try to improve things,
or you can.  I think however that it is also possible that you haven't
fully understood the code I've spent the last year or so developing
yet, possibly because I haven't documented it well enough, but
possibly also because you haven't spent much time looking on it yet.
I'm glad you are, by the way, because I'm sure there are a bunch of
things here that you can improve over what I was able to do,
especially on the planner side of things, and that would be great.
However, a bit of forbearance would be appreciated.

> I also note with
> dismay that turning force_parallel_mode on seems to pretty much disable
> any testing of local parallelism.

No, I don't think so.  It doesn't push a Gather node on top of a plan
that already contains a Gather, because such a plan isn't
parallel_safe.   Nor does it suppress generation of parallel paths
otherwise.

>> In view of 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06, we shouldn't
>> blindly assume that foreign scans are not parallel-safe, but we can't
>> blindly assume the opposite either.  Maybe we should assume that the
>> foreign scan is parallel-safe only if one or more of the new methods
>> introduced by the aforementioned commit are set, but actually that
>> doesn't seem quite right.  That would tell us whether the scan itself
>> can be parallelized, not whether it's safe to run serially but within
>> a parallel worker.  I think maybe we need a new FDW API that gets
>> called from set_rel_consider_parallel() with the root, rel, and rte as
>> arguments and which can return a Boolean.  If the callback is not set,
>> assume false.
>
> Meh.  As things stand, postgres_fdw would have to aver that it can't ever
> be safely parallelized, which doesn't seem like a very satisfactory answer
> even if there are other FDWs that work differently (and which would those
> be?  None that use a socket-style connection to an external server.)

file_fdw is parallel-safe, and KaiGai posted a patch that makes it
parallel-aware, though that would have needed more work than I'm
willing to put in right now to make it committable.  So in other
words...

> The commit you mention above seems to me to highlight the dangers of
> accepting hook patches with no working use-case to back them up.
> AFAICT it's basically useless for typical FDWs because of this
> multiple-connection problem.

...I didn't ignore this principal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-22 Thread Tom Lane
Robert Haas  writes:
> On Tue, Feb 23, 2016 at 2:06 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Foreign tables are supposed to be categorically excluded from
>>> parallelism.  Not sure why that's not working in this instance.

>> BTW, I wonder where you think that's supposed to be enforced, because
>> I sure can't find any such logic.

> RTEs are checked in set_rel_consider_parallel(), and I thought there
> was a check there related to foreign tables, but there isn't.  Oops.

Even if there were, it would not fix this bug, because AFAICS the only
thing that set_rel_consider_parallel is chartered to do is set the
per-relation consider_parallel flag.  The failure that is happening in
that regression test with force_parallel_mode turned on happens because
standard_planner plasters a Gather node at the top of the plan, causing
the whole plan including the FDW access to happen inside a parallel
worker.  The only way to prevent that is to clear the
wholePlanParallelSafe flag, which as far as I can tell (not that any of
this is documented worth a damn) isn't something that
set_rel_consider_parallel is supposed to do.

It looks to me like there is a good deal of fuzzy thinking here about the
difference between locally parallelizable and globally parallelizable
plans, ie Gather at the top vs Gather somewhere else.  I also note with
dismay that turning force_parallel_mode on seems to pretty much disable
any testing of local parallelism.

> In view of 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06, we shouldn't
> blindly assume that foreign scans are not parallel-safe, but we can't
> blindly assume the opposite either.  Maybe we should assume that the
> foreign scan is parallel-safe only if one or more of the new methods
> introduced by the aforementioned commit are set, but actually that
> doesn't seem quite right.  That would tell us whether the scan itself
> can be parallelized, not whether it's safe to run serially but within
> a parallel worker.  I think maybe we need a new FDW API that gets
> called from set_rel_consider_parallel() with the root, rel, and rte as
> arguments and which can return a Boolean.  If the callback is not set,
> assume false.

Meh.  As things stand, postgres_fdw would have to aver that it can't ever
be safely parallelized, which doesn't seem like a very satisfactory answer
even if there are other FDWs that work differently (and which would those
be?  None that use a socket-style connection to an external server.)

The commit you mention above seems to me to highlight the dangers of
accepting hook patches with no working use-case to back them up.
AFAICT it's basically useless for typical FDWs because of this
multiple-connection problem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-22 Thread Robert Haas
On Tue, Feb 23, 2016 at 2:06 AM, Tom Lane  wrote:
> Robert Haas  writes:
>>> Foreign tables are supposed to be categorically excluded from
>>> parallelism.  Not sure why that's not working in this instance.
>
> BTW, I wonder where you think that's supposed to be enforced, because
> I sure can't find any such logic.
>
> I suppose that has_parallel_hazard() would be the logical place to
> notice foreign tables, but it currently doesn't even visit RTEs,
> much less contain any code to check if their tables are foreign.
> Or did you have another place in mind to do that?

RTEs are checked in set_rel_consider_parallel(), and I thought there
was a check there related to foreign tables, but there isn't.  Oops.
In view of 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06, we shouldn't
blindly assume that foreign scans are not parallel-safe, but we can't
blindly assume the opposite either.  Maybe we should assume that the
foreign scan is parallel-safe only if one or more of the new methods
introduced by the aforementioned commit are set, but actually that
doesn't seem quite right.  That would tell us whether the scan itself
can be parallelized, not whether it's safe to run serially but within
a parallel worker.  I think maybe we need a new FDW API that gets
called from set_rel_consider_parallel() with the root, rel, and rte as
arguments and which can return a Boolean.  If the callback is not set,
assume false.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-22 Thread Tom Lane
Thomas Munro  writes:
> On Tue, Feb 23, 2016 at 4:03 AM, Tom Lane  wrote:
>> I've not looked at the test case to see if this is exactly what's
>> going wrong, but it's pretty easy to see how there might be a problem:
>> consider a STABLE user-defined function that does a SELECT from a foreign
>> table.  If that function call gets pushed down into a parallel worker
>> then it would fail as described.

> I thought user defined functions were not a problem since it's the
> user's responsibility to declare functions' parallel safety correctly.
> The manual says: "In general, if a function is labeled as being safe
> when it is restricted or unsafe, or if it is labeled as being
> restricted when it is in fact unsafe, it may throw errors or produce
> wrong answers when used in a parallel query"[1].

Hm.  I'm not terribly happy with this its-the-users-problem approach to
things, mainly because I have little confidence that somebody couldn't
figure out a security exploit based on it.

> The case of a plain old SELECT (as seen in the failing regression
> test) is definitely a problem though and FDW access there needs to be
> detected automatically.

Yes, the problem we're actually seeing in that regression test is not
dependent on a function wrapper.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-22 Thread Thomas Munro
On Tue, Feb 23, 2016 at 4:03 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Feb 22, 2016 at 11:02 AM, Thomas Munro
>>  wrote:
>>> The postgres_fdw failure is a visibility-of-my-own-uncommitted-work
>>> problem.  The first command in a transaction updates a row via an FDW,
>>> and then the SELECT expects to see the effects, but when run in a
>>> background worker it creates a new FDW connection that can't see the
>>> uncommitted UPDATE.
>
>> Foreign tables are supposed to be categorically excluded from
>> parallelism.  Not sure why that's not working in this instance.
>
> I've not looked at the test case to see if this is exactly what's
> going wrong, but it's pretty easy to see how there might be a problem:
> consider a STABLE user-defined function that does a SELECT from a foreign
> table.  If that function call gets pushed down into a parallel worker
> then it would fail as described.

I thought user defined functions were not a problem since it's the
user's responsibility to declare functions' parallel safety correctly.
The manual says: "In general, if a function is labeled as being safe
when it is restricted or unsafe, or if it is labeled as being
restricted when it is in fact unsafe, it may throw errors or produce
wrong answers when used in a parallel query"[1].  Uncommitted changes
on foreign tables are indeed invisible to functions declared as
PARALLEL SAFE, when run with force_parallel_mode = on,
max_parallel_degree = 2, but the default is UNSAFE and in that case
the containing query is never parallelised.  Perhaps the documentation
could use a specific mention of this subtlety with FDWs in the
PARALLEL section?

The case of a plain old SELECT (as seen in the failing regression
test) is definitely a problem though and FDW access there needs to be
detected automatically.  I also thought that
has_parallel_hazard_walker might be the right place for that logic, as
you suggested in your later message.

[1] http://www.postgresql.org/docs/devel/static/sql-createfunction.html

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-22 Thread Tom Lane
Robert Haas  writes:
>> Foreign tables are supposed to be categorically excluded from
>> parallelism.  Not sure why that's not working in this instance.

BTW, I wonder where you think that's supposed to be enforced, because
I sure can't find any such logic.

I suppose that has_parallel_hazard() would be the logical place to
notice foreign tables, but it currently doesn't even visit RTEs,
much less contain any code to check if their tables are foreign.
Or did you have another place in mind to do that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-22 Thread Tom Lane
Robert Haas  writes:
> On Mon, Feb 22, 2016 at 11:02 AM, Thomas Munro
>  wrote:
>> The postgres_fdw failure is a visibility-of-my-own-uncommitted-work
>> problem.  The first command in a transaction updates a row via an FDW,
>> and then the SELECT expects to see the effects, but when run in a
>> background worker it creates a new FDW connection that can't see the
>> uncommitted UPDATE.

> Foreign tables are supposed to be categorically excluded from
> parallelism.  Not sure why that's not working in this instance.

I've not looked at the test case to see if this is exactly what's
going wrong, but it's pretty easy to see how there might be a problem:
consider a STABLE user-defined function that does a SELECT from a foreign
table.  If that function call gets pushed down into a parallel worker
then it would fail as described.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-22 Thread Robert Haas
On Mon, Feb 22, 2016 at 11:02 AM, Thomas Munro
 wrote:
> On Tue, Feb 16, 2016 at 12:12 PM, Noah Misch  wrote:
>> On Mon, Feb 15, 2016 at 06:07:48PM -0500, Tom Lane wrote:
>>> Noah Misch  writes:
>>> > I configured a copy of animal "mandrill" that way and launched a test run.
>>> > The postgres_fdw suite failed as attached.  A manual "make -C contrib
>>> > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes 
>>> > on
>>> > x86_64 and aarch64.  Since contrib test suites don't recognize 
>>> > TEMP_CONFIG,
>>> > check-world passes everywhere.
>>>
>>> Hm, is this with or without the ppc-related atomics fix you just found?
>>
>> Without those.  The ppc64 GNU/Linux configuration used gcc, though, and the
>> atomics change affects xlC only.  Also, the postgres_fdw behavior does not
>> appear probabilistic; it failed twenty times in a row.
>
> The postgres_fdw failure is a visibility-of-my-own-uncommitted-work
> problem.  The first command in a transaction updates a row via an FDW,
> and then the SELECT expects to see the effects, but when run in a
> background worker it creates a new FDW connection that can't see the
> uncommitted UPDATE.
>
> I wonder if parallelism of queries involving an FDW should not be
> allowed if your transaction has written through the FDW.

Foreign tables are supposed to be categorically excluded from
parallelism.  Not sure why that's not working in this instance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-21 Thread Thomas Munro
On Tue, Feb 16, 2016 at 12:12 PM, Noah Misch  wrote:
> On Mon, Feb 15, 2016 at 06:07:48PM -0500, Tom Lane wrote:
>> Noah Misch  writes:
>> > I configured a copy of animal "mandrill" that way and launched a test run.
>> > The postgres_fdw suite failed as attached.  A manual "make -C contrib
>> > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
>> > x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
>> > check-world passes everywhere.
>>
>> Hm, is this with or without the ppc-related atomics fix you just found?
>
> Without those.  The ppc64 GNU/Linux configuration used gcc, though, and the
> atomics change affects xlC only.  Also, the postgres_fdw behavior does not
> appear probabilistic; it failed twenty times in a row.

The postgres_fdw failure is a visibility-of-my-own-uncommitted-work
problem.  The first command in a transaction updates a row via an FDW,
and then the SELECT expects to see the effects, but when run in a
background worker it creates a new FDW connection that can't see the
uncommitted UPDATE.

I wonder if parallelism of queries involving an FDW should not be
allowed if your transaction has written through the FDW.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-16 Thread Andrew Dunstan



On 02/15/2016 07:57 PM, Tom Lane wrote:

Noah Misch  writes:

On Mon, Feb 15, 2016 at 07:31:40PM -0500, Robert Haas wrote:

Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
test suites.  Is there any reason for that, or is it just kinda where
we ended up?

To my knowledge, it's just the undesirable place we ended up.

Yeah.  +1 for fixing that, if it's not unreasonably painful.





+1 for fixing it everywhere.

Historical note: back when TEMP_CONFIG was implemented, the main 
regression set was just about the only set of tests the buildfarm ran 
using a temp install. That wasn't even available for contrib and the 
PLs, IIRC.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-15 Thread Tom Lane
Noah Misch  writes:
> On Mon, Feb 15, 2016 at 07:31:40PM -0500, Robert Haas wrote:
>> Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
>> test suites.  Is there any reason for that, or is it just kinda where
>> we ended up?

> To my knowledge, it's just the undesirable place we ended up.

Yeah.  +1 for fixing that, if it's not unreasonably painful.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-15 Thread Noah Misch
On Mon, Feb 15, 2016 at 07:31:40PM -0500, Robert Haas wrote:
> On Mon, Feb 15, 2016 at 5:52 PM, Noah Misch  wrote:
> > On Mon, Feb 08, 2016 at 02:49:27PM -0500, Robert Haas wrote:
> >> force_parallel_mode=regress
> >> max_parallel_degree=2
> >>
> >> And then run this: make check-world 
> >> TEMP_CONFIG=/path/to/aforementioned/file

> > I configured a copy of animal "mandrill" that way and launched a test run.
> > The postgres_fdw suite failed as attached.  A manual "make -C contrib
> > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> > x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> > check-world passes everywhere.
> 
> Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
> test suites.  Is there any reason for that, or is it just kinda where
> we ended up?

To my knowledge, it's just the undesirable place we ended up.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-15 Thread Robert Haas
On Mon, Feb 15, 2016 at 5:52 PM, Noah Misch  wrote:
> On Mon, Feb 08, 2016 at 02:49:27PM -0500, Robert Haas wrote:
>> Well, what I've done is push into the buildfarm code that will allow
>> us to do *the most exhaustive* testing that I know how to do in an
>> automated fashion. Which is to create a file that says this:
>>
>> force_parallel_mode=regress
>> max_parallel_degree=2
>>
>> And then run this: make check-world TEMP_CONFIG=/path/to/aforementioned/file
>>
>> Now, that is not going to find bugs in the deadlock.c portion of the
>> group locking patch, but it's been wildly successful in finding bugs
>> in other parts of the parallelism code, and there might well be a few
>> more that we haven't found yet, which is why I'm hoping that we'll get
>> this procedure running regularly either on all buildfarm machines, or
>> on some subset of them, or on new animals that just do this.
>
> I configured a copy of animal "mandrill" that way and launched a test run.
> The postgres_fdw suite failed as attached.  A manual "make -C contrib
> installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> check-world passes everywhere.

Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
test suites.  Is there any reason for that, or is it just kinda where
we ended up?

Retrying it the way you did it, I see the same errors here, so I think
this isn't a PPC-specific problem, but just a problem in general.
I've actually seen these kinds of errors before in earlier versions of
the testing code that eventually became force_parallel_mode.  I got
fooled into believing I'd fixed the problem because of my confusion
about how TEMP_CONFIG worked.  I think this is more likely to be a bug
in force_parallel_mode than a bug in the code that checks whether a
normal parallel query is safe, but I'll have to track it down before I
can say for sure.

Thanks for testing this.  It's not delightful to discover that I
muffed this, but better to find it now than in 6 months.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-15 Thread Noah Misch
On Mon, Feb 15, 2016 at 06:07:48PM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > I configured a copy of animal "mandrill" that way and launched a test run.
> > The postgres_fdw suite failed as attached.  A manual "make -C contrib
> > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> > x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> > check-world passes everywhere.
> 
> Hm, is this with or without the ppc-related atomics fix you just found?

Without those.  The ppc64 GNU/Linux configuration used gcc, though, and the
atomics change affects xlC only.  Also, the postgres_fdw behavior does not
appear probabilistic; it failed twenty times in a row.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-15 Thread Tom Lane
Noah Misch  writes:
> I configured a copy of animal "mandrill" that way and launched a test run.
> The postgres_fdw suite failed as attached.  A manual "make -C contrib
> installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> check-world passes everywhere.

Hm, is this with or without the ppc-related atomics fix you just found?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers