subject:"\[HACKERS\] a raft of parallelism\-related bug fixes"

On 2016-02-02 15:41:45 -0500, Robert Haas wrote:
> group-locking-v1.patch is a vastly improved version of the group
> locking patch that we discussed, uh, extensively last year.  I realize
> that there was a lot of doubt about this approach, but I still believe
> it's the right approach, I have put a lot of work into making it work
> correctly, I don't think anyone has come up with a really plausible
> alternative approach (except one other approach I tried which turned
> out to work but with significantly more restrictions), and I'm
> committed to fixing it in whatever way is necessary if it turns out to
> be broken, even if that amounts to a full rewrite.  Review is welcome,
> but I honestly believe it's a good idea to get this into the tree
> sooner rather than later at this point, because automated regression
> testing falls to pieces without these changes, and I believe that
> automated regression testing is a really good idea to shake out
> whatever bugs we may have in the parallel query stuff.  The code in
> this patch is all mine, but Amit Kapila deserves credit as co-author
> for doing a lot of prototyping (that ended up getting tossed) and
> testing.  This patch includes comments and an addition to
> src/backend/storage/lmgr/README which explain in more detail what this
> patch does, how it does it, and why that's OK.

I see you pushed group locking support. I do wonder if somebody has
actually reviewed this? On a quick scrollthrough it seems fairly
invasive, touching some parts where bugs are really hard to find.

I realize that this stuff has all been brewing long, and that there's
still a lot to do. So you gotta keep moving. And I'm not sure that
there's anything wrong or if there's any actually better approach. But
pushing an unreviewed, complex patch, that originated in a thread
orginally about different relatively small/mundane items, for a
contentious issue, a few days after the initial post. Hm. Not sure how
you'd react if you weren't the author.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 10:17 AM, Andres Freund  wrote:
> On 2016-02-02 15:41:45 -0500, Robert Haas wrote:
>> group-locking-v1.patch is a vastly improved version of the group
>> locking patch that we discussed, uh, extensively last year.  I realize
>> that there was a lot of doubt about this approach, but I still believe
>> it's the right approach, I have put a lot of work into making it work
>> correctly, I don't think anyone has come up with a really plausible
>> alternative approach (except one other approach I tried which turned
>> out to work but with significantly more restrictions), and I'm
>> committed to fixing it in whatever way is necessary if it turns out to
>> be broken, even if that amounts to a full rewrite.  Review is welcome,
>> but I honestly believe it's a good idea to get this into the tree
>> sooner rather than later at this point, because automated regression
>> testing falls to pieces without these changes, and I believe that
>> automated regression testing is a really good idea to shake out
>> whatever bugs we may have in the parallel query stuff.  The code in
>> this patch is all mine, but Amit Kapila deserves credit as co-author
>> for doing a lot of prototyping (that ended up getting tossed) and
>> testing.  This patch includes comments and an addition to
>> src/backend/storage/lmgr/README which explain in more detail what this
>> patch does, how it does it, and why that's OK.
>
> I see you pushed group locking support. I do wonder if somebody has
> actually reviewed this? On a quick scrollthrough it seems fairly
> invasive, touching some parts where bugs are really hard to find.
>
> I realize that this stuff has all been brewing long, and that there's
> still a lot to do. So you gotta keep moving. And I'm not sure that
> there's anything wrong or if there's any actually better approach. But
> pushing an unreviewed, complex patch, that originated in a thread
> orginally about different relatively small/mundane items, for a
> contentious issue, a few days after the initial post. Hm. Not sure how
> you'd react if you weren't the author.

Probably not very well.  Do you want me to revert it?

I mean, look.  Without that patch, parallel query is definitely
broken.  Just revert the patch and try running the regression tests
with force_parallel_mode=regress and max_parallel_degree>0.  It hangs
all over the place.  With the patch, every regression test suite we
have runs cleanly with those settings.  Without the patch, it's
trivial to construct a test case where parallel query experiences an
undetected deadlock.  With the patch, it appears to work reliably.
Could there bugs someplace?  Yes, there absolutely could.  Do I really
think anybody was going to spend the time to understand deadlock.c
well enough to verify my changes?  No, I don't.  What I think would
have happened is that the patch would have sat around like an
albatross around my neck - totally ignored by everyone - until the end
of the last CF, and then the discussion would have gone one of three
ways:

1. Boy, this patch is complicated and I don't understand it.  Let's
reject it, even though without it parallel query is trivially broken!
Uh, we'll just let parallel query be broken.
2. Like #1, but we rip out parallel query in its entirety on the eve of beta.
3. Oh well, Robert says we need this, I guess we'd better let him commit it.

I don't find any of those options to be better than the status quo.
If the patch is broken, another two months of having in the tree give
us a better chance of finding the bugs, especially because, combined
with the other patch which I also pushed, it enables *actual automated
regression testing* of the parallelism code, which I personally think
is a really good thing - and I'd like to see the buildfarm doing that
as soon as possible, so that we can find some of those bugs before
we're deep in beta.  Not just bugs in group locking but all sorts of
parallelism bugs that might be revealed by end-to-end testing.  The
*entire stack of patches* that began this thread was a response to
problems that were found by the automated testing that you can't do
without this patch.  Those bug fixes resulted in a huge increase in
the robustness of parallel query, and that would not have happened
without this code.  Every single one of those problems, some of them
in commits dating back years, was detected by the same method: run the
regression tests with parallel mode and parallel workers used for
every query for which that seems to be safe.

And, by the way, the patch, aside from the deadlock.c portion, was
posted back in October, admittedly without much fanfare, but nobody
reviewed that or any other patch on this thread.  If I'd waited for
those reviews to come in, parallel query would not be committed now,
nor probably in 9.6, nor probably in 9.7 or 9.8 either.  The whole
project would just be impossible on its face.  It would be impossible
in the first instance if I did not have a commit

Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Joshua D. Drake


On 02/08/2016 10:45 AM, Robert Haas wrote:

On Mon, Feb 8, 2016 at 10:17 AM, Andres Freund  wrote:

On 2016-02-02 15:41:45 -0500, Robert Haas wrote:



I realize that this stuff has all been brewing long, and that there's
still a lot to do. So you gotta keep moving. And I'm not sure that
there's anything wrong or if there's any actually better approach. But
pushing an unreviewed, complex patch, that originated in a thread
orginally about different relatively small/mundane items, for a
contentious issue, a few days after the initial post. Hm. Not sure how
you'd react if you weren't the author.


Probably not very well.  Do you want me to revert it?


If I am off base, please feel free to yell Latin at me again but isn't 
this exactly what different trees are for in Git? Would it be possible 
to say:


Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism 
fixes do"?


I can't review this patch but I can run a test suite on a number of 
platforms and see if it behaves as expected.




albatross around my neck - totally ignored by everyone - until the end
of the last CF, and then the discussion would have gone one of three
ways:

1. Boy, this patch is complicated and I don't understand it.  Let's
reject it, even though without it parallel query is trivially broken!
Uh, we'll just let parallel query be broken.
2. Like #1, but we rip out parallel query in its entirety on the eve of beta.
3. Oh well, Robert says we need this, I guess we'd better let him commit it.


4. We need to push the release so we can test this.



I don't find any of those options to be better than the status quo.
If the patch is broken, another two months of having in the tree give
us a better chance of finding the bugs, especially because, combined


I think this further points to the need for more reviewers and less 
feature pushes. There are fundamental features that we could use, this 
is one of them. It is certainly more important than say pgLogical or BDR 
(not that those aren't useful but that we do have external solutions for 
that problem).




Oh: another thing that I would like to do is commit the isolation
tests I wrote for the deadlock detector a while back, which nobody has
reviewed either, though Tom and Alvaro seemed reasonably positive
about the concept.  Right now, the deadlock.c part of this patch isn't
tested at all by any of our regression test suites, because NOTHING in
deadlock.c is tested by any of our regression test suites.  You can
blow it up with dynamite and the regression tests are perfectly happy,
and that's pretty scary.


Test test test. Please commit.

Sincerely,

JD



--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 4:11 PM, Peter Geoghegan  wrote:
> All that I wanted to do was look at EXPLAIN ANALYZE output that showed
> a parallel seq scan on my laptop, simply because I wanted to see a
> cool thing happen. I had to complain about it [1] to get clarification
> from you [2].
>
> I accept that this might have been a somewhat isolated incident (that
> I couldn't easily get *at least* a little instant gratification), but
> it still should be avoided. You've accused me of burying the lead
> plenty of times. Don't tell me that it was too hard to prominently
> place those details somewhere where I or any other contributor could
> reasonably expect to find them, like the CF app page, or a wiki page
> that is maintained on an ongoing basis (and linked to at the start of
> each thread). If I said that that was too much to you, you'd probably
> shout at me. If I persisted, you wouldn't commit my patch, and for me
> that probably means it's DOA.
>
> I don't think I'm asking for much here.

I don't think you are asking for too much; what I think is that Amit
and I were trying to do exactly the thing you asked for, and mostly
did.  On March 20th, Amit posted version 11 of the sequential scan
patch, and included directions about the order in which to apply the
patches:

http://www.postgresql.org/message-id/CAA4eK1JSSonzKSN=l-dwucewdlqkbmujvfpe3fgw2tn2zpo...@mail.gmail.com

On March 25th, Amit posted version 12 of the sequential scan patch,
and again included directions about which patches to apply:

http://www.postgresql.org/message-id/caa4ek1l50y0y1ogt_dh2eouyq-rqcnpvjboon2pcgjq+1by...@mail.gmail.com

On March 27th, Amit posted version 13 of the sequential scan patch,
which did not include those directions:

http://www.postgresql.org/message-id/caa4ek1lfr8sr9viuplpmkrquvcrhefdjsz1019rpwgjyftr...@mail.gmail.com

While perhaps Amit might have included directions again, I think it's
pretty reasonable that he felt that it might not be entirely necessary
to do so given that he had already done it twice in the last week.
This was still the state of affairs when you asked your question on
April 20th.  Two days after you asked that question, Amit posted
version 14 of the patch, and again included directions about what
patches to apply:

http://www.postgresql.org/message-id/caa4ek1jlv+2y1awjhsqpfiskhbf7jwf_nzirmzyno9upbrc...@mail.gmail.com

Far from the negligence that you seem to be implying, I think Amit was
remarkably diligent about providing these kinds of updates.  I
admittedly didn't duplicate those same updates on the parallel
mode/contexts thread to which you replied, but that's partly because I
would often whack around that patch first and then Amit would adjust
his patch to cope with my changes after the fact.  That doesn't seem
to have been the case in this particular example, but if this is the
closest thing you can come up with to a process failure during the
development of parallel query, I'm not going to be sad about it: I'm
going to have a beer.  Seriously: we worked really hard at this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 12:18 PM, Robert Haas  wrote:
> So, there may be a person who knows how to do all of that
> work and get it done in a reasonable time frame and also knows how to
> make sure that everybody has the opportunity to be as involved in the
> process as they want to be and that there are no bugs or controversial
> design decisions, but I am not that person.  I am doing my best.
>
>> To be more specific, I thought it was really hard to test parallel
>> sequential scan a few months ago, because there was so many threads
>> and so many dependencies. I appreciate that we now use git
>> format-patch patch series for complicated stuff these days, but it's
>> important to make it clear how everything fits together. That's
>> actually what I was thinking about when I said we need to be clear on
>> how things fit together from the CF app patch page, because there
>> doesn't seem to be a culture of being particular about that, having
>> good "annotations", etc.
>
> I agree that you had to be pretty deeply involved in that thread to
> follow everything that was going on.  But it's not entirely fair to
> say that it was impossible for anyone else to get involved.

All that I wanted to do was look at EXPLAIN ANALYZE output that showed
a parallel seq scan on my laptop, simply because I wanted to see a
cool thing happen. I had to complain about it [1] to get clarification
from you [2].

I accept that this might have been a somewhat isolated incident (that
I couldn't easily get *at least* a little instant gratification), but
it still should be avoided. You've accused me of burying the lead
plenty of times. Don't tell me that it was too hard to prominently
place those details somewhere where I or any other contributor could
reasonably expect to find them, like the CF app page, or a wiki page
that is maintained on an ongoing basis (and linked to at the start of
each thread). If I said that that was too much to you, you'd probably
shout at me. If I persisted, you wouldn't commit my patch, and for me
that probably means it's DOA.

I don't think I'm asking for much here.

[1] 
http://www.postgresql.org/message-id/CAM3SWZSefE4uQk3r_3gwpfDWWtT3P51SceVsL4=g8v_me2a...@mail.gmail.com
[2] 
http://www.postgresql.org/message-id/ca+tgmoarttf8eptbhinwxukfkctsfc7wtzfhgegqywe8e2v...@mail.gmail.com
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Joshua D. Drake

On 02/08/2016 01:11 PM, Peter Geoghegan wrote:

On Mon, Feb 8, 2016 at 12:18 PM, Robert Haas wrote:

I accept that this might have been a somewhat isolated incident (that
I couldn't easily get *at least* a little instant gratification), but
it still should be avoided. You've accused me of burying the lead
plenty of times. Don't tell me that it was too hard to prominently
place those details somewhere where I or any other contributor could
reasonably expect to find them, like the CF app page, or a wiki page
that is maintained on an ongoing basis (and linked to at the start of
each thread). If I said that that was too much to you, you'd probably
shout at me. If I persisted, you wouldn't commit my patch, and for me
that probably means it's DOA.

I don't think I'm asking for much here.

[1]
http://www.postgresql.org/message-id/CAM3SWZSefE4uQk3r_3gwpfDWWtT3P51SceVsL4=g8v_me2a...@mail.gmail.com
[2]
http://www.postgresql.org/message-id/ca+tgmoarttf8eptbhinwxukfkctsfc7wtzfhgegqywe8e2v...@mail.gmail.com

This part of the thread seems like something that should be a new thread
about how to write patches. I agree that patches that are large features
that are in depth discussed on a maintained wiki page would be awesome.
Creating that knowledge base without having to troll through code would
be priceless in value.

Sincerely,

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 2:00 PM, Joshua D. Drake  wrote:
> If I am off base, please feel free to yell Latin at me again but isn't this
> exactly what different trees are for in Git? Would it be possible to say:
>
> Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism
> fixes do"?
>
> I can't review this patch but I can run a test suite on a number of
> platforms and see if it behaves as expected.

Sure, I'd love to have the ability to push a branch into the buildfarm
and have the tests get run on all the buildfarm machines and let that
bake for a while before putting it into the main tree.  The problem
here is that the complicated part of this patch is something that's
only going to be tested in very rare cases.  The simple part of the
patch, which handles the simple-deadlock case, is easy to hit,
although apparently zero people other than Amit and I have found it in
the few months since parallel sequential scan was committed, which
makes me thing people haven't tried very hard to break any part of
parallel query, which is a shame.  The really hairy part is in
deadlock.c, and it's actually very hard to hit that case.  It won't be
hit in real life except in pretty rare circumstances.  So testing is
good, but you not only need to know what you are testing but probably
have an automated tool that can run the test a gazillion times in a
loop, or be really clever and find a test case that Amit and I didn't
foresee.  And the reality is that getting anybody independent of the
parallel query effort to take an interested in deep testing has not
gone anywhere at all up until now.  I'd be happy for that change,
whether because of this commit or for any other reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Alvaro Herrera

Robert Haas wrote:

> Oh: another thing that I would like to do is commit the isolation
> tests I wrote for the deadlock detector a while back, which nobody has
> reviewed either, though Tom and Alvaro seemed reasonably positive
> about the concept.  Right now, the deadlock.c part of this patch isn't
> tested at all by any of our regression test suites, because NOTHING in
> deadlock.c is tested by any of our regression test suites.  You can
> blow it up with dynamite and the regression tests are perfectly happy,
> and that's pretty scary.

FWIW a couple of months back I thought you had already pushed that one
and was surprised to find that you hadn't.  +1 from me on pushing it.
(I don't mean specifically the deadlock tests, but rather the
isolationtester changes that allowed you to have multiple blocked
backends.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 2:48 PM, Peter Geoghegan  wrote:
> FWIW, I appreciate your candor. However, I think that you could have
> done a better job of making things easier for reviewers, even if that
> might not have made an enormous difference. I suspect I would have not
> been able to get UPSERT done as a non-committer if it wasn't for the
> epic wiki page, that made it at least possible for someone to jump in.

I'm not going to argue with the proposition that it could have been
done better.  Equally, I'm going to disclaim having the ability to
have done it better.  I've been working on this for three years, and
most of the work that I've put into it has gone into tinkering with C
code that was not in any way user-testable.  I've modified essentially
every major component of the system.  We had a shared memory facility;
I built another one.  We had background workers; I overhauled them.  I
invented a message queueing system, and then layered a modified
version of the FE/BE protocol on top of that message queue, and then
later layered tuple-passing on top of that same message queue and then
invented a bespoke protocol that is used to handle typemod mapping.
We had a transaction system; I made substantial, invasive
modifications to it.  I tinkered with the GUC subsystem, the combocid
system, and the system for loading loadable modules.  Amit added read
functions to a whole class of nodes that never had them before and
together we overhauled core pieces of the executer machinery.  Then I
hit the planner with hammer.  Finally there's this patch, which
affects heavyweight locking and deadlock detection.  I don't believe
that during the time I've been involved with this project anyone else
has ever attempted a project that required changing as many subsystems
as this one did - in some cases rather lightly, but in a number of
cases in pretty significant, invasive ways.  No other project in
recent memory has been this invasive to my knowledge.  Hot Standby
probably comes closest, but I think (admittedly being much closer to
this work than I was to that work) that this has its fingers in more
places.  So, there may be a person who knows how to do all of that
work and get it done in a reasonable time frame and also knows how to
make sure that everybody has the opportunity to be as involved in the
process as they want to be and that there are no bugs or controversial
design decisions, but I am not that person.  I am doing my best.

> To be more specific, I thought it was really hard to test parallel
> sequential scan a few months ago, because there was so many threads
> and so many dependencies. I appreciate that we now use git
> format-patch patch series for complicated stuff these days, but it's
> important to make it clear how everything fits together. That's
> actually what I was thinking about when I said we need to be clear on
> how things fit together from the CF app patch page, because there
> doesn't seem to be a culture of being particular about that, having
> good "annotations", etc.

I agree that you had to be pretty deeply involved in that thread to
follow everything that was going on.  But it's not entirely fair to
say that it was impossible for anyone else to get involved.   Both
Amit and I, mostly Amit, posted directions at various times saying:
here is the sequence of patches that you currently need to apply as of
this time.  There was not a heck of a lot of evidence that anyone was
doing that, though, though I think a few people did, and towards the
end things changed very quickly as I committed patches in the series.
We certainly knew what each other were doing and not because of some
hidden off-list collaboration that we kept secret from the community -
we do talk every week, but almost all of our correspondence on those
patches was on-list.

I think it's an inherent peril of complicated patch sets that people
who are not intimately involved in what is going on will have trouble
following just because it takes a lot of work.  Is anybody here
following what is going on on the postgres_fdw join pushdown thread?
There's only one patch to apply there right now (though there have
been as many as four at times in the past) and the people who are
actually working on it can follow along, but I'm not a bit surprised
if other people feel lost.  It's hard to think that the cause of that
is anything other than "it's hard to find the time to get invested in
a patch that other people are already working hard and apparently
diligently on, especially if you're not personally interested in
seeing that patch get committed, but sometimes even if you are".  For
example, I really want the work Fabien and Andres are doing on the
checkpointer to get committed this release.  I am reading the emails,
but I haven't tried the patches and I probably won't.  I don't have
time to be that involved in every patch.  I'm trusting that whatever
Andres commits - which will probably be a whole lot more complex than

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 10:45 AM, Robert Haas  wrote:
> And, by the way, the patch, aside from the deadlock.c portion, was
> posted back in October, admittedly without much fanfare, but nobody
> reviewed that or any other patch on this thread.  If I'd waited for
> those reviews to come in, parallel query would not be committed now,
> nor probably in 9.6, nor probably in 9.7 or 9.8 either.  The whole
> project would just be impossible on its face.  It would be impossible
> in the first instance if I did not have a commit bit, because there is
> just not enough committer bandwidth - even reviewer bandwidth more
> generally - to review the number of patches that I've submitted
> related to parallelism, so in the end some, perhaps many, of those are
> going to be committed mostly on the strength of my personal opinion
> that committing them is better than not committing them.  I am going
> to have a heck of a lot of egg on my face if it turns out that I've
> been too aggressive in pushing this stuff into the tree.  But,
> basically, the alternative is that we don't get the feature, and I
> think the feature is important enough to justify taking some risk.

FWIW, I appreciate your candor. However, I think that you could have
done a better job of making things easier for reviewers, even if that
might not have made an enormous difference. I suspect I would have not
been able to get UPSERT done as a non-committer if it wasn't for the
epic wiki page, that made it at least possible for someone to jump in.

To be more specific, I thought it was really hard to test parallel
sequential scan a few months ago, because there was so many threads
and so many dependencies. I appreciate that we now use git
format-patch patch series for complicated stuff these days, but it's
important to make it clear how everything fits together. That's
actually what I was thinking about when I said we need to be clear on
how things fit together from the CF app patch page, because there
doesn't seem to be a culture of being particular about that, having
good "annotations", etc.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Joshua D. Drake


On 02/08/2016 11:24 AM, Robert Haas wrote:

On Mon, Feb 8, 2016 at 2:00 PM, Joshua D. Drake  wrote:

If I am off base, please feel free to yell Latin at me again but isn't this
exactly what different trees are for in Git? Would it be possible to say:

Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism
fixes do"?

I can't review this patch but I can run a test suite on a number of
platforms and see if it behaves as expected.


Sure, I'd love to have the ability to push a branch into the buildfarm
and have the tests get run on all the buildfarm machines and let that
bake for a while before putting it into the main tree.  The problem
here is that the complicated part of this patch is something that's
only going to be tested in very rare cases.  The simple part of the



I have no problem running any test cases you wish on a branch in a loop 
for the next week and reporting back any errors.


Where this gets tricky is the tooling itself. For me to be able to do so 
(and others really) I need to be able to do this:


* Download (preferably a tarball but I can do a git pull)
* Exact instructions on how to set up the tests
* Exact instructions on how to run the tests
* Exact instructions on how to report the tests

If anyone takes the time to do that, I will take the time and resources 
to run them.


What I can't do, is fiddle around trying to figure out how to set this 
stuff up. I don't have the time and it isn't productive for me. I don't 
think I am the only one in this boat.


Let's be honest, a lot of people won't even bother to play with this 
even though it is easily one of the best features we have coming for 9.6 
until we release 9.6.0. That is a bad time to be testing.


The easier we make it for people like me, practitioners to test, the 
better it is for the whole project.


Sincerely,

JD




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 2:36 PM, Joshua D. Drake  wrote:
> I have no problem running any test cases you wish on a branch in a loop for
> the next week and reporting back any errors.
>
> Where this gets tricky is the tooling itself. For me to be able to do so
> (and others really) I need to be able to do this:
>
> * Download (preferably a tarball but I can do a git pull)
> * Exact instructions on how to set up the tests
> * Exact instructions on how to run the tests
> * Exact instructions on how to report the tests
>
> If anyone takes the time to do that, I will take the time and resources to
> run them.

Well, what I've done is push into the buildfarm code that will allow
us to do *the most exhaustive* testing that I know how to do in an
automated fashion. Which is to create a file that says this:

force_parallel_mode=regress
max_parallel_degree=2

And then run this: make check-world TEMP_CONFIG=/path/to/aforementioned/file

Now, that is not going to find bugs in the deadlock.c portion of the
group locking patch, but it's been wildly successful in finding bugs
in other parts of the parallelism code, and there might well be a few
more that we haven't found yet, which is why I'm hoping that we'll get
this procedure running regularly either on all buildfarm machines, or
on some subset of them, or on new animals that just do this.

Testing the deadlock.c changes is harder.  I don't know of a good way
to do it in an automated fashion, which is why I also posted the test
code Amit devised which allows construction of manual test cases.
Constructing a manual test case is *hard* but doable.  I think it
would be good to automate this and if somebody's got a good idea about
how to fuzz test it I think that would be *great*.  But that's not
easy to do.  We haven't had any testing at all of the deadlock
detector up until now, but somehow the deadlock detector itself has
been in the tree for  a very long time...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 4:54 PM, Peter Geoghegan  wrote:
> On Mon, Feb 8, 2016 at 1:45 PM, Robert Haas  wrote:
>> Far from the negligence that you seem to be implying, I think Amit was
>> remarkably diligent about providing these kinds of updates.
>
> I don't think I remotely implied negligence. That word has very severe
> connotations (think "criminal negligence") that are far from what I
> intended.

OK, sorry, I think I misread your tone.

> I don't want to get stuck on that one example, which I acknowledged
> might not be representative when I raised it. I'm not really talking
> about parallel query in particular anyway. I'm mostly arguing for a
> consistent way to get instructions on how to at least build the patch,
> where that might be warranted.
>
> The CF app is one way. Another good way is: As long as we're using a
> patch series, be explicit about what goes where in the commit message.
> Have message-id references. That sort of thing. I already try to do
> that. That's all.

Yeah, me too.  Generally, although with some exceptions, my practice
is to keep reposting the whole patch stack, so that everything is in
one email.  In this particular case, though, there were patches from
me and patches from Amit, so that was harder to do.  I wasn't using
his patches to test my patches; I had other test code for that.  He
was using my patches as a base for his patches, but linked to them
instead of reposting them.  That's an unusually complicated scenario,
though: it's pretty rare around here to have two developers working
together on something as closely as Amit and I did on those patches.

> Thank you (and Amit) for working really hard on parallelism.

Thanks.

By the way, it bears saying, or if I've said it before repeating, that
although most of the parallelism code that has been committed was
written by me, Amit has made an absolutely invaluable contribution to
parallel query, and it wouldn't be committed today or maybe ever
without that contribution.  In addition to those parts of the code
that were committed as he wrote them, he prototyped quite a number of
things that I ended up rewriting, reviewed a ton of code that I wrote
and found bugs in it, wrote numerous bits and pieces of test code, and
generally put up with an absolutely insane level of me nitpicking his
work, breaking it by committing pieces of it or committing different
pieces that replaced pieces he had, demanding repeated rebases on
short time scales, and generally beating him up in just about every
conceivable way.  I am deeply appreciative of him being willing to
jump into this project, do a ton of work, and put up with me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 1:45 PM, Robert Haas  wrote:
> Far from the negligence that you seem to be implying, I think Amit was
> remarkably diligent about providing these kinds of updates.

I don't think I remotely implied negligence. That word has very severe
connotations (think "criminal negligence") that are far from what I
intended.

> I admittedly didn't duplicate those same updates on the parallel
> mode/contexts thread to which you replied, but that's partly because I
> would often whack around that patch first and then Amit would adjust
> his patch to cope with my changes after the fact.  That doesn't seem
> to have been the case in this particular example, but if this is the
> closest thing you can come up with to a process failure during the
> development of parallel query, I'm not going to be sad about it: I'm
> going to have a beer.  Seriously: we worked really hard at this.

I don't want to get stuck on that one example, which I acknowledged
might not be representative when I raised it. I'm not really talking
about parallel query in particular anyway. I'm mostly arguing for a
consistent way to get instructions on how to at least build the patch,
where that might be warranted.

The CF app is one way. Another good way is: As long as we're using a
patch series, be explicit about what goes where in the commit message.
Have message-id references. That sort of thing. I already try to do
that. That's all.

Thank you (and Amit) for working really hard on parallelism.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 2:35 PM, Andres Freund  wrote:
> I think having a public git tree, that contains the current state, is
> greatly helpful for that. Just announce that you're going to screw
> wildly with history, and that you're not going to be terribly careful
> about commit messages.  That means observers can just do a fetch and a
> reset --hard to see the absolutely latest and greatest.  By all means
> post a series to the list every now and then, but I think for minor
> changes it's perfectly sane to say 'pull to see the fixups for the
> issues you noticed'.

I would really like for there to be a way to do that more often. It
would be a significant time saver, because it removes problems with
minor bitrot.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

Hi Robert,

On 2016-02-08 13:45:37 -0500, Robert Haas wrote:
> > I realize that this stuff has all been brewing long, and that there's
> > still a lot to do. So you gotta keep moving. And I'm not sure that
> > there's anything wrong or if there's any actually better approach. But
> > pushing an unreviewed, complex patch, that originated in a thread
> > orginally about different relatively small/mundane items, for a
> > contentious issue, a few days after the initial post. Hm. Not sure how
> > you'd react if you weren't the author.
> 
> Probably not very well.  Do you want me to revert it?

No. I want(ed) to express that I am not comfortable with how this got
in. My aim wasn't to generate a flurry of responses with everybody
piling on, or anything like that. But it's unfortunately hard to
avoid. I wish I knew a way, besides only sending private mails. Which I
don't think is a great approach either.

I do agree that we need something to tackle this problem, and that this
quite possibly is the least bad way to do this. And certainly the only
one that's been implemented and posted with any degree of completeness.

But even given the last paragraph, posting a complex new patch in a
somewhat related thread, and then pushing it 5 days later is pretty darn
quick.

> I mean, look.  [explanation why we need the infrastructure].  Do I really
> think anybody was going to spend the time to understand deadlock.c
> well enough to verify my changes?  No, I don't.  What I think would
> have happened is that the patch would have sat around like an
> albatross around my neck - totally ignored by everyone - until the end
> of the last CF, and then the discussion would have gone one of three
> ways:

Yes, believe me, I really get that. It's awfully hard to get substantial
review for pieces of code that require a lot of context.

But I think posting this patch in a new thread, posting a message that
you're intending to commit unless somebody protests with a substantial
arguments and/or a timeline of review, and then waiting a few days, are
something that should be done for a major piece of new infrastructure,
especially when it's somewhat controversial.

This doesn't just affect parallel execution, it affects one of least
understood parts of postgres code. And where hard to find bugs, likely
to only trigger in production, are to be expected.

> And, by the way, the patch, aside from the deadlock.c portion, was
> posted back in October, admittedly without much fanfare, but nobody
> reviewed that or any other patch on this thread.

I think it's unrealistic to expect random patches without a commitest
entry, posted somewhere deep in a thread, to get a review when there's
so many open commitfest entries that haven't gotten feedback, and which
we are supposed to look at.

> If I'd waited for those reviews to come in, parallel query would not
> be committed now, nor probably in 9.6, nor probably in 9.7 or 9.8
> either.  The whole project would just be impossible on its face.

Yes, that's a problem. But you're not the only one facing it, and you've
argued hard against such an approach in some other cases.

> I think it's myopic to say "well, but this patch might have bugs".
> Very true.  But also, all the other parallelism patches that are
> already committed or that are still under review but which can't be
> properly tested without this patch might have bugs, too, so you've got
> to weigh the risk that this patch might get better if I wait longer to
> commit it against the possibility that not having committed it reduces
> the chances of finding bugs elsewhere.  I don't want it to seem like
> I'm forcing this down the community's throat - I don't have a right to
> do that, and I will certainly revert this patch if that is the
> consensus.  But that is not what I think best myself.  What I think
> would be better is to (1) make an effort to get the buildfarm testing
> which this patch enables up and running as soon as possible and (2)
> for somebody to read over the committed code and raise any issues that
> they find.  Or, for that matter, to read over the committed code for
> any of the *other* parallelism patches and raise any issues that they
> find with *that* code.  There's certainly scads of code here and this
> is far from the only bit that might have bugs.

I think you are, and *you have to*, walk a very thin line here. I agree
that realistically there's just nobody with the bandwidth to keep up
with a fully loaded Robert. Not without ignoring their own stuff at
least. And I think the importance of what you're building means we need
to be flexible.  But I think that thin line in turn means that you have
to be *doubly* careful about communication. I.e. post new infrastructure
to new threads, "warn" that you're intending to commit something
potentially needing debate/review, etc.

> Oh: another thing that I would like to do is commit the isolation
> tests I wrote for the deadlock detector a while back, which nobody has
>

Re: [HACKERS] a raft of parallelism-related bug fixes

On 2016-02-08 15:18:13 -0500, Robert Haas wrote:
> I agree that you had to be pretty deeply involved in that thread to
> follow everything that was going on.  But it's not entirely fair to
> say that it was impossible for anyone else to get involved.   Both
> Amit and I, mostly Amit, posted directions at various times saying:
> here is the sequence of patches that you currently need to apply as of
> this time.  There was not a heck of a lot of evidence that anyone was
> doing that, though, though I think a few people did, and towards the
> end things changed very quickly as I committed patches in the series.
> We certainly knew what each other were doing and not because of some
> hidden off-list collaboration that we kept secret from the community -
> we do talk every week, but almost all of our correspondence on those
> patches was on-list.

I think having a public git tree, that contains the current state, is
greatly helpful for that. Just announce that you're going to screw
wildly with history, and that you're not going to be terribly careful
about commit messages.  That means observers can just do a fetch and a
reset --hard to see the absolutely latest and greatest.  By all means
post a series to the list every now and then, but I think for minor
changes it's perfectly sane to say 'pull to see the fixups for the
issues you noticed'.

> I think it's an inherent peril of complicated patch sets that people
> who are not intimately involved in what is going on will have trouble
> following just because it takes a lot of work.

True. But it becomes doubly hard if there's no up-to-date high level
design overview somewhere outside $sizeable_brain. I know it sucks to
write these, believe me. Especially because one definitely feels that
nobody is reading those.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

On Mon, Feb 8, 2016 at 5:27 PM, Andres Freund  wrote:
>> > contentious issue, a few days after the initial post. Hm. Not sure how
>> > you'd react if you weren't the author.
>>
>> Probably not very well.  Do you want me to revert it?
>
> No. I want(ed) to express that I am not comfortable with how this got
> in. My aim wasn't to generate a flurry of responses with everybody
> piling on, or anything like that. But it's unfortunately hard to
> avoid. I wish I knew a way, besides only sending private mails. Which I
> don't think is a great approach either.
>
> I do agree that we need something to tackle this problem, and that this
> quite possibly is the least bad way to do this. And certainly the only
> one that's been implemented and posted with any degree of completeness.
>
> But even given the last paragraph, posting a complex new patch in a
> somewhat related thread, and then pushing it 5 days later is pretty darn
> quick.

Sorry.  I understand your discomfort, and you're probably right.  I'll
try to handle it better next time.  I think my frustration with the
process got the better of me a little bit here.  This patch may very
well not be perfect, but it's sure as heck better than doing nothing,
and if I'd gone out of my way to say "hey, everybody, here's a patch
that you might want to object to" I'm sure I could have found some
volunteers to do just that.  But, you know, that's not really what I
want.  What I want is somebody to do a detailed review and help me fix
whatever the problems the patch may have.  And ideally, I'd like that
person to understand that you can't have parallel query without doing
something in this area - which I think you do, but certainly not
everybody probably did - and that a lot of simplistic, non-invasive
ideas for how to handle this are going to be utterly inadequate in
complex cases.  Unless you or Noah want to take a hand, I don't expect
to get that sort of review.  Now, that having been said, I think your
frustration with the way I handled it is somewhat justified, and since
you are not arguing for a revert I'm not sure what I can do except try
not to let my frustration get in the way next time.  Which I will try
to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes