On Fri, Sep 1, 2017 at 7:42 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: >> I'm thinking about something like this: >> >> Gather >> -> Nested Loop >> -> Parallel Seq Scan >> -> Hash Join >> -> Seq Scan >> -> Parallel Hash >> -> Parallel Seq Scan >> >> The hash join has to be rescanned for every iteration of the nested loop. > > I think you mean: > > Gather > -> Nested Loop > -> Parallel Seq Scan > -> Parallel Hash Join > -> Parallel Seq Scan > -> Parallel Hash > -> Parallel Seq Scan
I don't, though, because that's nonsense. Maybe what I wrote is also nonsense, but it is at least different nonsense. Let's try it again with some table names: Gather -> Nested Loop -> Parallel Seq Scan on a -> (Parallel?) Hash Join -> Seq Scan on b (NOT A PARALLEL SEQ SCAN) -> Parallel Hash -> Parallel Seq Scan on c I argue that this is a potentially valid plan. b, of course, has to be scanned in its entirety by every worker every time through, which is why it's not a Parallel Seq Scan, but that requirement does not apply to c. If we take all the rows in c and stick them into a DSM-based hash table, we can reuse them every time the hash join is rescanned and, AFAICS, that should work just fine, and it's probably a win over letting each worker build a separate copy of the hash table on c, too. Of course, there's the "small" problem that I have no idea what to do if the b-c join is (or becomes) multi-batch. When I was thinking about this before, I was imagining that this case might Just Work with your patch provided that you could generate a plan shaped like this, but now I see that that's not actually true, because of multiple batches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers