On Sun, Jan 11, 2015 at 6:01 AM, Stephen Frost <sfr...@snowman.net> wrote: > So, for my 2c, I've long expected us to parallelize at the relation-file > level for these kinds of operations. This goes back to my other > thoughts on how we should be thinking about parallelizing inbound data > for bulk data loads but it seems appropriate to consider it here also. > One of the issues there is that 1G still feels like an awful lot for a > minimum work size for each worker and it would mean we don't parallelize > for relations less than that size.
Yes, I think that's a killer objection. > [ .. ] and > how this thinking is an utter violation of the modularity we currently > have there. As is that. My thinking is more along the lines that we might need to issue explicit prefetch requests when doing a parallel sequential scan, to make up for any failure of the OS to do that for us. >> So, if the workers have been started but aren't keeping up, the master >> should do nothing until they produce tuples rather than participating? >> That doesn't seem right. > > Having the master jump in and start working could screw things up also > though. I don't think there's any reason why that should screw things up. There's no reason why the master's participation should look any different from one more worker. Look at my parallel_count code on the other thread to see what I mean: the master and all the workers are running the same code, and if fewer worker show up than expected, or run unduly slowly, it's easily tolerated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers