On Wed, Jan 28, 2015 at 9:12 AM, Thom Brown <t...@linux.com> wrote: > On 28 January 2015 at 14:03, Robert Haas <robertmh...@gmail.com> wrote: >> The problem here, as I see it, is that we're flying blind. If there's >> just one spindle, I think it's got to be right to read the relation >> sequentially. But if there are multiple spindles, it might not be, >> but it seems hard to predict what we should do. We don't know what >> the RAID chunk size is or how many spindles there are, so any guess as >> to how to chunk up the relation and divide up the work between workers >> is just a shot in the dark. > > Can't the planner take effective_io_concurrency into account?
Maybe. It's answering a somewhat the right question -- to tell us how many parallel I/O channels we think we've got. But I'm not quite sure what the to do with that information in this case. I mean, if we've got effective_io_concurrency = 6, does that mean it's right to start scans in 6 arbitrary places in the relation and hope that keeps all the drives busy? That seems like throwing darts at the wall. We have no idea which parts are on which underlying devices. Or maybe it mean we should prefetch 24MB, on the assumption that the RAID stripe is 4MB? That's definitely blind guesswork. Considering the email Amit just sent, it looks like on this machine, regardless of what algorithm we used, the scan took between 3 minutes and 5.5 minutes, and most of them took between 4 minutes and 5.5 minutes. The results aren't very predictable, more workers don't necessarily help, and it's not really clear that any algorithm we've tried is clearly better than any other. I experimented with prefetching a bit yesterday, too, and it was pretty much the same. Some settings made it slightly faster. Others made it slower. Whee! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers