On 1/23/15 10:16 PM, Amit Kapila wrote:
Further, if we want to just get the benefit of parallel I/O, then
I think we can get that by parallelising partition scan where different
table partitions reside on different disk partitions, however that is
a matter of separate patch.

I don't think we even have to go that far.

My experience with Postgres is that it is *very* sensitive to IO latency (not 
bandwidth). I believe this is the case because complex queries tend to 
interleave CPU intensive code in-between IO requests. So we see this pattern:

Wait 5ms on IO
Compute for a few ms
Wait 5ms on IO
Compute for a few ms
...

We blindly assume that the kernel will magically do read-ahead for us, but I've 
never seen that work so great. It certainly falls apart on something like an 
index scan.

If we could instead do this:

Wait for first IO, issue second IO request
Compute
Already have second IO request, issue third
...

We'd be a lot less sensitive to IO latency.

I wonder what kind of gains we would see if every SeqScan in a query spawned a 
worker just to read tuples and shove them in a queue (or shove a pointer to a 
buffer in the queue). Similarly, have IndexScans have one worker reading the 
index and another worker taking index tuples and reading heap tuples...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to