In the 'wishful hand waving' department :

read index -> determine (tuple id,page) to hit in table -> for each of these, tell the OS 'I'm gonna need these' via a NON BLOCKING call. Non blocking because you feed the information to the OS as you read the index, streaming it.

Meanwhile, the OS accumulates the requests in an internal FIFO, reorganizes them according to the order best suited to good disk head movements, then reads them in clusters, and calls a callback inside the application when it has data available. Or the application polls it once in a while to get a bucketload of pages. The 'I'm gonna need these()' syscall would also sometimes return 'hey, I'm full, read the pages I have here waiting for you before asking for new ones'.

A flag would tell the OS if the application wanted the results in any order, or with order preserved.
Without order preservation, if the application has requested twice the same page with different tuple id's, the OS would call the callback only once, giving it a list of the tuple id's associated with that page.

It involves a tradeoff between memory and performance : as the size of the FIFO increases, likelihood of good contiguous disk reading increases. However, the memory structure would only contain page numbers and tuple id's, so it could be pretty small.

        Returning the results in-order would also need more memory.

It could be made very generic if instead of 'tuple id' you read 'opaque application data', and instead of 'page' you read '(offset, length)'.

This structure actually exists already in the Linux Kernel, it's called the Elevator or something, but it works for scheduling reads between threads.

You can also read 'internal not yet developed postgres cache manager' instead of OS if you don't feel like talking kernel developers into implementing this thing.

(Those are ReadFileScatter and WriteFileGather)

---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to