Thanks for the extensive testing! Did you see the same syscall pattern in strace output, as I did?
If yes, then the only reason I can think of that excuses the regression with my patch is that the SATA interface was maxed out when reading sequentially, while the very short latency of SSDs guarantees thousands of seek() operations per second. I was using an HDD, and in older measurements I was using a VM with mounted volume over iSCSI. The first imposes physical limits in the amount of seeks, and the second network round-trip limits. So you are right, it's probably very platform dependent, and the most important fix was to enlarge the underlying block size, that you have done. Dimitris
