On Wednesday, 18 November, 2015 20:36, Nico Williams <nico at cryptonector.com> said: > On Thu, Nov 19, 2015 at 12:39:41AM +0000, Simon Slavin wrote: > > On 19 Nov 2015, at 12:26am, Nico Williams <nico at cryptonector.com> wrote: > > > two concurrent scans of the same table should be able to go faster > > > than the same two scans in series. > > > SQLite is not processor-bound, it's file-bound. Both > > threads/processes need to continually read blocks from disk and a disk > > can only answer one request at a time.
> Two table scans of the same data in sequence are going to cost more than > two table scans of the same data concurrently. The original was taking about the "same table". "same data" is somewhat sloppy terminology. Assuming that "same data" == "same table", then unless the read pages from the first scan get evicted from the cache when doing two table scans in series, the total physical I/O and CPU usage is the same between the two scenarios and the total elapsed wall clock time is the same in both cases, assuming that both table scans, even when parallelized are running, on the same thread, and that physical I/O due to cache eviction is eliminated by running the two table scans in lockstep. If the two parallelized scans are executed on separate threads, each running on a different CPU (or proper core -- not a pretend-core), still in lockstep, then the total utilized CPU and physical I/O will remain the same, but the elapsed time will be slightly less due to the achievable CPU MPR improvement. However, since the scan is likely I/O limited rather than CPU limited, the increase will be negligible (if any at all). Of course, if running the two scans serially results in more physical I/O operations and that is the bottleneck, then running the scans in parallel (lockstep) will provide an improvement equal to the ratio between the physical I/O counts of the two methods (assuming CPU is not limited). For the mathematical calculations, See Amdahl's Law https://en.wikipedia.org/wiki/Amdahl's_law