[sqlite] Diff two tables as fast as diff(1) (of sorted files)

Keith Medcalf Wed, 18 Nov 2015 23:22:51 -0700

On Wednesday, 18 November, 2015 20:36, Nico Williams <nico at cryptonector.com> 
said:
> On Thu, Nov 19, 2015 at 12:39:41AM +0000, Simon Slavin wrote:
> > On 19 Nov 2015, at 12:26am, Nico Williams <nico at cryptonector.com> wrote:
> > > two concurrent scans of the same table should be able to go faster
> > > than the same two scans in series.
>
> > SQLite is not processor-bound, it's file-bound.  Both
> > threads/processes need to continually read blocks from disk and a disk
> > can only answer one request at a time.


> Two table scans of the same data in sequence are going to cost more than
> two table scans of the same data concurrently.

The original was taking about the "same table".  "same data" is somewhat sloppy 
terminology.  Assuming that "same data" == "same table", then unless the read 
pages from the first scan get evicted from the cache when doing two table scans 
in series, the total physical I/O and CPU usage is the same between the two 
scenarios and the total elapsed wall clock time is the same in both cases, 
assuming that both table scans, even when parallelized are running, on the same 
thread, and that physical I/O due to cache eviction is eliminated by running 
the two table scans in lockstep.  

If the two parallelized scans are executed on separate threads, each running on 
a different CPU (or proper core -- not a pretend-core), still in lockstep, then 
the total utilized CPU and physical I/O will remain the same, but the elapsed 
time will be slightly less due to the achievable CPU MPR improvement.  However, 
since the scan is likely I/O limited rather than CPU limited, the increase will 
be negligible (if any at all).  

Of course, if running the two scans serially results in more physical I/O 
operations and that is the bottleneck, then running the scans in parallel 
(lockstep) will provide an improvement equal to the ratio between the physical 
I/O counts of the two methods (assuming CPU is not limited).

For the mathematical calculations, See Amdahl's Law 
https://en.wikipedia.org/wiki/Amdahl's_law

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

Reply via email to