[sqlite] Diff two tables as fast as diff(1) (of sorted files)

Simon Slavin Wed, 18 Nov 2015 23:52:14 +0000

On 18 Nov 2015, at 8:11pm, Nico Williams <nico at cryptonector.com> wrote:


> SELECT 'added., a.* FROM
>  (SELECT a.* FROM a EXCEPT SELECT b.* FROM b) a
> UNION ALL
> SELECT 'deleted', d.* FROM
>  (SELECT b.* FROM b EXCEPT SELECT a.* FROM a) d;
> 
> In other words, the SQL analog of a straightforward diff(1) of two
> [sorted] files.  I think this is the canonical way to do this in SQL,
> though there are variants.

I don't know why anyone would want to do that in SQL.  If you need to diff two 
tables you also need to know about UPDATE, not just INSERT and DELETE.

> Now, diff(1) (i.e., the Unix command) can do this very efficiently
> because it just moves a "cursor" in both files looking for different
> records (lines).

Text files have the concept of order.  SQL tables do not.  You are imposing an 
arbitrary idea of order on table rows which doesn't really make sense for data.

> SQLite3, on the other hand, does four table scans and further scans two
> sub-queries' result row-sets.

Instead of the above try the following.  Suppose that the primary key (i.e. a 
significant unique value like a company name) for the above tables is a column 
'pk', and suppose further that if this is not literally the primary key that 
there's a UNIQUE INDEX on it ...

SELECT 'added., a.* FROM
 (SELECT a.pk FROM a EXCEPT SELECT b.pk FROM b) a
UNION ALL
SELECT 'deleted', d.pk FROM
 (SELECT b.pk FROM b EXCEPT SELECT a.pk FROM a) d;

Simon.

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

Reply via email to