[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-19 Thread Nico Williams
On Wed, Nov 18, 2015 at 11:22:51PM -0700, Keith Medcalf wrote: > On Wednesday, 18 November, 2015 20:36, Nico Williams cryptonector.com> said: > > On Thu, Nov 19, 2015 at 12:39:41AM +, Simon Slavin wrote: > > > On 19 Nov 2015, at 12:26am, Nico Williams > > > wrote: > > > > two concurrent

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-19 Thread Stadin, Benjamin
Sqlitediff diffs a same db (one db changed over time). If you are looking for diffing data using the same db scheme that has been generated at different times (producing different instances of the db scheme, eg you import data from somewhere and crate a same db scheme from it, but different

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-19 Thread Simon Slavin
On 19 Nov 2015, at 12:26am, Nico Williams wrote: > two concurrent scans of the same table should be able to go faster > than the same two scans in series. SQLite is not processor-bound, it's file-bound. Both threads/processes need to continually read blocks from disk and a disk can only

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Simon Slavin
On 18 Nov 2015, at 10:58pm, Nico Williams wrote: > Can SQLite3 run the sub-queries of a UNION ALL with any degree of > concurrency? E.g., with its co-routines? SQLite always has the bottleneck of access to the file storage medium. Many posts to this list have complained that implementing

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Simon Slavin
On 18 Nov 2015, at 8:11pm, Nico Williams wrote: > SELECT 'added., a.* FROM > (SELECT a.* FROM a EXCEPT SELECT b.* FROM b) a > UNION ALL > SELECT 'deleted', d.* FROM > (SELECT b.* FROM b EXCEPT SELECT a.* FROM a) d; > > In other words, the SQL analog of a straightforward diff(1) of two >

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Stadin, Benjamin
I've written a Sqlite diff tool that does this fast, but using a different approach. I'm adding a column named md5checksum to each to be diffed table, and add a index to it. A small tool then generates checksums for all rows in the table. Instead of comparing all columns, only the difference

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Keith Medcalf
On Wednesday, 18 November, 2015 20:36, Nico Williams said: > On Thu, Nov 19, 2015 at 12:39:41AM +, Simon Slavin wrote: > > On 19 Nov 2015, at 12:26am, Nico Williams wrote: > > > two concurrent scans of the same table should be able to go faster > > > than the same two scans in series. > > >

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread E.Pasma
op 18-11-2015 21:23 schreef Richard Hipp op drh at sqlite.org: > On 11/18/15, Nico Williams wrote: >> Consider two tables with the same columns and primary keys, and then >> consider this query: >> >> SELECT 'added., a.* FROM >> (SELECT a.* FROM a EXCEPT SELECT b.* FROM b) a >> UNION ALL >>

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams
On Wed, Nov 18, 2015 at 09:11:13PM -0500, Richard Hipp wrote: > On 11/18/15, Nico Williams wrote: > > > > Now, diff(1) (i.e., the Unix command) can do this very efficiently > > because it just moves a "cursor" in both files looking for different > > records (lines). > > > > Umm.. Diff is a

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams
On Thu, Nov 19, 2015 at 12:39:41AM +, Simon Slavin wrote: > On 19 Nov 2015, at 12:26am, Nico Williams wrote: > > two concurrent scans of the same table should be able to go faster > > than the same two scans in series. > > SQLite is not processor-bound, it's file-bound. Both >

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams
On Thu, Nov 19, 2015 at 12:43:18AM +, Stadin, Benjamin wrote: > Sqlitediff diffs a same db (one db changed over time). If you are > looking for diffing data using the same db scheme that has been > generated at different times (producing different instances of the db > scheme, eg you import

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Richard Hipp
On 11/18/15, Nico Williams wrote: > > Now, diff(1) (i.e., the Unix command) can do this very efficiently > because it just moves a "cursor" in both files looking for different > records (lines). > Umm.. Diff is a little more complicated than that. See

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams
On Wed, Nov 18, 2015 at 11:56:50PM +, Simon Slavin wrote: > On 18 Nov 2015, at 10:58pm, Nico Williams wrote: > > Can SQLite3 run the sub-queries of a UNION ALL with any degree of > > concurrency? E.g., with its co-routines? > > SQLite always has the bottleneck of access to the file storage

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams
On Wed, Nov 18, 2015 at 11:52:14PM +, Simon Slavin wrote: > On 18 Nov 2015, at 8:11pm, Nico Williams wrote: > > SELECT 'added., a.* FROM > > (SELECT a.* FROM a EXCEPT SELECT b.* FROM b) a > > UNION ALL > > SELECT 'deleted', d.* FROM > > (SELECT b.* FROM b EXCEPT SELECT a.* FROM a) d; > > >

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams
Can SQLite3 run the sub-queries of a UNION ALL with any degree of concurrency? E.g., with its co-routines? That might help three union'ed scans of the same table go pretty fast, maybe not much slower than one scan. In which case this might compare quite well to diff(1). Nico --

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams
On Wed, Nov 18, 2015 at 11:10:47PM +0100, E.Pasma wrote: > op 18-11-2015 21:23 schreef Richard Hipp op drh at sqlite.org: > > The "sqldiff.exe" command-line utility has an (undocumented and > > unsupported) "--debug 2" option that will show you the SQL that it > > uses to do a diff between two

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams
On Wed, Nov 18, 2015 at 03:23:33PM -0500, Richard Hipp wrote: > If it's so common, why are you the first to request it :-) And, yeah, > according to the WWPD principal, if Postgres doesn't do it, we > probably won't be in a big hurry to do it either... I've written this sort of query many times.

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Richard Hipp
On 11/18/15, Nico Williams wrote: > Consider two tables with the same columns and primary keys, and then > consider this query: > > SELECT 'added., a.* FROM > (SELECT a.* FROM a EXCEPT SELECT b.* FROM b) a > UNION ALL > SELECT 'deleted', d.* FROM > (SELECT b.* FROM b EXCEPT SELECT a.* FROM a)

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams
Consider two tables with the same columns and primary keys, and then consider this query: SELECT 'added., a.* FROM (SELECT a.* FROM a EXCEPT SELECT b.* FROM b) a UNION ALL SELECT 'deleted', d.* FROM (SELECT b.* FROM b EXCEPT SELECT a.* FROM a) d; In other words, the SQL analog of a