On 3 Apr 2012, at 12:27am, Webdude <[email protected]> wrote:

> A key part of the design requires that if another user who is using the same 
> program, (and probably would have to be using the same version of the SQLite 
> engine I suspect), if they put exactly the same items into their database and 
> the program runs a hash on their resulting file, then both hash values would 
> be the same (= both files are exactly the same).

Two identical SQLite database files will, of course, represent the same SQLite 
data.  But you can do the same INSERT commands in a different order and produce 
two different database files which represent the same data.  Most SQL commands 
will return the same results for both files, but the files on disk will have 
different contents and therefore hash differently.  And if you know what you're 
doing you can arrange it so that all SQL commands would return the same data 
but the files would hash differently.

> Does anyone know if SQLite stores additional unique internal information such 
> as timestamps etc. that would affect this, and if so could these "additional 
> to the data" variable features be disabled in any way?

SQLite files do contain some metadata like row numbers and internal page 
hashes, but nothing like timestamps unless the programmer generated them in 
software.

> Also, would they have to add the data in the same order etc?

Inserting the same data in the same order on the same platform with the same 
(PRAGMA) settings would result in the files matching identically.  But you want 
more than that.

>  Or would/could the data be encouraged to be internally ordered the same?

> 
> Parsing through the database item by item in some fixed pattern to try to 
> determine a unique content id is not an option, the two separately produced 
> files must be the exactly the same.
> 
> In short is it possible to use or tweak SQLite so that 2 identical sets of 
> data produced by 2 separate people would both file hash to the same value?

There is no simple way to modify SQLite (or any SQL engine) to do this.  The 
flexibility to, for example, enter an 'A' entry after ten thousand 'Z' entries 
are what makes SQL engines a fast way to manipulate data.  There is a process 
you can go through to prove that two SQLite files have the same data in, but it 
will take time (vaguely) proportional to the amount of data in the file, and 
for multi-megabyte databases it can be pretty long.

SQL databases aren't unusual in this.  You will find the same problem with 
anything more than a simple text file.  Take a Word file, insert an 'x', then 
delete it, then save the resulting file under a different name.  Same content 
in terms of meaning, but the files will have many bits different internally and 
their hashes will be different.

Simon.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to