On 3 Apr 2012, at 12:27am, Webdude <[email protected]> wrote: > A key part of the design requires that if another user who is using the same > program, (and probably would have to be using the same version of the SQLite > engine I suspect), if they put exactly the same items into their database and > the program runs a hash on their resulting file, then both hash values would > be the same (= both files are exactly the same).
Two identical SQLite database files will, of course, represent the same SQLite data. But you can do the same INSERT commands in a different order and produce two different database files which represent the same data. Most SQL commands will return the same results for both files, but the files on disk will have different contents and therefore hash differently. And if you know what you're doing you can arrange it so that all SQL commands would return the same data but the files would hash differently. > Does anyone know if SQLite stores additional unique internal information such > as timestamps etc. that would affect this, and if so could these "additional > to the data" variable features be disabled in any way? SQLite files do contain some metadata like row numbers and internal page hashes, but nothing like timestamps unless the programmer generated them in software. > Also, would they have to add the data in the same order etc? Inserting the same data in the same order on the same platform with the same (PRAGMA) settings would result in the files matching identically. But you want more than that. > Or would/could the data be encouraged to be internally ordered the same? > > Parsing through the database item by item in some fixed pattern to try to > determine a unique content id is not an option, the two separately produced > files must be the exactly the same. > > In short is it possible to use or tweak SQLite so that 2 identical sets of > data produced by 2 separate people would both file hash to the same value? There is no simple way to modify SQLite (or any SQL engine) to do this. The flexibility to, for example, enter an 'A' entry after ten thousand 'Z' entries are what makes SQL engines a fast way to manipulate data. There is a process you can go through to prove that two SQLite files have the same data in, but it will take time (vaguely) proportional to the amount of data in the file, and for multi-megabyte databases it can be pretty long. SQL databases aren't unusual in this. You will find the same problem with anything more than a simple text file. Take a Word file, insert an 'x', then delete it, then save the resulting file under a different name. Same content in terms of meaning, but the files will have many bits different internally and their hashes will be different. Simon. _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

