Re: [sqlite] Hashing 2 SQLite db files with the same data

Simon Slavin Mon, 02 Apr 2012 20:37:17 -0700

On 3 Apr 2012, at 2:42am, Webdude <webd...@thewebdudes.com> wrote:

> Hi Simon,
> 
> thanks for helping me with this.
> 
>> Inserting the same data in the same order on the same platform
>> with the same (PRAGMA) settings would result in the files
>> matching identically.
> 
> Do you feel that the platform - Hardware / OS / some other factor could 
> influence the way SQLite performed its sequence?


SQLite stores data inside its files in blocks called 'pages'.  When you create 
a new database file SQLite has to pick a page size.  The page size it picks 
depends on some details about the hard disk the file will be created on (and 
also on some compilation settings).  To optimize speed it might, for instance, 
make pages the size of the disk's sectors.  So you can run code on a computer, 
one time writing your file to one hard disk, and another time writing to a hard 
disk with a different sector size, and end up with files with different page 
sizes, and these files will, of course, have different hashes.  For details, see

<http://www.sqlite.org/pragma.html#pragma_page_size>

> Here is my reply that I just sent to*Nico Williams, for more insight:
> 
> 
> *It's not important that the 2 db files are exactly the same all the time 
> that people are editing them, but only when they 'finalise' a 'package'.
> So what if some code in the 'packaging' process performed a sequence of 
> queries that read all the data from the db, table by table, and inserted it 
> into a new db.
> Would that same code process, running on the same data but on 2 different 
> machines, produce the exact same file byte for byte?
> Would hardware / OS / anything else affect the final sequence of bytes in the 
> file?

Correct question, answered as above: it's possible that the two files would be 
different on disk even if they contain identical SQL information.  That answer 
depends on the documentation being accurate and the OS returning correct 
information for your hard disks.

> I don't mind the extra coding, and reluctantly can put up with the extra time 
> taken to package at the end if need be.
> But I really need the final files to be the same so that anyone can confirm 
> the content by hashing the file itself even if they don't have the program 
> that reads it.
> Also, given a list of the contents, anyone could recreate the same exact file 
> using the program but can still prove the content just by using an 
> independent hash checker.

Sorry, I can't think of any way of checking that the data is the same without 
using real SQLite code to read the files.  The obvious thing to do would be to 
dump the SQL data the same way the SQLite3 shell tool does if you tell it to 
make a text file of the SQL commands needed to make the database.  This can 
take a long time and generate a long file.  And even then you can /still/ get 
different text out even if the data stored in the files is identical.  You'd 
have to know how the software worked to know if the differences mattered.

You'll have this problem with many different pieces of software.  So the 
situation is:

A) Hashes match: files are identical (except for the rare possibility of hash 
collision)
B) Hashes differ: files may or may not contain the same information

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Hashing 2 SQLite db files with the same data

Reply via email to