Right, you need to use the birthday attack algorithm for determining collision risk.
I wouldn't mind hashing - but there is an additional complication - the query I want the covered field in the index for is this: SELECT ExtraCol FROM Foo WHERE UniqueCol > range_start AND UniqueCol < range_end So I would need a hashing algorithm that's usably small and doesn't collide, yet preserves the properties that if a < b then hash(a) < hash(b). - Deon -----Original Message----- From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Paul Sanderson Sent: Wednesday, March 1, 2017 2:19 PM To: p...@sandersonforensics.com Cc: SQLite mailing list <sqlite-users@mailinglists.sqlite.org> Subject: Re: [sqlite] Non-unique columns in unique index Hmm - a bit of google foo and refreshing of chances of collions means my 1:9,223,372,036,854,775,808 is way off That is the chance of any two hashes colliding - you'll have lot sof hashes. The basic idea might be sound though even if you stick with a full MD5 it should save a chunk of storage Paul www.sandersonforensics.com skype: r3scue193 twitter: @sandersonforens Tel +44 (0)1326 572786 http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit -Forensic Toolkit for SQLite email from a work address for a fully functional demo licence On 1 March 2017 at 22:13, Paul Sanderson <sandersonforens...@gmail.com> wrote: > As a bit of an off the wall suggestion you could try an MD5 (or even a > partial MD5 - half of the bytes) > > CREATE table hashes ( > hash integer primary key; // just the first 64 bits of the hash of > uniquecol and extracol > ) > > as an integer primary key the hash would be an alias of the rowid and > so storage would be 8 bytes plus admin > > the chance of a random colliison based on a 64 bit hash would be (I > think) > 1:9,223,372,036,854,775,808 > > MD5 is broken but would work OK for this > > use a trigger to abort the insert into blah if the insert into hashes > fails. > > > Paul > www.sandersonforensics.com > skype: r3scue193 > twitter: @sandersonforens > Tel +44 (0)1326 572786 <+44%201326%20572786> > http://sandersonforensics.com/forum/content.php?195-SQLite- > Forensic-Toolkit -Forensic Toolkit for SQLite email from a work > address for a fully functional demo licence > > On 1 March 2017 at 21:54, Simon Slavin <slav...@bigfraud.org> wrote: > >> >> On 1 Mar 2017, at 9:41pm, Deon Brewis <de...@outlook.com> wrote: >> >> > Yeah ok, but that is paltry compared with the gb's of diskspace >> > that >> the actual second index takes up. But thanks for clarifying. >> >> Ah. If it’s really GBs of disk space then I can see why you’d look >> for alternative solutions. >> >> But I have a 43 GB database file which could be 20 GB without an >> extra index. I could have written that extra check in code, and >> reduced the file size, but I decided not to. Because once I had >> developed procedures to handle a 20 GB file, I might was well be dealing >> with a 43 GB file anyway. >> >> Simon. >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >> > > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users