Right, you need to use the birthday attack algorithm for determining collision 
risk.

I wouldn't mind hashing - but there is an additional complication - the query I 
want the covered field in the index for is this:

SELECT ExtraCol FROM Foo WHERE UniqueCol > range_start AND UniqueCol < range_end

So I would need a hashing algorithm that's usably small and doesn't collide, 
yet preserves the properties that if  a < b then  hash(a) < hash(b).

- Deon

-----Original Message-----
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Paul Sanderson
Sent: Wednesday, March 1, 2017 2:19 PM
To: p...@sandersonforensics.com
Cc: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Subject: Re: [sqlite] Non-unique columns in unique index

Hmm - a bit of google foo and refreshing of chances of collions means my
1:9,223,372,036,854,775,808 is way off

That is the chance of any two hashes colliding - you'll have lot sof hashes.

The basic idea might be sound though even if you stick with a full MD5 it 
should save a chunk of storage

Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence

On 1 March 2017 at 22:13, Paul Sanderson <sandersonforens...@gmail.com>
wrote:

> As a bit of an off the wall suggestion you could try an MD5 (or even a 
> partial MD5 - half of the bytes)
>
> CREATE table hashes (
>     hash integer primary key; // just the first 64 bits of the hash of 
> uniquecol and extracol
> )
>
> as an integer primary key the hash would be an alias of the rowid and 
> so storage would be 8 bytes plus admin
>
> the chance of a random colliison based on a 64 bit hash would be (I 
> think)
> 1:9,223,372,036,854,775,808
>
> MD5 is broken but would work OK for this
>
> use a trigger to abort the insert into blah if the insert into hashes 
> fails.
>
>
> Paul
> www.sandersonforensics.com
> skype: r3scue193
> twitter: @sandersonforens
> Tel +44 (0)1326 572786 <+44%201326%20572786>
> http://sandersonforensics.com/forum/content.php?195-SQLite-
> Forensic-Toolkit -Forensic Toolkit for SQLite email from a work 
> address for a fully functional demo licence
>
> On 1 March 2017 at 21:54, Simon Slavin <slav...@bigfraud.org> wrote:
>
>>
>> On 1 Mar 2017, at 9:41pm, Deon Brewis <de...@outlook.com> wrote:
>>
>> > Yeah ok, but that is paltry compared with the gb's of diskspace 
>> > that
>> the actual second index takes up. But thanks for clarifying.
>>
>> Ah.  If it’s really GBs of disk space then I can see why you’d look 
>> for alternative solutions.
>>
>> But I have a 43 GB database file which could be 20 GB without an 
>> extra index.  I could have written that extra check in code, and 
>> reduced the file size, but I decided not to.  Because once I had 
>> developed procedures to handle a 20 GB file, I might was well be dealing 
>> with a 43 GB file anyway.
>>
>> Simon.
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to