>On Thu, Jun 26, 2008 at 11:58 AM, Brooks, Phil <[EMAIL PROTECTED]>
wrote:
>> Can anyone suggest additional things I might try to:
>>
>>    1) reduce db creation time
>> - or -
>>    2) reduce the size of the file?
>>

> Jay Kreibich suggested creating
> a hash number and making that a key for the match table, so I would
create
> another column in the table that would contain a hashed integer,
create an
> index on that, then when I need to do a query, I could select on the 
> hashed index and then compare each record returned against my query
> string.  This is actually similar to the solution I am using now in my
> ondisk file format.

...

I created my hashes in a perl script:

        $hash=md5($key);
        $hash_num = unpack( "%32N*", $hash ) % 4294967295;

so they end up being big 32 bit integer numbers.

This ends up saving a lot of space, but the indexes end 
up taking vastly longer to create than the simple creation of string
indices.  Perhaps the randomness of the key values?  Or perhaps
duplication?

-- my data is arranged by 'group' and the strings can be (and frequently
are)  repeated from group to group (though they are unique within a
group)
Any guesses on why the integer indexes take so much longer to create?


I finally set my script up to time each step:

import the two tables 6 minutes.
create foo left and foo right indexes (the big table) about 25 minutes
each. 
create bar left and bar right indexes (the small table) about 2 minutes
each.
database size 8GB.

When I key on the strings themselves:

import the two tables 6 minutes.
create foo left and foo right indexes (the big table) about 11 minutes
each. 
create bar left and bar right indexes (the small table) about 1 minutes
each.
database size 14GB.




_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to