>On Thu, Jun 26, 2008 at 11:58 AM, Brooks, Phil <[EMAIL PROTECTED]> wrote: >> Can anyone suggest additional things I might try to: >> >> 1) reduce db creation time >> - or - >> 2) reduce the size of the file? >>
> Jay Kreibich suggested creating > a hash number and making that a key for the match table, so I would create > another column in the table that would contain a hashed integer, create an > index on that, then when I need to do a query, I could select on the > hashed index and then compare each record returned against my query > string. This is actually similar to the solution I am using now in my > ondisk file format. ... I created my hashes in a perl script: $hash=md5($key); $hash_num = unpack( "%32N*", $hash ) % 4294967295; so they end up being big 32 bit integer numbers. This ends up saving a lot of space, but the indexes end up taking vastly longer to create than the simple creation of string indices. Perhaps the randomness of the key values? Or perhaps duplication? -- my data is arranged by 'group' and the strings can be (and frequently are) repeated from group to group (though they are unique within a group) Any guesses on why the integer indexes take so much longer to create? I finally set my script up to time each step: import the two tables 6 minutes. create foo left and foo right indexes (the big table) about 25 minutes each. create bar left and bar right indexes (the small table) about 2 minutes each. database size 8GB. When I key on the strings themselves: import the two tables 6 minutes. create foo left and foo right indexes (the big table) about 11 minutes each. create bar left and bar right indexes (the small table) about 1 minutes each. database size 14GB. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users