On Fri, Jun 27, 2008 at 12:20:28PM -0700, Brooks, Phil scratched on the wall:

> I created my hashes in a perl script:
> 
>         $hash=md5($key);
>         $hash_num = unpack( "%32N*", $hash ) % 4294967295;
> 
> so they end up being big 32 bit integer numbers.
> 
> This ends up saving a lot of space, but the indexes end 
> up taking vastly longer to create than the simple creation of string
> indices.  Perhaps the randomness of the key values?  Or perhaps
> duplication?

  The hash values are going to be very "random."  If the string values
  were somewhat sorted, then these indexes will take a lot longer since
  the values need to be sorted as they are inserted into the index.
  Duplicates shouldn't be much different (in terms of cost) than the
  original string duplicates.

  Were you able to increase the cache size?  That will make a big
  difference in the sort process of the cache creation.  See:

  PRAGMA cache_size=<size>
  http://www.sqlite.org/pragma.html

  If you have a typical desktop PC, try setting it to 250000 or so.
  Be aware that pragma isn't "sticky", so you'll need to issue it in
  the specific session used to create the indexes.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to