I forgot to say about hash...

My personal choice will be MurmurHash2 64 bit function
http://murmurhash.googlepages.com/
http://en.wikipedia.org/wiki/MurmurHash2 - lots of implementations here

It's fast (even in managed impls), have good characteristics and free.
Don't use CRC64...

P.S. You still have a chance ~ 1/10`000`000`000 that two strings in 1
billion dictionary will have same hash. So you probably should make very
small table cached in memory that will have collision resolvings - string
key that was changed to other string key w/o collision. That's simple to do
and will remove a chance of collision while keeping additional checks very
fast (due to small size of the collision check table - I believe you will
never see anything in that table at all).
-- 
View this message in context: 
http://www.nabble.com/very-large-SQLite-tables-tp24201098p24219678.html
Sent from the SQLite mailing list archive at Nabble.com.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to