alright george, i'm open to new ideas. here's what i've got going. running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram. my key is 128 bits with ~256 bits per record. so my 1 gb file contains ~63 million records and ~32 million keys. about 8% will be dupes leaving me with ~30 million keys. i run a custom built hash. i use separate chaining with a vector of bignums. i am willing to let my chains run up to 5 keys per chain so i need a vector of 6 million pointers. that's 48 mb for the array. another 480 mb for the bignums. let's round that sum to .5 gb. i have another rather large bignum in memory that i use to reduce but not eliminate record duplication of about .5 gb. i'm attempting to get this thing to run in 2 places so i need 2 hashes. add this up .5+.5+.5 is 1.5 gb and that gets me to about my memory limit. the generated keys are random but i use one of the associated fields for sorting during the initial write to the hard drive. what goes in each of those files is totally random but dupes do not run across files. also, the number of keys is >1e25.
-- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.