On Tue, 2016-01-26 at 18:40 -0800, Scotty C wrote:
> alright george, i'm open to new ideas. here's what i've got going.
> running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram. my
> key is 128 bits with ~256 bits per record. so my 1 gb file contains
> ~63 million records and ~32 million keys. about 8% will be dupes
> leaving me with ~30 million keys. i run a custom built hash. i use
> separate chaining with a vector of bignums. i am willing to let my
> chains run up to 5 keys per chain so i need a vector of 6 million
> pointers. that's 48 mb for the array. another 480 mb for the bignums.
> let's round that sum to .5 gb. i have another rather large bignum in
> memory that i use to reduce but not eliminate record duplication of
> about .5 gb. i'm attempting to get this thing to run in 2 places so i
> need 2 hashes. add this up .5+.5+.5 is 1.5 gb and that gets me to
> about my memory limit. the generated keys are random but i use one of
> the associated fields for sorting during the initial write to the
> hard drive. what goes in each of those files is totally random but
> dupes do not run across files. also, the number of keys is >1e25.
> 

Sorry, I haven't read through the entire conversation, so I hope I'm
not missing anything. Is there anything stopping you from restructuring
the data on disk and using the hash directly from there (possibly with
the help of a cache if speed is important)? For example, let's say each
entry is 256 bits. Use something like "(file-position dbfile (* 32
(hash-custom key)))" to seek over to the appropriate entry on disk and
read just the entry you need (using whatever colliosion resolution).
Then you'll be using no auxillary memory (unless your caching, which
can just be a smaller ram hash table). Unless of course I'm just
missing something completly.

Regards,
Brandon Thomas

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to