On Tue, 2016-01-26 at 22:48 -0800, Scotty C wrote:
> ok brandon, that's a thought. build the hash on the hard drive at the
> time of data creation. you mention collision resolution. so let me
> build my hash on the hard drive using my 6 million buckets but
> increase the size of each bucket from 5 slots to 20. right?
> i can't exactly recreate my vector/bignum hash on the hard drive
> because i can't dynamically resize the buckets like i can the
> bignums. this gives me a 4 gb file whereas my original was 1 gb. i
> have enough space for that so that's not a problem.

Sure, a bigger file works. Your bignums are probably implemented using
linked lists, if I'm not mistaken. That's why they are O(1) resizable.
You could do the same thing on hdd. The problem is that you might need
to seek and read through the file more with linked lists. Since your
probably paging, your implementation also has this same problem. You
can get rid of this problem in both by using vectos though, as you
suggest. And you can have a proccess in the background that resizes the
file (by creating a new one) when your other one gets too full, so you
don't block operations.

> so as my buckets fill up they head towards the average of 5 data
> items per bucket. so on average here's what happens with each hd hash
> record. i go to my hd hash and read 3.5 (think about it) items and
> 90% of the time i don't find my data so i do a write. in my process i
> do an initial write, then a read, a write, a read, a write. compare:
> 3.5 vs 2 reads; 1 vs 3 writes. the reads are more costly and if i
> exceed 20 items in a bucket the hd hash breaks. what do you think? is
> it worth it?
> 

I think that implementation has room for improvement. Let's say you
look for a record, it does not exist, and you need to write it. So, you
seek to the location, read the bucket into ram (1 read). If the entry
isn't in the bucket, you seek to the location in the bucket and write
your entry (1 write). Of course to do this, you'd think you'd need to
make a function for your hash that searches for an entry, and uses a
fallback to write the record if it fails. That's a reasonable function
though. But you can also introduce the idea of using a cache, and have
a background proccess update the file (that way your hahs oeprations
aren't waiting on the hdd). Though, your operating systems cache file
cache might even be taking care of this for you, so you might want to
do some benchmarking.

There are probably several other optimizations you can make, and you
can probably find them by going through the source/docs of other
(key,value) storage solutions. Or if you wanted to cut down on work,
you can just use one of the other (key,value) storage solutions and use
it via libffi.

Regards,
Brandon Thomas

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to