On Tue, 2016-01-26 at 22:48 -0800, Scotty C wrote: > ok brandon, that's a thought. build the hash on the hard drive at the > time of data creation. you mention collision resolution. so let me > build my hash on the hard drive using my 6 million buckets but > increase the size of each bucket from 5 slots to 20. right? > i can't exactly recreate my vector/bignum hash on the hard drive > because i can't dynamically resize the buckets like i can the > bignums. this gives me a 4 gb file whereas my original was 1 gb. i > have enough space for that so that's not a problem.
Sure, a bigger file works. Your bignums are probably implemented using linked lists, if I'm not mistaken. That's why they are O(1) resizable. You could do the same thing on hdd. The problem is that you might need to seek and read through the file more with linked lists. Since your probably paging, your implementation also has this same problem. You can get rid of this problem in both by using vectos though, as you suggest. And you can have a proccess in the background that resizes the file (by creating a new one) when your other one gets too full, so you don't block operations. > so as my buckets fill up they head towards the average of 5 data > items per bucket. so on average here's what happens with each hd hash > record. i go to my hd hash and read 3.5 (think about it) items and > 90% of the time i don't find my data so i do a write. in my process i > do an initial write, then a read, a write, a read, a write. compare: > 3.5 vs 2 reads; 1 vs 3 writes. the reads are more costly and if i > exceed 20 items in a bucket the hd hash breaks. what do you think? is > it worth it? > I think that implementation has room for improvement. Let's say you look for a record, it does not exist, and you need to write it. So, you seek to the location, read the bucket into ram (1 read). If the entry isn't in the bucket, you seek to the location in the bucket and write your entry (1 write). Of course to do this, you'd think you'd need to make a function for your hash that searches for an entry, and uses a fallback to write the record if it fails. That's a reasonable function though. But you can also introduce the idea of using a cache, and have a background proccess update the file (that way your hahs oeprations aren't waiting on the hdd). Though, your operating systems cache file cache might even be taking care of this for you, so you might want to do some benchmarking. There are probably several other optimizations you can make, and you can probably find them by going through the source/docs of other (key,value) storage solutions. Or if you wanted to cut down on work, you can just use one of the other (key,value) storage solutions and use it via libffi. Regards, Brandon Thomas -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.