gneuner2 (george), you are over thinking this thing. my test data of 1 gb is but a small sample file. i can't even hash that small 1 gb at the time of data creation. the hashed data won't fit in ram. at the time i put the redundant data on the hard drive, i do some constant time sorting so that the redundant data on the hard drive is contained in roughly 200 usefully sorted files. some of these files will be small and can be hashed with a single read, hash and write. some will be massive (data won't fit in ram) and must be split further. this produces another another type of single read, hash and write. these split files can now be fully hashed which means a second read, hash and write. recombining the second level files is virtually instantaneous (copy-port) relative to the effort spent to get to that point. all of these operations are constant time. it would be nice to cut into that big fat hard drive induced C but i can't do it with a single read and write on the larger files.
-- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.