On 1/27/2016 10:50 AM, Brandon Thomas wrote:
On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote:
> On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas
> <bthoma...@gmail.com> wrote:
>
> > Is there anything stopping you from restructuring
> > the data on disk and using the hash directly from there
>
> Scotty's hash table is much larger than he thinks it is and very
> likely is being paged to disk already.  Deliberately implementing it
> as a disk file is unlikely to improve anything.

That's a good point to keep in mind. But there are advantages,
including faster startup time, using less ram+swap, easier to keep the
file updated and it makes it easier to make a resize solution. There
are probably more, but basically it's the reasons why all large
(key,value) storage solutions I've herd of use an explicit file instead
of swap.

I miscalculated the scale of Scotty's hash structure - it's not as bad as I thought initially. But even so, it is of a scale where it is unwieldy and bound to have virtual memory problems unless the machine is dedicated.

Hashing is latency sensitive - it was designed to be a memory resident technique. Obviously it _can_ be done using file based buckets ... the effect is of querying an ISAM database for ever access. The problem is that the latency increases by orders of magnitude: even from resident blocks, every access involves the file system API and the kernel. You did mention (user space) caching, but that greatly complicates the solution.

Making the hash external I think is not a win - it definitely will handle much bigger files, but it will handle every file more slowly. I think it is better to leverage the file system rather than fight it.

YMMV,
George

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to