>>>> rc = hp.heap()[0].bysize[0].byid[0].rp[5].theone >>>> verinfos = set([verinfo for (verinfo,shnum) in rc.cache.keys()]) >>>> len(verinfos) > 1 >>>> > > So there's only one 'verinfo' value there. The size of the cache repr() > is indeed pretty big.
Huh, that knocks out my theory that ResponseCache is living longer than expected (and thus accumulating cache entries from many obsolete verinfos). I mean, it might be living longer than expected, but that's not as obviously the problem as I previously thought it was >>>> len(repr(rc.cache)) > 258270546 Hm. How big is the mutable file? And what is k/N? If the overlap-detecting code is working right, then we could basically store the full contents of every share in the cache, for a size of roughly N/k*filesize. (remember that repr() will expand most bytes into \xAB, for an extra 4x expansion). (the overlap-detecting code might not be doing the right thing, or might not even be completely written, so we could also wind up storing the same data multiple times, with slightly different start/end edges. But I don't really think this what's happening here). So, my root assumption when writing this code several years ago was that SMDF files would remain Small, so things like a cache of all current shares wouldn't be a significant memory problem. (This assumption appears implicitly in many of the design choices, and was made explicit in the hard-coded limit on mutable file size, but then I was talked into removing that limit, allowing the less-obvious issues to surface slowly). The cache is populated during servermap-update operations, and read during file-retrieve ops (downloads). I think my original intention was that the small reads we do during servermap update (4kB) should not be wasted/duplicated when we later read the full share, to reduce roundtrips on small files. Given that the mapupdate reads are so small, I'm really confused as to how the cache got so big. > Would it be possible that some field of the verinfo tuple (such as > offset) are different between calls to mutable.filenode._add_to_cache() > and mutable.filenode._read_from_cache()? That would show up as multiple values of the verinfo tuple, right? But you only saw one value, right? ds> (nodemaker.py create_from_cap will ensure that a NodeMaker creates ds> only one node object for a given cap. However, there will be ds> separate DirectoryNode objects and separate underlying ds> MutableFileNode objects for read and write caps to a directory -- ds> see also ticket #1105. But this would not cause a performance ds> regression since the two MutableFileNode objects would currently ds> have separate ResponseCaches.) >From a correctness aspect, we really want to have exactly one MutableFileNode object for any given readcap, to avoid UCWE-with-self. We can have multiple DirectoryNodes that share a MutableFileNode, I think the locking will handle that. And we can have a separate readcap-node and writecap-node for the same file (since the readcap-node won't ever modify the shares: it might fall out of date due to a local read where it should have been automatically updated, but that shouldn't cause UCWE). So, next step is to probably add some logging to the calls to _add_to_cache and _read_from_cache, and figure out how they're being called. And to look at the overall size of the file and see if it's just that we're caching the "normal" amount of data but the file is a lot bigger than the design ever expected. If the latter, then either we should get rid of the ResponseCache (and take an extra round-trip performance hit for small files), or only use it on small files somehow. The biggest question in my mind is how a series of 4kB reads (all of which start at the beginning of the share, I think: constant offset) managed to accumulate to 260MB. cheers, -Brian _______________________________________________ tahoe-dev mailing list tahoe-dev@tahoe-lafs.org http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev