Shawn Willden wrote: > > Yeah, let TCP handle making sure the whole share arrives, then hash to > verify. Why the concern about data buffered in the outbound socket?
To keep the memory footprint down. Think of it this way: we could take an entire 1GB file, encrypt+encode it in a single pass, deliver everything to socket.write(), and then just sit back and wait for the ACK. But where will that share data live in the 12 hours it takes to get everything through your DSL upstream? In RAM. The actual call that Foolscap makes is a transport.write(), which is implemented in Twisted by appending the outbound data to a list and marking the socket as writeable (so that select() or poll() will wake up the process when that data can become written). The top-most 128KB of the list is handed to the kernel's socket.write() call, which is allowed to just accept part of it, leaving the rest in userspace. Typically, the kernel will have some fixed buffer size that it's willing to let userspace consume: that space is decreased when socket.write() is called, and increased when the far end ACKs another TCP segment (this basically extends TCP's buffering-window into the kernel). I don't know offhand how large this buffer is, but since every open TCP socket in the whole system gets one, I suspect it's pretty small, like 64KB or so. So the kernel will consume 64KB, and the transport's list (in userspace/python/Twisted) will consume N/k*1GB. Badness. Whereas, if we just put off creating later segments until the earlier ones have been retired, we don't consume more than a segment's worth of memory at any one time. We've always had low-memory-footprint as a goal for Tahoe, especially since the previous codebase which it replaced could hit multiple hundreds of MB and slam the entire system (memory footprint was roughly proportional to filesize, whereas in tahoe it's constant, based upon the 128KiB segment size). In 1.5.0, uploading got a bit more clever, and it creates multiple segments (but not all of them), and writes them all to the kernel, to try and keep the kernel's pipeline full. It uses a default 50kB pipeline size: running full-steam ahead until there is more than 50kB of outstanding (unACKed) data, then stalling the encoding process until that size drops below 50kB. So in exchange for another spike of 50kB per connection, we get a bit more pipeline fill. The details depend, of course, on the RTT and bandwidth.. in some quick tests, I got maybe a 10% speedup on small files over slow links. > I really, really like grid-side convergence. I'd vote for keeping it and > combining the message semantics. Yeah, me too. It feels like a good+useful place for convergence. Without it, clients must do considerable work to achieve the same savings (specifically time savings, by not encoding+uploading things which are already in the grid). > 1. My app -> local node -> helper -> grid > 2. My app -> helper (using helper as client) -> grid > 3. My app -> local node -> grid > > Option 1 seems to give the best performance. Option 3 obviously sucks > because it means pushing the FEC-expanded data up my cable modem. It's > not clear to me why 1 is better than 2. Maybe it's just from spreading > the CPU load. Yeah, that'd be my guess. cheers, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
