Jano wrote: > Ummm. Each branch is counted as a success? All of them must succeed? Just > the longer one? Is this nonsense?
Not nonsense, but the choice between the various options seems to be fairly arbitrary. On the other hand there's only one way to define route length for requests, so I guess we could add a hop counter to ChkDataFound and SskDataFound. > * I'm dropping messages at the tail when queues reach 50.000 messages queued > (for search and transfer queues). I implemented this in the hope of getting > rid of OOMs. I'm getting them anyway, so I've screwed something in the > process. Not necessarily - 500 peers * 2 queues * 50,000 messages could easily eat a gig or two of memory. > I could only simulate up to 30 with lifo queues and this change; > see the graph. I don't think it's correct. Have we some idea on what is the > theoretical maximum throughput for the simulated network, as currently > defined? It depends on how far the data's travelling. Ignoring inserts and slow nodes for the moment, half the requests are for CHKs and half are for SSKs, so the average size of a reply is about 17 KB. The total capacity of the network is 1500 KB per second, and the maximum sustainable throughput (for FIFO at least) seems to be about 8000 in replies in 2 hours = 11 replies per second. That would imply that replies are travelling an average of 1500/(11*17) = 8 hops, but that's a *very* rough estimate. > * I'm counting just remote successes. If we are measuring the load balancing > performance, I don't think the local hits are of any interest and could > mask the remote ones. Good point - maybe that's why our figures for backoff at high loads are different. (I'm also at revision 11135 by the way.) It makes sense that you'd see a low success rate but high throughput if requests were either succeeding locally or not at all. Unfortunately this suggests that simply counting the number of successes (or even remote successes) isn't an adequate measure of throughput - being able to retrieve the nearest tenth of the keyspace in one minute isn't equivalent to being able to retrieve the entire keyspace in ten minutes... Any suggestions for a better metric? > * I'm not computing failures anymore since messages dropped by far exceed > successes. Half an hour of simulation would produce near 1GB of logs, given > the amount of msgs dropped! We should probably replace the logging statements with a static counter. Bear in mind that a dropped message doesn't necessarily lead to a failure - under some circumstances the upstream node can move on if it gets a timeout, so a search can suffer several dropped messages and still succeed. Cheers, Michael