Jano wrote:
> Ummm. Each branch is counted as a success? All of them must succeed? Just
> the longer one? Is this nonsense?

Not nonsense, but the choice between the various options seems to be 
fairly arbitrary. On the other hand there's only one way to define route 
length for requests, so I guess we could add a hop counter to 
ChkDataFound and SskDataFound.

> * I'm dropping messages at the tail when queues reach 50.000 messages queued
> (for search and transfer queues). I implemented this in the hope of getting
> rid of OOMs. I'm getting them anyway, so I've screwed something in the
> process.

Not necessarily - 500 peers * 2 queues * 50,000 messages could easily 
eat a gig or two of memory.

> I could only simulate up to 30 with lifo queues and this change;
> see the graph. I don't think it's correct. Have we some idea on what is the
> theoretical maximum throughput for the simulated network, as currently
> defined?

It depends on how far the data's travelling. Ignoring inserts and slow 
nodes for the moment, half the requests are for CHKs and half are for 
SSKs, so the average size of a reply is about 17 KB. The total capacity 
of the network is 1500 KB per second, and the maximum sustainable 
throughput (for FIFO at least) seems to be about 8000 in replies in 2 
hours = 11 replies per second. That would imply that replies are 
travelling an average of 1500/(11*17) = 8 hops, but that's a *very* 
rough estimate.

> * I'm counting just remote successes. If we are measuring the load balancing
> performance, I don't think the local hits are of any interest and could
> mask the remote ones.

Good point - maybe that's why our figures for backoff at high loads are 
different. (I'm also at revision 11135 by the way.) It makes sense that 
you'd see a low success rate but high throughput if requests were either 
succeeding locally or not at all.

Unfortunately this suggests that simply counting the number of successes 
(or even remote successes) isn't an adequate measure of throughput - 
being able to retrieve the nearest tenth of the keyspace in one minute 
isn't equivalent to being able to retrieve the entire keyspace in ten 
minutes...

Any suggestions for a better metric?

> * I'm not computing failures anymore since messages dropped by far exceed
> successes. Half an hour of simulation would produce near 1GB of logs, given
> the amount of msgs dropped!

We should probably replace the logging statements with a static counter. 
Bear in mind that a dropped message doesn't necessarily lead to a 
failure - under some circumstances the upstream node can move on if it 
gets a timeout, so a search can suffer several dropped messages and 
still succeed.

Cheers,
Michael

Reply via email to