[freenet-dev] How to gather more data was Re: Beyond New Load Management: A proposal

2011-09-10 Thread Arne Babenhauserheide
Hi, 
At Sat, 03 Sep 2011 00:53:58 +0200,
Arne Babenhauserheide wrote:
> Am Freitag, 2. September 2011, 23:34:29 schrieb Matthew Toseland:
> > > If the load balancer does not have some hidden delicacies, there is a
> > > very  simple check to see if my understanding is right.
> > No, it takes into account that SSKs use very little bandwidth (1-2K versus
> > 32K).
> 
> Bandwidth: 75 down, 85 up. 

I changed the upload bandwidth to 120 out (about my real output bandwidth), and 
it adjusted itself to 70 down and 78 up. This is about 10-20% more than with a 
90kB/s limit, but far away from using my 120 (it only uses 65%). It is less 
then the overshooting one, though ? load limiter is a bit too strong. 

I assume that due to treating CHKs and SSKs the same, the load limiting also 
increased the number of CHKs, though these really need the bandwidth. 

If we have a mean transfer time of about 45s-1min for a 32k block on my node 
(with OLM: failed CHK time - successful time, with NLM it is likely a bit more 
complex). 

Now the search time of about 1 minute in NLM for unsuccessful downloads 
(successful have 1:24 min for me) will block IO. If we assume that a successful 
transfer still takes 45s, almost half the time is spent searching without any 
transfer ? wasted bandwidth. 

(Disclaimer: The following is just an idea. Treat any ?should? as ?it might be 
a good idea to? - which is much longer and harder to read, so I go with the 
should). 

I assume that the deeper issue than just failing requests is that waiting 
should not be accounted as bandwidth. A requests should be accounted as 32kB * 
estimated success probability, so failing requests are not counted 
(conceptually, the implementation is another thing). The limiter should then 
use a time-frame of about 2? the time to complete a request and try to fill 
that ? Since NLM has higher wait times, it also needs a bigger bandwidth 
limiter window. If it can get to 3 minutes for bulk, the bandwidth limiter 
window needs to be 6 minutes. 

? one more reason (with the assumption of a 2min limiter window), why NLM used 
so little bandwidth: Estimated 2 min search times with 1 min transfer time 
meant that essentially 2/3rd of the allocated bandwidth was overbooking, 
because 2/3rd of the transfers would not finish within the window, so their 
bandwidth reservation was carried on into the next period. This reduced the 
total number of running requests which in turn increased the wait times (since 
less requests finished in a given time period). 

Conclusion: 


NLM changes some basic assumptions about the load
limiting. Because of that we need parameter tweaking to integrate
it cleanly. Likely that can only be done in a real network. We
already know that the network *does not break down* with NLM, so
live tweaking is possible.

Likely it also changes the assumptions for other parts, so these
will need some tweaking, too.

Toad spent years to optimize the parameters and helper parts to
work well with the assumptions from OLM. It?s natural that NLM ?
which has slightly different characteristics ? requires some
optimization outside of its main algorithms, too.


Best wishes, 
Arne

Besides: Bad performance: If the inserter uses NLM and the downloader uses OLM, 
I think that might make it harder for the downloader to get the data (different 
routing). Worse: If some downloaders use NLM and some use OLM the packets might 
take different paths (NLM has shorter paths), which is likely to affect the 
caching negatively (needs to cache at both paths). 

Besides 2: Regin has the theory that the thread scheduler was the bottleneck, 
because his threads always hit the limit of 500 threads. 



Re: [freenet-dev] How to gather more data was Re: Beyond New Load Management: A proposal

2011-09-10 Thread Arne Babenhauserheide
Hi, 
At Sat, 03 Sep 2011 00:53:58 +0200,
Arne Babenhauserheide wrote:
 Am Freitag, 2. September 2011, 23:34:29 schrieb Matthew Toseland:
   If the load balancer does not have some hidden delicacies, there is a
   very  simple check to see if my understanding is right.
  No, it takes into account that SSKs use very little bandwidth (1-2K versus
  32K).
 
 Bandwidth: 75 down, 85 up. 

I changed the upload bandwidth to 120 out (about my real output bandwidth), and 
it adjusted itself to 70 down and 78 up. This is about 10-20% more than with a 
90kB/s limit, but far away from using my 120 (it only uses 65%). It is less 
then the overshooting one, though → load limiter is a bit too strong. 

I assume that due to treating CHKs and SSKs the same, the load limiting also 
increased the number of CHKs, though these really need the bandwidth. 

If we have a mean transfer time of about 45s-1min for a 32k block on my node 
(with OLM: failed CHK time - successful time, with NLM it is likely a bit more 
complex). 

Now the search time of about 1 minute in NLM for unsuccessful downloads 
(successful have 1:24 min for me) will block IO. If we assume that a successful 
transfer still takes 45s, almost half the time is spent searching without any 
transfer → wasted bandwidth. 

(Disclaimer: The following is just an idea. Treat any “should” as “it might be 
a good idea to” - which is much longer and harder to read, so I go with the 
should). 

I assume that the deeper issue than just failing requests is that waiting 
should not be accounted as bandwidth. A requests should be accounted as 32kB * 
estimated success probability, so failing requests are not counted 
(conceptually, the implementation is another thing). The limiter should then 
use a time-frame of about 2× the time to complete a request and try to fill 
that → Since NLM has higher wait times, it also needs a bigger bandwidth 
limiter window. If it can get to 3 minutes for bulk, the bandwidth limiter 
window needs to be 6 minutes. 

… one more reason (with the assumption of a 2min limiter window), why NLM used 
so little bandwidth: Estimated 2 min search times with 1 min transfer time 
meant that essentially 2/3rd of the allocated bandwidth was overbooking, 
because 2/3rd of the transfers would not finish within the window, so their 
bandwidth reservation was carried on into the next period. This reduced the 
total number of running requests which in turn increased the wait times (since 
less requests finished in a given time period). 

Conclusion: 


NLM changes some basic assumptions about the load
limiting. Because of that we need parameter tweaking to integrate
it cleanly. Likely that can only be done in a real network. We
already know that the network *does not break down* with NLM, so
live tweaking is possible.

Likely it also changes the assumptions for other parts, so these
will need some tweaking, too.

Toad spent years to optimize the parameters and helper parts to
work well with the assumptions from OLM. It’s natural that NLM —
which has slightly different characteristics — requires some
optimization outside of its main algorithms, too.


Best wishes, 
Arne

Besides: Bad performance: If the inserter uses NLM and the downloader uses OLM, 
I think that might make it harder for the downloader to get the data (different 
routing). Worse: If some downloaders use NLM and some use OLM the packets might 
take different paths (NLM has shorter paths), which is likely to affect the 
caching negatively (needs to cache at both paths). 

Besides 2: Regin has the theory that the thread scheduler was the bottleneck, 
because his threads always hit the limit of 500 threads. 
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl