[freenet-dev] How to gather more data was Re: Beyond New Load Management: A proposal
Hi, At Sat, 03 Sep 2011 00:53:58 +0200, Arne Babenhauserheide wrote: > Am Freitag, 2. September 2011, 23:34:29 schrieb Matthew Toseland: > > > If the load balancer does not have some hidden delicacies, there is a > > > very simple check to see if my understanding is right. > > No, it takes into account that SSKs use very little bandwidth (1-2K versus > > 32K). > > Bandwidth: 75 down, 85 up. I changed the upload bandwidth to 120 out (about my real output bandwidth), and it adjusted itself to 70 down and 78 up. This is about 10-20% more than with a 90kB/s limit, but far away from using my 120 (it only uses 65%). It is less then the overshooting one, though ? load limiter is a bit too strong. I assume that due to treating CHKs and SSKs the same, the load limiting also increased the number of CHKs, though these really need the bandwidth. If we have a mean transfer time of about 45s-1min for a 32k block on my node (with OLM: failed CHK time - successful time, with NLM it is likely a bit more complex). Now the search time of about 1 minute in NLM for unsuccessful downloads (successful have 1:24 min for me) will block IO. If we assume that a successful transfer still takes 45s, almost half the time is spent searching without any transfer ? wasted bandwidth. (Disclaimer: The following is just an idea. Treat any ?should? as ?it might be a good idea to? - which is much longer and harder to read, so I go with the should). I assume that the deeper issue than just failing requests is that waiting should not be accounted as bandwidth. A requests should be accounted as 32kB * estimated success probability, so failing requests are not counted (conceptually, the implementation is another thing). The limiter should then use a time-frame of about 2? the time to complete a request and try to fill that ? Since NLM has higher wait times, it also needs a bigger bandwidth limiter window. If it can get to 3 minutes for bulk, the bandwidth limiter window needs to be 6 minutes. ? one more reason (with the assumption of a 2min limiter window), why NLM used so little bandwidth: Estimated 2 min search times with 1 min transfer time meant that essentially 2/3rd of the allocated bandwidth was overbooking, because 2/3rd of the transfers would not finish within the window, so their bandwidth reservation was carried on into the next period. This reduced the total number of running requests which in turn increased the wait times (since less requests finished in a given time period). Conclusion: NLM changes some basic assumptions about the load limiting. Because of that we need parameter tweaking to integrate it cleanly. Likely that can only be done in a real network. We already know that the network *does not break down* with NLM, so live tweaking is possible. Likely it also changes the assumptions for other parts, so these will need some tweaking, too. Toad spent years to optimize the parameters and helper parts to work well with the assumptions from OLM. It?s natural that NLM ? which has slightly different characteristics ? requires some optimization outside of its main algorithms, too. Best wishes, Arne Besides: Bad performance: If the inserter uses NLM and the downloader uses OLM, I think that might make it harder for the downloader to get the data (different routing). Worse: If some downloaders use NLM and some use OLM the packets might take different paths (NLM has shorter paths), which is likely to affect the caching negatively (needs to cache at both paths). Besides 2: Regin has the theory that the thread scheduler was the bottleneck, because his threads always hit the limit of 500 threads.
Re: [freenet-dev] How to gather more data was Re: Beyond New Load Management: A proposal
Hi, At Sat, 03 Sep 2011 00:53:58 +0200, Arne Babenhauserheide wrote: Am Freitag, 2. September 2011, 23:34:29 schrieb Matthew Toseland: If the load balancer does not have some hidden delicacies, there is a very simple check to see if my understanding is right. No, it takes into account that SSKs use very little bandwidth (1-2K versus 32K). Bandwidth: 75 down, 85 up. I changed the upload bandwidth to 120 out (about my real output bandwidth), and it adjusted itself to 70 down and 78 up. This is about 10-20% more than with a 90kB/s limit, but far away from using my 120 (it only uses 65%). It is less then the overshooting one, though → load limiter is a bit too strong. I assume that due to treating CHKs and SSKs the same, the load limiting also increased the number of CHKs, though these really need the bandwidth. If we have a mean transfer time of about 45s-1min for a 32k block on my node (with OLM: failed CHK time - successful time, with NLM it is likely a bit more complex). Now the search time of about 1 minute in NLM for unsuccessful downloads (successful have 1:24 min for me) will block IO. If we assume that a successful transfer still takes 45s, almost half the time is spent searching without any transfer → wasted bandwidth. (Disclaimer: The following is just an idea. Treat any “should” as “it might be a good idea to” - which is much longer and harder to read, so I go with the should). I assume that the deeper issue than just failing requests is that waiting should not be accounted as bandwidth. A requests should be accounted as 32kB * estimated success probability, so failing requests are not counted (conceptually, the implementation is another thing). The limiter should then use a time-frame of about 2× the time to complete a request and try to fill that → Since NLM has higher wait times, it also needs a bigger bandwidth limiter window. If it can get to 3 minutes for bulk, the bandwidth limiter window needs to be 6 minutes. … one more reason (with the assumption of a 2min limiter window), why NLM used so little bandwidth: Estimated 2 min search times with 1 min transfer time meant that essentially 2/3rd of the allocated bandwidth was overbooking, because 2/3rd of the transfers would not finish within the window, so their bandwidth reservation was carried on into the next period. This reduced the total number of running requests which in turn increased the wait times (since less requests finished in a given time period). Conclusion: NLM changes some basic assumptions about the load limiting. Because of that we need parameter tweaking to integrate it cleanly. Likely that can only be done in a real network. We already know that the network *does not break down* with NLM, so live tweaking is possible. Likely it also changes the assumptions for other parts, so these will need some tweaking, too. Toad spent years to optimize the parameters and helper parts to work well with the assumptions from OLM. It’s natural that NLM — which has slightly different characteristics — requires some optimization outside of its main algorithms, too. Best wishes, Arne Besides: Bad performance: If the inserter uses NLM and the downloader uses OLM, I think that might make it harder for the downloader to get the data (different routing). Worse: If some downloaders use NLM and some use OLM the packets might take different paths (NLM has shorter paths), which is likely to affect the caching negatively (needs to cache at both paths). Besides 2: Regin has the theory that the thread scheduler was the bottleneck, because his threads always hit the limit of 500 threads. ___ Devl mailing list Devl@freenetproject.org http://freenetproject.org/cgi-bin/mailman/listinfo/devl