[freenet-dev] Beyond New Load Management: A proposal
Am Dienstag, 30. August 2011, 01:08:16 schrieb Arne Babenhauserheide: > 5) solution: count each SSK as only > average_SSK_success_rate * data_to_transfer_on_success. Some more data: chances of having at least this many successful transfers for 40 SSKs with a mean success rate of 16%: for i in {0..16}; do echo $i $(./spielfaehig.py 0.16 40 $i); done 0 1.0 1 0.999064224991 2 0.99193451064 3 0.965452714478 4 0.901560126912 5 0.788987472629 6 0.634602118184 7 0.463062835467 8 0.304359825607 9 0.179664603573 10 0.0952149293922 11 0.0453494074947 12 0.0194452402752 13 0.00752109980912 14 0.0026291447461 15 0.000832100029072 16 0.00023879002726 what this means: if a SSK has a mean success rate of 0.16, then using 0.25 as value makes sure that 95% of the possible cases don?t exhaust the bandwidth. We then use only 64% of the bandwidth on average, though. With 0.2, we?d get 68% of the possible distributions safe and use 80% of bandwidth on average. Note: this is just a binomial spread: from math import factorial fac = factorial def n?k(n, k): if k > n: return 0 return fac(n) / (fac(k)*fac(n-k)) def binom(p, n, k): return n?k(n, k) * p** k * (1-p)**(n-k) def spielf?hig(p, n, min_spieler): return sum([binom(p, n, k) for k in range(min_spieler, n+1)]) ? USK at 6~ZDYdvAgMoUfG6M5Kwi7SQqyS- gTcyFeaNN1Pf3FvY,OSOT4OEeg4xyYnwcGECZUX6~lnmYrZsz05Km7G7bvOQ,AQACAAE/bab/9/Content- D426DC7.html Best wishes, Arne -- next part -- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 316 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110830/d131ec81/attachment.pgp>
[freenet-dev] Queueing doesn't use any bandwidth was Re: Beyond New Load Management
Am Dienstag, 30. August 2011, 12:32:17 schrieb Ian Clarke: > Regardless, even if queueing doesn't use additional bandwidth or CPU > resources, it also doesn't use any less of these resources - so it doesn't > actually help to alleviate any load (unless it results in a timeout in which > case it uses more of everything). Queueing reduces the total bandwidth needed to transfer a given chunk, because it gives the requests the leeway they need to be able to choose the best route. This results in shorter routes. Actually it is a very simple system which is used in any train station: You wait before you get in instead of just choosing another train and trying to find a different way. And the fewer contacts we have, the more important it gets to choose the right path. Also the increase in latency should be in the range of 20% for CHK requests and SSKs which succeed. Only unsuccessful requests should have a much higher latency than with OLM, because they don?t benefit from the faster transfers (shorter routes). Best wishes, Arne -- next part -- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 316 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110830/82b7532f/attachment.pgp>
[freenet-dev] Queueing doesn't use any bandwidth was Re: Beyond New Load Management
On Mon, Aug 29, 2011 at 6:42 PM, Matthew Toseland wrote: > On Monday 29 Aug 2011 18:58:26 Ian Clarke wrote: > > On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland < > > Right, the same is true of queueing. If nodes are forced to do things to > > deal with overloading that make the problem worse then the load balancing > > algorithm has failed. Its job is to prevent that from happening. > > Not true. Queueing does not make anything worse (for bulk requests where we > are not latency sensitive). **When a request is waiting for progress on a > queue, it is not using any bandwidth!** > I thought there was some issue where outstanding requests occupied "slots" or something? Regardless, even if queueing doesn't use additional bandwidth or CPU resources, it also doesn't use any less of these resources - so it doesn't actually help to alleviate any load (unless it results in a timeout in which case it uses more of everything). And it does use more of one very important resource, which is the initial requestor's time. I mean, ultimately the symptom of overloading is that requests take longer, and queueing makes that problem worse. Queueing should be a last resort, the *right* load balancing algorithm should avoid situations where queueing must occur. Ian. -- Ian Clarke Founder, The Freenet Project Email: ian at freenetproject.org -- next part -- An HTML attachment was scrubbed... URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110830/9afc084e/attachment.html>
[freenet-dev] Beyond New Load Management
On Tue, Aug 30, 2011 at 11:49 AM, Robert Hailey wrote: > On 2011/08/29 (Aug), at 12:58 PM, Ian Clarke wrote: > > The problem is that we come up with solutions that are too complicated to > analyze or fix when they don't work > > The cause is complexity, which just grows and grows as we try to fix > problems we don't understand by layering on more complexity. > > > And what is the cause of that? Is the problem really one of *behavior*? > emotions? workflow? organization? > A combination, but at its core I think its a failure to recognize that most working systems are built on principles that are simple enough to understand that they are either "obviously" correct, or can easily be proven to be correct. For example, AIMD, which is TCP's load management system, is simple enough that you can describe the basic idea in a sentence. It is also fairly obvious that it will work, or at least its easy to simulate it and convince yourself that it will work. In contrast, Freenet's current load management system is a combination of mechanisms that are perhaps individually of comparable complexity to AIMD (in fact, AIMD is part of it although used in quite a different context to how it is used in TCP), but together they are far more complex, and interact in ways that are hard to predict. For example, in a conversation with Matthew a few days ago, he realized that the "fair sharing" mechanism which tries to allocate resources fairly among "downstream" nodes, was probably generating a bunch of rejected messages when it transitions from one mode of operation to another, with all kinds of negative consequences. Who knows how many other unintended interactions there are? > Matthew has presented some very real problems which he is trying to work > around (with much frustration, I'm sure). I think he needs more leverage > My criticism is that Matthew's proposals for "fixing" the problem follow exactly the same pattern of all the previous "fixes" that turned out to have either no effect, or a negative effect. It starts out with a bunch of hand-wavey hypotheses about what the problem might be, which is followed by a bunch of fixes for what might be the problem. There is no evidence that these problems are real, I think a big part of the reason is that the existing system is too complicated to debug. I know it sounds like I'm beating up on Matthew here, and that isn't my intention, he is following a precedent set by me and others that have long since left the project. Having (I hope) learned some tough lessons, and recognized the folly in our past approach, I'm now hoping its not too late to rescue the project from an ignominious death. I think we're talking about weeks of work here, not months, and frankly I don't think we've got a choice if the project is to survive. Freenet is no use to anyone if it doesn't work, regardless of how cool our goals are, or how clever our algorithms and heuristics. > If you read my suggestion below, we can discuss how it would allow: > > With an investment of developer time, we could separate the current freenet > code into three interfaced sections (link-layer, routing-layer, > user/client-interface-layer). > > If we then were to modify the outer layers to accept two routing-layers > (e.g. client requests round-robin between the two but thereafter stay in > that network) we could have "two networks in one" a stable-net (for the > nay-sayers, a disaster/fallback, and as a control for measurement), and a > development-net where experimentation could take place. > > Drawing the interface lines on theory (rather than present code-state) > would be critical [e.g. load-balancing should be in the middle layer, imo]. > The goal being, reliable communication with near-guaranteed/methodical > improvement > > While I'm sure there is much room for improvement in the way the code is architected, and the separation of concerns - I don't think refactoring is the answer. We need to refactor the fundamental way that we solve problems like load balancing. Ian. -- Ian Clarke Founder, The Freenet Project Email: ian at freenetproject.org -- next part -- An HTML attachment was scrubbed... URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110830/42369abb/attachment.html>
[freenet-dev] Beyond New Load Management
On 2011/08/30 (Aug), at 3:17 AM, Thomas Bruderer wrote: > Thank you Ian! Good message! I am 100% behind your whole post! The > routing must go back to a simple system! Even tearing up the current systems for a fifo queue is destructive without organization and a means of comparison. In the science of software, there are generally two kinds of commits: * Features - divergent, potentially disruptive, tend to contain bugs * Bugfixes - convergent, generally repairing feature disruption, rarely unmask other bugs On 2011/08/29 (Aug), at 12:58 PM, Ian Clarke wrote: > The problem is that we come up with solutions that are too > complicated to analyze or fix when they don't work > The cause is complexity, which just grows and grows as we try to fix > problems we don't understand by layering on more complexity. And what is the cause of that? Is the problem really one of behavior? emotions? workflow? organization? Matthew has presented some very real problems which he is trying to work around (with much frustration, I'm sure). I think he needs more leverage If you read my suggestion below, we can discuss how it would allow: * a full-network-sized network to try out new features & bugfixes * nearly-identical traffic on both networks * zero traffic redundancy (network waste) * easy 1:1 comparison of various performances of two implementations (for measuring network effectiveness) * full network uptime (at worst a totally-broken / 0% unstable-side would yield a 50% reduction in effectiveness) It is simply a matter of *organization* and *multi-versioning* (which IMO, are both solved problems). This is also tied into the subject of 'mandatory' node update deadline, as the utility of a split network would diminish if it's stable-side succumbs to the same 'chase' issues. -- Robert Hailey On 2011/08/26 (Aug), at 10:18 AM, Robert Hailey wrote: > > On 2011/08/25 (Aug), at 2:15 PM, Matthew Toseland wrote: > >> And we never, ever, ever, have enough data to evaluate a single >> build, even on the simplest metrics (see the push-pull tests). I >> could write a plugin to get more data, but digger3 promises to do >> it eventually and anyway I don't have time given the remaining >> funding and unlikeliness of getting more. And it's always been this >> way! >> >> Our whole business model forces me to just do things and not >> evaluate them! > > I think we had an idea for empirical stepwise advancement earlier. > > With an investment of developer time, we could separate the current > freenet code into three interfaced sections (link-layer, routing- > layer, user/client-interface-layer). > > If we then were to modify the outer layers to accept two routing- > layers (e.g. client requests round-robin between the two but > thereafter stay in that network) we could have "two networks in one" > a stable-net (for the nay-sayers, a disaster/fallback, and as a > control for measurement), and a development-net where > experimentation could take place. > > Drawing the interface lines on theory (rather than present code- > state) would be critical [e.g. load-balancing should be in the > middle layer, imo]. The goal being, reliable communication with near- > guaranteed/methodical improvement. > > -- > Robert Hailey > > ___ > Devl mailing list > Devl at freenetproject.org > http://freenetproject.org/cgi-bin/mailman/listinfo/devl -- next part -- An HTML attachment was scrubbed... URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110830/b3f0eedb/attachment.html>
[freenet-dev] Beyond New Load Management
> The entire approach of coming up with hypotheses about what is wrong, > building a solution based on these hypotheses (without actually > confirming that the hypotheses are accurate) and deploying it is deja > vu, we've been doing it for a decade, and we still haven't got load > management right. We're just layering more complexity onto a system > that we already don't understand, based on guesses as to what the > problems were with the previous iteration that we can't test because > the system is too complicated with too many interactions for anyone to > get their heads around it. > Thank you Ian! Good message! I am 100% behind your whole post! The routing must go back to a simple system!
[freenet-dev] Beyond New Load Management: A proposal
Von: "Matthew Toseland" ??? >But the other question is, can queueing ever be helpful? It can if it allows >us to route more accurately (which NLM clearly does), and/or to run enough >requests in parallel that the longer time taken for the request to reach its >destination is offset. Is this condition met? Experience with the deployed NLM showed that even in the fully congested case it had success rates of 60% for HTL 18,17 and 16, compared to less than 40% for OLM. This means that the requests are sent over fewer hops on average, because find the content fewer hops away from the requester. A download of 1MiB which is sent over 2 hops needs 2 MiB in total network bandwidth. If it is sent over only 1.5 hops on average, then it needs only 1.5 MiB total network bandwidth. So essentially NLM can distribute 30% more content with the same network resources?. And these numbers are actual observations. The only reason why this did not result in increased performance is that the nodes used less than 50% of their allocated bandwidth? - which is a problem with the bandwidth scheduler and not with queueing. Best wishes, Arne ?: The relevant network resource is upload bandwidth. ?: Source: observations from me and two other freenet users. PS: How exactly the bandwidth limiter is fixed is an implementation detail. I think you are actually the only person who can judge how to do this most efficiently. ___ Schon geh?rt? WEB.DE hat einen genialen Phishing-Filter in die Toolbar eingebaut! http://produkte.web.de/go/toolbar -- next part -- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1743 bytes Desc: S/MIME Cryptographic Signature URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110830/02b4e75f/attachment.bin>
[freenet-dev] Beyond New Load Management: A proposal
oblem with queueing; > > the alternative is to allow a larger window between when we start > > complaining and when stuff breaks i.e. use less % of the total capacity > > in NLM: q(SSK) ~ q(CHK) ~ ?(CHK), ?(CHK) lower due to better > > routes? which might be faster in practice > > ?(CHK) depends on the length of the route. with 25% better > > success rates per hop, it should be much lower > > ?need NLM stats? do you have some handy? > > let?s estimate 60%/50%/50%/50% for HTL 18/17/16/15 > > and I currentlo have 45%/50/25%/25% with OLM > > starting with 1000 requests, in NLM 600 have 1 hop, 200 have 2 > > hops, 100 3 and 50 4, 50 have more ? irrelevant. > > -*- toad_ not following > > in OLM 450 have 1 hop, 275 have 2 hops, 69 3 and 51 4, 150 > > have more I?m trying to estimate the hops a transfer has to > > take we can?t ignore the 150 with more than 4 hops in OLM > > I?ll just go down to 50, too > > what are you trying to compute? > > ian is convinced that queueing always makes the underlying > > problem worse > > i'm inclined to agree with him unless you come up with a > > persuasive theoretical argument > > 120 have 5, 96 have 6, 77 have 7, 61 have 8, 50 have more > > so a 95% of the transfers in OLM take on average ? > > gah? need to divide the numbers, too > > (I need to generate data to make an argument - that?s what I?m > > doing right now) > > average hops for OLM: 450*1 + 275*2 + 69*3 + 51*4 + [now with > > correction] 150*0.22*5+120*0.2*6+96*0.2*7+77*0.2*8+61*0.2*9 > > ? 2087.4 > > for NLM 95% of 1000 transfers need 600*1+200*2+100*3+50*4 > > = 1500 hops together > > that?s 2.09 hops per transfer for OLM and 1.5 hops for NLM ? > > ?_nlm / ?_olm ~ 0.71 > > ArneBab: okay, that's plausible > > ArneBab: however, it should be possible with smart load limiting > > on the originator to achieve NLM-level success rates > > but not the resilience > > it still keeps freenet open to a DoS, NLM should help there. > > now back to the queueing: OLM had: ?q?(SSK) ~ 16s, ?q?(CHK) ~ > > 18s, ?(CHK) ~ 45s (my stats) > > possibly - fair sharing limits our vulnerability to a DoS, > > possibly enough as long as we don't have to worry about incentives > > issues that?s about: q = ? ? ? (OLM) > > NLM: q ~ ? > > NLM: q ~ ? (NLM) > > time: 2?q + ? > > OLM: time ~ 5/3 ?_olm > > NLM: time = 3 ? 0.72 ?_olm = 2.15 ?_olm > > toad_: Alright, it's alive. https://github.com/freenet/fred- > > staging/pull/55 > > ? time_nlm / time_olm ~ 2.15 / (5/3) ~ 1.3 > > so the time to transfer should be a bit longer > > (not yet finished: this is the current state) > > now, if we decrease the timeout time, the chance that a given > > timeout happens in the first 4 hops should be about 4/20 = 0.2 > > ?cut that? > > if we decrease the timeout time below the transfer time per > > hop, there should be more misrouting ? ? goes up, q might go down or up > > ? cut that. transfer time per hop in OLM ~ 45s / hops_olm = > > 45s/2.09 = 21.5s ?actually, the time in NLM is so dependant > > on transfer time, that the most efficient stratigy would likely be to > > decrease the block size? or to get a faster network > > toad_: got it, damnit: NLM is so much slower than OLM, because > > it used less bandwidth! > > the time is a function of the raw bandwidth (not so with OLM), > > and NLM used only half my bandwidth after it had been deployed for 2 > > days (at the start much more) > > when we double the bandwidth (1.8 years?), NLM should be > > faster than OLM > > operhiem1: cool! > > toad_: actually I think the slot number calculation is flawed > > ? less bandwith used than possible > > that?s why it did not break down, but slowed down to 1/5 OLM. > > From the math here I?d have guessed 1/2.6 > > but adding SSKs with many more hops and time almost pure queue > > time it fits > > q_nlm ~ 3??q?_olm; in the full bandwidth case > > but with half bandwidth we actually are at 6?q_olm > > ? more slots should actually make it much better > > toad_: summary: ? ~ bandwidth. q_olm ~ 16s, q_nlm ~ ?! ? using > > only 50% of bandwidth (too little slots) massively slows down NLM. > > the transfer times should actually be dominant > > though they are lower than the queue time. > > and freenet should get faster with faster network or lower > > chunk sizes. > > toad_: so first step: make sure all bandwidth gets used - > > maybe by allocating more slots till about 2? the current number are > > transferring -*- ArneBab is happy > > cool, lot's of stuff to read tomorrow morning. :) > > NLM should with the current network be slower than OLM by 23%. > > But in 18 month it should actually be faster by ~8%, given Moores Law > > holds for upload bandwidth. > > :) > > with faster I mean time to complete a request. > > reaction time ? latency > > digger3: maybe you can doublecheck the reasoning -- A man in the streets faces a knife. Two policemen are there it once. They raise a sign: ?Illegal Scene! Noone may watch this!? The man gets robbed and stabbed and bleeds to death. The police had to hold the sign. ?Welcome to Europe, citizen. Censorship is beautiful. ( http://draketo.de/stichwort/censorship ) -- next part -- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 316 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110830/cfc43728/attachment.pgp>
[freenet-dev] Queueing doesn't use any bandwidth was Re: Beyond New Load Management
On Monday 29 Aug 2011 18:58:26 Ian Clarke wrote: > On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland < > toad at amphibian.dyndns.org> wrote: > > Misrouting is unacceptable, in general. Extremely overloaded or extremely > > low capacity nodes may be routed around. We might even allow some bounded > > amount of misrouting in the more general case (e.g. go to either of the top > > two peers for the key). But in general, transforming load into misrouting > > (or into reduced HTL, or any other bogus escape valve) is a bad idea. We > > need to reduce the incoming load. > > Right, the same is true of queueing. If nodes are forced to do things to > deal with overloading that make the problem worse then the load balancing > algorithm has failed. Its job is to prevent that from happening. Not true. Queueing does not make anything worse (for bulk requests where we are not latency sensitive). **When a request is waiting for progress on a queue, it is not using any bandwidth!** -- next part -- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110830/88fda31d/attachment.pgp>
[freenet-dev] Beyond New Load Management: A proposal
e is to allow a larger window between when we start > > > > complaining and when stuff breaks i.e. use less % of the total capacity > > > > in NLM: q(SSK) ~ q(CHK) ~ ?(CHK), ?(CHK) lower due to better > > > > routes? which might be faster in practice > > > > ?(CHK) depends on the length of the route. with 25% better > > > > success rates per hop, it should be much lower > > > > ?need NLM stats? do you have some handy? > > > > let?s estimate 60%/50%/50%/50% for HTL 18/17/16/15 > > > > and I currentlo have 45%/50/25%/25% with OLM > > > > starting with 1000 requests, in NLM 600 have 1 hop, 200 have 2 > > > > hops, 100 3 and 50 4, 50 have more ? irrelevant. > > > > -*- toad_ not following > > > > in OLM 450 have 1 hop, 275 have 2 hops, 69 3 and 51 4, 150 > > > > have more I?m trying to estimate the hops a transfer has to > > > > take we can?t ignore the 150 with more than 4 hops in OLM > > > > I?ll just go down to 50, too > > > > what are you trying to compute? > > > > ian is convinced that queueing always makes the underlying > > > > problem worse > > > > i'm inclined to agree with him unless you come up with a > > > > persuasive theoretical argument > > > > 120 have 5, 96 have 6, 77 have 7, 61 have 8, 50 have more > > > > so a 95% of the transfers in OLM take on average ? > > > > gah? need to divide the numbers, too > > > > (I need to generate data to make an argument - that?s what I?m > > > > doing right now) > > > > average hops for OLM: 450*1 + 275*2 + 69*3 + 51*4 + [now with > > > > correction] 150*0.22*5+120*0.2*6+96*0.2*7+77*0.2*8+61*0.2*9 > > > > ? 2087.4 > > > > for NLM 95% of 1000 transfers need 600*1+200*2+100*3+50*4 > > > > = 1500 hops together > > > > that?s 2.09 hops per transfer for OLM and 1.5 hops for NLM ? > > > > ?_nlm / ?_olm ~ 0.71 > > > > ArneBab: okay, that's plausible > > > > ArneBab: however, it should be possible with smart load limiting > > > > on the originator to achieve NLM-level success rates > > > > but not the resilience > > > > it still keeps freenet open to a DoS, NLM should help there. > > > > now back to the queueing: OLM had: ?q?(SSK) ~ 16s, ?q?(CHK) ~ > > > > 18s, ?(CHK) ~ 45s (my stats) > > > > possibly - fair sharing limits our vulnerability to a DoS, > > > > possibly enough as long as we don't have to worry about incentives > > > > issues that?s about: q = ? ? ? (OLM) > > > > NLM: q ~ ? > > > > NLM: q ~ ? (NLM) > > > > time: 2?q + ? > > > > OLM: time ~ 5/3 ?_olm > > > > NLM: time = 3 ? 0.72 ?_olm = 2.15 ?_olm > > > > toad_: Alright, it's alive. https://github.com/freenet/fred- > > > > staging/pull/55 > > > > ? time_nlm / time_olm ~ 2.15 / (5/3) ~ 1.3 > > > > so the time to transfer should be a bit longer > > > > (not yet finished: this is the current state) > > > > now, if we decrease the timeout time, the chance that a given > > > > timeout happens in the first 4 hops should be about 4/20 = 0.2 > > > > ?cut that? > > > > if we decrease the timeout time below the transfer time per > > > > hop, there should be more misrouting ? ? goes up, q might go down or up > > > > ? cut that. transfer time per hop in OLM ~ 45s / hops_olm = > > > > 45s/2.09 = 21.5s ?actually, the time in NLM is so dependant > > > > on transfer time, that the most efficient stratigy would likely be to > > > > decrease the block size? or to get a faster network > > > > toad_: got it, damnit: NLM is so much slower than OLM, because > > > > it used less bandwidth! > > > > the time is a function of the raw bandwidth (not so with OLM), > > > > and NLM used only half my bandwidth after it had been deployed for 2 > > > > days (at the start much more) > > > > when we double the bandwidth (1.8 years?), NLM should be > > > > faster than OLM > > > > operhiem1: cool! > > > > toad_: actually I think the slot number calculation is flawed > > > > ? less bandwith used than possible > > > > that?s why it did not break down, but slowed down to 1/5 OLM. > > > > From the math here I?d have guessed 1/2.6 > > > > but adding SSKs with many more hops and time almost pure queue > > > > time it fits > > > > q_nlm ~ 3??q?_olm; in the full bandwidth case > > > > but with half bandwidth we actually are at 6?q_olm > > > > ? more slots should actually make it much better > > > > toad_: summary: ? ~ bandwidth. q_olm ~ 16s, q_nlm ~ ?! ? using > > > > only 50% of bandwidth (too little slots) massively slows down NLM. > > > > the transfer times should actually be dominant > > > > though they are lower than the queue time. > > > > and freenet should get faster with faster network or lower > > > > chunk sizes. > > > > toad_: so first step: make sure all bandwidth gets used - > > > > maybe by allocating more slots till about 2? the current number are > > > > transferring -*- ArneBab is happy > > > > cool, lot's of stuff to read tomorrow morning. :) > > > > NLM should with the current network be slower than OLM by 23%. > > > > But in 18 month it should actually be faster by ~8%, given Moores Law > > > > holds for upload bandwidth. > > > > :) > > > > with faster I mean time to complete a request. > > > > reaction time ? latency > > > > digger3: maybe you can doublecheck the reasoning > > -- > > A man in the streets faces a knife. > > Two policemen are there it once. They raise a sign: > > > > ?Illegal Scene! Noone may watch this!? > > > > The man gets robbed and stabbed and bleeds to death. > > The police had to hold the sign. > > > > ?Welcome to Europe, citizen. Censorship is beautiful. > > > >( http://draketo.de/stichwort/censorship ) -- next part -- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110830/c5c14ad2/attachment.pgp>
[freenet-dev] Beyond New Load Management: A proposal
ime = 3 ? 0.72 ?_olm = 2.15 ?_olm > > > toad_: Alright, it's alive. https://github.com/freenet/fred- > > > staging/pull/55 > > > ? time_nlm / time_olm ~ 2.15 / (5/3) ~ 1.3 > > > so the time to transfer should be a bit longer > > > (not yet finished: this is the current state) > > > now, if we decrease the timeout time, the chance that a given > > > timeout happens in the first 4 hops should be about 4/20 = 0.2 > > > ?cut that? > > > if we decrease the timeout time below the transfer time per > > > hop, there should be more misrouting ? ? goes up, q might go down or up > > > ? cut that. transfer time per hop in OLM ~ 45s / hops_olm = > > > 45s/2.09 = 21.5s ?actually, the time in NLM is so dependant > > > on transfer time, that the most efficient stratigy would likely be to > > > decrease the block size? or to get a faster network > > > toad_: got it, damnit: NLM is so much slower than OLM, because > > > it used less bandwidth! > > > the time is a function of the raw bandwidth (not so with OLM), > > > and NLM used only half my bandwidth after it had been deployed for 2 > > > days (at the start much more) > > > when we double the bandwidth (1.8 years?), NLM should be > > > faster than OLM > > > operhiem1: cool! > > > toad_: actually I think the slot number calculation is flawed > > > ? less bandwith used than possible > > > that?s why it did not break down, but slowed down to 1/5 OLM. > > > From the math here I?d have guessed 1/2.6 > > > but adding SSKs with many more hops and time almost pure queue > > > time it fits > > > q_nlm ~ 3??q?_olm; in the full bandwidth case > > > but with half bandwidth we actually are at 6?q_olm > > > ? more slots should actually make it much better > > > toad_: summary: ? ~ bandwidth. q_olm ~ 16s, q_nlm ~ ?! ? using > > > only 50% of bandwidth (too little slots) massively slows down NLM. > > > the transfer times should actually be dominant > > > though they are lower than the queue time. > > > and freenet should get faster with faster network or lower > > > chunk sizes. > > > toad_: so first step: make sure all bandwidth gets used - > > > maybe by allocating more slots till about 2? the current number are > > > transferring -*- ArneBab is happy > > > cool, lot's of stuff to read tomorrow morning. :) > > > NLM should with the current network be slower than OLM by 23%. > > > But in 18 month it should actually be faster by ~8%, given Moores Law > > > holds for upload bandwidth. > > > :) > > > with faster I mean time to complete a request. > > > reaction time ? latency > > > digger3: maybe you can doublecheck the reasoning > -- > A man in the streets faces a knife. > Two policemen are there it once. They raise a sign: > > ?Illegal Scene! Noone may watch this!? > > The man gets robbed and stabbed and bleeds to death. > The police had to hold the sign. > > ?Welcome to Europe, citizen. Censorship is beautiful. > >( http://draketo.de/stichwort/censorship ) > > > -- next part -- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110830/ce2bd38b/attachment.pgp>
Re: [freenet-dev] Beyond New Load Management
The entire approach of coming up with hypotheses about what is wrong, building a solution based on these hypotheses (without actually confirming that the hypotheses are accurate) and deploying it is deja vu, we've been doing it for a decade, and we still haven't got load management right. We're just layering more complexity onto a system that we already don't understand, based on guesses as to what the problems were with the previous iteration that we can't test because the system is too complicated with too many interactions for anyone to get their heads around it. Thank you Ian! Good message! I am 100% behind your whole post! The routing must go back to a simple system! ___ Devl mailing list Devl@freenetproject.org http://freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] Beyond New Load Management
On 2011/08/30 (Aug), at 3:17 AM, Thomas Bruderer wrote: Thank you Ian! Good message! I am 100% behind your whole post! The routing must go back to a simple system! Even tearing up the current systems for a fifo queue is destructive without organization and a means of comparison. In the science of software, there are generally two kinds of commits: * Features - divergent, potentially disruptive, tend to contain bugs * Bugfixes - convergent, generally repairing feature disruption, rarely unmask other bugs On 2011/08/29 (Aug), at 12:58 PM, Ian Clarke wrote: The problem is that we come up with solutions that are too complicated to analyze or fix when they don't work The cause is complexity, which just grows and grows as we try to fix problems we don't understand by layering on more complexity. And what is the cause of that? Is the problem really one of behavior? emotions? workflow? organization? Matthew has presented some very real problems which he is trying to work around (with much frustration, I'm sure). I think he needs more leverage If you read my suggestion below, we can discuss how it would allow: * a full-network-sized network to try out new features bugfixes * nearly-identical traffic on both networks * zero traffic redundancy (network waste) * easy 1:1 comparison of various performances of two implementations (for measuring network effectiveness) * full network uptime (at worst a totally-broken / 0% unstable-side would yield a 50% reduction in effectiveness) It is simply a matter of *organization* and *multi-versioning* (which IMO, are both solved problems). This is also tied into the subject of 'mandatory' node update deadline, as the utility of a split network would diminish if it's stable-side succumbs to the same 'chase' issues. -- Robert Hailey On 2011/08/26 (Aug), at 10:18 AM, Robert Hailey wrote: On 2011/08/25 (Aug), at 2:15 PM, Matthew Toseland wrote: And we never, ever, ever, have enough data to evaluate a single build, even on the simplest metrics (see the push-pull tests). I could write a plugin to get more data, but digger3 promises to do it eventually and anyway I don't have time given the remaining funding and unlikeliness of getting more. And it's always been this way! Our whole business model forces me to just do things and not evaluate them! I think we had an idea for empirical stepwise advancement earlier. With an investment of developer time, we could separate the current freenet code into three interfaced sections (link-layer, routing- layer, user/client-interface-layer). If we then were to modify the outer layers to accept two routing- layers (e.g. client requests round-robin between the two but thereafter stay in that network) we could have two networks in one a stable-net (for the nay-sayers, a disaster/fallback, and as a control for measurement), and a development-net where experimentation could take place. Drawing the interface lines on theory (rather than present code- state) would be critical [e.g. load-balancing should be in the middle layer, imo]. The goal being, reliable communication with near- guaranteed/methodical improvement. -- Robert Hailey ___ Devl mailing list Devl@freenetproject.org http://freenetproject.org/cgi-bin/mailman/listinfo/devl ___ Devl mailing list Devl@freenetproject.org http://freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] Beyond New Load Management
On Tue, Aug 30, 2011 at 11:49 AM, Robert Hailey rob...@freenetproject.orgwrote: On 2011/08/29 (Aug), at 12:58 PM, Ian Clarke wrote: The problem is that we come up with solutions that are too complicated to analyze or fix when they don't work The cause is complexity, which just grows and grows as we try to fix problems we don't understand by layering on more complexity. And what is the cause of that? Is the problem really one of *behavior*? emotions? workflow? organization? A combination, but at its core I think its a failure to recognize that most working systems are built on principles that are simple enough to understand that they are either obviously correct, or can easily be proven to be correct. For example, AIMD, which is TCP's load management system, is simple enough that you can describe the basic idea in a sentence. It is also fairly obvious that it will work, or at least its easy to simulate it and convince yourself that it will work. In contrast, Freenet's current load management system is a combination of mechanisms that are perhaps individually of comparable complexity to AIMD (in fact, AIMD is part of it although used in quite a different context to how it is used in TCP), but together they are far more complex, and interact in ways that are hard to predict. For example, in a conversation with Matthew a few days ago, he realized that the fair sharing mechanism which tries to allocate resources fairly among downstream nodes, was probably generating a bunch of rejected messages when it transitions from one mode of operation to another, with all kinds of negative consequences. Who knows how many other unintended interactions there are? Matthew has presented some very real problems which he is trying to work around (with much frustration, I'm sure). I think he needs more leverage My criticism is that Matthew's proposals for fixing the problem follow exactly the same pattern of all the previous fixes that turned out to have either no effect, or a negative effect. It starts out with a bunch of hand-wavey hypotheses about what the problem might be, which is followed by a bunch of fixes for what might be the problem. There is no evidence that these problems are real, I think a big part of the reason is that the existing system is too complicated to debug. I know it sounds like I'm beating up on Matthew here, and that isn't my intention, he is following a precedent set by me and others that have long since left the project. Having (I hope) learned some tough lessons, and recognized the folly in our past approach, I'm now hoping its not too late to rescue the project from an ignominious death. I think we're talking about weeks of work here, not months, and frankly I don't think we've got a choice if the project is to survive. Freenet is no use to anyone if it doesn't work, regardless of how cool our goals are, or how clever our algorithms and heuristics. If you read my suggestion below, we can discuss how it would allow: With an investment of developer time, we could separate the current freenet code into three interfaced sections (link-layer, routing-layer, user/client-interface-layer). If we then were to modify the outer layers to accept two routing-layers (e.g. client requests round-robin between the two but thereafter stay in that network) we could have two networks in one a stable-net (for the nay-sayers, a disaster/fallback, and as a control for measurement), and a development-net where experimentation could take place. Drawing the interface lines on theory (rather than present code-state) would be critical [e.g. load-balancing should be in the middle layer, imo]. The goal being, reliable communication with near-guaranteed/methodical improvement While I'm sure there is much room for improvement in the way the code is architected, and the separation of concerns - I don't think refactoring is the answer. We need to refactor the fundamental way that we solve problems like load balancing. Ian. -- Ian Clarke Founder, The Freenet Project Email: i...@freenetproject.org ___ Devl mailing list Devl@freenetproject.org http://freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] Queueing doesn't use any bandwidth was Re: Beyond New Load Management
On Mon, Aug 29, 2011 at 6:42 PM, Matthew Toseland t...@amphibian.dyndns.org wrote: On Monday 29 Aug 2011 18:58:26 Ian Clarke wrote: On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland Right, the same is true of queueing. If nodes are forced to do things to deal with overloading that make the problem worse then the load balancing algorithm has failed. Its job is to prevent that from happening. Not true. Queueing does not make anything worse (for bulk requests where we are not latency sensitive). **When a request is waiting for progress on a queue, it is not using any bandwidth!** I thought there was some issue where outstanding requests occupied slots or something? Regardless, even if queueing doesn't use additional bandwidth or CPU resources, it also doesn't use any less of these resources - so it doesn't actually help to alleviate any load (unless it results in a timeout in which case it uses more of everything). And it does use more of one very important resource, which is the initial requestor's time. I mean, ultimately the symptom of overloading is that requests take longer, and queueing makes that problem worse. Queueing should be a last resort, the *right* load balancing algorithm should avoid situations where queueing must occur. Ian. -- Ian Clarke Founder, The Freenet Project Email: i...@freenetproject.org ___ Devl mailing list Devl@freenetproject.org http://freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] Queueing doesn't use any bandwidth was Re: Beyond New Load Management
Am Dienstag, 30. August 2011, 12:32:17 schrieb Ian Clarke: Regardless, even if queueing doesn't use additional bandwidth or CPU resources, it also doesn't use any less of these resources - so it doesn't actually help to alleviate any load (unless it results in a timeout in which case it uses more of everything). Queueing reduces the total bandwidth needed to transfer a given chunk, because it gives the requests the leeway they need to be able to choose the best route. This results in shorter routes. Actually it is a very simple system which is used in any train station: You wait before you get in instead of just choosing another train and trying to find a different way. And the fewer contacts we have, the more important it gets to choose the right path. Also the increase in latency should be in the range of 20% for CHK requests and SSKs which succeed. Only unsuccessful requests should have a much higher latency than with OLM, because they don’t benefit from the faster transfers (shorter routes). Best wishes, Arne signature.asc Description: This is a digitally signed message part. ___ Devl mailing list Devl@freenetproject.org http://freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] Beyond New Load Management: A proposal
Am Dienstag, 30. August 2011, 01:08:16 schrieb Arne Babenhauserheide: 5) solution: count each SSK as only average_SSK_success_rate * data_to_transfer_on_success. Some more data: chances of having at least this many successful transfers for 40 SSKs with a mean success rate of 16%: for i in {0..16}; do echo $i $(./spielfaehig.py 0.16 40 $i); done 0 1.0 1 0.999064224991 2 0.99193451064 3 0.965452714478 4 0.901560126912 5 0.788987472629 6 0.634602118184 7 0.463062835467 8 0.304359825607 9 0.179664603573 10 0.0952149293922 11 0.0453494074947 12 0.0194452402752 13 0.00752109980912 14 0.0026291447461 15 0.000832100029072 16 0.00023879002726 what this means: if a SSK has a mean success rate of 0.16, then using 0.25 as value makes sure that 95% of the possible cases don’t exhaust the bandwidth. We then use only 64% of the bandwidth on average, though. With 0.2, we’d get 68% of the possible distributions safe and use 80% of bandwidth on average. Note: this is just a binomial spread: from math import factorial fac = factorial def nük(n, k): if k n: return 0 return fac(n) / (fac(k)*fac(n-k)) def binom(p, n, k): return nük(n, k) * p** k * (1-p)**(n-k) def spielfähig(p, n, min_spieler): return sum([binom(p, n, k) for k in range(min_spieler, n+1)]) → USK@6~ZDYdvAgMoUfG6M5Kwi7SQqyS- gTcyFeaNN1Pf3FvY,OSOT4OEeg4xyYnwcGECZUX6~lnmYrZsz05Km7G7bvOQ,AQACAAE/bab/9/Content- D426DC7.html Best wishes, Arne signature.asc Description: This is a digitally signed message part. ___ Devl mailing list Devl@freenetproject.org http://freenetproject.org/cgi-bin/mailman/listinfo/devl