On Wednesday 08 August 2007 16:05, Matthew Toseland wrote: > Continued at end to minimise confusion! > > On Wednesday 08 August 2007 15:59, Matthew Toseland wrote: > > Unfortunately this thread is rather rambling, it includes lots of > > discussion on token passing as well as the original premise. > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.25 - > > 20:18:36GMT ----- > > > > I made some measurements on how freenet node behaves if bandwidth limit > > is set low: 10KBps and downto 6KBps (specificially, input bandwidth > > limit; output bandwidth limit was set to at least 15KBps but as expected > > factually used output bandwidth is comparable (just slightly above) with > > factually used input bandwidth). The node itself was running frost but no > > uploads/downloads, so absolute majority of network traffic was forwarded > > CHK/SSK > > requests/inserts. > > > > Results are interesting enough: CHK traffic becomes as low as 5% (in > > packets) of CHK+SSK, while at least 92% of SSK requests were not > > satisfied for assorted failures (plus quite some more certainly resulted > > in NotFound response due to missing the key in whole network, but I don't > > have the number). This makes low traffic node working highly inefficient > > and improportionally slow; this also slows down its peers with all the > > extra reject traffic. Worse, input bandwidth sometimes goes over set > > limit, suggesting that on hardware 33600/56000 Kbps modem and even ISDN > > things will just get worse due to increased delays. > > > > Another thing to note: low bandwidth node (LBN) almost exclusively reject > > requests with "input bandwidth liability" reason, and extremely rarely > > other reasons. > > > > Speculating a bit, the same picture will likely be observed for peers of > > fast node (1Mbps or more) with many peers having typical home connection > > of 256Kbps or less. > > > > Not sure if simulations ever showed anything like this, but contributing > > to network mostly SSK service (and absolute majority of SSK requests > > fail!) is rather useless: optimally working network is supposed to > > transfer at least one CHK block for each SSK key, and typically much much > > more (single 10MB file consists of 481 CHK blocks!), and even if you > > found SSK but not CHK the SSK points to, then you failed to find > > information you requested. > > > > OK to make the long story short[er], at the end of this message you will > > find a small patch that noticably improves LBN situation. Idea is to > > reserve some bandwidth for CHK transfers (and SSK inserts, as those are > > too rare to penalize, and more valuable than requests). The line directly > > before the inserted one implicitly penalizes CHK transfers (as much > > smaller SSK requests tend to rereserve bandwidth the next moment it got > > released after CHK transfer finish, while much larger CHK requests do not > > have such good chance), so bandwidth should be reserved for 2 CHKs at > > least (and tests show that's enough to make a difference). > > > > Another thing I tried was increasing the 90 seconds period up to 120. > > That had some (no numbers here; just "noticeable but small") positive > > effect on making traffic smoother and staying closer to set limit, > > without jumping up and down too much. Where the 90 seconds number came > > from anyway, and how dangerous 120 could be? > > > > Some pros observed and/or thought out during tests of the patch: > > - I observe increase of output payload by approx. 15% (of total traffic), > > making LBN more useful for its peers. > > - the change is negligibly small for faster nodes so should not break > > anything globally. > > - entire network SSK flood traffic will be toned down a little bit (at > > temporary overloaded nodes only), additionally simplifying life for LBNs: > > after all, requesting the same SSK every 15 seconds for 35 hours, total > > 8100 times (factual numbers from one of the test before this patch > > applied; there are over 60 other SSKs that were requested more than 1000 > > times during the same period) is just way too much, SSKs are not inserted > > into network THAT fast. [does it worth to remember recently seen SSK > > requests, and do not forward them if same request was already forwarded > > within last 10 minutes and resulted in DNF/RNF? Table of recently > > requested SSKs that are closest to the node location should not be too > > big?]. > > > > And contras: > > - in exceptional conditions (specificially, with less than 2 incoming CHK > > requests per 90 seconds; factually I observe 2-7 CHK requests per > > seconds, that's 180-630 per 90 seconds) notwithstanding node bandwidth > > speed, up to 800 Bps might end being unused. For high bandwidth node > > that's just way too small to notice, for LBN that's still acceptable (10% > > of 56Kbps) and will decrease roundtrip delays a bit which is always a > > good thing for so slow links. > > > > Other notes: > > - distribution of location closeness/number of SSK requests is very nice: > > only SSK requests with location very close to node location get repeated > > frequently; farther SSK location is, less requests the node sees, with > > those SSKs seen only once or two times per 1-2 days period are > > distributed evenly among location space. This suggests that routing is > > working fine. - As far as I understand, if input bandwidth > > limit/liability exceeded (but a packet already received anyway), CHK/SSK > > request gets instantly rejected (thus throwing out received bytes while > > input bandwidth has no spare volume!); only otherwise node checks if the > > requested key exists in the storage. Heh? This feels like a serious bug > > hurting overall network performance: better query storage and hopefully > > send back result (or still reject if the key not found locally) rather > > than wait for retry request to waste more input bandwidth. At least for > > SSK reject and reply are comparable in output bandwidth usage, so worth a > > little delay in response. Or do I miss something? > > > > === > > diff --git a/freenet/src/freenet/node/NodeStats.java > > b/freenet/src/freenet/node/NodeStats.java > > index 3b091b4..fb9f8b9 100644 > > --- a/freenet/src/freenet/node/NodeStats.java > > +++ b/freenet/src/freenet/node/NodeStats.java > > @@ -414,6 +414,7 @@ public class NodeStats implements Persistable { > > > > successfulChkInsertBytesReceivedAverage.currentValue() * > > node.getNumCHKInserts() + > > > > successfulSskInsertBytesReceivedAverage.currentValue() * > > node.getNumSSKInserts(); > > bandwidthLiabilityInput += getSuccessfulBytes(isSSK, > > isInsert, true).currentValue(); > > + if (isSSK && !isInsert) > > bandwidthLiabilityInput+=successfulChkFetchBytesReceivedAverage.currentVa > >lu e()+successfulChkInsertBytesReceivedAverage.currentValue(); // slightly > > penalize SSK requests by reserving bandwidth for 2 additional CHK > > transfers (or SSK inserts if any) > > double bandwidthAvailableInput = > > node.getInputBandwidthLimit() * 90; // 90 seconds > > at full power > > if(bandwidthLiabilityInput > bandwidthAvailableInput) { > > === > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.26 - 16:56:59GMT > > ----- > > > > Most SSK requests fail. They DNF. The reason for this is most SSK > > requests are polling for data that has not yet been inserted. > > > > Bandwidth liability is usually the main reason for rejection. If we reach > > most of the other reasons, there is a problem - usually a cyclical > > problem. The main reason for it is to ensure that we don't accept so many > > requests that some of them needlessly timeout even though they succeeded. > > The timeout is 120 seconds, so we need the actual transfer to take less > > than this; on a request, 30 seconds seems a reasonable upper bound for > > the search time. We don't throw out many bytes when we reject a > > request/insert because the bulk of it hasn't been sent yet, except with > > SSKs where typically a little under half of the total bytes will have > > been moved. Ideally we wouldn't send requests until we have a good idea > > that they will be accepted, but token passing load balancing is a long > > way off, not likely to happen for 0.7.0. > > > > We cannot control input bandwidth usage precisely. > > > > Any more info on SSK flooding? Is it simply Frost? > > > > We can add a failure table, we had one before, however a failure table > > which results in actually blocking keys can be extremely dangerous; what > > I had envisaged was "per node failure tables" i.e. reroute requests which > > have recently failed to a different node since we know it isn't where > > it's supposed to be. > > > > On what do you base the assertion about key closeness? It would be nice > > to have a histogram or circle on the stats pages showing recent keys on > > the keyspace - can you write a patch? > > > > As far as your patch goes, surely rejecting more SSK requests would be > > counterproductive as it wastes bandwidth? Shouldn't a slow node accept > > those requests it's likely to be able to handle? > > > > I can see an argument that we shouldn't prefer SSKs, and that on slow > > nodes we do prefer SSKs... I'm not sure the above is the right way to > > deal with it though. The effect of the patch would be to never accept any > > SSKs unless we have plenty of spare bandwidth, correct? > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.26 - > > 18:41:32GMT ----- > > > > > Ideally we wouldn't send requests until we have a good idea that they > > > will > > > > be accepted, but token passing load balancing is a long way off, not > > likely to happen for 0.7.0. > > > > Well, even current algorithm implementation has certain room for > > improvement. Here is the typical numbers I observe: > > > > === > > unclaimedFIFO Message Counts > > * FNPRejectOverload: 89 (45.2%) > > * FNPInsertTransfersCompleted: 59 (29.9%) > > * FNPDataNotFound: 15 (7.6%) > > * packetTransmit: 12 (6.1%) > > * FNPRejectLoop: 7 (3.6%) > > * FNPAccepted: 6 (3.0%) > > * FNPSwapRejected: 4 (2.0%) > > * FNPDataInsertRejected: 4 (2.0%) > > * FNPRouteNotFound: 1 (0.5%) > > * Unclaimed Messages Considered: 197 > > === > > > > FNPRejectOverload always stays at top sometimes with hundreds messages > > (for the last hour before unclaimed messages expire, that's alot), and so > > indicates that there is some bug (or bugs) with bandwidth limiting > > obeying. > > > > > Any more info on SSK flooding? Is it simply Frost? > > > > Not local frost for sure, it generates just several SSK simultaneous > > requests (by default max 8: 6 for boards plus 2 for filesharing, AFAIR; > > practically 2-4 simutaneous requests most of the time). Other 100 SSK > > requests (without proposed patch) are forwarded ones. > > > > >We can add a failure table, we had one before, however a failure table > > > which > > > > results in actually blocking keys can be extremely dangerous; > > > > Is it, having timeout of max few minutes (i.e. at least few times less > > than SSK propagation time visible with frost messages)? Is it more > > dangerous than current wastage of bandwith for same SSK key requests > > several times per minute? Had some simulations been done on that in the > > past? > > > > BTW, isn't the observed very low store hit rate results from prioritising > > the likely-to-fail SSKs? > > > > BTW2 the failure table could also act as a targetted content propagation > > mechanism: if a node sees SSK insert for a temporary blacklisted > > (non-existing) SSK, then forwarding the insert (more likely insert copy, > > for security reasons and routing sake) to the original requestor should > > speed up propagaton of new SSKs toward the nodes that already > > anticipate/await for them. > > > > >what I had envisaged was "per node failure tables" i.e. reroute requests > > > > which have recently failed to a different node since we know it isn't > > where it's supposed to be. > > > > At a glance, very nice idea. But LBNs typically answer with reject, not > > DFN... even with current code. Probably such rerouting will even further > > increase SSK traffic toward LBNs, and get sharply increased volume of SSK > > rejects as result. Hmm, some testing/simulation seems really needed here. > > > > >On what do you base the assertion about key closeness? It would be nice > > > to > > > > have a histogram or circle on the stats pages showing recent keys on the > > keyspace - can you write a patch? > > > > Mmmm... in fact I just added custom logging, then a wild combination of > > grep/sed/sort/uniq to analyze the logs. But let me think, maybe > > visualizing a couple of stats files I operate with will be rather > > trivial... > > > > But I would rather stay away from stats page graphics at this time, as > > the stats files I operate (filtered+sorted+uniqued) with are rather > > large, 20-50Mb each - too much memory for the toy. Unless your 'recent' > > means just 10-15 minutes at most? > > > > >As far as your patch goes, surely rejecting more SSK requests would be > > > > counterproductive as it wastes bandwidth? > > > > Tests show the opposite: without the patch payload output at stats page > > never exceeded 38%, with patch it becomes 53% or little more after > > several minutes upon node restart. So, with the patch SSK/CHK forwarding > > behaviour 'feels' logical: > > > > without patch: > > - just several CHKs, and over over 100 SSKs very typical. > > > > with patch: > > - most of the time (say, 75%) number of currently forwarded CHK > > requests+inserts approximately equals to the number of SSK > > requests+inserts (i.e. 10-25 each, depending on set bandwidth limit); > > - sometimes (say, 10%) CHK requests start to prevail, but current SSK > > requests+inserts seems never go below the amount which CHK get at max > > without patch (i.e. 6-10). This is very typical when number of CHK > > inserts gets several times higher than CHK requests (close fast peer > > inserts something really large?). > > - other times (say, 15%) CHK requests+inserts flow does not saturate > > bandwidth, and number of SSK requests quickly climbs to 50 or even over > > 100+ as it typically gets without the patch. > > > > That's for LBN. Raising input bandwidth allotment, number of SSKs quickly > > grows resembling the situation without the patch. > > > > So that's why I suggest reserving bandwidth for 2 CHK transfers; 3 would > > kill SSKs, 1 still makes SSKs to seriously prevail over CHKs (but > > nonetheless gives quite better ratio, so is a legal value to try if the > > value of 2 alarms you too much). Just, in case of reserving bandwidth for > > 1 extra CHK the proposed patch is not really needed: simply comment out > > the line starting with "bandwidthLiabilityInput +=" and decrease 90 > > seconds constant to 80 (10 seconds is roughly how much 33.6Kbod modem > > takes to transmit a single CHK - using anything noticeably slower than > > 28800/33600bod for freenet will not ever work well anyway). > > > > >Shouldn't a slow node accept those requests it's likely to be able to > > > handle? > > > > Considering the very high chance of SSK request failures (at lest 92%), I > > would say the answer is no. But with sane SSK failure rate (say 75% or > > below) SSK requests would likely not waste the limited thus precious LBN > > bandwidth so fruitlessly. > > > > The problem, in my belief, is too small size of UDP packets if SSK > > requests prevail: PPP(oE)/TCP/FNP overhead becomes too large while LBNs, > > unlike faster link nodes, almost never coalesce packets, obviously. > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.27 - 17:19:24GMT > > ----- > > > > The current algorithm is working, on most nodes, far better than it has > > in *ages*. I'm at 62% of a 700MB ISO, I started inserting it yesterday > > morning, and only a few of my peers are backed off - frequently none are > > backed off, right now it's 11 connected, 6 backed off, which is more > > backed off than I've seen for quite a while. > > > > Re failure tables: Yes it is extremely dangerous. It can result in > > self-reinforcing key censorship, either as an attack or just occurring > > naturally. This happened on 0.5. And the hit ratio is only for CHKs iirc. > > > > Even LBNs don't often send local RejectedOverload's on SSKs *once they > > have accepted them*. They may relay downstream RO's but that is not > > fatal. And if they reject some requests, so what, it's a slow node, it's > > bound to reject some requests with the current load balancing system. > > > > 10-15 minutes would be interesting. We already show a circle and > > histogram of nearby nodes from swapping and of our peers, you'd just have > > to add another one. It would be good to have a visual proof that routing > > is working on the level of adhering to node specialisations. I didn't > > expect it to be working given the load: I'm surprised that it does, it's > > an interesting result. > > > > Packet size has nothing to do with it, ethernet has a 1472 byte maximum. > > Dial-up has 576 bytes max, but we ignore it, and use fragmented packets > > (this sucks, obviously, as it greatly increases the chance of losing a > > packet and having to retransmit it). > > > > Please explain why the patch doesn't result in never accepting a single > > SSK? > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.27 - > > 19:31:14GMT ----- > > > > >Packet size has nothing to do with it, ethernet has a 1472 byte maximum. > > > > Dial-up has 576 bytes max, but we ignore it, and use fragmented packets > > (this sucks, obviously, as it greatly increases the chance of losing a > > packet and having to retransmit it). > > > > I am talking about typical/average packet size, not MTU. LBNs, unlike > > faster nodes, rarely have a chance to coalesce reject responses (over max > > 100ms), and thus send improportionally more tiny packets resulting in > > much higher protocols overhead. Thus having LBNs to mostly cater SSKs not > > CHKs results in lowest imaginable usefulness of LBNs for network as a > > whole. > > > > BTW in my experience typical/default dialup/PPP MTU is 1500 minus link > > level headers, like ethernet. 576 is a reasonable adjustment for > > interactive traffic like ssh but I fail to remember if it was used as > > default since the time the super fast 28800 bod modems became common. :) > > 1400+ is the typical size of GPRS PPP packets too, and the same holds > > true for other popular wireless mediae like BlueTooth or WiFi; so I have > > no concerns regarding IP fragmentation. > > > > > Please explain why the patch doesn't result in never accepting a single > > > SSK? > > > > I can not. :) Can you explain why the current code that penalizes CHKs > > still gives 5% for them, even if CHKs are 25 times larger and similarly > > less frequent so have really hard time to arrive at the exact moment when > > bandwidth liability is not maxed out? > > > > Seriously, I believe that goes with 2 facts: > > > > - SSK requests are much more frequent, so any temporary drop of CHK > > requests level enables node to quickly get a bunch of new SSKs accepted > > for processing; > > - the large CHK requests (at times while they prevail over SSKs) tend to > > hit other limits too, like "output bandwidth liability", "Insufficient > > input/output bandwidth" throttles. Then the small SSK requests quickly > > pick up all the remaining bandwidth bits. > > > > But currently I do not have relevant statistics to prove that. > > > > Anyway, please commit the following patch - it should equal out bandwidth > > rights for CHKs and SSKs at least half way toward fair/expected > > distribution (and the change will make any difference for > > high-/over-loaded nodes only). Once most of my peers (and their peers) > > update, I will study the new node traffic forwarding efficiency and > > behavior at different bandwidth limits and with different penalization > > levels again - and then will be in better position to prove the original > > proposal of reserving bandwidth for 2 CHKs is optimal (or maybe withdraw > > it). > > > > === > > diff --git a/freenet/src/freenet/node/NodeStats.java > > b/freenet/src/freenet/node/NodeStats.java > > index 3b091b4..98c82c3 100644 > > --- a/freenet/src/freenet/node/NodeStats.java > > +++ b/freenet/src/freenet/node/NodeStats.java > > @@ -399,9 +399,8 @@ public class NodeStats implements Persistable { > > successfulSskFetchBytesSentAverage.currentValue() > > * node.getNumSSKRequests() + > > > > successfulChkInsertBytesSentAverage.currentValue() * > > node.getNumCHKInserts() + > > > > successfulSskInsertBytesSentAverage.currentValue() * > > node.getNumSSKInserts(); > > - bandwidthLiabilityOutput += getSuccessfulBytes(isSSK, > > isInsert, false).currentValue(); > > double bandwidthAvailableOutput = > > - node.getOutputBandwidthLimit() * 90; // 90 > > seconds at full power; we have to leave some time for the search as well > > + node.getOutputBandwidthLimit() * 80; // 80 > > seconds at full power; we have to leave some time for the search as well > > bandwidthAvailableOutput *= > > NodeStats.FRACTION_OF_BANDWIDTH_USED_BY_REQUESTS; > > if(bandwidthLiabilityOutput > bandwidthAvailableOutput) { > > preemptiveRejectReasons.inc("Output bandwidth > > liability"); > > @@ -413,9 +412,8 @@ public class NodeStats implements Persistable { > > > > successfulSskFetchBytesReceivedAverage.currentValue() * > > node.getNumSSKRequests() + > > > > successfulChkInsertBytesReceivedAverage.currentValue() * > > node.getNumCHKInserts() + > > > > successfulSskInsertBytesReceivedAverage.currentValue() * > > node.getNumSSKInserts(); > > - bandwidthLiabilityInput += getSuccessfulBytes(isSSK, > > isInsert, true).currentValue(); > > double bandwidthAvailableInput = > > - node.getInputBandwidthLimit() * 90; // 90 seconds > > at full power > > + node.getInputBandwidthLimit() * 80; // 80 seconds > > at full power > > if(bandwidthLiabilityInput > bandwidthAvailableInput) { > > preemptiveRejectReasons.inc("Input bandwidth > > liability"); > > return "Input bandwidth liability"; > > === > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.28 - 17:05:53GMT > > ----- > > > > Why does assuming 80 seconds instead of 90 help? I would have expected it > > to make the situation worse. > > > > Isn't what you want to increment the value you are multiplying the CHK > > byte counters by if the request is an SSK? In any case I'm not convinced > > - we accept 32x as many SSKs as CHKs precisely because they use 32x less > > bandwidth. As far as I can see incrementing the CHK counts but only on a > > CHK would just result in us never accepting an SSK... > > > > But by all means continue to investigate. > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.04.30 - 19:36:36GMT > > ----- > > > > > we accept 32x as many SSKs as CHKs precisely because they use 32x less > > > > bandwidth. > > > > Sorry but I don't understand the rationale behind this. It seems to be > > based on the assumption that equal resources should be allocated to SSKs > > and CHKs, regardless of whether there's equal demand for resources. If > > we're only getting, say, 16 times as many SSK requests as CHK requests, > > would we reject CHK requests to keep things "fair"? > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.02 - 16:13:52GMT > > ----- > > > > Why should CHKs be prioritised over SSKs? > > > > What do you think of the patch I committed anyway? > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.02 - > 17:03:52GMT ----- > > > Why should CHKs be prioritised over SSKs? > > Because SSK success ratio is extremely low (remember the example I gave > within initial message in the thread: 8100 rejected requests for the same > SSK key within 35 hours, and uncounted amount of the same key requests > resulted in DNF - and such key is nowhere near being alone); linear > programming would certainly suggest that SSK requests like they currently > are should be totally disabled. I do not propose anything as drastic; but > as SSKs currently use bandwidth very inefficiently I propose to tone them > down heuristically to approximately CHK level while node is short on > bandwidth (but let SSKs go as much as sending node needs if spare bandwidth > available). > > [not that I meant to speak instead of mrogers] > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.03 - 13:54:38GMT > ----- > > >> Why should CHKs be prioritised over SSKs? > > > > Because SSK success ratio is extremely low > > That doesn't make sense for two reasons: > > 1) rejecting SSKs will make the SSK success ratio worse, not better > 2) "SSK not found" is useful information - for example, that's how you > discover the current version of a USK - but "SSK rejected for not being a > CHK" is not useful to anyone > > Let me use an analogy: you're designing a Cisco router. Some of the packets > it gets asked to forward will be small SSH packets, others will be large > HTTP packets. Do you say "half our resources should go to SSH because we > don't want to prioritise HTTP, so we'll only forward one HTTP packet for > every 32 SSH packets"? If you answered yes, you just lost your job at > Cisco. The router's job is to deliver packets to the best of its ability, > not to decide what kind of packets the end hosts "should" be sending.
----- Anonymous ----- 2007.05.03 - 18:37:00GMT ----- lets build an extreme case: There is a packet type which is 1 MB in size and one which is 1 byte in size. Both get constantly send to a router at a higher rate then the routers bandwidth quota allows it to forward the packets. If the following rules apply not a single 1 MB packet will get served: 1. the router doesn't care for the size of the packet or penalises any kind of packet, all are considered equal 2. if a packet arrives and the remaining bandwidth quota doesn't allow to forward it, it gets instantly rejected 3. no queueing of incomming packets is done (with part-allocation of available bandwidth if packet size exceed the free quota) Afaik this is a possible situation for freenet, correct me if I am wrong. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/tech/attachments/20070808/a72f2d69/attachment.pgp>