On Wednesday 08 August 2007 16:13, Matthew Toseland wrote: > On Wednesday 08 August 2007 16:06, Matthew Toseland wrote: > > On Wednesday 08 August 2007 16:05, Matthew Toseland wrote: > > > Continued at end to minimise confusion! > > > > > > On Wednesday 08 August 2007 15:59, Matthew Toseland wrote: > > > > Unfortunately this thread is rather rambling, it includes lots of > > > > discussion on token passing as well as the original premise. > > > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.25 - > > > > 20:18:36GMT ----- > > > > > > > > I made some measurements on how freenet node behaves if bandwidth > > > > limit is set low: 10KBps and downto 6KBps (specificially, input > > > > bandwidth limit; output bandwidth limit was set to at least 15KBps > > > > but as expected factually used output bandwidth is comparable (just > > > > slightly above) with factually used input bandwidth). The node itself > > > > was running frost but no uploads/downloads, so absolute majority of > > > > network traffic was forwarded CHK/SSK > > > > requests/inserts. > > > > > > > > Results are interesting enough: CHK traffic becomes as low as 5% (in > > > > packets) of CHK+SSK, while at least 92% of SSK requests were not > > > > satisfied for assorted failures (plus quite some more certainly > > > > resulted in NotFound response due to missing the key in whole > > > > network, but I don't have the number). This makes low traffic node > > > > working highly inefficient and improportionally slow; this also slows > > > > down its peers with all the extra reject traffic. Worse, input > > > > bandwidth sometimes goes over set limit, suggesting that on hardware > > > > 33600/56000 Kbps modem and even ISDN things will just get worse due > > > > to increased delays. > > > > > > > > Another thing to note: low bandwidth node (LBN) almost exclusively > > > > reject requests with "input bandwidth liability" reason, and > > > > extremely rarely other reasons. > > > > > > > > Speculating a bit, the same picture will likely be observed for peers > > > > of fast node (1Mbps or more) with many peers having typical home > > > > connection of 256Kbps or less. > > > > > > > > Not sure if simulations ever showed anything like this, but > > > > contributing to network mostly SSK service (and absolute majority of > > > > SSK requests fail!) is rather useless: optimally working network is > > > > supposed to transfer at least one CHK block for each SSK key, and > > > > typically much much more (single 10MB file consists of 481 CHK > > > > blocks!), and even if you found SSK but not CHK the SSK points to, > > > > then you failed to find information you requested. > > > > > > > > OK to make the long story short[er], at the end of this message you > > > > will find a small patch that noticably improves LBN situation. Idea > > > > is to reserve some bandwidth for CHK transfers (and SSK inserts, as > > > > those are too rare to penalize, and more valuable than requests). The > > > > line directly before the inserted one implicitly penalizes CHK > > > > transfers (as much smaller SSK requests tend to rereserve bandwidth > > > > the next moment it got released after CHK transfer finish, while much > > > > larger CHK requests do not have such good chance), so bandwidth > > > > should be reserved for 2 CHKs at least (and tests show that's enough > > > > to make a > > > > difference). > > > > > > > > Another thing I tried was increasing the 90 seconds period up to 120. > > > > That had some (no numbers here; just "noticeable but small") positive > > > > effect on making traffic smoother and staying closer to set limit, > > > > without jumping up and down too much. Where the 90 seconds number > > > > came from anyway, and how dangerous 120 could be? > > > > > > > > Some pros observed and/or thought out during tests of the patch: > > > > - I observe increase of output payload by approx. 15% (of total > > > > traffic), making LBN more useful for its peers. > > > > - the change is negligibly small for faster nodes so should not break > > > > anything globally. > > > > - entire network SSK flood traffic will be toned down a little bit > > > > (at temporary overloaded nodes only), additionally simplifying life > > > > for LBNs: after all, requesting the same SSK every 15 seconds for 35 > > > > hours, total 8100 times (factual numbers from one of the test before > > > > this patch applied; there are over 60 other SSKs that were requested > > > > more than 1000 times during the same period) is just way too much, > > > > SSKs are not inserted into network THAT fast. [does it worth to > > > > remember recently seen SSK requests, and do not forward them if same > > > > request was already forwarded within last 10 minutes and resulted in > > > > DNF/RNF? Table of recently requested SSKs that are closest to the > > > > node location should not be too big?]. > > > > > > > > And contras: > > > > - in exceptional conditions (specificially, with less than 2 incoming > > > > CHK requests per 90 seconds; factually I observe 2-7 CHK requests per > > > > seconds, that's 180-630 per 90 seconds) notwithstanding node > > > > bandwidth speed, up to 800 Bps might end being unused. For high > > > > bandwidth node that's just way too small to notice, for LBN that's > > > > still acceptable (10% of 56Kbps) and will decrease roundtrip delays a > > > > bit which is always a good thing for so slow links. > > > > > > > > Other notes: > > > > - distribution of location closeness/number of SSK requests is very > > > > nice: only SSK requests with location very close to node location get > > > > repeated frequently; farther SSK location is, less requests the node > > > > sees, with those SSKs seen only once or two times per 1-2 days period > > > > are distributed evenly among location space. This suggests that > > > > routing is working fine. - As far as I understand, if input bandwidth > > > > limit/liability exceeded (but a packet already received anyway), > > > > CHK/SSK request gets instantly rejected (thus throwing out received > > > > bytes while input bandwidth has no spare volume!); only otherwise > > > > node checks if the requested key exists in the storage. Heh? This > > > > feels like a serious bug hurting overall network performance: better > > > > query storage and hopefully send back result (or still reject if the > > > > key not found locally) rather than wait for retry request to waste > > > > more input bandwidth. At least for SSK reject and reply are > > > > comparable in output bandwidth usage, so worth a little delay in > > > > response. Or do I miss something? > > > > > > > > === > > > > diff --git a/freenet/src/freenet/node/NodeStats.java > > > > b/freenet/src/freenet/node/NodeStats.java > > > > index 3b091b4..fb9f8b9 100644 > > > > --- a/freenet/src/freenet/node/NodeStats.java > > > > +++ b/freenet/src/freenet/node/NodeStats.java > > > > @@ -414,6 +414,7 @@ public class NodeStats implements Persistable { > > > > > > > > successfulChkInsertBytesReceivedAverage.currentValue() * > > > > node.getNumCHKInserts() + > > > > > > > > successfulSskInsertBytesReceivedAverage.currentValue() * > > > > node.getNumSSKInserts(); > > > > bandwidthLiabilityInput += getSuccessfulBytes(isSSK, > > > > isInsert, true).currentValue(); > > > > + if (isSSK && !isInsert) > > > > bandwidthLiabilityInput+=successfulChkFetchBytesReceivedAverage.curre > > > >nt Va lu e()+successfulChkInsertBytesReceivedAverage.currentValue(); > > > > // slightly penalize SSK requests by reserving bandwidth for 2 > > > > additional CHK transfers (or SSK inserts if any) > > > > double bandwidthAvailableInput = > > > > node.getInputBandwidthLimit() * 90; // 90 > > > > seconds at full power > > > > if(bandwidthLiabilityInput > bandwidthAvailableInput) > > > > { === > > > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.26 - 16:56:59GMT > > > > ----- > > > > > > > > Most SSK requests fail. They DNF. The reason for this is most SSK > > > > requests are polling for data that has not yet been inserted. > > > > > > > > Bandwidth liability is usually the main reason for rejection. If we > > > > reach most of the other reasons, there is a problem - usually a > > > > cyclical problem. The main reason for it is to ensure that we don't > > > > accept so many requests that some of them needlessly timeout even > > > > though they succeeded. The timeout is 120 seconds, so we need the > > > > actual transfer to take less than this; on a request, 30 seconds > > > > seems a reasonable upper bound for the search time. We don't throw > > > > out many bytes when we reject a request/insert because the bulk of it > > > > hasn't been sent yet, except with SSKs where typically a little under > > > > half of the total bytes will have been moved. Ideally we wouldn't > > > > send requests until we have a good idea that they will be accepted, > > > > but token passing load balancing is a long way off, not likely to > > > > happen for 0.7.0. > > > > > > > > We cannot control input bandwidth usage precisely. > > > > > > > > Any more info on SSK flooding? Is it simply Frost? > > > > > > > > We can add a failure table, we had one before, however a failure > > > > table which results in actually blocking keys can be extremely > > > > dangerous; what I had envisaged was "per node failure tables" i.e. > > > > reroute requests which have recently failed to a different node since > > > > we know it isn't where it's supposed to be. > > > > > > > > On what do you base the assertion about key closeness? It would be > > > > nice to have a histogram or circle on the stats pages showing recent > > > > keys on the keyspace - can you write a patch? > > > > > > > > As far as your patch goes, surely rejecting more SSK requests would > > > > be counterproductive as it wastes bandwidth? Shouldn't a slow node > > > > accept those requests it's likely to be able to handle? > > > > > > > > I can see an argument that we shouldn't prefer SSKs, and that on slow > > > > nodes we do prefer SSKs... I'm not sure the above is the right way to > > > > deal with it though. The effect of the patch would be to never accept > > > > any SSKs unless we have plenty of spare bandwidth, correct? > > > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.26 - > > > > 18:41:32GMT ----- > > > > > > > > > Ideally we wouldn't send requests until we have a good idea that > > > > > they will > > > > > > > > be accepted, but token passing load balancing is a long way off, not > > > > likely to happen for 0.7.0. > > > > > > > > Well, even current algorithm implementation has certain room for > > > > improvement. Here is the typical numbers I observe: > > > > > > > > === > > > > unclaimedFIFO Message Counts > > > > * FNPRejectOverload: 89 (45.2%) > > > > * FNPInsertTransfersCompleted: 59 (29.9%) > > > > * FNPDataNotFound: 15 (7.6%) > > > > * packetTransmit: 12 (6.1%) > > > > * FNPRejectLoop: 7 (3.6%) > > > > * FNPAccepted: 6 (3.0%) > > > > * FNPSwapRejected: 4 (2.0%) > > > > * FNPDataInsertRejected: 4 (2.0%) > > > > * FNPRouteNotFound: 1 (0.5%) > > > > * Unclaimed Messages Considered: 197 > > > > === > > > > > > > > FNPRejectOverload always stays at top sometimes with hundreds > > > > messages (for the last hour before unclaimed messages expire, that's > > > > alot), and so indicates that there is some bug (or bugs) with > > > > bandwidth limiting obeying. > > > > > > > > > Any more info on SSK flooding? Is it simply Frost? > > > > > > > > Not local frost for sure, it generates just several SSK simultaneous > > > > requests (by default max 8: 6 for boards plus 2 for filesharing, > > > > AFAIR; practically 2-4 simutaneous requests most of the time). Other > > > > 100 SSK requests (without proposed patch) are forwarded ones. > > > > > > > > >We can add a failure table, we had one before, however a failure > > > > > table which > > > > > > > > results in actually blocking keys can be extremely dangerous; > > > > > > > > Is it, having timeout of max few minutes (i.e. at least few times > > > > less than SSK propagation time visible with frost messages)? Is it > > > > more dangerous than current wastage of bandwith for same SSK key > > > > requests several times per minute? Had some simulations been done on > > > > that in the past? > > > > > > > > BTW, isn't the observed very low store hit rate results from > > > > prioritising the likely-to-fail SSKs? > > > > > > > > BTW2 the failure table could also act as a targetted content > > > > propagation mechanism: if a node sees SSK insert for a temporary > > > > blacklisted (non-existing) SSK, then forwarding the insert (more > > > > likely insert copy, for security reasons and routing sake) to the > > > > original requestor should speed up propagaton of new SSKs toward the > > > > nodes that already > > > > anticipate/await for them. > > > > > > > > >what I had envisaged was "per node failure tables" i.e. reroute > > > > > requests > > > > > > > > which have recently failed to a different node since we know it isn't > > > > where it's supposed to be. > > > > > > > > At a glance, very nice idea. But LBNs typically answer with reject, > > > > not DFN... even with current code. Probably such rerouting will even > > > > further increase SSK traffic toward LBNs, and get sharply increased > > > > volume of SSK rejects as result. Hmm, some testing/simulation seems > > > > really needed here. > > > > > > > > >On what do you base the assertion about key closeness? It would be > > > > > nice to > > > > > > > > have a histogram or circle on the stats pages showing recent keys on > > > > the keyspace - can you write a patch? > > > > > > > > Mmmm... in fact I just added custom logging, then a wild combination > > > > of grep/sed/sort/uniq to analyze the logs. But let me think, maybe > > > > visualizing a couple of stats files I operate with will be rather > > > > trivial... > > > > > > > > But I would rather stay away from stats page graphics at this time, > > > > as the stats files I operate (filtered+sorted+uniqued) with are > > > > rather large, 20-50Mb each - too much memory for the toy. Unless your > > > > 'recent' means just 10-15 minutes at most? > > > > > > > > >As far as your patch goes, surely rejecting more SSK requests would > > > > > be > > > > > > > > counterproductive as it wastes bandwidth? > > > > > > > > Tests show the opposite: without the patch payload output at stats > > > > page never exceeded 38%, with patch it becomes 53% or little more > > > > after several minutes upon node restart. So, with the patch SSK/CHK > > > > forwarding behaviour 'feels' logical: > > > > > > > > without patch: > > > > - just several CHKs, and over over 100 SSKs very typical. > > > > > > > > with patch: > > > > - most of the time (say, 75%) number of currently forwarded CHK > > > > requests+inserts approximately equals to the number of SSK > > > > requests+inserts (i.e. 10-25 each, depending on set bandwidth limit); > > > > - sometimes (say, 10%) CHK requests start to prevail, but current SSK > > > > requests+inserts seems never go below the amount which CHK get at max > > > > without patch (i.e. 6-10). This is very typical when number of CHK > > > > inserts gets several times higher than CHK requests (close fast peer > > > > inserts something really large?). > > > > - other times (say, 15%) CHK requests+inserts flow does not saturate > > > > bandwidth, and number of SSK requests quickly climbs to 50 or even > > > > over 100+ as it typically gets without the patch. > > > > > > > > That's for LBN. Raising input bandwidth allotment, number of SSKs > > > > quickly grows resembling the situation without the patch. > > > > > > > > So that's why I suggest reserving bandwidth for 2 CHK transfers; 3 > > > > would kill SSKs, 1 still makes SSKs to seriously prevail over CHKs > > > > (but nonetheless gives quite better ratio, so is a legal value to try > > > > if the value of 2 alarms you too much). Just, in case of reserving > > > > bandwidth for 1 extra CHK the proposed patch is not really needed: > > > > simply comment out the line starting with "bandwidthLiabilityInput > > > > +=" and decrease 90 seconds constant to 80 (10 seconds is roughly how > > > > much 33.6Kbod modem takes to transmit a single CHK - using anything > > > > noticeably slower than 28800/33600bod for freenet will not ever work > > > > well anyway). > > > > > > > > >Shouldn't a slow node accept those requests it's likely to be able > > > > > to handle? > > > > > > > > Considering the very high chance of SSK request failures (at lest > > > > 92%), I would say the answer is no. But with sane SSK failure rate > > > > (say 75% or below) SSK requests would likely not waste the limited > > > > thus precious LBN bandwidth so fruitlessly. > > > > > > > > The problem, in my belief, is too small size of UDP packets if SSK > > > > requests prevail: PPP(oE)/TCP/FNP overhead becomes too large while > > > > LBNs, unlike faster link nodes, almost never coalesce packets, > > > > obviously. > > > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.27 - 17:19:24GMT > > > > ----- > > > > > > > > The current algorithm is working, on most nodes, far better than it > > > > has in *ages*. I'm at 62% of a 700MB ISO, I started inserting it > > > > yesterday morning, and only a few of my peers are backed off - > > > > frequently none are backed off, right now it's 11 connected, 6 backed > > > > off, which is more backed off than I've seen for quite a while. > > > > > > > > Re failure tables: Yes it is extremely dangerous. It can result in > > > > self-reinforcing key censorship, either as an attack or just > > > > occurring naturally. This happened on 0.5. And the hit ratio is only > > > > for CHKs iirc. > > > > > > > > Even LBNs don't often send local RejectedOverload's on SSKs *once > > > > they have accepted them*. They may relay downstream RO's but that is > > > > not fatal. And if they reject some requests, so what, it's a slow > > > > node, it's bound to reject some requests with the current load > > > > balancing system. > > > > > > > > 10-15 minutes would be interesting. We already show a circle and > > > > histogram of nearby nodes from swapping and of our peers, you'd just > > > > have to add another one. It would be good to have a visual proof that > > > > routing is working on the level of adhering to node specialisations. > > > > I didn't expect it to be working given the load: I'm surprised that > > > > it does, it's an interesting result. > > > > > > > > Packet size has nothing to do with it, ethernet has a 1472 byte > > > > maximum. Dial-up has 576 bytes max, but we ignore it, and use > > > > fragmented packets (this sucks, obviously, as it greatly increases > > > > the chance of losing a packet and having to retransmit it). > > > > > > > > Please explain why the patch doesn't result in never accepting a > > > > single SSK? > > > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.27 - > > > > 19:31:14GMT ----- > > > > > > > > >Packet size has nothing to do with it, ethernet has a 1472 byte > > > > > maximum. > > > > > > > > Dial-up has 576 bytes max, but we ignore it, and use fragmented > > > > packets (this sucks, obviously, as it greatly increases the chance of > > > > losing a packet and having to retransmit it). > > > > > > > > I am talking about typical/average packet size, not MTU. LBNs, unlike > > > > faster nodes, rarely have a chance to coalesce reject responses (over > > > > max 100ms), and thus send improportionally more tiny packets > > > > resulting in much higher protocols overhead. Thus having LBNs to > > > > mostly cater SSKs not CHKs results in lowest imaginable usefulness of > > > > LBNs for network as a whole. > > > > > > > > BTW in my experience typical/default dialup/PPP MTU is 1500 minus > > > > link level headers, like ethernet. 576 is a reasonable adjustment for > > > > interactive traffic like ssh but I fail to remember if it was used as > > > > default since the time the super fast 28800 bod modems became common. > > > > > > > > :) 1400+ is the typical size of GPRS PPP packets too, and the same > > > > > > > > holds true for other popular wireless mediae like BlueTooth or WiFi; > > > > so I have no concerns regarding IP fragmentation. > > > > > > > > > Please explain why the patch doesn't result in never accepting a > > > > > single SSK? > > > > > > > > I can not. :) Can you explain why the current code that penalizes > > > > CHKs still gives 5% for them, even if CHKs are 25 times larger and > > > > similarly less frequent so have really hard time to arrive at the > > > > exact moment when bandwidth liability is not maxed out? > > > > > > > > Seriously, I believe that goes with 2 facts: > > > > > > > > - SSK requests are much more frequent, so any temporary drop of CHK > > > > requests level enables node to quickly get a bunch of new SSKs > > > > accepted for processing; > > > > - the large CHK requests (at times while they prevail over SSKs) tend > > > > to hit other limits too, like "output bandwidth liability", > > > > "Insufficient input/output bandwidth" throttles. Then the small SSK > > > > requests quickly pick up all the remaining bandwidth bits. > > > > > > > > But currently I do not have relevant statistics to prove that. > > > > > > > > Anyway, please commit the following patch - it should equal out > > > > bandwidth rights for CHKs and SSKs at least half way toward > > > > fair/expected distribution (and the change will make any difference > > > > for high-/over-loaded nodes only). Once most of my peers (and their > > > > peers) update, I will study the new node traffic forwarding > > > > efficiency and behavior at different bandwidth limits and with > > > > different penalization levels again - and then will be in better > > > > position to prove the original proposal of reserving bandwidth for 2 > > > > CHKs is optimal (or maybe withdraw it). > > > > > > > > === > > > > diff --git a/freenet/src/freenet/node/NodeStats.java > > > > b/freenet/src/freenet/node/NodeStats.java > > > > index 3b091b4..98c82c3 100644 > > > > --- a/freenet/src/freenet/node/NodeStats.java > > > > +++ b/freenet/src/freenet/node/NodeStats.java > > > > @@ -399,9 +399,8 @@ public class NodeStats implements Persistable { > > > > > > > > successfulSskFetchBytesSentAverage.currentValue() * > > > > node.getNumSSKRequests() + > > > > > > > > successfulChkInsertBytesSentAverage.currentValue() * > > > > node.getNumCHKInserts() + > > > > > > > > successfulSskInsertBytesSentAverage.currentValue() * > > > > node.getNumSSKInserts(); > > > > - bandwidthLiabilityOutput += getSuccessfulBytes(isSSK, > > > > isInsert, false).currentValue(); > > > > double bandwidthAvailableOutput = > > > > - node.getOutputBandwidthLimit() * 90; // 90 > > > > seconds at full power; we have to leave some time for the search as > > > > well + node.getOutputBandwidthLimit() * 80; // > > > > 80 seconds at full power; we have to leave some time for the search > > > > as well bandwidthAvailableOutput *= > > > > NodeStats.FRACTION_OF_BANDWIDTH_USED_BY_REQUESTS; > > > > if(bandwidthLiabilityOutput > > > > > bandwidthAvailableOutput) { preemptiveRejectReasons.inc("Output > > > > bandwidth liability"); @@ -413,9 +412,8 @@ public class NodeStats > > > > implements Persistable { > > > > > > > > successfulSskFetchBytesReceivedAverage.currentValue() * > > > > node.getNumSSKRequests() + > > > > > > > > successfulChkInsertBytesReceivedAverage.currentValue() * > > > > node.getNumCHKInserts() + > > > > > > > > successfulSskInsertBytesReceivedAverage.currentValue() * > > > > node.getNumSSKInserts(); > > > > - bandwidthLiabilityInput += getSuccessfulBytes(isSSK, > > > > isInsert, true).currentValue(); > > > > double bandwidthAvailableInput = > > > > - node.getInputBandwidthLimit() * 90; // 90 > > > > seconds at full power > > > > + node.getInputBandwidthLimit() * 80; // 80 > > > > seconds at full power > > > > if(bandwidthLiabilityInput > bandwidthAvailableInput) > > > > { preemptiveRejectReasons.inc("Input bandwidth liability"); > > > > return "Input bandwidth liability"; > > > > === > > > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.28 - 17:05:53GMT > > > > ----- > > > > > > > > Why does assuming 80 seconds instead of 90 help? I would have > > > > expected it to make the situation worse. > > > > > > > > Isn't what you want to increment the value you are multiplying the > > > > CHK byte counters by if the request is an SSK? In any case I'm not > > > > convinced - we accept 32x as many SSKs as CHKs precisely because they > > > > use 32x less bandwidth. As far as I can see incrementing the CHK > > > > counts but only on a CHK would just result in us never accepting an > > > > SSK... > > > > > > > > But by all means continue to investigate. > > > > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.04.30 - > > > > 19:36:36GMT ----- > > > > > > > > > we accept 32x as many SSKs as CHKs precisely because they use 32x > > > > > less > > > > > > > > bandwidth. > > > > > > > > Sorry but I don't understand the rationale behind this. It seems to > > > > be based on the assumption that equal resources should be allocated > > > > to SSKs and CHKs, regardless of whether there's equal demand for > > > > resources. If we're only getting, say, 16 times as many SSK requests > > > > as CHK requests, would we reject CHK requests to keep things "fair"? > > > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.02 - 16:13:52GMT > > > > ----- > > > > > > > > Why should CHKs be prioritised over SSKs? > > > > > > > > What do you think of the patch I committed anyway? > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.02 - > > > 17:03:52GMT ----- > > > > > > > Why should CHKs be prioritised over SSKs? > > > > > > Because SSK success ratio is extremely low (remember the example I gave > > > within initial message in the thread: 8100 rejected requests for the > > > same SSK key within 35 hours, and uncounted amount of the same key > > > requests resulted in DNF - and such key is nowhere near being alone); > > > linear programming would certainly suggest that SSK requests like they > > > currently are should be totally disabled. I do not propose anything as > > > drastic; but as SSKs currently use bandwidth very inefficiently I > > > propose to tone them down heuristically to approximately CHK level > > > while node is short on bandwidth (but let SSKs go as much as sending > > > node needs if spare bandwidth available). > > > > > > [not that I meant to speak instead of mrogers] > > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.03 - > > > 13:54:38GMT ----- > > > > > > >> Why should CHKs be prioritised over SSKs? > > > > > > > > Because SSK success ratio is extremely low > > > > > > That doesn't make sense for two reasons: > > > > > > 1) rejecting SSKs will make the SSK success ratio worse, not better > > > 2) "SSK not found" is useful information - for example, that's how you > > > discover the current version of a USK - but "SSK rejected for not being > > > a CHK" is not useful to anyone > > > > > > Let me use an analogy: you're designing a Cisco router. Some of the > > > packets it gets asked to forward will be small SSH packets, others will > > > be large HTTP packets. Do you say "half our resources should go to SSH > > > because we don't want to prioritise HTTP, so we'll only forward one > > > HTTP packet for every 32 SSH packets"? If you answered yes, you just > > > lost your job at Cisco. The router's job is to deliver packets to the > > > best of its ability, not to decide what kind of packets the end hosts > > > "should" be sending. > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.03 - > > 15:56:54GMT ----- > > > > 1) accepting more SSK requests will make SSK success ratio worse, not > > better. If that done in addition to CHK requests (i.e. node bandwidth > > stays well within desired limits), not a big deal; but if the resultless > > SSK traffic starts limiting CHK (i.e. real data) transfer, that's a > > problem. > > > > Fact is that SSK key is known well before SSK content, and gets > > rerequested repeatably on practice. My profiling of network indicates > > that even with datastore 10 times larger then default, less than 0.05% > > (and who knows how much less) of SSK requests (not SSK keys!) are > > currently getting satisfied by underloaded node; even worse for > > overloaded node (do you have different numbers?). With such miserable SSK > > requests success rate I see alot of sense to penalize them; once the SSK > > requests flood situation improves (and there certainly should be SSK > > centric bugs hiding or improvements pending to either improve SSK > > requests flow or avoid using SSKs where they are not optimal construct) I > > will quite possibly and readily change my opinion that underprioritising > > SSK requests is acceptable and good. But the SSK flood problem does not > > look like a low hanging fruit at all, expect it to take months if not > > years for major improvements. > > > > But please do not argue with me that equal treating of SSK and CHK > > requests based on their count not length is good: I totally agree; my > > point is that for the currently observed network behaviour that is > > nowhere being the only good choice, and in short-to-middle term not > > neccessarily the best one. > > > > 2) "SSK not found" is a useful information indeed; but the exactly same > > event repeated 8100 times brings extremely little additional entropy > > (basic fact of information theory), and mostly means wasted bandwidth on > > practice. Then again, if the node is not overloaded at the moment, I have > > nothing against repeating the DNF using otherwise idle bandwidth - even > > 0.001 bit of entropy per extra SSK request is at least a mediocre (and > > maybe a quite good) argument against transferring exactly 0 bits of > > entropy (i.e. keeping network link idle). > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.04 - 11:03:09GMT > > ----- > > > > Can you explain why you think accepting more SSK requests will make the > > SSK success ratio worse? A rejected request will fail to reach some parts > > of the network (at the minimum it won't reach the node that rejected it), > > so it's more likely to fail. > > > > > But please do not argue with me that equal treating of SSK and CHK > > > requests > > > > based on their count not length is good: I totally agree > > > > But I don't agree. :-) I don't think we should try to treat them equally > > in any way. We should just accept as many requests as possible, of > > whatever kind, without trying to enforce any kind of "fairness" or > > "equality" between CHKs and SSKs. > > > > > "SSK not found" is a useful information indeed; but the exactly same > > > event > > > > repeated 8100 times brings extremely little additional entropy > > > > Very true - if the same client is really requesting the same key 8100 > > times then the client should be fixed (eg modified to use exponentially > > increasing delays when re-requesting a failed key). But we shouldn't try > > to fix the problem by rejecting (anonymous!) requests further down the > > line. How do we know it's one client making 8100 requests, not 810 > > clients making 10 requests each? > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.10 - > > 20:36:10GMT ----- > > > > > Can you explain why you think accepting more SSK requests will make the > > > SSK > > > > success ratio worse? > > > > Generally, a set of requests for the same key succeeds only once; during > > moderately long time period (cache lifetime, probably few days to a > > couple of weeks) maximum (N-1) times where N is number of peers, provided > > that it was not the current node that inserted the key. ALL the previous > > requests (made before content is available) fail, and if you do more > > unsucessful requests you decrease success ratio. > > > > I see your rationale for a request made in search for content that was on > > network for some time already, and thus will be either found quickly, or > > after few attempts the original requestor will cease searching for it > > (and hopefully will insert it later, after FEC recovery, if applicable). > > But such behaviour is typical for CHK keys; while observation for SSK > > requests behaviour shows very different pattern: keys missing in network > > getting requested repeatedly, often without facing success at the end. > > > > > But I don't agree. :-) > > > > Ugh :-) > > > > > I don't think we should try to treat them equally in any way. We should > > > just > > > > accept as many requests as possible, of whatever kind, without trying to > > enforce any kind of "fairness" or "equality" between CHKs and SSKs. > > > > Doesn't that really means equal treating, or in other words: processing > > without prejudice? > > > > > How do we know it's one client making 8100 requests, not 810 clients > > > making > > > > 10 requests each? > > > > That makes no real difference for the request processing node. If it had > > the requested data, it would respond (at most N-1 times) and would no > > longer see requests for that particular key for days, due to peer caching > > (provided the node does not have malicious peers). If the requests for > > the same key get repeated again and again, that basically wastes traffic > > - and so if node works at its bandwidth limit, the limited traffic should > > better be used for work that is statistically expected to be more useful. > > > > Fixing apps is always a good thing. But there are 2 problems: first, in > > anonymous network it is supposed to be extremely hard to find out source > > node or software making the requests put aside actual bug tracking; > > second, the repeating requests could be malicious - and presense of > > spammer on the network is not sufficient reason for many other nodes to > > silently submit a share of bandwidth for the spammer. > > > > With CHK keys that reserve 15 times higher bandwidth, the spam rate drops > > quickly with each hop; but the small SSK keys tends to travel 15 times > > further (over overloaded links; no difference if all nodes are > > underloaded) making SSK flood attack, while not fatal or noticeable > > without targetted network profiling, quite more damaging for network > > performance as a whole. > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.11 - 14:30:49GMT > > ----- > > > > > Generally, a set of requests for the same key succeeds only once > > > > OK, I see your point, if there are lots of redundant requests from the > > same client we can improve the success ratio by rejecting most of them. > > But we shouldn't do it, for two reasons: > > > > First, as you pointed out, in an anonymous network we don't know how many > > requests came from each client, so we don't know whether the requests are > > redundant. > > > > Second, and more importantly, rejecting a request has side-effects: it > > causes the client to slow down, and it causes the request to go > > elsewhere. Rejections are interpreted as a sign of overload, just like > > packet losses in TCP - rejecting a request because the data's not in the > > network is like dropping a TCP packet instead of returning a 404 error. > > OK, it saves a bit of bandwidth, but it confuses the client. > > > > It *might* make sense to return DataNotFound if the same key was recently > > searched for and not found, but as Toad has pointed out that might make > > it possible to blacklist certain keys (if you get a request for a key you > > want to censor, return DataNotFound and all the upstream nodes will cache > > the result). > > > > > keys missing in network getting requested repeatedly, often without > > > facing > > > > success at the end. > > > > Nevertheless, "DataNotFound" is useful information. It's worth spending > > bandwidth to find out that a key is not in the network. > > > > > Doesn't that really means equal treating, or in other words: processing > > > > without prejudice? > > > > Yes, processing without prejudice is what I meant - I thought you meant > > "equal quotas" rather than "equal opportunities". Oh god, I hope Lomion's > > not reading this thread. :-) > > > > > If the requests for the same key get repeated again and again, that > > > > basically wastes traffic > > > > Maybe so, but rejecting requests is not the solution because it has > > side-effects. Can you think of a way to cache the results of recent > > searches without making it possible to blacklist certain keys? > > > > > presense of spammer on the network is not sufficient reason for many > > > other > > > > nodes to silently submit a share of bandwidth for the spammer. > > > > But your solution (preemptively rejecting SSK requests) would be even > > worse - if one person started spamming, everyone's SSK requests would > > start getting rejected. Everyone except the spammer would obey the > > request throttle, and pretty soon the spammer would be the only one > > sending SSK requests. > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.14 - > > 15:43:40GMT ----- > > > > > in an anonymous network we don't know how many requests came from each > > > > client, so we don't know whether the requests are redundant. > > > > Yes but that does not change anything for an overloaded node. On the > > other hand, as toad pointed out, redundant request can be caught on > > request sending end, rather than request receiving end - which saves more > > bandwidth. So only spammer's direct peers will be at disadvantage, which > > is acceptable (moreover, it is fair that those supporting spammers get > > hit most). > > > > > Second, and more importantly, rejecting a request has side-effects: it > > > > causes the client to slow down, and it causes the request to go > > elsewhere. Rejections are interpreted as a sign of overload > > > > Exactly. If forwarding node is not overloaded, it should not reject > > requests. If client that caused other node(s) gets slowed down something, > > just makes things more 'fair'. > > > > > Maybe so, but rejecting requests is not the solution because it has > > > > side-effects. Can you think of a way to cache the results of recent > > searches without making it possible to blacklist certain keys? > > > > Rejecting requests during overload is the only possibility (aside node > > harakiri, maybe). As far as algorithm goes, here The Bitch Algorithm is > > proudly presented: :) > > > > - A node seeing request for a key which recently failed, responds with > > DNF, with a little but important postscriptum attached: "and BTW do not > > ever call me again". (Technically: "installed a listener for the key"). > > The request sending node(s) memorizes that. > > > > - If such listener already existed, then the request sending node > > misbehaving, and should be penalised. Seriously. > > > > - The request sender node might try to search the key with other peers, > > but do not that aggressively. > > > > - The request sender key listener queue is of limited length. If request > > sender needs to eject queue element to conserve memory (or my be just > > having the element timed out), it sends a notice "was not even going to" > > ("Listener lease expired"). This allows (but not forces) request > > receiving node to clear that element from its queue. > > > > - The request receiver key listener queue is of limited length too. So if > > request receiving node decides to eject a queued element, it sends a > > notice: "OK maybe if you call me again we could chat a little bit". This > > gets propagated back the same as the "and BTW do not ever call me again" > > reply allowing to clear the similar queue element on other nodes that > > searched the key, and/or retry the request (likely using different peer). > > > > - The most interesting thing: if a node sees a key insert and/or key > > transfer as result of request, it sends a duplicate inserts to all the > > peers who are waiting they key for (as they need the key) with a virtual > > comment "here is your stupid toy, I just bought a similar for myself", > > and also creates new inserts directed to all the peers the key was > > requested from with virtual comment "please do not be sad and keep that > > toy, I just got an identical toy: just have a look" - as other nodes are > > most likely will be searching the key in the same direction. > > > > Of course such listener mechanism will form a tree graph, located over > > network links, with root being located at the node that (at least > > locally) has closest location to the location of the packet. > > > > Thinking of the stats I saw, I would say that just 100-200 SSK keys per > > listener queue could reduce SSK traffic as much as 30-50% maybe (thanks > > to the good location distribution), while having those listened keys, if > > they ever listened, get appear on the original requestor node ASAP. Even > > if I am overly optimistic and decrease is just 10%, that still alot > > considering the current SSK flood. > > > > Another note is that application sending requests via FCP should see the > > bitch response distinguishable from plain DNF, so that non-maliciuous > > software could cooperate with nodes to keep useless traffic smaller by > > issuing requests that are unlikely to succeed less frequent. > > > > > But your solution (preemptively rejecting SSK requests) would be even > > > > worse - if one person started spamming, everyone's SSK requests would > > start getting rejected. Everyone except the spammer would obey the > > request throttle, and pretty soon the spammer would be the only one > > sending SSK requests. > > > > - aggressive throttling during overload ensures that malicious > > excessively high-volume traffic does not generally travel too far from > > the spammer's own node. > > - I had not yet proposed to throttle traffic by node that has idle > > resources (but The Bitch Algorithm is well applicable for non-overloaded > > nodes, too). > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.14 - > 15:43:40GMT ----- > > > in an anonymous network we don't know how many requests came from each > > client, so we don't know whether the requests are redundant. > > Yes but that does not change anything for an overloaded node. On the other > hand, as toad pointed out, redundant request can be caught on request > sending end, rather than request receiving end - which saves more > bandwidth. So only spammer's direct peers will be at disadvantage, which is > acceptable (moreover, it is fair that those supporting spammers get hit > most). > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.15 - 23:42:54GMT ----- > > Okay, this is an interesting argument. What you are proposing, then, is > this: - When we get a DNF for a specific key requested by a specific peer, > we remember it. > - For X period afterwards, if that peer asks for that key, we DNF it. Or > better, we send a fatal response other than DNF which indicates that we > aren't passing the request on because of it being requested recently by the > same peer. > > This might work. The classic freenet 0.5 solution was the failure table. We > removed it. The reason for that is that it wasn't per-peer, and could be > attacked, and in fact generated bad behaviours all by itself: > > Lets say the cooldown is 30 minutes. > At t=0, request the key. Since this has not been requested yet, it will go > all the way. When it DNFs, each node along the route will remember to DNF > it. At t=10 minutes, somebody else requests the key. He gets a DNF. The > nodes along that route also remember this. > At t=30.1 minutes, the key is now fetchable along the original path. We > fetch it again. Because of the vaguaries of routing, it gets routed to one > of the keys touched by the middle request. So it DNFs again, and the whole > chain remembers this. > > You can get some very nasty effects. If however it is built in such a way > that it doesn't propagate (it returns a failure other than DNF, or a flag > on the DNF, to indicate that it must not be propagated), it could be a > useful tool. > > Should it be per-peer? It limits the impact to a degree, but a full DNF > will involve many nodes... > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.25 - > 12:07:23GMT ----- > > The key point is that once the previously key appears in network, it gets > propogated toward original requestor almost instantly. > > With such property, keeping the tables per-peer is not needed much (only > needed to know where duplicate inserts should be directed too, once/if the > key appears). > > Whenever the bitch record times out thus breaking virtual search backpath, > the following happens: > > If the key still not on network: > > - the new request will bump into the next node with non timed out bitch > record, thus restoring the backpath (maybe having it more optimal at this > time). > > If the key appears on network: > > - the key (its duplicate) will get inserted along the remaining backpath, > up to the timed out node; > - the next request for the key will most likely hit the remaining backpath > (if not the ideal key storage node), and will quickly return the key from > cache/store. > > Another important point is that assuming that routing mostly works > correctly, majority of the bitch records will be residing on nodes with > locations very close to the locations of the sought keys, which is going to > enable certain performance and data structure coding and network global > optimizations. If on the other hand routing malfunctions (malicious request > sending node?), the proposed alorithm will not get things any worse.
----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.25 - 17:08:32GMT ----- Okay, you may have a point here. I don't think we should subscribe unless the request has a flag saying we actually WANT to subscribe, and in that case, we shouldn't add to the failure table unless we are going to subscribe. I will post a variant on it as a proposal for lightweight passive requests on the list and here. I will include your original thread. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/tech/attachments/20070808/67fdb19a/attachment.pgp>