On Wednesday 08 August 2007 16:06, Matthew Toseland wrote: > On Wednesday 08 August 2007 16:05, Matthew Toseland wrote: > > Continued at end to minimise confusion! > > > > On Wednesday 08 August 2007 15:59, Matthew Toseland wrote: > > > Unfortunately this thread is rather rambling, it includes lots of > > > discussion on token passing as well as the original premise. > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.25 - > > > 20:18:36GMT ----- > > > > > > I made some measurements on how freenet node behaves if bandwidth limit > > > is set low: 10KBps and downto 6KBps (specificially, input bandwidth > > > limit; output bandwidth limit was set to at least 15KBps but as > > > expected factually used output bandwidth is comparable (just slightly > > > above) with factually used input bandwidth). The node itself was > > > running frost but no uploads/downloads, so absolute majority of network > > > traffic was forwarded CHK/SSK > > > requests/inserts. > > > > > > Results are interesting enough: CHK traffic becomes as low as 5% (in > > > packets) of CHK+SSK, while at least 92% of SSK requests were not > > > satisfied for assorted failures (plus quite some more certainly > > > resulted in NotFound response due to missing the key in whole network, > > > but I don't have the number). This makes low traffic node working > > > highly inefficient and improportionally slow; this also slows down its > > > peers with all the extra reject traffic. Worse, input bandwidth > > > sometimes goes over set limit, suggesting that on hardware 33600/56000 > > > Kbps modem and even ISDN things will just get worse due to increased > > > delays. > > > > > > Another thing to note: low bandwidth node (LBN) almost exclusively > > > reject requests with "input bandwidth liability" reason, and extremely > > > rarely other reasons. > > > > > > Speculating a bit, the same picture will likely be observed for peers > > > of fast node (1Mbps or more) with many peers having typical home > > > connection of 256Kbps or less. > > > > > > Not sure if simulations ever showed anything like this, but > > > contributing to network mostly SSK service (and absolute majority of > > > SSK requests fail!) is rather useless: optimally working network is > > > supposed to transfer at least one CHK block for each SSK key, and > > > typically much much more (single 10MB file consists of 481 CHK > > > blocks!), and even if you found SSK but not CHK the SSK points to, then > > > you failed to find information you requested. > > > > > > OK to make the long story short[er], at the end of this message you > > > will find a small patch that noticably improves LBN situation. Idea is > > > to reserve some bandwidth for CHK transfers (and SSK inserts, as those > > > are too rare to penalize, and more valuable than requests). The line > > > directly before the inserted one implicitly penalizes CHK transfers (as > > > much smaller SSK requests tend to rereserve bandwidth the next moment > > > it got released after CHK transfer finish, while much larger CHK > > > requests do not have such good chance), so bandwidth should be reserved > > > for 2 CHKs at least (and tests show that's enough to make a > > > difference). > > > > > > Another thing I tried was increasing the 90 seconds period up to 120. > > > That had some (no numbers here; just "noticeable but small") positive > > > effect on making traffic smoother and staying closer to set limit, > > > without jumping up and down too much. Where the 90 seconds number came > > > from anyway, and how dangerous 120 could be? > > > > > > Some pros observed and/or thought out during tests of the patch: > > > - I observe increase of output payload by approx. 15% (of total > > > traffic), making LBN more useful for its peers. > > > - the change is negligibly small for faster nodes so should not break > > > anything globally. > > > - entire network SSK flood traffic will be toned down a little bit (at > > > temporary overloaded nodes only), additionally simplifying life for > > > LBNs: after all, requesting the same SSK every 15 seconds for 35 hours, > > > total 8100 times (factual numbers from one of the test before this > > > patch applied; there are over 60 other SSKs that were requested more > > > than 1000 times during the same period) is just way too much, SSKs are > > > not inserted into network THAT fast. [does it worth to remember > > > recently seen SSK requests, and do not forward them if same request was > > > already forwarded within last 10 minutes and resulted in DNF/RNF? Table > > > of recently requested SSKs that are closest to the node location should > > > not be too big?]. > > > > > > And contras: > > > - in exceptional conditions (specificially, with less than 2 incoming > > > CHK requests per 90 seconds; factually I observe 2-7 CHK requests per > > > seconds, that's 180-630 per 90 seconds) notwithstanding node bandwidth > > > speed, up to 800 Bps might end being unused. For high bandwidth node > > > that's just way too small to notice, for LBN that's still acceptable > > > (10% of 56Kbps) and will decrease roundtrip delays a bit which is > > > always a good thing for so slow links. > > > > > > Other notes: > > > - distribution of location closeness/number of SSK requests is very > > > nice: only SSK requests with location very close to node location get > > > repeated frequently; farther SSK location is, less requests the node > > > sees, with those SSKs seen only once or two times per 1-2 days period > > > are distributed evenly among location space. This suggests that routing > > > is working fine. - As far as I understand, if input bandwidth > > > limit/liability exceeded (but a packet already received anyway), > > > CHK/SSK request gets instantly rejected (thus throwing out received > > > bytes while input bandwidth has no spare volume!); only otherwise node > > > checks if the requested key exists in the storage. Heh? This feels like > > > a serious bug hurting overall network performance: better query storage > > > and hopefully send back result (or still reject if the key not found > > > locally) rather than wait for retry request to waste more input > > > bandwidth. At least for SSK reject and reply are comparable in output > > > bandwidth usage, so worth a little delay in response. Or do I miss > > > something? > > > > > > === > > > diff --git a/freenet/src/freenet/node/NodeStats.java > > > b/freenet/src/freenet/node/NodeStats.java > > > index 3b091b4..fb9f8b9 100644 > > > --- a/freenet/src/freenet/node/NodeStats.java > > > +++ b/freenet/src/freenet/node/NodeStats.java > > > @@ -414,6 +414,7 @@ public class NodeStats implements Persistable { > > > > > > successfulChkInsertBytesReceivedAverage.currentValue() * > > > node.getNumCHKInserts() + > > > > > > successfulSskInsertBytesReceivedAverage.currentValue() * > > > node.getNumSSKInserts(); > > > bandwidthLiabilityInput += getSuccessfulBytes(isSSK, > > > isInsert, true).currentValue(); > > > + if (isSSK && !isInsert) > > > bandwidthLiabilityInput+=successfulChkFetchBytesReceivedAverage.current > > >Va lu e()+successfulChkInsertBytesReceivedAverage.currentValue(); // > > > slightly penalize SSK requests by reserving bandwidth for 2 additional > > > CHK transfers (or SSK inserts if any) > > > double bandwidthAvailableInput = > > > node.getInputBandwidthLimit() * 90; // 90 > > > seconds at full power > > > if(bandwidthLiabilityInput > bandwidthAvailableInput) { > > > === > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.26 - 16:56:59GMT > > > ----- > > > > > > Most SSK requests fail. They DNF. The reason for this is most SSK > > > requests are polling for data that has not yet been inserted. > > > > > > Bandwidth liability is usually the main reason for rejection. If we > > > reach most of the other reasons, there is a problem - usually a > > > cyclical problem. The main reason for it is to ensure that we don't > > > accept so many requests that some of them needlessly timeout even > > > though they succeeded. The timeout is 120 seconds, so we need the > > > actual transfer to take less than this; on a request, 30 seconds seems > > > a reasonable upper bound for the search time. We don't throw out many > > > bytes when we reject a request/insert because the bulk of it hasn't > > > been sent yet, except with SSKs where typically a little under half of > > > the total bytes will have been moved. Ideally we wouldn't send requests > > > until we have a good idea that they will be accepted, but token passing > > > load balancing is a long way off, not likely to happen for 0.7.0. > > > > > > We cannot control input bandwidth usage precisely. > > > > > > Any more info on SSK flooding? Is it simply Frost? > > > > > > We can add a failure table, we had one before, however a failure table > > > which results in actually blocking keys can be extremely dangerous; > > > what I had envisaged was "per node failure tables" i.e. reroute > > > requests which have recently failed to a different node since we know > > > it isn't where it's supposed to be. > > > > > > On what do you base the assertion about key closeness? It would be nice > > > to have a histogram or circle on the stats pages showing recent keys on > > > the keyspace - can you write a patch? > > > > > > As far as your patch goes, surely rejecting more SSK requests would be > > > counterproductive as it wastes bandwidth? Shouldn't a slow node accept > > > those requests it's likely to be able to handle? > > > > > > I can see an argument that we shouldn't prefer SSKs, and that on slow > > > nodes we do prefer SSKs... I'm not sure the above is the right way to > > > deal with it though. The effect of the patch would be to never accept > > > any SSKs unless we have plenty of spare bandwidth, correct? > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.26 - > > > 18:41:32GMT ----- > > > > > > > Ideally we wouldn't send requests until we have a good idea that they > > > > will > > > > > > be accepted, but token passing load balancing is a long way off, not > > > likely to happen for 0.7.0. > > > > > > Well, even current algorithm implementation has certain room for > > > improvement. Here is the typical numbers I observe: > > > > > > === > > > unclaimedFIFO Message Counts > > > * FNPRejectOverload: 89 (45.2%) > > > * FNPInsertTransfersCompleted: 59 (29.9%) > > > * FNPDataNotFound: 15 (7.6%) > > > * packetTransmit: 12 (6.1%) > > > * FNPRejectLoop: 7 (3.6%) > > > * FNPAccepted: 6 (3.0%) > > > * FNPSwapRejected: 4 (2.0%) > > > * FNPDataInsertRejected: 4 (2.0%) > > > * FNPRouteNotFound: 1 (0.5%) > > > * Unclaimed Messages Considered: 197 > > > === > > > > > > FNPRejectOverload always stays at top sometimes with hundreds messages > > > (for the last hour before unclaimed messages expire, that's alot), and > > > so indicates that there is some bug (or bugs) with bandwidth limiting > > > obeying. > > > > > > > Any more info on SSK flooding? Is it simply Frost? > > > > > > Not local frost for sure, it generates just several SSK simultaneous > > > requests (by default max 8: 6 for boards plus 2 for filesharing, AFAIR; > > > practically 2-4 simutaneous requests most of the time). Other 100 SSK > > > requests (without proposed patch) are forwarded ones. > > > > > > >We can add a failure table, we had one before, however a failure table > > > > which > > > > > > results in actually blocking keys can be extremely dangerous; > > > > > > Is it, having timeout of max few minutes (i.e. at least few times less > > > than SSK propagation time visible with frost messages)? Is it more > > > dangerous than current wastage of bandwith for same SSK key requests > > > several times per minute? Had some simulations been done on that in the > > > past? > > > > > > BTW, isn't the observed very low store hit rate results from > > > prioritising the likely-to-fail SSKs? > > > > > > BTW2 the failure table could also act as a targetted content > > > propagation mechanism: if a node sees SSK insert for a temporary > > > blacklisted (non-existing) SSK, then forwarding the insert (more likely > > > insert copy, for security reasons and routing sake) to the original > > > requestor should speed up propagaton of new SSKs toward the nodes that > > > already > > > anticipate/await for them. > > > > > > >what I had envisaged was "per node failure tables" i.e. reroute > > > > requests > > > > > > which have recently failed to a different node since we know it isn't > > > where it's supposed to be. > > > > > > At a glance, very nice idea. But LBNs typically answer with reject, not > > > DFN... even with current code. Probably such rerouting will even > > > further increase SSK traffic toward LBNs, and get sharply increased > > > volume of SSK rejects as result. Hmm, some testing/simulation seems > > > really needed here. > > > > > > >On what do you base the assertion about key closeness? It would be > > > > nice to > > > > > > have a histogram or circle on the stats pages showing recent keys on > > > the keyspace - can you write a patch? > > > > > > Mmmm... in fact I just added custom logging, then a wild combination of > > > grep/sed/sort/uniq to analyze the logs. But let me think, maybe > > > visualizing a couple of stats files I operate with will be rather > > > trivial... > > > > > > But I would rather stay away from stats page graphics at this time, as > > > the stats files I operate (filtered+sorted+uniqued) with are rather > > > large, 20-50Mb each - too much memory for the toy. Unless your 'recent' > > > means just 10-15 minutes at most? > > > > > > >As far as your patch goes, surely rejecting more SSK requests would be > > > > > > counterproductive as it wastes bandwidth? > > > > > > Tests show the opposite: without the patch payload output at stats page > > > never exceeded 38%, with patch it becomes 53% or little more after > > > several minutes upon node restart. So, with the patch SSK/CHK > > > forwarding behaviour 'feels' logical: > > > > > > without patch: > > > - just several CHKs, and over over 100 SSKs very typical. > > > > > > with patch: > > > - most of the time (say, 75%) number of currently forwarded CHK > > > requests+inserts approximately equals to the number of SSK > > > requests+inserts (i.e. 10-25 each, depending on set bandwidth limit); > > > - sometimes (say, 10%) CHK requests start to prevail, but current SSK > > > requests+inserts seems never go below the amount which CHK get at max > > > without patch (i.e. 6-10). This is very typical when number of CHK > > > inserts gets several times higher than CHK requests (close fast peer > > > inserts something really large?). > > > - other times (say, 15%) CHK requests+inserts flow does not saturate > > > bandwidth, and number of SSK requests quickly climbs to 50 or even over > > > 100+ as it typically gets without the patch. > > > > > > That's for LBN. Raising input bandwidth allotment, number of SSKs > > > quickly grows resembling the situation without the patch. > > > > > > So that's why I suggest reserving bandwidth for 2 CHK transfers; 3 > > > would kill SSKs, 1 still makes SSKs to seriously prevail over CHKs (but > > > nonetheless gives quite better ratio, so is a legal value to try if the > > > value of 2 alarms you too much). Just, in case of reserving bandwidth > > > for 1 extra CHK the proposed patch is not really needed: simply comment > > > out the line starting with "bandwidthLiabilityInput +=" and decrease 90 > > > seconds constant to 80 (10 seconds is roughly how much 33.6Kbod modem > > > takes to transmit a single CHK - using anything noticeably slower than > > > 28800/33600bod for freenet will not ever work well anyway). > > > > > > >Shouldn't a slow node accept those requests it's likely to be able to > > > > handle? > > > > > > Considering the very high chance of SSK request failures (at lest 92%), > > > I would say the answer is no. But with sane SSK failure rate (say 75% > > > or below) SSK requests would likely not waste the limited thus precious > > > LBN bandwidth so fruitlessly. > > > > > > The problem, in my belief, is too small size of UDP packets if SSK > > > requests prevail: PPP(oE)/TCP/FNP overhead becomes too large while > > > LBNs, unlike faster link nodes, almost never coalesce packets, > > > obviously. > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.27 - 17:19:24GMT > > > ----- > > > > > > The current algorithm is working, on most nodes, far better than it has > > > in *ages*. I'm at 62% of a 700MB ISO, I started inserting it yesterday > > > morning, and only a few of my peers are backed off - frequently none > > > are backed off, right now it's 11 connected, 6 backed off, which is > > > more backed off than I've seen for quite a while. > > > > > > Re failure tables: Yes it is extremely dangerous. It can result in > > > self-reinforcing key censorship, either as an attack or just occurring > > > naturally. This happened on 0.5. And the hit ratio is only for CHKs > > > iirc. > > > > > > Even LBNs don't often send local RejectedOverload's on SSKs *once they > > > have accepted them*. They may relay downstream RO's but that is not > > > fatal. And if they reject some requests, so what, it's a slow node, > > > it's bound to reject some requests with the current load balancing > > > system. > > > > > > 10-15 minutes would be interesting. We already show a circle and > > > histogram of nearby nodes from swapping and of our peers, you'd just > > > have to add another one. It would be good to have a visual proof that > > > routing is working on the level of adhering to node specialisations. I > > > didn't expect it to be working given the load: I'm surprised that it > > > does, it's an interesting result. > > > > > > Packet size has nothing to do with it, ethernet has a 1472 byte > > > maximum. Dial-up has 576 bytes max, but we ignore it, and use > > > fragmented packets (this sucks, obviously, as it greatly increases the > > > chance of losing a packet and having to retransmit it). > > > > > > Please explain why the patch doesn't result in never accepting a single > > > SSK? > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.27 - > > > 19:31:14GMT ----- > > > > > > >Packet size has nothing to do with it, ethernet has a 1472 byte > > > > maximum. > > > > > > Dial-up has 576 bytes max, but we ignore it, and use fragmented packets > > > (this sucks, obviously, as it greatly increases the chance of losing a > > > packet and having to retransmit it). > > > > > > I am talking about typical/average packet size, not MTU. LBNs, unlike > > > faster nodes, rarely have a chance to coalesce reject responses (over > > > max 100ms), and thus send improportionally more tiny packets resulting > > > in much higher protocols overhead. Thus having LBNs to mostly cater > > > SSKs not CHKs results in lowest imaginable usefulness of LBNs for > > > network as a whole. > > > > > > BTW in my experience typical/default dialup/PPP MTU is 1500 minus link > > > level headers, like ethernet. 576 is a reasonable adjustment for > > > interactive traffic like ssh but I fail to remember if it was used as > > > default since the time the super fast 28800 bod modems became common. > > > :) 1400+ is the typical size of GPRS PPP packets too, and the same > > > holds true for other popular wireless mediae like BlueTooth or WiFi; so > > > I have no concerns regarding IP fragmentation. > > > > > > > Please explain why the patch doesn't result in never accepting a > > > > single SSK? > > > > > > I can not. :) Can you explain why the current code that penalizes CHKs > > > still gives 5% for them, even if CHKs are 25 times larger and similarly > > > less frequent so have really hard time to arrive at the exact moment > > > when bandwidth liability is not maxed out? > > > > > > Seriously, I believe that goes with 2 facts: > > > > > > - SSK requests are much more frequent, so any temporary drop of CHK > > > requests level enables node to quickly get a bunch of new SSKs accepted > > > for processing; > > > - the large CHK requests (at times while they prevail over SSKs) tend > > > to hit other limits too, like "output bandwidth liability", > > > "Insufficient input/output bandwidth" throttles. Then the small SSK > > > requests quickly pick up all the remaining bandwidth bits. > > > > > > But currently I do not have relevant statistics to prove that. > > > > > > Anyway, please commit the following patch - it should equal out > > > bandwidth rights for CHKs and SSKs at least half way toward > > > fair/expected distribution (and the change will make any difference for > > > high-/over-loaded nodes only). Once most of my peers (and their peers) > > > update, I will study the new node traffic forwarding efficiency and > > > behavior at different bandwidth limits and with different penalization > > > levels again - and then will be in better position to prove the > > > original proposal of reserving bandwidth for 2 CHKs is optimal (or > > > maybe withdraw it). > > > > > > === > > > diff --git a/freenet/src/freenet/node/NodeStats.java > > > b/freenet/src/freenet/node/NodeStats.java > > > index 3b091b4..98c82c3 100644 > > > --- a/freenet/src/freenet/node/NodeStats.java > > > +++ b/freenet/src/freenet/node/NodeStats.java > > > @@ -399,9 +399,8 @@ public class NodeStats implements Persistable { > > > > > > successfulSskFetchBytesSentAverage.currentValue() * > > > node.getNumSSKRequests() + > > > > > > successfulChkInsertBytesSentAverage.currentValue() * > > > node.getNumCHKInserts() + > > > > > > successfulSskInsertBytesSentAverage.currentValue() * > > > node.getNumSSKInserts(); > > > - bandwidthLiabilityOutput += getSuccessfulBytes(isSSK, > > > isInsert, false).currentValue(); > > > double bandwidthAvailableOutput = > > > - node.getOutputBandwidthLimit() * 90; // 90 > > > seconds at full power; we have to leave some time for the search as > > > well + node.getOutputBandwidthLimit() * 80; // 80 > > > seconds at full power; we have to leave some time for the search as > > > well bandwidthAvailableOutput *= > > > NodeStats.FRACTION_OF_BANDWIDTH_USED_BY_REQUESTS; > > > if(bandwidthLiabilityOutput > bandwidthAvailableOutput) > > > { preemptiveRejectReasons.inc("Output bandwidth liability"); > > > @@ -413,9 +412,8 @@ public class NodeStats implements Persistable { > > > > > > successfulSskFetchBytesReceivedAverage.currentValue() * > > > node.getNumSSKRequests() + > > > > > > successfulChkInsertBytesReceivedAverage.currentValue() * > > > node.getNumCHKInserts() + > > > > > > successfulSskInsertBytesReceivedAverage.currentValue() * > > > node.getNumSSKInserts(); > > > - bandwidthLiabilityInput += getSuccessfulBytes(isSSK, > > > isInsert, true).currentValue(); > > > double bandwidthAvailableInput = > > > - node.getInputBandwidthLimit() * 90; // 90 > > > seconds at full power > > > + node.getInputBandwidthLimit() * 80; // 80 > > > seconds at full power > > > if(bandwidthLiabilityInput > bandwidthAvailableInput) { > > > preemptiveRejectReasons.inc("Input bandwidth > > > liability"); > > > return "Input bandwidth liability"; > > > === > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.28 - 17:05:53GMT > > > ----- > > > > > > Why does assuming 80 seconds instead of 90 help? I would have expected > > > it to make the situation worse. > > > > > > Isn't what you want to increment the value you are multiplying the CHK > > > byte counters by if the request is an SSK? In any case I'm not > > > convinced - we accept 32x as many SSKs as CHKs precisely because they > > > use 32x less bandwidth. As far as I can see incrementing the CHK counts > > > but only on a CHK would just result in us never accepting an SSK... > > > > > > But by all means continue to investigate. > > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.04.30 - > > > 19:36:36GMT ----- > > > > > > > we accept 32x as many SSKs as CHKs precisely because they use 32x > > > > less > > > > > > bandwidth. > > > > > > Sorry but I don't understand the rationale behind this. It seems to be > > > based on the assumption that equal resources should be allocated to > > > SSKs and CHKs, regardless of whether there's equal demand for > > > resources. If we're only getting, say, 16 times as many SSK requests as > > > CHK requests, would we reject CHK requests to keep things "fair"? > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.02 - 16:13:52GMT > > > ----- > > > > > > Why should CHKs be prioritised over SSKs? > > > > > > What do you think of the patch I committed anyway? > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.02 - > > 17:03:52GMT ----- > > > > > Why should CHKs be prioritised over SSKs? > > > > Because SSK success ratio is extremely low (remember the example I gave > > within initial message in the thread: 8100 rejected requests for the same > > SSK key within 35 hours, and uncounted amount of the same key requests > > resulted in DNF - and such key is nowhere near being alone); linear > > programming would certainly suggest that SSK requests like they currently > > are should be totally disabled. I do not propose anything as drastic; but > > as SSKs currently use bandwidth very inefficiently I propose to tone them > > down heuristically to approximately CHK level while node is short on > > bandwidth (but let SSKs go as much as sending node needs if spare > > bandwidth available). > > > > [not that I meant to speak instead of mrogers] > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.03 - 13:54:38GMT > > ----- > > > > >> Why should CHKs be prioritised over SSKs? > > > > > > Because SSK success ratio is extremely low > > > > That doesn't make sense for two reasons: > > > > 1) rejecting SSKs will make the SSK success ratio worse, not better > > 2) "SSK not found" is useful information - for example, that's how you > > discover the current version of a USK - but "SSK rejected for not being a > > CHK" is not useful to anyone > > > > Let me use an analogy: you're designing a Cisco router. Some of the > > packets it gets asked to forward will be small SSH packets, others will > > be large HTTP packets. Do you say "half our resources should go to SSH > > because we don't want to prioritise HTTP, so we'll only forward one HTTP > > packet for every 32 SSH packets"? If you answered yes, you just lost your > > job at Cisco. The router's job is to deliver packets to the best of its > > ability, not to decide what kind of packets the end hosts "should" be > > sending. > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.03 - > 15:56:54GMT ----- > > 1) accepting more SSK requests will make SSK success ratio worse, not > better. If that done in addition to CHK requests (i.e. node bandwidth stays > well within desired limits), not a big deal; but if the resultless SSK > traffic starts limiting CHK (i.e. real data) transfer, that's a problem. > > Fact is that SSK key is known well before SSK content, and gets rerequested > repeatably on practice. My profiling of network indicates that even with > datastore 10 times larger then default, less than 0.05% (and who knows how > much less) of SSK requests (not SSK keys!) are currently getting satisfied > by underloaded node; even worse for overloaded node (do you have different > numbers?). With such miserable SSK requests success rate I see alot of > sense to penalize them; once the SSK requests flood situation improves (and > there certainly should be SSK centric bugs hiding or improvements pending > to either improve SSK requests flow or avoid using SSKs where they are not > optimal construct) I will quite possibly and readily change my opinion that > underprioritising SSK requests is acceptable and good. But the SSK flood > problem does not look like a low hanging fruit at all, expect it to take > months if not years for major improvements. > > But please do not argue with me that equal treating of SSK and CHK requests > based on their count not length is good: I totally agree; my point is that > for the currently observed network behaviour that is nowhere being the only > good choice, and in short-to-middle term not neccessarily the best one. > > 2) "SSK not found" is a useful information indeed; but the exactly same > event repeated 8100 times brings extremely little additional entropy (basic > fact of information theory), and mostly means wasted bandwidth on practice. > Then again, if the node is not overloaded at the moment, I have nothing > against repeating the DNF using otherwise idle bandwidth - even 0.001 bit > of entropy per extra SSK request is at least a mediocre (and maybe a quite > good) argument against transferring exactly 0 bits of entropy (i.e. keeping > network link idle). > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.04 - 11:03:09GMT > ----- > > Can you explain why you think accepting more SSK requests will make the SSK > success ratio worse? A rejected request will fail to reach some parts of > the network (at the minimum it won't reach the node that rejected it), so > it's more likely to fail. > > > But please do not argue with me that equal treating of SSK and CHK > > requests > > based on their count not length is good: I totally agree > > But I don't agree. :-) I don't think we should try to treat them equally in > any way. We should just accept as many requests as possible, of whatever > kind, without trying to enforce any kind of "fairness" or "equality" > between CHKs and SSKs. > > > "SSK not found" is a useful information indeed; but the exactly same > > event > > repeated 8100 times brings extremely little additional entropy > > Very true - if the same client is really requesting the same key 8100 times > then the client should be fixed (eg modified to use exponentially > increasing delays when re-requesting a failed key). But we shouldn't try to > fix the problem by rejecting (anonymous!) requests further down the line. > How do we know it's one client making 8100 requests, not 810 clients making > 10 requests each? > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.10 - > 20:36:10GMT ----- > > > Can you explain why you think accepting more SSK requests will make the > > SSK > > success ratio worse? > > Generally, a set of requests for the same key succeeds only once; during > moderately long time period (cache lifetime, probably few days to a couple > of weeks) maximum (N-1) times where N is number of peers, provided that it > was not the current node that inserted the key. ALL the previous requests > (made before content is available) fail, and if you do more unsucessful > requests you decrease success ratio. > > I see your rationale for a request made in search for content that was on > network for some time already, and thus will be either found quickly, or > after few attempts the original requestor will cease searching for it (and > hopefully will insert it later, after FEC recovery, if applicable). But > such behaviour is typical for CHK keys; while observation for SSK requests > behaviour shows very different pattern: keys missing in network getting > requested repeatedly, often without facing success at the end. > > > But I don't agree. :-) > > Ugh :-) > > > I don't think we should try to treat them equally in any way. We should > > just > > accept as many requests as possible, of whatever kind, without trying to > enforce any kind of "fairness" or "equality" between CHKs and SSKs. > > Doesn't that really means equal treating, or in other words: processing > without prejudice? > > > How do we know it's one client making 8100 requests, not 810 clients > > making > > 10 requests each? > > That makes no real difference for the request processing node. If it had > the requested data, it would respond (at most N-1 times) and would no > longer see requests for that particular key for days, due to peer caching > (provided the node does not have malicious peers). If the requests for the > same key get repeated again and again, that basically wastes traffic - and > so if node works at its bandwidth limit, the limited traffic should better > be used for work that is statistically expected to be more useful. > > Fixing apps is always a good thing. But there are 2 problems: first, in > anonymous network it is supposed to be extremely hard to find out source > node or software making the requests put aside actual bug tracking; second, > the repeating requests could be malicious - and presense of spammer on the > network is not sufficient reason for many other nodes to silently submit a > share of bandwidth for the spammer. > > With CHK keys that reserve 15 times higher bandwidth, the spam rate drops > quickly with each hop; but the small SSK keys tends to travel 15 times > further (over overloaded links; no difference if all nodes are underloaded) > making SSK flood attack, while not fatal or noticeable without targetted > network profiling, quite more damaging for network performance as a whole. > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.11 - 14:30:49GMT > ----- > > > Generally, a set of requests for the same key succeeds only once > > OK, I see your point, if there are lots of redundant requests from the same > client we can improve the success ratio by rejecting most of them. But we > shouldn't do it, for two reasons: > > First, as you pointed out, in an anonymous network we don't know how many > requests came from each client, so we don't know whether the requests are > redundant. > > Second, and more importantly, rejecting a request has side-effects: it > causes the client to slow down, and it causes the request to go elsewhere. > Rejections are interpreted as a sign of overload, just like packet losses > in TCP - rejecting a request because the data's not in the network is like > dropping a TCP packet instead of returning a 404 error. OK, it saves a bit > of bandwidth, but it confuses the client. > > It *might* make sense to return DataNotFound if the same key was recently > searched for and not found, but as Toad has pointed out that might make it > possible to blacklist certain keys (if you get a request for a key you want > to censor, return DataNotFound and all the upstream nodes will cache the > result). > > > keys missing in network getting requested repeatedly, often without > > facing > > success at the end. > > Nevertheless, "DataNotFound" is useful information. It's worth spending > bandwidth to find out that a key is not in the network. > > > Doesn't that really means equal treating, or in other words: processing > > without prejudice? > > Yes, processing without prejudice is what I meant - I thought you meant > "equal quotas" rather than "equal opportunities". Oh god, I hope Lomion's > not reading this thread. :-) > > > If the requests for the same key get repeated again and again, that > > basically wastes traffic > > Maybe so, but rejecting requests is not the solution because it has > side-effects. Can you think of a way to cache the results of recent > searches without making it possible to blacklist certain keys? > > > presense of spammer on the network is not sufficient reason for many > > other > > nodes to silently submit a share of bandwidth for the spammer. > > But your solution (preemptively rejecting SSK requests) would be even worse > - if one person started spamming, everyone's SSK requests would start > getting rejected. Everyone except the spammer would obey the request > throttle, and pretty soon the spammer would be the only one sending SSK > requests. > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.14 - > 15:43:40GMT ----- > > > in an anonymous network we don't know how many requests came from each > > client, so we don't know whether the requests are redundant. > > Yes but that does not change anything for an overloaded node. On the other > hand, as toad pointed out, redundant request can be caught on request > sending end, rather than request receiving end - which saves more > bandwidth. So only spammer's direct peers will be at disadvantage, which is > acceptable (moreover, it is fair that those supporting spammers get hit > most). > > > Second, and more importantly, rejecting a request has side-effects: it > > causes the client to slow down, and it causes the request to go elsewhere. > Rejections are interpreted as a sign of overload > > Exactly. If forwarding node is not overloaded, it should not reject > requests. If client that caused other node(s) gets slowed down something, > just makes things more 'fair'. > > > Maybe so, but rejecting requests is not the solution because it has > > side-effects. Can you think of a way to cache the results of recent > searches without making it possible to blacklist certain keys? > > Rejecting requests during overload is the only possibility (aside node > harakiri, maybe). As far as algorithm goes, here The Bitch Algorithm is > proudly presented: :) > > - A node seeing request for a key which recently failed, responds with DNF, > with a little but important postscriptum attached: "and BTW do not ever > call me again". (Technically: "installed a listener for the key"). The > request sending node(s) memorizes that. > > - If such listener already existed, then the request sending node > misbehaving, and should be penalised. Seriously. > > - The request sender node might try to search the key with other peers, but > do not that aggressively. > > - The request sender key listener queue is of limited length. If request > sender needs to eject queue element to conserve memory (or my be just > having the element timed out), it sends a notice "was not even going to" > ("Listener lease expired"). This allows (but not forces) request receiving > node to clear that element from its queue. > > - The request receiver key listener queue is of limited length too. So if > request receiving node decides to eject a queued element, it sends a > notice: "OK maybe if you call me again we could chat a little bit". This > gets propagated back the same as the "and BTW do not ever call me again" > reply allowing to clear the similar queue element on other nodes that > searched the key, and/or retry the request (likely using different peer). > > - The most interesting thing: if a node sees a key insert and/or key > transfer as result of request, it sends a duplicate inserts to all the > peers who are waiting they key for (as they need the key) with a virtual > comment "here is your stupid toy, I just bought a similar for myself", and > also creates new inserts directed to all the peers the key was requested > from with virtual comment "please do not be sad and keep that toy, I just > got an identical toy: just have a look" - as other nodes are most likely > will be searching the key in the same direction. > > Of course such listener mechanism will form a tree graph, located over > network links, with root being located at the node that (at least locally) > has closest location to the location of the packet. > > Thinking of the stats I saw, I would say that just 100-200 SSK keys per > listener queue could reduce SSK traffic as much as 30-50% maybe (thanks to > the good location distribution), while having those listened keys, if they > ever listened, get appear on the original requestor node ASAP. Even if I am > overly optimistic and decrease is just 10%, that still alot considering the > current SSK flood. > > Another note is that application sending requests via FCP should see the > bitch response distinguishable from plain DNF, so that non-maliciuous > software could cooperate with nodes to keep useless traffic smaller by > issuing requests that are unlikely to succeed less frequent. > > > But your solution (preemptively rejecting SSK requests) would be even > > worse - if one person started spamming, everyone's SSK requests would start > getting rejected. Everyone except the spammer would obey the request > throttle, and pretty soon the spammer would be the only one sending SSK > requests. > > - aggressive throttling during overload ensures that malicious excessively > high-volume traffic does not generally travel too far from the spammer's > own node. > - I had not yet proposed to throttle traffic by node that has idle > resources (but The Bitch Algorithm is well applicable for non-overloaded > nodes, too).
----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.16 - 00:01:10GMT ----- Sounds like a lightweight (= unreliable, but cheap) version of passive requests. Could be interesting. I think the penalizing nodes that rerequest is a bit harsh though; just because it doesn't exist now doesn't mean it will never exist, and most likely it won't be inserted through this node. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/tech/attachments/20070808/17958596/attachment.pgp>