[Tech] [old frost] bandwidth usage improvements for old nodes (and much more!)

Matthew Toseland Wed, 8 Aug 2007 16:16:46 +0100

On Wednesday 08 August 2007 16:06, Matthew Toseland wrote:
> On Wednesday 08 August 2007 16:05, Matthew Toseland wrote:
> > Continued at end to minimise confusion!
> >
> > On Wednesday 08 August 2007 15:59, Matthew Toseland wrote:
> > > Unfortunately this thread is rather rambling, it includes lots of
> > > discussion on token passing as well as the original premise.
> > >
> > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.25 -
> > > 20:18:36GMT -----
> > >
> > > I made some measurements on how freenet node behaves if bandwidth limit
> > > is set low: 10KBps and downto 6KBps (specificially, input bandwidth
> > > limit; output bandwidth limit was set to at least 15KBps but as
> > > expected factually used output bandwidth is comparable (just slightly
> > > above) with factually used input bandwidth). The node itself was
> > > running frost but no uploads/downloads, so absolute majority of network
> > > traffic was forwarded CHK/SSK
> > > requests/inserts.
> > >
> > > Results are interesting enough: CHK traffic becomes as low as 5% (in
> > > packets) of CHK+SSK, while at least 92% of SSK requests were not
> > > satisfied for assorted failures (plus quite some more certainly
> > > resulted in NotFound response due to missing the key in whole network,
> > > but I don't have the number). This makes low traffic node working
> > > highly inefficient and improportionally slow; this also slows down its
> > > peers with all the extra reject traffic. Worse, input bandwidth
> > > sometimes goes over set limit, suggesting that on hardware 33600/56000
> > > Kbps modem and even ISDN things will just get worse due to increased
> > > delays.
> > >
> > > Another thing to note: low bandwidth node (LBN) almost exclusively
> > > reject requests with "input bandwidth liability" reason, and extremely
> > > rarely other reasons.
> > >
> > > Speculating a bit, the same picture will likely be observed for peers
> > > of fast node (1Mbps or more) with many peers having typical home
> > > connection of 256Kbps or less.
> > >
> > > Not sure if simulations ever showed anything like this, but
> > > contributing to network mostly SSK service (and absolute majority of
> > > SSK requests fail!) is rather useless: optimally working network is
> > > supposed to transfer at least one CHK block for each SSK key, and
> > > typically much much more (single 10MB file consists of 481 CHK
> > > blocks!), and even if you found SSK but not CHK the SSK points to, then
> > > you failed to find information you requested.
> > >
> > > OK to make the long story short[er], at the end of this message you
> > > will find a small patch that noticably improves LBN situation. Idea is
> > > to reserve some bandwidth for CHK transfers (and SSK inserts, as those
> > > are too rare to penalize, and more valuable than requests). The line
> > > directly before the inserted one implicitly penalizes CHK transfers (as
> > > much smaller SSK requests tend to rereserve bandwidth the next moment
> > > it got released after CHK transfer finish, while much larger CHK
> > > requests do not have such good chance), so bandwidth should be reserved
> > > for 2 CHKs at least (and tests show that's enough to make a
> > > difference).
> > >
> > > Another thing I tried was increasing the 90 seconds period up to 120.
> > > That had some (no numbers here; just "noticeable but small") positive
> > > effect on making traffic smoother and staying closer to set limit,
> > > without jumping up and down too much. Where the 90 seconds number came
> > > from anyway, and how dangerous 120 could be?
> > >
> > > Some pros observed and/or thought out during tests of the patch:
> > > - I observe increase of output payload by approx. 15% (of total
> > > traffic), making LBN more useful for its peers.
> > > - the change is negligibly small for faster nodes so should not break
> > > anything globally.
> > > - entire network SSK flood traffic will be toned down a little bit (at
> > > temporary overloaded nodes only), additionally simplifying life for
> > > LBNs: after all, requesting the same SSK every 15 seconds for 35 hours,
> > > total 8100 times (factual numbers from one of the test before this
> > > patch applied; there are over 60 other SSKs that were requested more
> > > than 1000 times during the same period) is just way too much, SSKs are
> > > not inserted into network THAT fast. [does it worth to remember
> > > recently seen SSK requests, and do not forward them if same request was
> > > already forwarded within last 10 minutes and resulted in DNF/RNF? Table
> > > of recently requested SSKs that are closest to the node location should
> > > not be too big?].
> > >
> > > And contras:
> > > - in exceptional conditions (specificially, with less than 2 incoming
> > > CHK requests per 90 seconds; factually I observe 2-7 CHK requests per
> > > seconds, that's 180-630 per 90 seconds) notwithstanding node bandwidth
> > > speed, up to 800 Bps might end being unused. For high bandwidth node
> > > that's just way too small to notice, for LBN that's still acceptable
> > > (10% of 56Kbps) and will decrease roundtrip delays a bit which is
> > > always a good thing for so slow links.
> > >
> > > Other notes:
> > > - distribution of location closeness/number of SSK requests is very
> > > nice: only SSK requests with location very close to node location get
> > > repeated frequently; farther SSK location is, less requests the node
> > > sees, with those SSKs seen only once or two times per 1-2 days period
> > > are distributed evenly among location space. This suggests that routing
> > > is working fine. - As far as I understand, if input bandwidth
> > > limit/liability exceeded (but a packet already received anyway),
> > > CHK/SSK request gets instantly rejected (thus throwing out received
> > > bytes while input bandwidth has no spare volume!); only otherwise node
> > > checks if the requested key exists in the storage. Heh? This feels like
> > > a serious bug hurting overall network performance: better query storage
> > > and hopefully send back result (or still reject if the key not found
> > > locally) rather than wait for retry request to waste more input
> > > bandwidth. At least for SSK reject and reply are comparable in output
> > > bandwidth usage, so worth a little delay in response. Or do I miss
> > > something?
> > >
> > > ===
> > > diff --git a/freenet/src/freenet/node/NodeStats.java
> > > b/freenet/src/freenet/node/NodeStats.java
> > > index 3b091b4..fb9f8b9 100644
> > > --- a/freenet/src/freenet/node/NodeStats.java
> > > +++ b/freenet/src/freenet/node/NodeStats.java
> > > @@ -414,6 +414,7 @@ public class NodeStats implements Persistable {
> > >
> > > successfulChkInsertBytesReceivedAverage.currentValue() *
> > > node.getNumCHKInserts() +
> > >
> > > successfulSskInsertBytesReceivedAverage.currentValue() *
> > > node.getNumSSKInserts();
> > >                 bandwidthLiabilityInput += getSuccessfulBytes(isSSK,
> > > isInsert, true).currentValue();
> > > +               if (isSSK && !isInsert)
> > > bandwidthLiabilityInput+=successfulChkFetchBytesReceivedAverage.current
> > >Va lu e()+successfulChkInsertBytesReceivedAverage.currentValue(); //
> > > slightly penalize SSK requests by reserving bandwidth for 2 additional
> > > CHK transfers (or SSK inserts if any)
> > >                 double bandwidthAvailableInput =
> > >                         node.getInputBandwidthLimit() * 90; // 90
> > > seconds at full power
> > >                 if(bandwidthLiabilityInput > bandwidthAvailableInput) {
> > > ===
> > >
> > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.26 - 16:56:59GMT
> > > -----
> > >
> > > Most SSK requests fail. They DNF. The reason for this is most SSK
> > > requests are polling for data that has not yet been inserted.
> > >
> > > Bandwidth liability is usually the main reason for rejection. If we
> > > reach most of the other reasons, there is a problem - usually a
> > > cyclical problem. The main reason for it is to ensure that we don't
> > > accept so many requests that some of them needlessly timeout even
> > > though they succeeded. The timeout is 120 seconds, so we need the
> > > actual transfer to take less than this; on a request, 30 seconds seems
> > > a reasonable upper bound for the search time. We don't throw out many
> > > bytes when we reject a request/insert because the bulk of it hasn't
> > > been sent yet, except with SSKs where typically a little under half of
> > > the total bytes will have been moved. Ideally we wouldn't send requests
> > > until we have a good idea that they will be accepted, but token passing
> > > load balancing is a long way off, not likely to happen for 0.7.0.
> > >
> > > We cannot control input bandwidth usage precisely.
> > >
> > > Any more info on SSK flooding? Is it simply Frost?
> > >
> > > We can add a failure table, we had one before, however a failure table
> > > which results in actually blocking keys can be extremely dangerous;
> > > what I had envisaged was "per node failure tables" i.e. reroute
> > > requests which have recently failed to a different node since we know
> > > it isn't where it's supposed to be.
> > >
> > > On what do you base the assertion about key closeness? It would be nice
> > > to have a histogram or circle on the stats pages showing recent keys on
> > > the keyspace - can you write a patch?
> > >
> > > As far as your patch goes, surely rejecting more SSK requests would be
> > > counterproductive as it wastes bandwidth? Shouldn't a slow node accept
> > > those requests it's likely to be able to handle?
> > >
> > > I can see an argument that we shouldn't prefer SSKs, and that on slow
> > > nodes we do prefer SSKs... I'm not sure the above is the right way to
> > > deal with it though. The effect of the patch would be to never accept
> > > any SSKs unless we have plenty of spare bandwidth, correct?
> > >
> > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.26 -
> > > 18:41:32GMT -----
> > >
> > > > Ideally we wouldn't send requests until we have a good idea that they
> > > > will
> > >
> > > be accepted, but token passing load balancing is a long way off, not
> > > likely to happen for 0.7.0.
> > >
> > > Well, even current algorithm implementation has certain room for
> > > improvement. Here is the typical numbers I observe:
> > >
> > > ===
> > > unclaimedFIFO Message Counts
> > >     * FNPRejectOverload: 89 (45.2%)
> > >     * FNPInsertTransfersCompleted: 59 (29.9%)
> > >     * FNPDataNotFound: 15 (7.6%)
> > >     * packetTransmit: 12 (6.1%)
> > >     * FNPRejectLoop: 7 (3.6%)
> > >     * FNPAccepted: 6 (3.0%)
> > >     * FNPSwapRejected: 4 (2.0%)
> > >     * FNPDataInsertRejected: 4 (2.0%)
> > >     * FNPRouteNotFound: 1 (0.5%)
> > >     * Unclaimed Messages Considered: 197
> > > ===
> > >
> > > FNPRejectOverload always stays at top sometimes with hundreds messages
> > > (for the last hour before unclaimed messages expire, that's alot), and
> > > so indicates that there is some bug (or bugs) with bandwidth limiting
> > > obeying.
> > >
> > > > Any more info on SSK flooding? Is it simply Frost?
> > >
> > > Not local frost for sure, it generates just several SSK simultaneous
> > > requests (by default max 8: 6 for boards plus 2 for filesharing, AFAIR;
> > > practically 2-4 simutaneous requests most of the time). Other 100 SSK
> > > requests (without proposed patch) are forwarded ones.
> > >
> > > >We can add a failure table, we had one before, however a failure table
> > > > which
> > >
> > > results in actually blocking keys can be extremely dangerous;
> > >
> > > Is it, having timeout of max few minutes (i.e. at least few times less
> > > than SSK propagation time visible with frost messages)? Is it more
> > > dangerous than current wastage of bandwith for same SSK key requests
> > > several times per minute? Had some simulations been done on that in the
> > > past?
> > >
> > > BTW, isn't the observed very low store hit rate results from
> > > prioritising the likely-to-fail SSKs?
> > >
> > > BTW2 the failure table could also act as a targetted content
> > > propagation mechanism: if a node sees SSK insert for a temporary
> > > blacklisted (non-existing) SSK, then forwarding the insert (more likely
> > > insert copy, for security reasons and routing sake) to the original
> > > requestor should speed up propagaton of new SSKs toward the nodes that
> > > already
> > > anticipate/await for them.
> > >
> > > >what I had envisaged was "per node failure tables" i.e. reroute
> > > > requests
> > >
> > > which have recently failed to a different node since we know it isn't
> > > where it's supposed to be.
> > >
> > > At a glance, very nice idea. But LBNs typically answer with reject, not
> > > DFN... even with current code. Probably such rerouting will even
> > > further increase SSK traffic toward LBNs, and get sharply increased
> > > volume of SSK rejects as result. Hmm, some testing/simulation seems
> > > really needed here.
> > >
> > > >On what do you base the assertion about key closeness? It would be
> > > > nice to
> > >
> > > have a histogram or circle on the stats pages showing recent keys on
> > > the keyspace - can you write a patch?
> > >
> > > Mmmm... in fact I just added custom logging, then a wild combination of
> > > grep/sed/sort/uniq to analyze the logs. But let me think, maybe
> > > visualizing a couple of stats files I operate with will be rather
> > > trivial...
> > >
> > > But I would rather stay away from stats page graphics at this time, as
> > > the stats files I operate  (filtered+sorted+uniqued) with are rather
> > > large, 20-50Mb each - too much memory for the toy. Unless your 'recent'
> > > means just 10-15 minutes at most?
> > >
> > > >As far as your patch goes, surely rejecting more SSK requests would be
> > >
> > > counterproductive as it wastes bandwidth?
> > >
> > > Tests show the opposite: without the patch payload output at stats page
> > > never exceeded 38%, with patch it becomes 53% or little more after
> > > several minutes upon node restart. So, with the patch SSK/CHK
> > > forwarding behaviour 'feels' logical:
> > >
> > > without patch:
> > > - just several CHKs, and over over 100 SSKs very typical.
> > >
> > > with patch:
> > > - most of the time (say, 75%) number of currently forwarded CHK
> > > requests+inserts approximately equals to the number of SSK
> > > requests+inserts (i.e. 10-25 each, depending on set bandwidth limit);
> > > - sometimes (say, 10%) CHK requests start to prevail, but current SSK
> > > requests+inserts seems never go below the amount which CHK get at max
> > > without patch (i.e. 6-10). This is very typical when number of CHK
> > > inserts gets several times higher than CHK requests (close fast peer
> > > inserts something really large?).
> > > - other times (say, 15%) CHK requests+inserts flow does not saturate
> > > bandwidth, and number of SSK requests quickly climbs to 50 or even over
> > > 100+ as it typically gets without the patch.
> > >
> > > That's for LBN. Raising input bandwidth allotment, number of SSKs
> > > quickly grows resembling the situation without the patch.
> > >
> > > So that's why I suggest reserving bandwidth for 2 CHK transfers; 3
> > > would kill SSKs, 1 still makes SSKs to seriously prevail over CHKs (but
> > > nonetheless gives quite better ratio, so is a legal value to try if the
> > > value of 2 alarms you too much). Just, in case of reserving bandwidth
> > > for 1 extra CHK the proposed patch is not really needed: simply comment
> > > out the line starting with "bandwidthLiabilityInput +=" and decrease 90
> > > seconds constant to 80 (10 seconds is roughly how much 33.6Kbod modem
> > > takes to transmit a single CHK - using anything noticeably slower than
> > > 28800/33600bod for freenet will not ever work well anyway).
> > >
> > > >Shouldn't a slow node accept those requests it's likely to be able to
> > > > handle?
> > >
> > > Considering the very high chance of SSK request failures (at lest 92%),
> > > I would say the answer is no. But with sane SSK failure rate (say 75%
> > > or below) SSK requests would likely not waste the limited thus precious
> > > LBN bandwidth so fruitlessly.
> > >
> > > The problem, in my belief, is too small size of UDP packets if SSK
> > > requests prevail: PPP(oE)/TCP/FNP overhead becomes too large while
> > > LBNs, unlike faster link nodes, almost never coalesce packets,
> > > obviously.
> > >
> > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.27 - 17:19:24GMT
> > > -----
> > >
> > > The current algorithm is working, on most nodes, far better than it has
> > > in *ages*. I'm at 62% of a 700MB ISO, I started inserting it yesterday
> > > morning, and only a few of my peers are backed off - frequently none
> > > are backed off, right now it's 11 connected, 6 backed off, which is
> > > more backed off than I've seen for quite a while.
> > >
> > > Re failure tables: Yes it is extremely dangerous. It can result in
> > > self-reinforcing key censorship, either as an attack or just occurring
> > > naturally. This happened on 0.5. And the hit ratio is only for CHKs
> > > iirc.
> > >
> > > Even LBNs don't often send local RejectedOverload's on SSKs *once they
> > > have accepted them*. They may relay downstream RO's but that is not
> > > fatal. And if they reject some requests, so what, it's a slow node,
> > > it's bound to reject some requests with the current load balancing
> > > system.
> > >
> > > 10-15 minutes would be interesting. We already show a circle and
> > > histogram of nearby nodes from swapping and of our peers, you'd just
> > > have to add another one. It would be good to have a visual proof that
> > > routing is working on the level of adhering to node specialisations. I
> > > didn't expect it to be working given the load: I'm surprised that it
> > > does, it's an interesting result.
> > >
> > > Packet size has nothing to do with it, ethernet has a 1472 byte
> > > maximum. Dial-up has 576 bytes max, but we ignore it, and use
> > > fragmented packets (this sucks, obviously, as it greatly increases the
> > > chance of losing a packet and having to retransmit it).
> > >
> > > Please explain why the patch doesn't result in never accepting a single
> > > SSK?
> > >
> > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.27 -
> > > 19:31:14GMT -----
> > >
> > > >Packet size has nothing to do with it, ethernet has a 1472 byte
> > > > maximum.
> > >
> > > Dial-up has 576 bytes max, but we ignore it, and use fragmented packets
> > > (this sucks, obviously, as it greatly increases the chance of losing a
> > > packet and having to retransmit it).
> > >
> > > I am talking about typical/average packet size, not MTU. LBNs, unlike
> > > faster nodes, rarely have a chance to coalesce reject responses (over
> > > max 100ms), and thus send improportionally more tiny packets resulting
> > > in much higher protocols overhead. Thus having LBNs to mostly cater
> > > SSKs not CHKs results in lowest imaginable usefulness of LBNs for
> > > network as a whole.
> > >
> > > BTW in my experience typical/default dialup/PPP MTU is 1500 minus link
> > > level headers, like ethernet. 576 is a reasonable adjustment for
> > > interactive traffic like ssh but I fail to remember if it was used as
> > > default since the time the super fast 28800 bod modems became common.
> > > :) 1400+ is the typical size of GPRS PPP packets too, and the same
> > > holds true for other popular wireless mediae like BlueTooth or WiFi; so
> > > I have no concerns regarding IP fragmentation.
> > >
> > > > Please explain why the patch doesn't result in never accepting a
> > > > single SSK?
> > >
> > > I can not. :) Can you explain why the current code that penalizes CHKs
> > > still gives 5% for them, even if CHKs are 25 times larger and similarly
> > > less frequent so have really hard time to arrive at the exact moment
> > > when bandwidth liability is not maxed out?
> > >
> > > Seriously, I believe that goes with 2 facts:
> > >
> > > - SSK requests are much more frequent, so any temporary drop of CHK
> > > requests level enables node to quickly get a bunch of new SSKs accepted
> > > for processing;
> > > - the large CHK requests (at times while they prevail over SSKs) tend
> > > to hit other limits too, like "output bandwidth liability",
> > > "Insufficient input/output bandwidth" throttles. Then the small SSK
> > > requests quickly pick up all the remaining bandwidth bits.
> > >
> > > But currently I do not have relevant statistics to prove that.
> > >
> > > Anyway, please commit the following patch - it should equal out
> > > bandwidth rights for CHKs and SSKs at least half way toward
> > > fair/expected distribution (and the change will make any difference for
> > > high-/over-loaded nodes only). Once most of my peers (and their peers)
> > > update, I will study the new node traffic forwarding efficiency and
> > > behavior at different bandwidth limits and with different penalization
> > > levels again - and then will be in better position to prove the
> > > original proposal of reserving bandwidth for 2 CHKs is optimal (or
> > > maybe withdraw it).
> > >
> > > ===
> > > diff --git a/freenet/src/freenet/node/NodeStats.java
> > > b/freenet/src/freenet/node/NodeStats.java
> > > index 3b091b4..98c82c3 100644
> > > --- a/freenet/src/freenet/node/NodeStats.java
> > > +++ b/freenet/src/freenet/node/NodeStats.java
> > > @@ -399,9 +399,8 @@ public class NodeStats implements Persistable {
> > >                        
> > > successfulSskFetchBytesSentAverage.currentValue() *
> > > node.getNumSSKRequests() +
> > >
> > > successfulChkInsertBytesSentAverage.currentValue() *
> > > node.getNumCHKInserts() +
> > >
> > > successfulSskInsertBytesSentAverage.currentValue() *
> > > node.getNumSSKInserts();
> > > -               bandwidthLiabilityOutput += getSuccessfulBytes(isSSK,
> > > isInsert, false).currentValue();
> > >                 double bandwidthAvailableOutput =
> > > -                       node.getOutputBandwidthLimit() * 90; // 90
> > > seconds at full power; we have to leave some time for the search as
> > > well +                       node.getOutputBandwidthLimit() * 80; // 80
> > > seconds at full power; we have to leave some time for the search as
> > > well bandwidthAvailableOutput *=
> > > NodeStats.FRACTION_OF_BANDWIDTH_USED_BY_REQUESTS;
> > >                 if(bandwidthLiabilityOutput > bandwidthAvailableOutput)
> > > { preemptiveRejectReasons.inc("Output bandwidth liability");
> > > @@ -413,9 +412,8 @@ public class NodeStats implements Persistable {
> > >
> > > successfulSskFetchBytesReceivedAverage.currentValue() *
> > > node.getNumSSKRequests() +
> > >
> > > successfulChkInsertBytesReceivedAverage.currentValue() *
> > > node.getNumCHKInserts() +
> > >
> > > successfulSskInsertBytesReceivedAverage.currentValue() *
> > > node.getNumSSKInserts();
> > > -               bandwidthLiabilityInput += getSuccessfulBytes(isSSK,
> > > isInsert, true).currentValue();
> > >                 double bandwidthAvailableInput =
> > > -                       node.getInputBandwidthLimit() * 90; // 90
> > > seconds at full power
> > > +                       node.getInputBandwidthLimit() * 80; // 80
> > > seconds at full power
> > >                 if(bandwidthLiabilityInput > bandwidthAvailableInput) {
> > >                         preemptiveRejectReasons.inc("Input bandwidth
> > > liability");
> > >                         return "Input bandwidth liability";
> > > ===
> > >
> > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.28 - 17:05:53GMT
> > > -----
> > >
> > > Why does assuming 80 seconds instead of 90 help? I would have expected
> > > it to make the situation worse.
> > >
> > > Isn't what you want to increment the value you are multiplying the CHK
> > > byte counters by if the request is an SSK? In any case I'm not
> > > convinced - we accept 32x as many SSKs as CHKs precisely because they
> > > use 32x less bandwidth. As far as I can see incrementing the CHK counts
> > > but only on a CHK would just result in us never accepting an SSK...
> > >
> > > But by all means continue to investigate.
> > >
> > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.04.30 -
> > > 19:36:36GMT -----
> > >
> > > > we accept 32x as many SSKs as CHKs precisely because they use 32x
> > > > less
> > >
> > > bandwidth.
> > >
> > > Sorry but I don't understand the rationale behind this. It seems to be
> > > based on the assumption that equal resources should be allocated to
> > > SSKs and CHKs, regardless of whether there's equal demand for
> > > resources. If we're only getting, say, 16 times as many SSK requests as
> > > CHK requests, would we reject CHK requests to keep things "fair"?
> > >
> > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.02 - 16:13:52GMT
> > > -----
> > >
> > > Why should CHKs be prioritised over SSKs?
> > >
> > > What do you think of the patch I committed anyway?
> >
> > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.02 -
> > 17:03:52GMT -----
> >
> > > Why should CHKs be prioritised over SSKs?
> >
> > Because SSK success ratio is extremely low (remember the example I gave
> > within initial message in the thread: 8100 rejected requests for the same
> > SSK key within 35 hours, and uncounted amount of the same key requests
> > resulted in DNF - and such key is nowhere near being alone); linear
> > programming would certainly suggest that SSK requests like they currently
> > are should be totally disabled. I do not propose anything as drastic; but
> > as SSKs currently use bandwidth very inefficiently I propose to tone them
> > down heuristically to approximately CHK level while node is short on
> > bandwidth (but let SSKs go as much as sending node needs if spare
> > bandwidth available).
> >
> > [not that I meant to speak instead of mrogers]
> >
> > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.03 - 13:54:38GMT
> > -----
> >
> > >> Why should CHKs be prioritised over SSKs?
> > >
> > > Because SSK success ratio is extremely low
> >
> > That doesn't make sense for two reasons:
> >
> > 1) rejecting SSKs will make the SSK success ratio worse, not better
> > 2) "SSK not found" is useful information - for example, that's how you
> > discover the current version of a USK - but "SSK rejected for not being a
> > CHK" is not useful to anyone
> >
> > Let me use an analogy: you're designing a Cisco router. Some of the
> > packets it gets asked to forward will be small SSH packets, others will
> > be large HTTP packets. Do you say "half our resources should go to SSH
> > because we don't want to prioritise HTTP, so we'll only forward one HTTP
> > packet for every 32 SSH packets"? If you answered yes, you just lost your
> > job at Cisco. The router's job is to deliver packets to the best of its
> > ability, not to decide what kind of packets the end hosts "should" be
> > sending.
>
> ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.03 -
> 15:56:54GMT -----
>
> 1) accepting more SSK requests will make SSK success ratio worse, not
> better. If that done in addition to CHK requests (i.e. node bandwidth stays
> well within desired limits), not a big deal; but if the resultless SSK
> traffic starts limiting CHK (i.e. real data) transfer, that's a problem.
>
> Fact is that SSK key is known well before SSK content, and gets rerequested
> repeatably on practice. My profiling of network indicates that even with
> datastore 10 times larger then default, less than 0.05% (and who knows how
> much less) of SSK requests (not SSK keys!) are currently getting satisfied
> by underloaded node; even worse for overloaded node (do you have different
> numbers?). With such miserable SSK requests success rate I see alot of
> sense to penalize them; once the SSK requests flood situation improves (and
> there certainly should be SSK centric bugs hiding or improvements pending
> to either improve SSK requests flow or avoid using SSKs where they are not
> optimal construct) I will quite possibly and readily change my opinion that
> underprioritising SSK requests is acceptable and good. But the SSK flood
> problem does not look like a low hanging fruit at all, expect it to take
> months if not years for major improvements.
>
> But please do not argue with me that equal treating of SSK and CHK requests
> based on their count not length is good: I totally agree; my point is that
> for the currently observed network behaviour that is nowhere being the only
> good choice, and in short-to-middle term not neccessarily the best one.
>
> 2) "SSK not found" is a useful information indeed; but the exactly same
> event repeated 8100 times brings extremely little additional entropy (basic
> fact of information theory), and mostly means wasted bandwidth on practice.
> Then again, if the node is not overloaded at the moment, I have nothing
> against repeating the DNF using otherwise idle bandwidth - even 0.001 bit
> of entropy per extra SSK request is at least a mediocre (and maybe a quite
> good) argument against transferring exactly 0 bits of entropy (i.e. keeping
> network link idle).
>
> ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.04 - 11:03:09GMT
> -----
>
> Can you explain why you think accepting more SSK requests will make the SSK
> success ratio worse? A rejected request will fail to reach some parts of
> the network (at the minimum it won't reach the node that rejected it), so
> it's more likely to fail.
>
> > But please do not argue with me that equal treating of SSK and CHK
> > requests
>
> based on their count not length is good: I totally agree
>
> But I don't agree. :-) I don't think we should try to treat them equally in
> any way. We should just accept as many requests as possible, of whatever
> kind, without trying to enforce any kind of "fairness" or "equality"
> between CHKs and SSKs.
>
> > "SSK not found" is a useful information indeed; but the exactly same
> > event
>
> repeated 8100 times brings extremely little additional entropy
>
> Very true - if the same client is really requesting the same key 8100 times
> then the client should be fixed (eg modified to use exponentially
> increasing delays when re-requesting a failed key). But we shouldn't try to
> fix the problem by rejecting (anonymous!) requests further down the line.
> How do we know it's one client making 8100 requests, not 810 clients making
> 10 requests each?
>
> ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.10 -
> 20:36:10GMT -----
>
> > Can you explain why you think accepting more SSK requests will make the
> > SSK
>
> success ratio worse?
>
> Generally, a set of requests for the same key succeeds only once; during
> moderately long time period (cache lifetime, probably few days to a couple
> of weeks) maximum (N-1) times where N is number of peers, provided that it
> was not the current node that inserted the key. ALL the previous requests
> (made before content is available) fail, and if you do more unsucessful
> requests you decrease success ratio.
>
> I see your rationale for a request made in search for content that was on
> network for some time already, and thus will be either found quickly, or
> after few attempts the original requestor will cease searching for it (and
> hopefully will insert it later, after FEC recovery, if applicable). But
> such behaviour is typical for CHK keys; while observation for SSK requests
> behaviour shows very different pattern: keys missing in network getting
> requested repeatedly, often without facing success at the end.
>
> > But I don't agree. :-)
>
> Ugh :-)
>
> > I don't think we should try to treat them equally in any way. We should
> > just
>
> accept as many requests as possible, of whatever kind, without trying to
> enforce any kind of "fairness" or "equality" between CHKs and SSKs.
>
> Doesn't that really means equal treating, or in other words: processing
> without prejudice?
>
> > How do we know it's one client making 8100 requests, not 810 clients
> > making
>
> 10 requests each?
>
> That makes no real difference for the request processing node. If it had
> the requested data, it would respond (at most N-1 times) and would no
> longer see requests for that particular key for days, due to peer caching
> (provided the node does not have malicious peers). If the requests for the
> same key get repeated again and again, that basically wastes traffic - and
> so if node works at its bandwidth limit, the limited traffic should better
> be used for work that is statistically expected to be more useful.
>
> Fixing apps is always a good thing. But there are 2 problems: first, in
> anonymous network it is supposed to be extremely hard to find out source
> node or software making the requests put aside actual bug tracking; second,
> the repeating requests could be malicious - and presense of spammer on the
> network is not sufficient reason for many other nodes to silently submit a
> share of bandwidth for the spammer.
>
> With CHK keys that reserve 15 times higher bandwidth, the spam rate drops
> quickly with each hop; but the small SSK keys tends to travel 15 times
> further (over overloaded links; no difference if all nodes are underloaded)
> making SSK flood attack, while not fatal or noticeable without targetted
> network profiling, quite more damaging for network performance as a whole.
>
> ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.11 - 14:30:49GMT
> -----
>
> > Generally, a set of requests for the same key succeeds only once
>
> OK, I see your point, if there are lots of redundant requests from the same
> client we can improve the success ratio by rejecting most of them. But we
> shouldn't do it, for two reasons:
>
> First, as you pointed out, in an anonymous network we don't know how many
> requests came from each client, so we don't know whether the requests are
> redundant.
>
> Second, and more importantly, rejecting a request has side-effects: it
> causes the client to slow down, and it causes the request to go elsewhere.
> Rejections are interpreted as a sign of overload, just like packet losses
> in TCP - rejecting a request because the data's not in the network is like
> dropping a TCP packet instead of returning a 404 error. OK, it saves a bit
> of bandwidth, but it confuses the client.
>
> It *might* make sense to return DataNotFound if the same key was recently
> searched for and not found, but as Toad has pointed out that might make it
> possible to blacklist certain keys (if you get a request for a key you want
> to censor, return DataNotFound and all the upstream nodes will cache the
> result).
>
> >  keys missing in network getting requested repeatedly, often without
> > facing
>
> success at the end.
>
> Nevertheless, "DataNotFound" is useful information. It's worth spending
> bandwidth to find out that a key is not in the network.
>
> > Doesn't that really means equal treating, or in other words: processing
>
> without prejudice?
>
> Yes, processing without prejudice is what I meant - I thought you meant
> "equal quotas" rather than "equal opportunities". Oh god, I hope Lomion's
> not reading this thread. :-)
>
> > If the requests for the same key get repeated again and again, that
>
> basically wastes traffic
>
> Maybe so, but rejecting requests is not the solution because it has
> side-effects. Can you think of a way to cache the results of recent
> searches without making it possible to blacklist certain keys?
>
> >  presense of spammer on the network is not sufficient reason for many
> > other
>
> nodes to silently submit a share of bandwidth for the spammer.
>
> But your solution (preemptively rejecting SSK requests) would be even worse
> - if one person started spamming, everyone's SSK requests would start
> getting rejected. Everyone except the spammer would obey the request
> throttle, and pretty soon the spammer would be the only one sending SSK
> requests.
>
> ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.14 -
> 15:43:40GMT -----
>
> > in an anonymous network we don't know how many requests came from each
>
> client, so we don't know whether the requests are redundant.
>
> Yes but that does not change anything for an overloaded node. On the other
> hand, as toad pointed out, redundant request can be caught on request
> sending end, rather than request receiving end - which saves more
> bandwidth. So only spammer's direct peers will be at disadvantage, which is
> acceptable (moreover, it is fair that those supporting spammers get hit
> most).
>
> > Second, and more importantly, rejecting a request has side-effects: it
>
> causes the client to slow down, and it causes the request to go elsewhere.
> Rejections are interpreted as a sign of overload
>
> Exactly. If forwarding node is not overloaded, it should not reject
> requests. If client that caused other node(s) gets slowed down something,
> just makes things more 'fair'.
>
> > Maybe so, but rejecting requests is not the solution because it has
>
> side-effects. Can you think of a way to cache the results of recent
> searches without making it possible to blacklist certain keys?
>
> Rejecting requests during overload is the only possibility (aside node
> harakiri, maybe). As far as algorithm goes, here The Bitch Algorithm is
> proudly presented: :)
>
> - A node seeing request for a key which recently failed, responds with DNF,
> with a little but important postscriptum attached: "and BTW do not ever
> call me again". (Technically: "installed a listener for the key"). The
> request sending node(s) memorizes that.
>
> - If such listener already existed, then the request sending node
> misbehaving, and should be penalised. Seriously.
>
> - The request sender node might try to search the key with other peers, but
> do not that aggressively.
>
> - The request sender key listener queue is of limited length. If request
> sender needs to eject queue element to conserve memory (or my be just
> having the element timed out), it sends a notice "was not even going to"
> ("Listener lease expired"). This allows (but not forces) request receiving
> node to clear that element from its queue.
>
> - The request receiver key listener queue is of limited length too. So if
> request receiving node decides to eject a queued element, it sends a
> notice: "OK maybe if you call me again we could chat a little bit". This
> gets propagated back the same as the "and BTW do not ever call me again"
> reply allowing to clear the similar queue element on other nodes that
> searched the key, and/or retry the request (likely using different peer).
>
> - The most interesting thing: if a node sees a key insert and/or key
> transfer as result of request, it sends a duplicate inserts to all the
> peers who are waiting they key for (as they need the key) with a virtual
> comment "here is your stupid toy, I just bought a similar for myself", and
> also creates new inserts directed to all the peers the key was requested
> from with virtual comment "please do not be sad and keep that toy, I just
> got an identical toy: just have a look" - as other nodes are most likely
> will be searching the key in the same direction.
>
> Of course such listener mechanism will form a tree graph, located over
> network links, with root being located at the node that (at least locally)
> has closest location to the location of the packet.
>
> Thinking of the stats I saw, I would say that just 100-200 SSK keys per
> listener queue could reduce SSK traffic as much as 30-50% maybe (thanks to
> the good location distribution), while having those listened keys, if they
> ever listened, get appear on the original requestor node ASAP. Even if I am
> overly optimistic and decrease is just 10%, that still alot considering the
> current SSK flood.
>
> Another note is that application sending requests via FCP should see the
> bitch response distinguishable from plain DNF, so that non-maliciuous
> software could cooperate with nodes to keep useless traffic smaller by
> issuing requests that are unlikely to succeed less frequent.
>
> > But your solution (preemptively rejecting SSK requests) would be even
>
> worse - if one person started spamming, everyone's SSK requests would start
> getting rejected. Everyone except the spammer would obey the request
> throttle, and pretty soon the spammer would be the only one sending SSK
> requests.
>
> - aggressive throttling during overload ensures that malicious excessively
> high-volume traffic does not generally travel too far from the spammer's
> own node.
> - I had not yet proposed to throttle traffic by node that has idle
> resources (but The Bitch Algorithm is well applicable for non-overloaded
> nodes, too).


----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.16 - 00:01:10GMT -----

Sounds like a lightweight (= unreliable, but cheap) version of passive 
requests. Could be interesting. I think the penalizing nodes that rerequest 
is a bit harsh though; just because it doesn't exist now doesn't mean it will 
never exist, and most likely it won't be inserted through this node.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20070808/17958596/attachment.pgp>

[Tech] [old frost] bandwidth usage improvements for old nodes (and much more!)

Reply via email to