[Tech] [old frost] bandwidth usage improvements for old nodes (and much more!)

Matthew Toseland Wed, 8 Aug 2007 16:15:28 +0100

On Wednesday 08 August 2007 16:13, Matthew Toseland wrote:
> On Wednesday 08 August 2007 16:06, Matthew Toseland wrote:
> > On Wednesday 08 August 2007 16:05, Matthew Toseland wrote:
> > > Continued at end to minimise confusion!
> > >
> > > On Wednesday 08 August 2007 15:59, Matthew Toseland wrote:
> > > > Unfortunately this thread is rather rambling, it includes lots of
> > > > discussion on token passing as well as the original premise.
> > > >
> > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.25 -
> > > > 20:18:36GMT -----
> > > >
> > > > I made some measurements on how freenet node behaves if bandwidth
> > > > limit is set low: 10KBps and downto 6KBps (specificially, input
> > > > bandwidth limit; output bandwidth limit was set to at least 15KBps
> > > > but as expected factually used output bandwidth is comparable (just
> > > > slightly above) with factually used input bandwidth). The node itself
> > > > was running frost but no uploads/downloads, so absolute majority of
> > > > network traffic was forwarded CHK/SSK
> > > > requests/inserts.
> > > >
> > > > Results are interesting enough: CHK traffic becomes as low as 5% (in
> > > > packets) of CHK+SSK, while at least 92% of SSK requests were not
> > > > satisfied for assorted failures (plus quite some more certainly
> > > > resulted in NotFound response due to missing the key in whole
> > > > network, but I don't have the number). This makes low traffic node
> > > > working highly inefficient and improportionally slow; this also slows
> > > > down its peers with all the extra reject traffic. Worse, input
> > > > bandwidth sometimes goes over set limit, suggesting that on hardware
> > > > 33600/56000 Kbps modem and even ISDN things will just get worse due
> > > > to increased delays.
> > > >
> > > > Another thing to note: low bandwidth node (LBN) almost exclusively
> > > > reject requests with "input bandwidth liability" reason, and
> > > > extremely rarely other reasons.
> > > >
> > > > Speculating a bit, the same picture will likely be observed for peers
> > > > of fast node (1Mbps or more) with many peers having typical home
> > > > connection of 256Kbps or less.
> > > >
> > > > Not sure if simulations ever showed anything like this, but
> > > > contributing to network mostly SSK service (and absolute majority of
> > > > SSK requests fail!) is rather useless: optimally working network is
> > > > supposed to transfer at least one CHK block for each SSK key, and
> > > > typically much much more (single 10MB file consists of 481 CHK
> > > > blocks!), and even if you found SSK but not CHK the SSK points to,
> > > > then you failed to find information you requested.
> > > >
> > > > OK to make the long story short[er], at the end of this message you
> > > > will find a small patch that noticably improves LBN situation. Idea
> > > > is to reserve some bandwidth for CHK transfers (and SSK inserts, as
> > > > those are too rare to penalize, and more valuable than requests). The
> > > > line directly before the inserted one implicitly penalizes CHK
> > > > transfers (as much smaller SSK requests tend to rereserve bandwidth
> > > > the next moment it got released after CHK transfer finish, while much
> > > > larger CHK requests do not have such good chance), so bandwidth
> > > > should be reserved for 2 CHKs at least (and tests show that's enough
> > > > to make a
> > > > difference).
> > > >
> > > > Another thing I tried was increasing the 90 seconds period up to 120.
> > > > That had some (no numbers here; just "noticeable but small") positive
> > > > effect on making traffic smoother and staying closer to set limit,
> > > > without jumping up and down too much. Where the 90 seconds number
> > > > came from anyway, and how dangerous 120 could be?
> > > >
> > > > Some pros observed and/or thought out during tests of the patch:
> > > > - I observe increase of output payload by approx. 15% (of total
> > > > traffic), making LBN more useful for its peers.
> > > > - the change is negligibly small for faster nodes so should not break
> > > > anything globally.
> > > > - entire network SSK flood traffic will be toned down a little bit
> > > > (at temporary overloaded nodes only), additionally simplifying life
> > > > for LBNs: after all, requesting the same SSK every 15 seconds for 35
> > > > hours, total 8100 times (factual numbers from one of the test before
> > > > this patch applied; there are over 60 other SSKs that were requested
> > > > more than 1000 times during the same period) is just way too much,
> > > > SSKs are not inserted into network THAT fast. [does it worth to
> > > > remember recently seen SSK requests, and do not forward them if same
> > > > request was already forwarded within last 10 minutes and resulted in
> > > > DNF/RNF? Table of recently requested SSKs that are closest to the
> > > > node location should not be too big?].
> > > >
> > > > And contras:
> > > > - in exceptional conditions (specificially, with less than 2 incoming
> > > > CHK requests per 90 seconds; factually I observe 2-7 CHK requests per
> > > > seconds, that's 180-630 per 90 seconds) notwithstanding node
> > > > bandwidth speed, up to 800 Bps might end being unused. For high
> > > > bandwidth node that's just way too small to notice, for LBN that's
> > > > still acceptable (10% of 56Kbps) and will decrease roundtrip delays a
> > > > bit which is always a good thing for so slow links.
> > > >
> > > > Other notes:
> > > > - distribution of location closeness/number of SSK requests is very
> > > > nice: only SSK requests with location very close to node location get
> > > > repeated frequently; farther SSK location is, less requests the node
> > > > sees, with those SSKs seen only once or two times per 1-2 days period
> > > > are distributed evenly among location space. This suggests that
> > > > routing is working fine. - As far as I understand, if input bandwidth
> > > > limit/liability exceeded (but a packet already received anyway),
> > > > CHK/SSK request gets instantly rejected (thus throwing out received
> > > > bytes while input bandwidth has no spare volume!); only otherwise
> > > > node checks if the requested key exists in the storage. Heh? This
> > > > feels like a serious bug hurting overall network performance: better
> > > > query storage and hopefully send back result (or still reject if the
> > > > key not found locally) rather than wait for retry request to waste
> > > > more input bandwidth. At least for SSK reject and reply are
> > > > comparable in output bandwidth usage, so worth a little delay in
> > > > response. Or do I miss something?
> > > >
> > > > ===
> > > > diff --git a/freenet/src/freenet/node/NodeStats.java
> > > > b/freenet/src/freenet/node/NodeStats.java
> > > > index 3b091b4..fb9f8b9 100644
> > > > --- a/freenet/src/freenet/node/NodeStats.java
> > > > +++ b/freenet/src/freenet/node/NodeStats.java
> > > > @@ -414,6 +414,7 @@ public class NodeStats implements Persistable {
> > > >
> > > > successfulChkInsertBytesReceivedAverage.currentValue() *
> > > > node.getNumCHKInserts() +
> > > >
> > > > successfulSskInsertBytesReceivedAverage.currentValue() *
> > > > node.getNumSSKInserts();
> > > >                 bandwidthLiabilityInput += getSuccessfulBytes(isSSK,
> > > > isInsert, true).currentValue();
> > > > +               if (isSSK && !isInsert)
> > > > bandwidthLiabilityInput+=successfulChkFetchBytesReceivedAverage.curre
> > > >nt Va lu e()+successfulChkInsertBytesReceivedAverage.currentValue();
> > > > // slightly penalize SSK requests by reserving bandwidth for 2
> > > > additional CHK transfers (or SSK inserts if any)
> > > >                 double bandwidthAvailableInput =
> > > >                         node.getInputBandwidthLimit() * 90; // 90
> > > > seconds at full power
> > > >                 if(bandwidthLiabilityInput > bandwidthAvailableInput)
> > > > { ===
> > > >
> > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.26 - 16:56:59GMT
> > > > -----
> > > >
> > > > Most SSK requests fail. They DNF. The reason for this is most SSK
> > > > requests are polling for data that has not yet been inserted.
> > > >
> > > > Bandwidth liability is usually the main reason for rejection. If we
> > > > reach most of the other reasons, there is a problem - usually a
> > > > cyclical problem. The main reason for it is to ensure that we don't
> > > > accept so many requests that some of them needlessly timeout even
> > > > though they succeeded. The timeout is 120 seconds, so we need the
> > > > actual transfer to take less than this; on a request, 30 seconds
> > > > seems a reasonable upper bound for the search time. We don't throw
> > > > out many bytes when we reject a request/insert because the bulk of it
> > > > hasn't been sent yet, except with SSKs where typically a little under
> > > > half of the total bytes will have been moved. Ideally we wouldn't
> > > > send requests until we have a good idea that they will be accepted,
> > > > but token passing load balancing is a long way off, not likely to
> > > > happen for 0.7.0.
> > > >
> > > > We cannot control input bandwidth usage precisely.
> > > >
> > > > Any more info on SSK flooding? Is it simply Frost?
> > > >
> > > > We can add a failure table, we had one before, however a failure
> > > > table which results in actually blocking keys can be extremely
> > > > dangerous; what I had envisaged was "per node failure tables" i.e.
> > > > reroute requests which have recently failed to a different node since
> > > > we know it isn't where it's supposed to be.
> > > >
> > > > On what do you base the assertion about key closeness? It would be
> > > > nice to have a histogram or circle on the stats pages showing recent
> > > > keys on the keyspace - can you write a patch?
> > > >
> > > > As far as your patch goes, surely rejecting more SSK requests would
> > > > be counterproductive as it wastes bandwidth? Shouldn't a slow node
> > > > accept those requests it's likely to be able to handle?
> > > >
> > > > I can see an argument that we shouldn't prefer SSKs, and that on slow
> > > > nodes we do prefer SSKs... I'm not sure the above is the right way to
> > > > deal with it though. The effect of the patch would be to never accept
> > > > any SSKs unless we have plenty of spare bandwidth, correct?
> > > >
> > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.26 -
> > > > 18:41:32GMT -----
> > > >
> > > > > Ideally we wouldn't send requests until we have a good idea that
> > > > > they will
> > > >
> > > > be accepted, but token passing load balancing is a long way off, not
> > > > likely to happen for 0.7.0.
> > > >
> > > > Well, even current algorithm implementation has certain room for
> > > > improvement. Here is the typical numbers I observe:
> > > >
> > > > ===
> > > > unclaimedFIFO Message Counts
> > > >     * FNPRejectOverload: 89 (45.2%)
> > > >     * FNPInsertTransfersCompleted: 59 (29.9%)
> > > >     * FNPDataNotFound: 15 (7.6%)
> > > >     * packetTransmit: 12 (6.1%)
> > > >     * FNPRejectLoop: 7 (3.6%)
> > > >     * FNPAccepted: 6 (3.0%)
> > > >     * FNPSwapRejected: 4 (2.0%)
> > > >     * FNPDataInsertRejected: 4 (2.0%)
> > > >     * FNPRouteNotFound: 1 (0.5%)
> > > >     * Unclaimed Messages Considered: 197
> > > > ===
> > > >
> > > > FNPRejectOverload always stays at top sometimes with hundreds
> > > > messages (for the last hour before unclaimed messages expire, that's
> > > > alot), and so indicates that there is some bug (or bugs) with
> > > > bandwidth limiting obeying.
> > > >
> > > > > Any more info on SSK flooding? Is it simply Frost?
> > > >
> > > > Not local frost for sure, it generates just several SSK simultaneous
> > > > requests (by default max 8: 6 for boards plus 2 for filesharing,
> > > > AFAIR; practically 2-4 simutaneous requests most of the time). Other
> > > > 100 SSK requests (without proposed patch) are forwarded ones.
> > > >
> > > > >We can add a failure table, we had one before, however a failure
> > > > > table which
> > > >
> > > > results in actually blocking keys can be extremely dangerous;
> > > >
> > > > Is it, having timeout of max few minutes (i.e. at least few times
> > > > less than SSK propagation time visible with frost messages)? Is it
> > > > more dangerous than current wastage of bandwith for same SSK key
> > > > requests several times per minute? Had some simulations been done on
> > > > that in the past?
> > > >
> > > > BTW, isn't the observed very low store hit rate results from
> > > > prioritising the likely-to-fail SSKs?
> > > >
> > > > BTW2 the failure table could also act as a targetted content
> > > > propagation mechanism: if a node sees SSK insert for a temporary
> > > > blacklisted (non-existing) SSK, then forwarding the insert (more
> > > > likely insert copy, for security reasons and routing sake) to the
> > > > original requestor should speed up propagaton of new SSKs toward the
> > > > nodes that already
> > > > anticipate/await for them.
> > > >
> > > > >what I had envisaged was "per node failure tables" i.e. reroute
> > > > > requests
> > > >
> > > > which have recently failed to a different node since we know it isn't
> > > > where it's supposed to be.
> > > >
> > > > At a glance, very nice idea. But LBNs typically answer with reject,
> > > > not DFN... even with current code. Probably such rerouting will even
> > > > further increase SSK traffic toward LBNs, and get sharply increased
> > > > volume of SSK rejects as result. Hmm, some testing/simulation seems
> > > > really needed here.
> > > >
> > > > >On what do you base the assertion about key closeness? It would be
> > > > > nice to
> > > >
> > > > have a histogram or circle on the stats pages showing recent keys on
> > > > the keyspace - can you write a patch?
> > > >
> > > > Mmmm... in fact I just added custom logging, then a wild combination
> > > > of grep/sed/sort/uniq to analyze the logs. But let me think, maybe
> > > > visualizing a couple of stats files I operate with will be rather
> > > > trivial...
> > > >
> > > > But I would rather stay away from stats page graphics at this time,
> > > > as the stats files I operate  (filtered+sorted+uniqued) with are
> > > > rather large, 20-50Mb each - too much memory for the toy. Unless your
> > > > 'recent' means just 10-15 minutes at most?
> > > >
> > > > >As far as your patch goes, surely rejecting more SSK requests would
> > > > > be
> > > >
> > > > counterproductive as it wastes bandwidth?
> > > >
> > > > Tests show the opposite: without the patch payload output at stats
> > > > page never exceeded 38%, with patch it becomes 53% or little more
> > > > after several minutes upon node restart. So, with the patch SSK/CHK
> > > > forwarding behaviour 'feels' logical:
> > > >
> > > > without patch:
> > > > - just several CHKs, and over over 100 SSKs very typical.
> > > >
> > > > with patch:
> > > > - most of the time (say, 75%) number of currently forwarded CHK
> > > > requests+inserts approximately equals to the number of SSK
> > > > requests+inserts (i.e. 10-25 each, depending on set bandwidth limit);
> > > > - sometimes (say, 10%) CHK requests start to prevail, but current SSK
> > > > requests+inserts seems never go below the amount which CHK get at max
> > > > without patch (i.e. 6-10). This is very typical when number of CHK
> > > > inserts gets several times higher than CHK requests (close fast peer
> > > > inserts something really large?).
> > > > - other times (say, 15%) CHK requests+inserts flow does not saturate
> > > > bandwidth, and number of SSK requests quickly climbs to 50 or even
> > > > over 100+ as it typically gets without the patch.
> > > >
> > > > That's for LBN. Raising input bandwidth allotment, number of SSKs
> > > > quickly grows resembling the situation without the patch.
> > > >
> > > > So that's why I suggest reserving bandwidth for 2 CHK transfers; 3
> > > > would kill SSKs, 1 still makes SSKs to seriously prevail over CHKs
> > > > (but nonetheless gives quite better ratio, so is a legal value to try
> > > > if the value of 2 alarms you too much). Just, in case of reserving
> > > > bandwidth for 1 extra CHK the proposed patch is not really needed:
> > > > simply comment out the line starting with "bandwidthLiabilityInput
> > > > +=" and decrease 90 seconds constant to 80 (10 seconds is roughly how
> > > > much 33.6Kbod modem takes to transmit a single CHK - using anything
> > > > noticeably slower than 28800/33600bod for freenet will not ever work
> > > > well anyway).
> > > >
> > > > >Shouldn't a slow node accept those requests it's likely to be able
> > > > > to handle?
> > > >
> > > > Considering the very high chance of SSK request failures (at lest
> > > > 92%), I would say the answer is no. But with sane SSK failure rate
> > > > (say 75% or below) SSK requests would likely not waste the limited
> > > > thus precious LBN bandwidth so fruitlessly.
> > > >
> > > > The problem, in my belief, is too small size of UDP packets if SSK
> > > > requests prevail: PPP(oE)/TCP/FNP overhead becomes too large while
> > > > LBNs, unlike faster link nodes, almost never coalesce packets,
> > > > obviously.
> > > >
> > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.27 - 17:19:24GMT
> > > > -----
> > > >
> > > > The current algorithm is working, on most nodes, far better than it
> > > > has in *ages*. I'm at 62% of a 700MB ISO, I started inserting it
> > > > yesterday morning, and only a few of my peers are backed off -
> > > > frequently none are backed off, right now it's 11 connected, 6 backed
> > > > off, which is more backed off than I've seen for quite a while.
> > > >
> > > > Re failure tables: Yes it is extremely dangerous. It can result in
> > > > self-reinforcing key censorship, either as an attack or just
> > > > occurring naturally. This happened on 0.5. And the hit ratio is only
> > > > for CHKs iirc.
> > > >
> > > > Even LBNs don't often send local RejectedOverload's on SSKs *once
> > > > they have accepted them*. They may relay downstream RO's but that is
> > > > not fatal. And if they reject some requests, so what, it's a slow
> > > > node, it's bound to reject some requests with the current load
> > > > balancing system.
> > > >
> > > > 10-15 minutes would be interesting. We already show a circle and
> > > > histogram of nearby nodes from swapping and of our peers, you'd just
> > > > have to add another one. It would be good to have a visual proof that
> > > > routing is working on the level of adhering to node specialisations.
> > > > I didn't expect it to be working given the load: I'm surprised that
> > > > it does, it's an interesting result.
> > > >
> > > > Packet size has nothing to do with it, ethernet has a 1472 byte
> > > > maximum. Dial-up has 576 bytes max, but we ignore it, and use
> > > > fragmented packets (this sucks, obviously, as it greatly increases
> > > > the chance of losing a packet and having to retransmit it).
> > > >
> > > > Please explain why the patch doesn't result in never accepting a
> > > > single SSK?
> > > >
> > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.27 -
> > > > 19:31:14GMT -----
> > > >
> > > > >Packet size has nothing to do with it, ethernet has a 1472 byte
> > > > > maximum.
> > > >
> > > > Dial-up has 576 bytes max, but we ignore it, and use fragmented
> > > > packets (this sucks, obviously, as it greatly increases the chance of
> > > > losing a packet and having to retransmit it).
> > > >
> > > > I am talking about typical/average packet size, not MTU. LBNs, unlike
> > > > faster nodes, rarely have a chance to coalesce reject responses (over
> > > > max 100ms), and thus send improportionally more tiny packets
> > > > resulting in much higher protocols overhead. Thus having LBNs to
> > > > mostly cater SSKs not CHKs results in lowest imaginable usefulness of
> > > > LBNs for network as a whole.
> > > >
> > > > BTW in my experience typical/default dialup/PPP MTU is 1500 minus
> > > > link level headers, like ethernet. 576 is a reasonable adjustment for
> > > > interactive traffic like ssh but I fail to remember if it was used as
> > > > default since the time the super fast 28800 bod modems became common.
> > > >
> > > > :) 1400+ is the typical size of GPRS PPP packets too, and the same
> > > >
> > > > holds true for other popular wireless mediae like BlueTooth or WiFi;
> > > > so I have no concerns regarding IP fragmentation.
> > > >
> > > > > Please explain why the patch doesn't result in never accepting a
> > > > > single SSK?
> > > >
> > > > I can not. :) Can you explain why the current code that penalizes
> > > > CHKs still gives 5% for them, even if CHKs are 25 times larger and
> > > > similarly less frequent so have really hard time to arrive at the
> > > > exact moment when bandwidth liability is not maxed out?
> > > >
> > > > Seriously, I believe that goes with 2 facts:
> > > >
> > > > - SSK requests are much more frequent, so any temporary drop of CHK
> > > > requests level enables node to quickly get a bunch of new SSKs
> > > > accepted for processing;
> > > > - the large CHK requests (at times while they prevail over SSKs) tend
> > > > to hit other limits too, like "output bandwidth liability",
> > > > "Insufficient input/output bandwidth" throttles. Then the small SSK
> > > > requests quickly pick up all the remaining bandwidth bits.
> > > >
> > > > But currently I do not have relevant statistics to prove that.
> > > >
> > > > Anyway, please commit the following patch - it should equal out
> > > > bandwidth rights for CHKs and SSKs at least half way toward
> > > > fair/expected distribution (and the change will make any difference
> > > > for high-/over-loaded nodes only). Once most of my peers (and their
> > > > peers) update, I will study the new node traffic forwarding
> > > > efficiency and behavior at different bandwidth limits and with
> > > > different penalization levels again - and then will be in better
> > > > position to prove the original proposal of reserving bandwidth for 2
> > > > CHKs is optimal (or maybe withdraw it).
> > > >
> > > > ===
> > > > diff --git a/freenet/src/freenet/node/NodeStats.java
> > > > b/freenet/src/freenet/node/NodeStats.java
> > > > index 3b091b4..98c82c3 100644
> > > > --- a/freenet/src/freenet/node/NodeStats.java
> > > > +++ b/freenet/src/freenet/node/NodeStats.java
> > > > @@ -399,9 +399,8 @@ public class NodeStats implements Persistable {
> > > >
> > > > successfulSskFetchBytesSentAverage.currentValue() *
> > > > node.getNumSSKRequests() +
> > > >
> > > > successfulChkInsertBytesSentAverage.currentValue() *
> > > > node.getNumCHKInserts() +
> > > >
> > > > successfulSskInsertBytesSentAverage.currentValue() *
> > > > node.getNumSSKInserts();
> > > > -               bandwidthLiabilityOutput += getSuccessfulBytes(isSSK,
> > > > isInsert, false).currentValue();
> > > >                 double bandwidthAvailableOutput =
> > > > -                       node.getOutputBandwidthLimit() * 90; // 90
> > > > seconds at full power; we have to leave some time for the search as
> > > > well +                       node.getOutputBandwidthLimit() * 80; //
> > > > 80 seconds at full power; we have to leave some time for the search
> > > > as well bandwidthAvailableOutput *=
> > > > NodeStats.FRACTION_OF_BANDWIDTH_USED_BY_REQUESTS;
> > > >                 if(bandwidthLiabilityOutput >
> > > > bandwidthAvailableOutput) { preemptiveRejectReasons.inc("Output
> > > > bandwidth liability"); @@ -413,9 +412,8 @@ public class NodeStats
> > > > implements Persistable {
> > > >
> > > > successfulSskFetchBytesReceivedAverage.currentValue() *
> > > > node.getNumSSKRequests() +
> > > >
> > > > successfulChkInsertBytesReceivedAverage.currentValue() *
> > > > node.getNumCHKInserts() +
> > > >
> > > > successfulSskInsertBytesReceivedAverage.currentValue() *
> > > > node.getNumSSKInserts();
> > > > -               bandwidthLiabilityInput += getSuccessfulBytes(isSSK,
> > > > isInsert, true).currentValue();
> > > >                 double bandwidthAvailableInput =
> > > > -                       node.getInputBandwidthLimit() * 90; // 90
> > > > seconds at full power
> > > > +                       node.getInputBandwidthLimit() * 80; // 80
> > > > seconds at full power
> > > >                 if(bandwidthLiabilityInput > bandwidthAvailableInput)
> > > > { preemptiveRejectReasons.inc("Input bandwidth liability");
> > > >                         return "Input bandwidth liability";
> > > > ===
> > > >
> > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.28 - 17:05:53GMT
> > > > -----
> > > >
> > > > Why does assuming 80 seconds instead of 90 help? I would have
> > > > expected it to make the situation worse.
> > > >
> > > > Isn't what you want to increment the value you are multiplying the
> > > > CHK byte counters by if the request is an SSK? In any case I'm not
> > > > convinced - we accept 32x as many SSKs as CHKs precisely because they
> > > > use 32x less bandwidth. As far as I can see incrementing the CHK
> > > > counts but only on a CHK would just result in us never accepting an
> > > > SSK...
> > > >
> > > > But by all means continue to investigate.
> > > >
> > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.04.30 -
> > > > 19:36:36GMT -----
> > > >
> > > > > we accept 32x as many SSKs as CHKs precisely because they use 32x
> > > > > less
> > > >
> > > > bandwidth.
> > > >
> > > > Sorry but I don't understand the rationale behind this. It seems to
> > > > be based on the assumption that equal resources should be allocated
> > > > to SSKs and CHKs, regardless of whether there's equal demand for
> > > > resources. If we're only getting, say, 16 times as many SSK requests
> > > > as CHK requests, would we reject CHK requests to keep things "fair"?
> > > >
> > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.02 - 16:13:52GMT
> > > > -----
> > > >
> > > > Why should CHKs be prioritised over SSKs?
> > > >
> > > > What do you think of the patch I committed anyway?
> > >
> > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.02 -
> > > 17:03:52GMT -----
> > >
> > > > Why should CHKs be prioritised over SSKs?
> > >
> > > Because SSK success ratio is extremely low (remember the example I gave
> > > within initial message in the thread: 8100 rejected requests for the
> > > same SSK key within 35 hours, and uncounted amount of the same key
> > > requests resulted in DNF - and such key is nowhere near being alone);
> > > linear programming would certainly suggest that SSK requests like they
> > > currently are should be totally disabled. I do not propose anything as
> > > drastic; but as SSKs currently use bandwidth very inefficiently I
> > > propose to tone them down heuristically to approximately CHK level
> > > while node is short on bandwidth (but let SSKs go as much as sending
> > > node needs if spare bandwidth available).
> > >
> > > [not that I meant to speak instead of mrogers]
> > >
> > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.03 -
> > > 13:54:38GMT -----
> > >
> > > >> Why should CHKs be prioritised over SSKs?
> > > >
> > > > Because SSK success ratio is extremely low
> > >
> > > That doesn't make sense for two reasons:
> > >
> > > 1) rejecting SSKs will make the SSK success ratio worse, not better
> > > 2) "SSK not found" is useful information - for example, that's how you
> > > discover the current version of a USK - but "SSK rejected for not being
> > > a CHK" is not useful to anyone
> > >
> > > Let me use an analogy: you're designing a Cisco router. Some of the
> > > packets it gets asked to forward will be small SSH packets, others will
> > > be large HTTP packets. Do you say "half our resources should go to SSH
> > > because we don't want to prioritise HTTP, so we'll only forward one
> > > HTTP packet for every 32 SSH packets"? If you answered yes, you just
> > > lost your job at Cisco. The router's job is to deliver packets to the
> > > best of its ability, not to decide what kind of packets the end hosts
> > > "should" be sending.
> >
> > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.03 -
> > 15:56:54GMT -----
> >
> > 1) accepting more SSK requests will make SSK success ratio worse, not
> > better. If that done in addition to CHK requests (i.e. node bandwidth
> > stays well within desired limits), not a big deal; but if the resultless
> > SSK traffic starts limiting CHK (i.e. real data) transfer, that's a
> > problem.
> >
> > Fact is that SSK key is known well before SSK content, and gets
> > rerequested repeatably on practice. My profiling of network indicates
> > that even with datastore 10 times larger then default, less than 0.05%
> > (and who knows how much less) of SSK requests (not SSK keys!) are
> > currently getting satisfied by underloaded node; even worse for
> > overloaded node (do you have different numbers?). With such miserable SSK
> > requests success rate I see alot of sense to penalize them; once the SSK
> > requests flood situation improves (and there certainly should be SSK
> > centric bugs hiding or improvements pending to either improve SSK
> > requests flow or avoid using SSKs where they are not optimal construct) I
> > will quite possibly and readily change my opinion that underprioritising
> > SSK requests is acceptable and good. But the SSK flood problem does not
> > look like a low hanging fruit at all, expect it to take months if not
> > years for major improvements.
> >
> > But please do not argue with me that equal treating of SSK and CHK
> > requests based on their count not length is good: I totally agree; my
> > point is that for the currently observed network behaviour that is
> > nowhere being the only good choice, and in short-to-middle term not
> > neccessarily the best one.
> >
> > 2) "SSK not found" is a useful information indeed; but the exactly same
> > event repeated 8100 times brings extremely little additional entropy
> > (basic fact of information theory), and mostly means wasted bandwidth on
> > practice. Then again, if the node is not overloaded at the moment, I have
> > nothing against repeating the DNF using otherwise idle bandwidth - even
> > 0.001 bit of entropy per extra SSK request is at least a mediocre (and
> > maybe a quite good) argument against transferring exactly 0 bits of
> > entropy (i.e. keeping network link idle).
> >
> > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.04 - 11:03:09GMT
> > -----
> >
> > Can you explain why you think accepting more SSK requests will make the
> > SSK success ratio worse? A rejected request will fail to reach some parts
> > of the network (at the minimum it won't reach the node that rejected it),
> > so it's more likely to fail.
> >
> > > But please do not argue with me that equal treating of SSK and CHK
> > > requests
> >
> > based on their count not length is good: I totally agree
> >
> > But I don't agree. :-) I don't think we should try to treat them equally
> > in any way. We should just accept as many requests as possible, of
> > whatever kind, without trying to enforce any kind of "fairness" or
> > "equality" between CHKs and SSKs.
> >
> > > "SSK not found" is a useful information indeed; but the exactly same
> > > event
> >
> > repeated 8100 times brings extremely little additional entropy
> >
> > Very true - if the same client is really requesting the same key 8100
> > times then the client should be fixed (eg modified to use exponentially
> > increasing delays when re-requesting a failed key). But we shouldn't try
> > to fix the problem by rejecting (anonymous!) requests further down the
> > line. How do we know it's one client making 8100 requests, not 810
> > clients making 10 requests each?
> >
> > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.10 -
> > 20:36:10GMT -----
> >
> > > Can you explain why you think accepting more SSK requests will make the
> > > SSK
> >
> > success ratio worse?
> >
> > Generally, a set of requests for the same key succeeds only once; during
> > moderately long time period (cache lifetime, probably few days to a
> > couple of weeks) maximum (N-1) times where N is number of peers, provided
> > that it was not the current node that inserted the key. ALL the previous
> > requests (made before content is available) fail, and if you do more
> > unsucessful requests you decrease success ratio.
> >
> > I see your rationale for a request made in search for content that was on
> > network for some time already, and thus will be either found quickly, or
> > after few attempts the original requestor will cease searching for it
> > (and hopefully will insert it later, after FEC recovery, if applicable).
> > But such behaviour is typical for CHK keys; while observation for SSK
> > requests behaviour shows very different pattern: keys missing in network
> > getting requested repeatedly, often without facing success at the end.
> >
> > > But I don't agree. :-)
> >
> > Ugh :-)
> >
> > > I don't think we should try to treat them equally in any way. We should
> > > just
> >
> > accept as many requests as possible, of whatever kind, without trying to
> > enforce any kind of "fairness" or "equality" between CHKs and SSKs.
> >
> > Doesn't that really means equal treating, or in other words: processing
> > without prejudice?
> >
> > > How do we know it's one client making 8100 requests, not 810 clients
> > > making
> >
> > 10 requests each?
> >
> > That makes no real difference for the request processing node. If it had
> > the requested data, it would respond (at most N-1 times) and would no
> > longer see requests for that particular key for days, due to peer caching
> > (provided the node does not have malicious peers). If the requests for
> > the same key get repeated again and again, that basically wastes traffic
> > - and so if node works at its bandwidth limit, the limited traffic should
> > better be used for work that is statistically expected to be more useful.
> >
> > Fixing apps is always a good thing. But there are 2 problems: first, in
> > anonymous network it is supposed to be extremely hard to find out source
> > node or software making the requests put aside actual bug tracking;
> > second, the repeating requests could be malicious - and presense of
> > spammer on the network is not sufficient reason for many other nodes to
> > silently submit a share of bandwidth for the spammer.
> >
> > With CHK keys that reserve 15 times higher bandwidth, the spam rate drops
> > quickly with each hop; but the small SSK keys tends to travel 15 times
> > further (over overloaded links; no difference if all nodes are
> > underloaded) making SSK flood attack, while not fatal or noticeable
> > without targetted network profiling, quite more damaging for network
> > performance as a whole.
> >
> > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.11 - 14:30:49GMT
> > -----
> >
> > > Generally, a set of requests for the same key succeeds only once
> >
> > OK, I see your point, if there are lots of redundant requests from the
> > same client we can improve the success ratio by rejecting most of them.
> > But we shouldn't do it, for two reasons:
> >
> > First, as you pointed out, in an anonymous network we don't know how many
> > requests came from each client, so we don't know whether the requests are
> > redundant.
> >
> > Second, and more importantly, rejecting a request has side-effects: it
> > causes the client to slow down, and it causes the request to go
> > elsewhere. Rejections are interpreted as a sign of overload, just like
> > packet losses in TCP - rejecting a request because the data's not in the
> > network is like dropping a TCP packet instead of returning a 404 error.
> > OK, it saves a bit of bandwidth, but it confuses the client.
> >
> > It *might* make sense to return DataNotFound if the same key was recently
> > searched for and not found, but as Toad has pointed out that might make
> > it possible to blacklist certain keys (if you get a request for a key you
> > want to censor, return DataNotFound and all the upstream nodes will cache
> > the result).
> >
> > >  keys missing in network getting requested repeatedly, often without
> > > facing
> >
> > success at the end.
> >
> > Nevertheless, "DataNotFound" is useful information. It's worth spending
> > bandwidth to find out that a key is not in the network.
> >
> > > Doesn't that really means equal treating, or in other words: processing
> >
> > without prejudice?
> >
> > Yes, processing without prejudice is what I meant - I thought you meant
> > "equal quotas" rather than "equal opportunities". Oh god, I hope Lomion's
> > not reading this thread. :-)
> >
> > > If the requests for the same key get repeated again and again, that
> >
> > basically wastes traffic
> >
> > Maybe so, but rejecting requests is not the solution because it has
> > side-effects. Can you think of a way to cache the results of recent
> > searches without making it possible to blacklist certain keys?
> >
> > >  presense of spammer on the network is not sufficient reason for many
> > > other
> >
> > nodes to silently submit a share of bandwidth for the spammer.
> >
> > But your solution (preemptively rejecting SSK requests) would be even
> > worse - if one person started spamming, everyone's SSK requests would
> > start getting rejected. Everyone except the spammer would obey the
> > request throttle, and pretty soon the spammer would be the only one
> > sending SSK requests.
> >
> > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.14 -
> > 15:43:40GMT -----
> >
> > > in an anonymous network we don't know how many requests came from each
> >
> > client, so we don't know whether the requests are redundant.
> >
> > Yes but that does not change anything for an overloaded node. On the
> > other hand, as toad pointed out, redundant request can be caught on
> > request sending end, rather than request receiving end - which saves more
> > bandwidth. So only spammer's direct peers will be at disadvantage, which
> > is acceptable (moreover, it is fair that those supporting spammers get
> > hit most).
> >
> > > Second, and more importantly, rejecting a request has side-effects: it
> >
> > causes the client to slow down, and it causes the request to go
> > elsewhere. Rejections are interpreted as a sign of overload
> >
> > Exactly. If forwarding node is not overloaded, it should not reject
> > requests. If client that caused other node(s) gets slowed down something,
> > just makes things more 'fair'.
> >
> > > Maybe so, but rejecting requests is not the solution because it has
> >
> > side-effects. Can you think of a way to cache the results of recent
> > searches without making it possible to blacklist certain keys?
> >
> > Rejecting requests during overload is the only possibility (aside node
> > harakiri, maybe). As far as algorithm goes, here The Bitch Algorithm is
> > proudly presented: :)
> >
> > - A node seeing request for a key which recently failed, responds with
> > DNF, with a little but important postscriptum attached: "and BTW do not
> > ever call me again". (Technically: "installed a listener for the key").
> > The request sending node(s) memorizes that.
> >
> > - If such listener already existed, then the request sending node
> > misbehaving, and should be penalised. Seriously.
> >
> > - The request sender node might try to search the key with other peers,
> > but do not that aggressively.
> >
> > - The request sender key listener queue is of limited length. If request
> > sender needs to eject queue element to conserve memory (or my be just
> > having the element timed out), it sends a notice "was not even going to"
> > ("Listener lease expired"). This allows (but not forces) request
> > receiving node to clear that element from its queue.
> >
> > - The request receiver key listener queue is of limited length too. So if
> > request receiving node decides to eject a queued element, it sends a
> > notice: "OK maybe if you call me again we could chat a little bit". This
> > gets propagated back the same as the "and BTW do not ever call me again"
> > reply allowing to clear the similar queue element on other nodes that
> > searched the key, and/or retry the request (likely using different peer).
> >
> > - The most interesting thing: if a node sees a key insert and/or key
> > transfer as result of request, it sends a duplicate inserts to all the
> > peers who are waiting they key for (as they need the key) with a virtual
> > comment "here is your stupid toy, I just bought a similar for myself",
> > and also creates new inserts directed to all the peers the key was
> > requested from with virtual comment "please do not be sad and keep that
> > toy, I just got an identical toy: just have a look" - as other nodes are
> > most likely will be searching the key in the same direction.
> >
> > Of course such listener mechanism will form a tree graph, located over
> > network links, with root being located at the node that (at least
> > locally) has closest location to the location of the packet.
> >
> > Thinking of the stats I saw, I would say that just 100-200 SSK keys per
> > listener queue could reduce SSK traffic as much as 30-50% maybe (thanks
> > to the good location distribution), while having those listened keys, if
> > they ever listened, get appear on the original requestor node ASAP. Even
> > if I am overly optimistic and decrease is just 10%, that still alot
> > considering the current SSK flood.
> >
> > Another note is that application sending requests via FCP should see the
> > bitch response distinguishable from plain DNF, so that non-maliciuous
> > software could cooperate with nodes to keep useless traffic smaller by
> > issuing requests that are unlikely to succeed less frequent.
> >
> > > But your solution (preemptively rejecting SSK requests) would be even
> >
> > worse - if one person started spamming, everyone's SSK requests would
> > start getting rejected. Everyone except the spammer would obey the
> > request throttle, and pretty soon the spammer would be the only one
> > sending SSK requests.
> >
> > - aggressive throttling during overload ensures that malicious
> > excessively high-volume traffic does not generally travel too far from
> > the spammer's own node.
> > - I had not yet proposed to throttle traffic by node that has idle
> > resources (but The Bitch Algorithm is well applicable for non-overloaded
> > nodes, too).
>
> ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.14 -
> 15:43:40GMT -----
>
> > in an anonymous network we don't know how many requests came from each
>
> client, so we don't know whether the requests are redundant.
>
> Yes but that does not change anything for an overloaded node. On the other
> hand, as toad pointed out, redundant request can be caught on request
> sending end, rather than request receiving end - which saves more
> bandwidth. So only spammer's direct peers will be at disadvantage, which is
> acceptable (moreover, it is fair that those supporting spammers get hit
> most).
>
> ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.15 - 23:42:54GMT -----
>
> Okay, this is an interesting argument. What you are proposing, then, is
> this: - When we get a DNF for a specific key requested by a specific peer,
> we remember it.
> - For X period afterwards, if that peer asks for that key, we DNF it. Or
> better, we send a fatal response other than DNF which indicates that we
> aren't passing the request on because of it being requested recently by the
> same peer.
>
> This might work. The classic freenet 0.5 solution was the failure table. We
> removed it. The reason for that is that it wasn't per-peer, and could be
> attacked, and in fact generated bad behaviours all by itself:
>
> Lets say the cooldown is 30 minutes.
> At t=0, request the key. Since this has not been requested yet, it will go
> all the way. When it DNFs, each node along the route will remember to DNF
> it. At t=10 minutes, somebody else requests the key. He gets a DNF. The
> nodes along that route also remember this.
> At t=30.1 minutes, the key is now fetchable along the original path. We
> fetch it again. Because of the vaguaries of routing, it gets routed to one
> of the keys touched by the middle request. So it DNFs again, and the whole
> chain remembers this.
>
> You can get some very nasty effects. If however it is built in such a way
> that it doesn't propagate (it returns a failure other than DNF, or a flag
> on the DNF, to indicate that it must not be propagated), it could be a
> useful tool.
>
> Should it be per-peer? It limits the impact to a degree, but a full DNF
> will involve many nodes...
>
> ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.25 -
> 12:07:23GMT -----
>
> The key point is that once the previously key appears in network, it gets
> propogated toward original requestor almost instantly.
>
> With such property, keeping the tables per-peer is not needed much (only
> needed to know where duplicate inserts should be directed too, once/if the
> key appears).
>
> Whenever the bitch record times out thus breaking virtual search backpath,
> the following happens:
>
> If the key still not on network:
>
> - the new request will bump into the next node with non timed out bitch
> record, thus restoring the backpath (maybe having it more optimal at this
> time).
>
> If the key appears on network:
>
> - the key (its duplicate) will get inserted along the remaining backpath,
> up to the timed out node;
> - the next request for the key will most likely hit the remaining backpath
> (if not the ideal key storage node), and will quickly return the key from
> cache/store.
>
> Another important point is that assuming that routing mostly works
> correctly, majority of the bitch records will be residing on nodes with
> locations very close to the locations of the sought keys, which is going to
> enable certain performance and data structure coding and network global
> optimizations. If on the other hand routing malfunctions (malicious request
> sending node?), the proposed alorithm will not get things any worse.


----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.25 - 17:08:32GMT -----

Okay, you may have a point here. I don't think we should subscribe unless the 
request has a flag saying we actually WANT to subscribe, and in that case, we 
shouldn't add to the failure table unless we are going to subscribe. I will 
post a variant on it as a proposal for lightweight passive requests on the 
list and here. I will include your original thread.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20070808/67fdb19a/attachment.pgp>

[Tech] [old frost] bandwidth usage improvements for old nodes (and much more!)

Reply via email to