[Tech] [old frost] bandwidth usage improvements for old nodes (and much more!)

Matthew Toseland Wed, 8 Aug 2007 16:14:53 +0100
On Wednesday 08 August 2007 16:13, Matthew Toseland wrote:
> On Wednesday 08 August 2007 16:06, Matthew Toseland wrote:
> > On Wednesday 08 August 2007 16:05, Matthew Toseland wrote:
> > > Continued at end to minimise confusion!
> > >
> > > On Wednesday 08 August 2007 15:59, Matthew Toseland wrote:
> > > > Unfortunately this thread is rather rambling, it includes lots of
> > > > discussion on token passing as well as the original premise.
> > > >
> > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.25 -
> > > > 20:18:36GMT -----
> > > >
> > > > I made some measurements on how freenet node behaves if bandwidth
> > > > limit is set low: 10KBps and downto 6KBps (specificially, input
> > > > bandwidth limit; output bandwidth limit was set to at least 15KBps
> > > > but as expected factually used output bandwidth is comparable (just
> > > > slightly above) with factually used input bandwidth). The node itself
> > > > was running frost but no uploads/downloads, so absolute majority of
> > > > network traffic was forwarded CHK/SSK
> > > > requests/inserts.
> > > >
> > > > Results are interesting enough: CHK traffic becomes as low as 5% (in
> > > > packets) of CHK+SSK, while at least 92% of SSK requests were not
> > > > satisfied for assorted failures (plus quite some more certainly
> > > > resulted in NotFound response due to missing the key in whole
> > > > network, but I don't have the number). This makes low traffic node
> > > > working highly inefficient and improportionally slow; this also slows
> > > > down its peers with all the extra reject traffic. Worse, input
> > > > bandwidth sometimes goes over set limit, suggesting that on hardware
> > > > 33600/56000 Kbps modem and even ISDN things will just get worse due
> > > > to increased delays.
> > > >
> > > > Another thing to note: low bandwidth node (LBN) almost exclusively
> > > > reject requests with "input bandwidth liability" reason, and
> > > > extremely rarely other reasons.
> > > >
> > > > Speculating a bit, the same picture will likely be observed for peers
> > > > of fast node (1Mbps or more) with many peers having typical home
> > > > connection of 256Kbps or less.
> > > >
> > > > Not sure if simulations ever showed anything like this, but
> > > > contributing to network mostly SSK service (and absolute majority of
> > > > SSK requests fail!) is rather useless: optimally working network is
> > > > supposed to transfer at least one CHK block for each SSK key, and
> > > > typically much much more (single 10MB file consists of 481 CHK
> > > > blocks!), and even if you found SSK but not CHK the SSK points to,
> > > > then you failed to find information you requested.
> > > >
> > > > OK to make the long story short[er], at the end of this message you
> > > > will find a small patch that noticably improves LBN situation. Idea
> > > > is to reserve some bandwidth for CHK transfers (and SSK inserts, as
> > > > those are too rare to penalize, and more valuable than requests). The
> > > > line directly before the inserted one implicitly penalizes CHK
> > > > transfers (as much smaller SSK requests tend to rereserve bandwidth
> > > > the next moment it got released after CHK transfer finish, while much
> > > > larger CHK requests do not have such good chance), so bandwidth
> > > > should be reserved for 2 CHKs at least (and tests show that's enough
> > > > to make a
> > > > difference).
> > > >
> > > > Another thing I tried was increasing the 90 seconds period up to 120.
> > > > That had some (no numbers here; just "noticeable but small") positive
> > > > effect on making traffic smoother and staying closer to set limit,
> > > > without jumping up and down too much. Where the 90 seconds number
> > > > came from anyway, and how dangerous 120 could be?
> > > >
> > > > Some pros observed and/or thought out during tests of the patch:
> > > > - I observe increase of output payload by approx. 15% (of total
> > > > traffic), making LBN more useful for its peers.
> > > > - the change is negligibly small for faster nodes so should not break
> > > > anything globally.
> > > > - entire network SSK flood traffic will be toned down a little bit
> > > > (at temporary overloaded nodes only), additionally simplifying life
> > > > for LBNs: after all, requesting the same SSK every 15 seconds for 35
> > > > hours, total 8100 times (factual numbers from one of the test before
> > > > this patch applied; there are over 60 other SSKs that were requested
> > > > more than 1000 times during the same period) is just way too much,
> > > > SSKs are not inserted into network THAT fast. [does it worth to
> > > > remember recently seen SSK requests, and do not forward them if same
> > > > request was already forwarded within last 10 minutes and resulted in
> > > > DNF/RNF? Table of recently requested SSKs that are closest to the
> > > > node location should not be too big?].
> > > >
> > > > And contras:
> > > > - in exceptional conditions (specificially, with less than 2 incoming
> > > > CHK requests per 90 seconds; factually I observe 2-7 CHK requests per
> > > > seconds, that's 180-630 per 90 seconds) notwithstanding node
> > > > bandwidth speed, up to 800 Bps might end being unused. For high
> > > > bandwidth node that's just way too small to notice, for LBN that's
> > > > still acceptable (10% of 56Kbps) and will decrease roundtrip delays a
> > > > bit which is always a good thing for so slow links.
> > > >
> > > > Other notes:
> > > > - distribution of location closeness/number of SSK requests is very
> > > > nice: only SSK requests with location very close to node location get
> > > > repeated frequently; farther SSK location is, less requests the node
> > > > sees, with those SSKs seen only once or two times per 1-2 days period
> > > > are distributed evenly among location space. This suggests that
> > > > routing is working fine. - As far as I understand, if input bandwidth
> > > > limit/liability exceeded (but a packet already received anyway),
> > > > CHK/SSK request gets instantly rejected (thus throwing out received
> > > > bytes while input bandwidth has no spare volume!); only otherwise
> > > > node checks if the requested key exists in the storage. Heh? This
> > > > feels like a serious bug hurting overall network performance: better
> > > > query storage and hopefully send back result (or still reject if the
> > > > key not found locally) rather than wait for retry request to waste
> > > > more input bandwidth. At least for SSK reject and reply are
> > > > comparable in output bandwidth usage, so worth a little delay in
> > > > response. Or do I miss something?
> > > >
> > > > ===
> > > > diff --git a/freenet/src/freenet/node/NodeStats.java
> > > > b/freenet/src/freenet/node/NodeStats.java
> > > > index 3b091b4..fb9f8b9 100644
> > > > --- a/freenet/src/freenet/node/NodeStats.java
> > > > +++ b/freenet/src/freenet/node/NodeStats.java
> > > > @@ -414,6 +414,7 @@ public class NodeStats implements Persistable {
> > > >
> > > > successfulChkInsertBytesReceivedAverage.currentValue() *
> > > > node.getNumCHKInserts() +
> > > >
> > > > successfulSskInsertBytesReceivedAverage.currentValue() *
> > > > node.getNumSSKInserts();
> > > >                 bandwidthLiabilityInput += getSuccessfulBytes(isSSK,
> > > > isInsert, true).currentValue();
> > > > +               if (isSSK && !isInsert)
> > > > bandwidthLiabilityInput+=successfulChkFetchBytesReceivedAverage.curre
> > > >nt Va lu e()+successfulChkInsertBytesReceivedAverage.currentValue();
> > > > // slightly penalize SSK requests by reserving bandwidth for 2
> > > > additional CHK transfers (or SSK inserts if any)
> > > >                 double bandwidthAvailableInput =
> > > >                         node.getInputBandwidthLimit() * 90; // 90
> > > > seconds at full power
> > > >                 if(bandwidthLiabilityInput > bandwidthAvailableInput)
> > > > { ===
> > > >
> > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.26 - 16:56:59GMT
> > > > -----
> > > >
> > > > Most SSK requests fail. They DNF. The reason for this is most SSK
> > > > requests are polling for data that has not yet been inserted.
> > > >
> > > > Bandwidth liability is usually the main reason for rejection. If we
> > > > reach most of the other reasons, there is a problem - usually a
> > > > cyclical problem. The main reason for it is to ensure that we don't
> > > > accept so many requests that some of them needlessly timeout even
> > > > though they succeeded. The timeout is 120 seconds, so we need the
> > > > actual transfer to take less than this; on a request, 30 seconds
> > > > seems a reasonable upper bound for the search time. We don't throw
> > > > out many bytes when we reject a request/insert because the bulk of it
> > > > hasn't been sent yet, except with SSKs where typically a little under
> > > > half of the total bytes will have been moved. Ideally we wouldn't
> > > > send requests until we have a good idea that they will be accepted,
> > > > but token passing load balancing is a long way off, not likely to
> > > > happen for 0.7.0.
> > > >
> > > > We cannot control input bandwidth usage precisely.
> > > >
> > > > Any more info on SSK flooding? Is it simply Frost?
> > > >
> > > > We can add a failure table, we had one before, however a failure
> > > > table which results in actually blocking keys can be extremely
> > > > dangerous; what I had envisaged was "per node failure tables" i.e.
> > > > reroute requests which have recently failed to a different node since
> > > > we know it isn't where it's supposed to be.
> > > >
> > > > On what do you base the assertion about key closeness? It would be
> > > > nice to have a histogram or circle on the stats pages showing recent
> > > > keys on the keyspace - can you write a patch?
> > > >
> > > > As far as your patch goes, surely rejecting more SSK requests would
> > > > be counterproductive as it wastes bandwidth? Shouldn't a slow node
> > > > accept those requests it's likely to be able to handle?
> > > >
> > > > I can see an argument that we shouldn't prefer SSKs, and that on slow
> > > > nodes we do prefer SSKs... I'm not sure the above is the right way to
> > > > deal with it though. The effect of the patch would be to never accept
> > > > any SSKs unless we have plenty of spare bandwidth, correct?
> > > >
> > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.26 -
> > > > 18:41:32GMT -----
> > > >
> > > > > Ideally we wouldn't send requests until we have a good idea that
> > > > > they will
> > > >
> > > > be accepted, but token passing load balancing is a long way off, not
> > > > likely to happen for 0.7.0.
> > > >
> > > > Well, even current algorithm implementation has certain room for
> > > > improvement. Here is the typical numbers I observe:
> > > >
> > > > ===
> > > > unclaimedFIFO Message Counts
> > > >     * FNPRejectOverload: 89 (45.2%)
> > > >     * FNPInsertTransfersCompleted: 59 (29.9%)
> > > >     * FNPDataNotFound: 15 (7.6%)
> > > >     * packetTransmit: 12 (6.1%)
> > > >     * FNPRejectLoop: 7 (3.6%)
> > > >     * FNPAccepted: 6 (3.0%)
> > > >     * FNPSwapRejected: 4 (2.0%)
> > > >     * FNPDataInsertRejected: 4 (2.0%)
> > > >     * FNPRouteNotFound: 1 (0.5%)
> > > >     * Unclaimed Messages Considered: 197
> > > > ===
> > > >
> > > > FNPRejectOverload always stays at top sometimes with hundreds
> > > > messages (for the last hour before unclaimed messages expire, that's
> > > > alot), and so indicates that there is some bug (or bugs) with
> > > > bandwidth limiting obeying.
> > > >
> > > > > Any more info on SSK flooding? Is it simply Frost?
> > > >
> > > > Not local frost for sure, it generates just several SSK simultaneous
> > > > requests (by default max 8: 6 for boards plus 2 for filesharing,
> > > > AFAIR; practically 2-4 simutaneous requests most of the time). Other
> > > > 100 SSK requests (without proposed patch) are forwarded ones.
> > > >
> > > > >We can add a failure table, we had one before, however a failure
> > > > > table which
> > > >
> > > > results in actually blocking keys can be extremely dangerous;
> > > >
> > > > Is it, having timeout of max few minutes (i.e. at least few times
> > > > less than SSK propagation time visible with frost messages)? Is it
> > > > more dangerous than current wastage of bandwith for same SSK key
> > > > requests several times per minute? Had some simulations been done on
> > > > that in the past?
> > > >
> > > > BTW, isn't the observed very low store hit rate results from
> > > > prioritising the likely-to-fail SSKs?
> > > >
> > > > BTW2 the failure table could also act as a targetted content
> > > > propagation mechanism: if a node sees SSK insert for a temporary
> > > > blacklisted (non-existing) SSK, then forwarding the insert (more
> > > > likely insert copy, for security reasons and routing sake) to the
> > > > original requestor should speed up propagaton of new SSKs toward the
> > > > nodes that already
> > > > anticipate/await for them.
> > > >
> > > > >what I had envisaged was "per node failure tables" i.e. reroute
> > > > > requests
> > > >
> > > > which have recently failed to a different node since we know it isn't
> > > > where it's supposed to be.
> > > >
> > > > At a glance, very nice idea. But LBNs typically answer with reject,
> > > > not DFN... even with current code. Probably such rerouting will even
> > > > further increase SSK traffic toward LBNs, and get sharply increased
> > > > volume of SSK rejects as result. Hmm, some testing/simulation seems
> > > > really needed here.
> > > >
> > > > >On what do you base the assertion about key closeness? It would be
> > > > > nice to
> > > >
> > > > have a histogram or circle on the stats pages showing recent keys on
> > > > the keyspace - can you write a patch?
> > > >
> > > > Mmmm... in fact I just added custom logging, then a wild combination
> > > > of grep/sed/sort/uniq to analyze the logs. But let me think, maybe
> > > > visualizing a couple of stats files I operate with will be rather
> > > > trivial...
> > > >
> > > > But I would rather stay away from stats page graphics at this time,
> > > > as the stats files I operate  (filtered+sorted+uniqued) with are
> > > > rather large, 20-50Mb each - too much memory for the toy. Unless your
> > > > 'recent' means just 10-15 minutes at most?
> > > >
> > > > >As far as your patch goes, surely rejecting more SSK requests would
> > > > > be
> > > >
> > > > counterproductive as it wastes bandwidth?
> > > >
> > > > Tests show the opposite: without the patch payload output at stats
> > > > page never exceeded 38%, with patch it becomes 53% or little more
> > > > after several minutes upon node restart. So, with the patch SSK/CHK
> > > > forwarding behaviour 'feels' logical:
> > > >
> > > > without patch:
> > > > - just several CHKs, and over over 100 SSKs very typical.
> > > >
> > > > with patch:
> > > > - most of the time (say, 75%) number of currently forwarded CHK
> > > > requests+inserts approximately equals to the number of SSK
> > > > requests+inserts (i.e. 10-25 each, depending on set bandwidth limit);
> > > > - sometimes (say, 10%) CHK requests start to prevail, but current SSK
> > > > requests+inserts seems never go below the amount which CHK get at max
> > > > without patch (i.e. 6-10). This is very typical when number of CHK
> > > > inserts gets several times higher than CHK requests (close fast peer
> > > > inserts something really large?).
> > > > - other times (say, 15%) CHK requests+inserts flow does not saturate
> > > > bandwidth, and number of SSK requests quickly climbs to 50 or even
> > > > over 100+ as it typically gets without the patch.
> > > >
> > > > That's for LBN. Raising input bandwidth allotment, number of SSKs
> > > > quickly grows resembling the situation without the patch.
> > > >
> > > > So that's why I suggest reserving bandwidth for 2 CHK transfers; 3
> > > > would kill SSKs, 1 still makes SSKs to seriously prevail over CHKs
> > > > (but nonetheless gives quite better ratio, so is a legal value to try
> > > > if the value of 2 alarms you too much). Just, in case of reserving
> > > > bandwidth for 1 extra CHK the proposed patch is not really needed:
> > > > simply comment out the line starting with "bandwidthLiabilityInput
> > > > +=" and decrease 90 seconds constant to 80 (10 seconds is roughly how
> > > > much 33.6Kbod modem takes to transmit a single CHK - using anything
> > > > noticeably slower than 28800/33600bod for freenet will not ever work
> > > > well anyway).
> > > >
> > > > >Shouldn't a slow node accept those requests it's likely to be able
> > > > > to handle?
> > > >
> > > > Considering the very high chance of SSK request failures (at lest
> > > > 92%), I would say the answer is no. But with sane SSK failure rate
> > > > (say 75% or below) SSK requests would likely not waste the limited
> > > > thus precious LBN bandwidth so fruitlessly.
> > > >
> > > > The problem, in my belief, is too small size of UDP packets if SSK
> > > > requests prevail: PPP(oE)/TCP/FNP overhead becomes too large while
> > > > LBNs, unlike faster link nodes, almost never coalesce packets,
> > > > obviously.
> > > >
> > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.27 - 17:19:24GMT
> > > > -----
> > > >
> > > > The current algorithm is working, on most nodes, far better than it
> > > > has in *ages*. I'm at 62% of a 700MB ISO, I started inserting it
> > > > yesterday morning, and only a few of my peers are backed off -
> > > > frequently none are backed off, right now it's 11 connected, 6 backed
> > > > off, which is more backed off than I've seen for quite a while.
> > > >
> > > > Re failure tables: Yes it is extremely dangerous. It can result in
> > > > self-reinforcing key censorship, either as an attack or just
> > > > occurring naturally. This happened on 0.5. And the hit ratio is only
> > > > for CHKs iirc.
> > > >
> > > > Even LBNs don't often send local RejectedOverload's on SSKs *once
> > > > they have accepted them*. They may relay downstream RO's but that is
> > > > not fatal. And if they reject some requests, so what, it's a slow
> > > > node, it's bound to reject some requests with the current load
> > > > balancing system.
> > > >
> > > > 10-15 minutes would be interesting. We already show a circle and
> > > > histogram of nearby nodes from swapping and of our peers, you'd just
> > > > have to add another one. It would be good to have a visual proof that
> > > > routing is working on the level of adhering to node specialisations.
> > > > I didn't expect it to be working given the load: I'm surprised that
> > > > it does, it's an interesting result.
> > > >
> > > > Packet size has nothing to do with it, ethernet has a 1472 byte
> > > > maximum. Dial-up has 576 bytes max, but we ignore it, and use
> > > > fragmented packets (this sucks, obviously, as it greatly increases
> > > > the chance of losing a packet and having to retransmit it).
> > > >
> > > > Please explain why the patch doesn't result in never accepting a
> > > > single SSK?
> > > >
> > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.04.27 -
> > > > 19:31:14GMT -----
> > > >
> > > > >Packet size has nothing to do with it, ethernet has a 1472 byte
> > > > > maximum.
> > > >
> > > > Dial-up has 576 bytes max, but we ignore it, and use fragmented
> > > > packets (this sucks, obviously, as it greatly increases the chance of
> > > > losing a packet and having to retransmit it).
> > > >
> > > > I am talking about typical/average packet size, not MTU. LBNs, unlike
> > > > faster nodes, rarely have a chance to coalesce reject responses (over
> > > > max 100ms), and thus send improportionally more tiny packets
> > > > resulting in much higher protocols overhead. Thus having LBNs to
> > > > mostly cater SSKs not CHKs results in lowest imaginable usefulness of
> > > > LBNs for network as a whole.
> > > >
> > > > BTW in my experience typical/default dialup/PPP MTU is 1500 minus
> > > > link level headers, like ethernet. 576 is a reasonable adjustment for
> > > > interactive traffic like ssh but I fail to remember if it was used as
> > > > default since the time the super fast 28800 bod modems became common.
> > > >
> > > > :) 1400+ is the typical size of GPRS PPP packets too, and the same
> > > >
> > > > holds true for other popular wireless mediae like BlueTooth or WiFi;
> > > > so I have no concerns regarding IP fragmentation.
> > > >
> > > > > Please explain why the patch doesn't result in never accepting a
> > > > > single SSK?
> > > >
> > > > I can not. :) Can you explain why the current code that penalizes
> > > > CHKs still gives 5% for them, even if CHKs are 25 times larger and
> > > > similarly less frequent so have really hard time to arrive at the
> > > > exact moment when bandwidth liability is not maxed out?
> > > >
> > > > Seriously, I believe that goes with 2 facts:
> > > >
> > > > - SSK requests are much more frequent, so any temporary drop of CHK
> > > > requests level enables node to quickly get a bunch of new SSKs
> > > > accepted for processing;
> > > > - the large CHK requests (at times while they prevail over SSKs) tend
> > > > to hit other limits too, like "output bandwidth liability",
> > > > "Insufficient input/output bandwidth" throttles. Then the small SSK
> > > > requests quickly pick up all the remaining bandwidth bits.
> > > >
> > > > But currently I do not have relevant statistics to prove that.
> > > >
> > > > Anyway, please commit the following patch - it should equal out
> > > > bandwidth rights for CHKs and SSKs at least half way toward
> > > > fair/expected distribution (and the change will make any difference
> > > > for high-/over-loaded nodes only). Once most of my peers (and their
> > > > peers) update, I will study the new node traffic forwarding
> > > > efficiency and behavior at different bandwidth limits and with
> > > > different penalization levels again - and then will be in better
> > > > position to prove the original proposal of reserving bandwidth for 2
> > > > CHKs is optimal (or maybe withdraw it).
> > > >
> > > > ===
> > > > diff --git a/freenet/src/freenet/node/NodeStats.java
> > > > b/freenet/src/freenet/node/NodeStats.java
> > > > index 3b091b4..98c82c3 100644
> > > > --- a/freenet/src/freenet/node/NodeStats.java
> > > > +++ b/freenet/src/freenet/node/NodeStats.java
> > > > @@ -399,9 +399,8 @@ public class NodeStats implements Persistable {
> > > >
> > > > successfulSskFetchBytesSentAverage.currentValue() *
> > > > node.getNumSSKRequests() +
> > > >
> > > > successfulChkInsertBytesSentAverage.currentValue() *
> > > > node.getNumCHKInserts() +
> > > >
> > > > successfulSskInsertBytesSentAverage.currentValue() *
> > > > node.getNumSSKInserts();
> > > > -               bandwidthLiabilityOutput += getSuccessfulBytes(isSSK,
> > > > isInsert, false).currentValue();
> > > >                 double bandwidthAvailableOutput =
> > > > -                       node.getOutputBandwidthLimit() * 90; // 90
> > > > seconds at full power; we have to leave some time for the search as
> > > > well +                       node.getOutputBandwidthLimit() * 80; //
> > > > 80 seconds at full power; we have to leave some time for the search
> > > > as well bandwidthAvailableOutput *=
> > > > NodeStats.FRACTION_OF_BANDWIDTH_USED_BY_REQUESTS;
> > > >                 if(bandwidthLiabilityOutput >
> > > > bandwidthAvailableOutput) { preemptiveRejectReasons.inc("Output
> > > > bandwidth liability"); @@ -413,9 +412,8 @@ public class NodeStats
> > > > implements Persistable {
> > > >
> > > > successfulSskFetchBytesReceivedAverage.currentValue() *
> > > > node.getNumSSKRequests() +
> > > >
> > > > successfulChkInsertBytesReceivedAverage.currentValue() *
> > > > node.getNumCHKInserts() +
> > > >
> > > > successfulSskInsertBytesReceivedAverage.currentValue() *
> > > > node.getNumSSKInserts();
> > > > -               bandwidthLiabilityInput += getSuccessfulBytes(isSSK,
> > > > isInsert, true).currentValue();
> > > >                 double bandwidthAvailableInput =
> > > > -                       node.getInputBandwidthLimit() * 90; // 90
> > > > seconds at full power
> > > > +                       node.getInputBandwidthLimit() * 80; // 80
> > > > seconds at full power
> > > >                 if(bandwidthLiabilityInput > bandwidthAvailableInput)
> > > > { preemptiveRejectReasons.inc("Input bandwidth liability");
> > > >                         return "Input bandwidth liability";
> > > > ===
> > > >
> > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.04.28 - 17:05:53GMT
> > > > -----
> > > >
> > > > Why does assuming 80 seconds instead of 90 help? I would have
> > > > expected it to make the situation worse.
> > > >
> > > > Isn't what you want to increment the value you are multiplying the
> > > > CHK byte counters by if the request is an SSK? In any case I'm not
> > > > convinced - we accept 32x as many SSKs as CHKs precisely because they
> > > > use 32x less bandwidth. As far as I can see incrementing the CHK
> > > > counts but only on a CHK would just result in us never accepting an
> > > > SSK...
> > > >
> > > > But by all means continue to investigate.
> > > >
> > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.04.30 -
> > > > 19:36:36GMT -----
> > > >
> > > > > we accept 32x as many SSKs as CHKs precisely because they use 32x
> > > > > less
> > > >
> > > > bandwidth.
> > > >
> > > > Sorry but I don't understand the rationale behind this. It seems to
> > > > be based on the assumption that equal resources should be allocated
> > > > to SSKs and CHKs, regardless of whether there's equal demand for
> > > > resources. If we're only getting, say, 16 times as many SSK requests
> > > > as CHK requests, would we reject CHK requests to keep things "fair"?
> > > >
> > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.02 - 16:13:52GMT
> > > > -----
> > > >
> > > > Why should CHKs be prioritised over SSKs?
> > > >
> > > > What do you think of the patch I committed anyway?
> > >
> > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.02 -
> > > 17:03:52GMT -----
> > >
> > > > Why should CHKs be prioritised over SSKs?
> > >
> > > Because SSK success ratio is extremely low (remember the example I gave
> > > within initial message in the thread: 8100 rejected requests for the
> > > same SSK key within 35 hours, and uncounted amount of the same key
> > > requests resulted in DNF - and such key is nowhere near being alone);
> > > linear programming would certainly suggest that SSK requests like they
> > > currently are should be totally disabled. I do not propose anything as
> > > drastic; but as SSKs currently use bandwidth very inefficiently I
> > > propose to tone them down heuristically to approximately CHK level
> > > while node is short on bandwidth (but let SSKs go as much as sending
> > > node needs if spare bandwidth available).
> > >
> > > [not that I meant to speak instead of mrogers]
> > >
> > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.03 -
> > > 13:54:38GMT -----
> > >
> > > >> Why should CHKs be prioritised over SSKs?
> > > >
> > > > Because SSK success ratio is extremely low
> > >
> > > That doesn't make sense for two reasons:
> > >
> > > 1) rejecting SSKs will make the SSK success ratio worse, not better
> > > 2) "SSK not found" is useful information - for example, that's how you
> > > discover the current version of a USK - but "SSK rejected for not being
> > > a CHK" is not useful to anyone
> > >
> > > Let me use an analogy: you're designing a Cisco router. Some of the
> > > packets it gets asked to forward will be small SSH packets, others will
> > > be large HTTP packets. Do you say "half our resources should go to SSH
> > > because we don't want to prioritise HTTP, so we'll only forward one
> > > HTTP packet for every 32 SSH packets"? If you answered yes, you just
> > > lost your job at Cisco. The router's job is to deliver packets to the
> > > best of its ability, not to decide what kind of packets the end hosts
> > > "should" be sending.
> >
> > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.03 -
> > 15:56:54GMT -----
> >
> > 1) accepting more SSK requests will make SSK success ratio worse, not
> > better. If that done in addition to CHK requests (i.e. node bandwidth
> > stays well within desired limits), not a big deal; but if the resultless
> > SSK traffic starts limiting CHK (i.e. real data) transfer, that's a
> > problem.
> >
> > Fact is that SSK key is known well before SSK content, and gets
> > rerequested repeatably on practice. My profiling of network indicates
> > that even with datastore 10 times larger then default, less than 0.05%
> > (and who knows how much less) of SSK requests (not SSK keys!) are
> > currently getting satisfied by underloaded node; even worse for
> > overloaded node (do you have different numbers?). With such miserable SSK
> > requests success rate I see alot of sense to penalize them; once the SSK
> > requests flood situation improves (and there certainly should be SSK
> > centric bugs hiding or improvements pending to either improve SSK
> > requests flow or avoid using SSKs where they are not optimal construct) I
> > will quite possibly and readily change my opinion that underprioritising
> > SSK requests is acceptable and good. But the SSK flood problem does not
> > look like a low hanging fruit at all, expect it to take months if not
> > years for major improvements.
> >
> > But please do not argue with me that equal treating of SSK and CHK
> > requests based on their count not length is good: I totally agree; my
> > point is that for the currently observed network behaviour that is
> > nowhere being the only good choice, and in short-to-middle term not
> > neccessarily the best one.
> >
> > 2) "SSK not found" is a useful information indeed; but the exactly same
> > event repeated 8100 times brings extremely little additional entropy
> > (basic fact of information theory), and mostly means wasted bandwidth on
> > practice. Then again, if the node is not overloaded at the moment, I have
> > nothing against repeating the DNF using otherwise idle bandwidth - even
> > 0.001 bit of entropy per extra SSK request is at least a mediocre (and
> > maybe a quite good) argument against transferring exactly 0 bits of
> > entropy (i.e. keeping network link idle).
> >
> > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.04 - 11:03:09GMT
> > -----
> >
> > Can you explain why you think accepting more SSK requests will make the
> > SSK success ratio worse? A rejected request will fail to reach some parts
> > of the network (at the minimum it won't reach the node that rejected it),
> > so it's more likely to fail.
> >
> > > But please do not argue with me that equal treating of SSK and CHK
> > > requests
> >
> > based on their count not length is good: I totally agree
> >
> > But I don't agree. :-) I don't think we should try to treat them equally
> > in any way. We should just accept as many requests as possible, of
> > whatever kind, without trying to enforce any kind of "fairness" or
> > "equality" between CHKs and SSKs.
> >
> > > "SSK not found" is a useful information indeed; but the exactly same
> > > event
> >
> > repeated 8100 times brings extremely little additional entropy
> >
> > Very true - if the same client is really requesting the same key 8100
> > times then the client should be fixed (eg modified to use exponentially
> > increasing delays when re-requesting a failed key). But we shouldn't try
> > to fix the problem by rejecting (anonymous!) requests further down the
> > line. How do we know it's one client making 8100 requests, not 810
> > clients making 10 requests each?
> >
> > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.10 -
> > 20:36:10GMT -----
> >
> > > Can you explain why you think accepting more SSK requests will make the
> > > SSK
> >
> > success ratio worse?
> >
> > Generally, a set of requests for the same key succeeds only once; during
> > moderately long time period (cache lifetime, probably few days to a
> > couple of weeks) maximum (N-1) times where N is number of peers, provided
> > that it was not the current node that inserted the key. ALL the previous
> > requests (made before content is available) fail, and if you do more
> > unsucessful requests you decrease success ratio.
> >
> > I see your rationale for a request made in search for content that was on
> > network for some time already, and thus will be either found quickly, or
> > after few attempts the original requestor will cease searching for it
> > (and hopefully will insert it later, after FEC recovery, if applicable).
> > But such behaviour is typical for CHK keys; while observation for SSK
> > requests behaviour shows very different pattern: keys missing in network
> > getting requested repeatedly, often without facing success at the end.
> >
> > > But I don't agree. :-)
> >
> > Ugh :-)
> >
> > > I don't think we should try to treat them equally in any way. We should
> > > just
> >
> > accept as many requests as possible, of whatever kind, without trying to
> > enforce any kind of "fairness" or "equality" between CHKs and SSKs.
> >
> > Doesn't that really means equal treating, or in other words: processing
> > without prejudice?
> >
> > > How do we know it's one client making 8100 requests, not 810 clients
> > > making
> >
> > 10 requests each?
> >
> > That makes no real difference for the request processing node. If it had
> > the requested data, it would respond (at most N-1 times) and would no
> > longer see requests for that particular key for days, due to peer caching
> > (provided the node does not have malicious peers). If the requests for
> > the same key get repeated again and again, that basically wastes traffic
> > - and so if node works at its bandwidth limit, the limited traffic should
> > better be used for work that is statistically expected to be more useful.
> >
> > Fixing apps is always a good thing. But there are 2 problems: first, in
> > anonymous network it is supposed to be extremely hard to find out source
> > node or software making the requests put aside actual bug tracking;
> > second, the repeating requests could be malicious - and presense of
> > spammer on the network is not sufficient reason for many other nodes to
> > silently submit a share of bandwidth for the spammer.
> >
> > With CHK keys that reserve 15 times higher bandwidth, the spam rate drops
> > quickly with each hop; but the small SSK keys tends to travel 15 times
> > further (over overloaded links; no difference if all nodes are
> > underloaded) making SSK flood attack, while not fatal or noticeable
> > without targetted network profiling, quite more damaging for network
> > performance as a whole.
> >
> > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.11 - 14:30:49GMT
> > -----
> >
> > > Generally, a set of requests for the same key succeeds only once
> >
> > OK, I see your point, if there are lots of redundant requests from the
> > same client we can improve the success ratio by rejecting most of them.
> > But we shouldn't do it, for two reasons:
> >
> > First, as you pointed out, in an anonymous network we don't know how many
> > requests came from each client, so we don't know whether the requests are
> > redundant.
> >
> > Second, and more importantly, rejecting a request has side-effects: it
> > causes the client to slow down, and it causes the request to go
> > elsewhere. Rejections are interpreted as a sign of overload, just like
> > packet losses in TCP - rejecting a request because the data's not in the
> > network is like dropping a TCP packet instead of returning a 404 error.
> > OK, it saves a bit of bandwidth, but it confuses the client.
> >
> > It *might* make sense to return DataNotFound if the same key was recently
> > searched for and not found, but as Toad has pointed out that might make
> > it possible to blacklist certain keys (if you get a request for a key you
> > want to censor, return DataNotFound and all the upstream nodes will cache
> > the result).
> >
> > >  keys missing in network getting requested repeatedly, often without
> > > facing
> >
> > success at the end.
> >
> > Nevertheless, "DataNotFound" is useful information. It's worth spending
> > bandwidth to find out that a key is not in the network.
> >
> > > Doesn't that really means equal treating, or in other words: processing
> >
> > without prejudice?
> >
> > Yes, processing without prejudice is what I meant - I thought you meant
> > "equal quotas" rather than "equal opportunities". Oh god, I hope Lomion's
> > not reading this thread. :-)
> >
> > > If the requests for the same key get repeated again and again, that
> >
> > basically wastes traffic
> >
> > Maybe so, but rejecting requests is not the solution because it has
> > side-effects. Can you think of a way to cache the results of recent
> > searches without making it possible to blacklist certain keys?
> >
> > >  presense of spammer on the network is not sufficient reason for many
> > > other
> >
> > nodes to silently submit a share of bandwidth for the spammer.
> >
> > But your solution (preemptively rejecting SSK requests) would be even
> > worse - if one person started spamming, everyone's SSK requests would
> > start getting rejected. Everyone except the spammer would obey the
> > request throttle, and pretty soon the spammer would be the only one
> > sending SSK requests.
> >
> > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.14 -
> > 15:43:40GMT -----
> >
> > > in an anonymous network we don't know how many requests came from each
> >
> > client, so we don't know whether the requests are redundant.
> >
> > Yes but that does not change anything for an overloaded node. On the
> > other hand, as toad pointed out, redundant request can be caught on
> > request sending end, rather than request receiving end - which saves more
> > bandwidth. So only spammer's direct peers will be at disadvantage, which
> > is acceptable (moreover, it is fair that those supporting spammers get
> > hit most).
> >
> > > Second, and more importantly, rejecting a request has side-effects: it
> >
> > causes the client to slow down, and it causes the request to go
> > elsewhere. Rejections are interpreted as a sign of overload
> >
> > Exactly. If forwarding node is not overloaded, it should not reject
> > requests. If client that caused other node(s) gets slowed down something,
> > just makes things more 'fair'.
> >
> > > Maybe so, but rejecting requests is not the solution because it has
> >
> > side-effects. Can you think of a way to cache the results of recent
> > searches without making it possible to blacklist certain keys?
> >
> > Rejecting requests during overload is the only possibility (aside node
> > harakiri, maybe). As far as algorithm goes, here The Bitch Algorithm is
> > proudly presented: :)
> >
> > - A node seeing request for a key which recently failed, responds with
> > DNF, with a little but important postscriptum attached: "and BTW do not
> > ever call me again". (Technically: "installed a listener for the key").
> > The request sending node(s) memorizes that.
> >
> > - If such listener already existed, then the request sending node
> > misbehaving, and should be penalised. Seriously.
> >
> > - The request sender node might try to search the key with other peers,
> > but do not that aggressively.
> >
> > - The request sender key listener queue is of limited length. If request
> > sender needs to eject queue element to conserve memory (or my be just
> > having the element timed out), it sends a notice "was not even going to"
> > ("Listener lease expired"). This allows (but not forces) request
> > receiving node to clear that element from its queue.
> >
> > - The request receiver key listener queue is of limited length too. So if
> > request receiving node decides to eject a queued element, it sends a
> > notice: "OK maybe if you call me again we could chat a little bit". This
> > gets propagated back the same as the "and BTW do not ever call me again"
> > reply allowing to clear the similar queue element on other nodes that
> > searched the key, and/or retry the request (likely using different peer).
> >
> > - The most interesting thing: if a node sees a key insert and/or key
> > transfer as result of request, it sends a duplicate inserts to all the
> > peers who are waiting they key for (as they need the key) with a virtual
> > comment "here is your stupid toy, I just bought a similar for myself",
> > and also creates new inserts directed to all the peers the key was
> > requested from with virtual comment "please do not be sad and keep that
> > toy, I just got an identical toy: just have a look" - as other nodes are
> > most likely will be searching the key in the same direction.
> >
> > Of course such listener mechanism will form a tree graph, located over
> > network links, with root being located at the node that (at least
> > locally) has closest location to the location of the packet.
> >
> > Thinking of the stats I saw, I would say that just 100-200 SSK keys per
> > listener queue could reduce SSK traffic as much as 30-50% maybe (thanks
> > to the good location distribution), while having those listened keys, if
> > they ever listened, get appear on the original requestor node ASAP. Even
> > if I am overly optimistic and decrease is just 10%, that still alot
> > considering the current SSK flood.
> >
> > Another note is that application sending requests via FCP should see the
> > bitch response distinguishable from plain DNF, so that non-maliciuous
> > software could cooperate with nodes to keep useless traffic smaller by
> > issuing requests that are unlikely to succeed less frequent.
> >
> > > But your solution (preemptively rejecting SSK requests) would be even
> >
> > worse - if one person started spamming, everyone's SSK requests would
> > start getting rejected. Everyone except the spammer would obey the
> > request throttle, and pretty soon the spammer would be the only one
> > sending SSK requests.
> >
> > - aggressive throttling during overload ensures that malicious
> > excessively high-volume traffic does not generally travel too far from
> > the spammer's own node.
> > - I had not yet proposed to throttle traffic by node that has idle
> > resources (but The Bitch Algorithm is well applicable for non-overloaded
> > nodes, too).
>
> ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.14 -
> 15:43:40GMT -----
>
> > in an anonymous network we don't know how many requests came from each
>
> client, so we don't know whether the requests are redundant.
>
> Yes but that does not change anything for an overloaded node. On the other
> hand, as toad pointed out, redundant request can be caught on request
> sending end, rather than request receiving end - which saves more
> bandwidth. So only spammer's direct peers will be at disadvantage, which is
> acceptable (moreover, it is fair that those supporting spammers get hit
> most).
>
> ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.15 - 23:42:54GMT -----
>
> Okay, this is an interesting argument. What you are proposing, then, is
> this: - When we get a DNF for a specific key requested by a specific peer,
> we remember it.
> - For X period afterwards, if that peer asks for that key, we DNF it. Or
> better, we send a fatal response other than DNF which indicates that we
> aren't passing the request on because of it being requested recently by the
> same peer.
>
> This might work. The classic freenet 0.5 solution was the failure table. We
> removed it. The reason for that is that it wasn't per-peer, and could be
> attacked, and in fact generated bad behaviours all by itself:
>
> Lets say the cooldown is 30 minutes.
> At t=0, request the key. Since this has not been requested yet, it will go
> all the way. When it DNFs, each node along the route will remember to DNF
> it. At t=10 minutes, somebody else requests the key. He gets a DNF. The
> nodes along that route also remember this.
> At t=30.1 minutes, the key is now fetchable along the original path. We
> fetch it again. Because of the vaguaries of routing, it gets routed to one
> of the keys touched by the middle request. So it DNFs again, and the whole
> chain remembers this.
>
> You can get some very nasty effects. If however it is built in such a way
> that it doesn't propagate (it returns a failure other than DNF, or a flag
> on the DNF, to indicate that it must not be propagated), it could be a
> useful tool.
>
> Should it be per-peer? It limits the impact to a degree, but a full DNF
> will involve many nodes...
>
> ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.25 -
> 12:07:23GMT -----
>
> The key point is that once the previously key appears in network, it gets
> propogated toward original requestor almost instantly.
>
> With such property, keeping the tables per-peer is not needed much (only
> needed to know where duplicate inserts should be directed too, once/if the
> key appears).
>
> Whenever the bitch record times out thus breaking virtual search backpath,
> the following happens:
>
> If the key still not on network:
>
> - the new request will bump into the next node with non timed out bitch
> record, thus restoring the backpath (maybe having it more optimal at this
> time).
>
> If the key appears on network:
>
> - the key (its duplicate) will get inserted along the remaining backpath,
> up to the timed out node;
> - the next request for the key will most likely hit the remaining backpath
> (if not the ideal key storage node), and will quickly return the key from
> cache/store.
>
> Another important point is that assuming that routing mostly works
> correctly, majority of the bitch records will be residing on nodes with
> locations very close to the locations of the sought keys, which is going to
> enable certain performance and data structure coding and network global
> optimizations. If on the other hand routing malfunctions (malicious request
> sending node?), the proposed alorithm will not get things any worse.
----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.25 - 16:37:25GMT -----
Passive requests are really a lot harder than you think they are. The hard
part is error handling: What happens when you lose a connection through which
you were subscribed to a key? If you don't provide a reliable service, the
client will have to poll for it. If you do provide a reliable service, you
have to deal with every possible error scenario, which is a pig (I wasted
significant time trying on this to get pub/sub to work).
Making it implicit when a request fails and tying it to the failure tables is
nice, however it is virtually guaranteed that when the key turns up it won't
be on that path - especially if you block all requests for it on that path
from actually continuing to find it.
----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.25 -
20:23:46GMT -----
Providing totally reliable service is not reasonable due to RAM usage
considerations anyway.
----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.25 - 22:17:01GMT -----
Not true, TCP does it.
----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.26 -
03:47:29GMT -----
TCP does not use key tables, too. :)
As your proposal variant does not include 'subscription cancellation' notices
or similar mechanism (and that's indeed one of the legal options), you either
keep in RAM arbitrary large list of recently requested keys, or give up on
reliability thus still requiring reduced level of polling.
----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.26 - 23:09:20GMT -----
Sorry, getting two arguments mixed up here. You are right, we lose
reliability. Ultra-lightweight passive requests, as I'm naming my variant on
your proposal, are inherently unreliable. Full blown heavyweight passive
requests would be reliable and incorporate not only cancellation notices but
also downstream rerouting (this is necessary for security, since we can't
have only the original requestor rerouting, and for minimising the network
load). But they will be considerably harder to get right, and this is a good
first step which hopefully we can implement in the near future, rather than
in 0.8.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/tech/attachments/20070808/417bd6c8/attachment.pgp>
[Tech] [old frost] bandwidth usage improvements for old nodes (and much more!)

Reply via email to