[Tech] [old frost] bandwidth usage improvements for old nodes (and much more!)

Matthew Toseland Wed, 8 Aug 2007 16:59:49 +0100

On Wednesday 08 August 2007 16:54, Matthew Toseland wrote:
> On Wednesday 08 August 2007 16:52, Matthew Toseland wrote:
> > On Wednesday 08 August 2007 16:51, Matthew Toseland wrote:
> > > On Wednesday 08 August 2007 16:48, Matthew Toseland wrote:
> > > > On Wednesday 08 August 2007 16:47, Matthew Toseland wrote:
> > > > > On Wednesday 08 August 2007 16:45, Matthew Toseland wrote:
> > > > > > On Wednesday 08 August 2007 16:42, Matthew Toseland wrote:
> > > > > > > On Wednesday 08 August 2007 16:26, Matthew Toseland wrote:
> > > > > > > > On Wednesday 08 August 2007 16:25, Matthew Toseland wrote:
> > > > > > > > > On Wednesday 08 August 2007 16:23, Matthew Toseland wrote:
> > > > > > > > > > On Wednesday 08 August 2007 16:22, Matthew Toseland wrote:
> > > > > > > > > > > On Wednesday 08 August 2007 16:20, Matthew Toseland 
wrote:
> > > > > > > > > > > > On Wednesday 08 August 2007 16:19, Matthew Toseland
>
> wrote:
> > > > > > > > > > > > > On Wednesday 08 August 2007 16:17, Matthew Toseland
> >
> > wrote:
> > > > > > > > > > > > > > On Wednesday 08 August 2007 16:05, Matthew
> > > > > > > > > > > > > > Toseland
> > >
> > > wrote:
> > > > > > > > > > > > > > > Continued at end to minimise confusion!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wednesday 08 August 2007 15:59, Matthew
> > > > > > > > > > > > > > > Toseland
> > > >
> > > > wrote:
> > > > > > > > > > > > > > > > Unfortunately this thread is rather rambling,
> > > > > > > > > > > > > > > > it includes lots of discussion on token
> > > > > > > > > > > > > > > > passing as well as the original premise.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI
> > > > > > > > > > > > > > > > ----- 2007.04.25 - 20:18:36GMT -----
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I made some measurements on how freenet node
> > > > > > > > > > > > > > > > behaves if bandwidth limit is set low: 10KBps
> > > > > > > > > > > > > > > > and downto 6KBps (specificially, input
> > > > > > > > > > > > > > > > bandwidth limit; output bandwidth limit was
> > > > > > > > > > > > > > > > set to at least 15KBps but as expected
> > > > > > > > > > > > > > > > factually used output bandwidth is comparable
> > > > > > > > > > > > > > > > (just slightly above) with factually used
> > > > > > > > > > > > > > > > input bandwidth). The node itself was running
> > > > > > > > > > > > > > > > frost but no
> > > > > > > > > > > > > > > > uploads/downloads, so absolute majority of
> > > > > > > > > > > > > > > > network traffic was forwarded CHK/SSK
> > > > > > > > > > > > > > > > requests/inserts.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Results are interesting enough: CHK traffic
> > > > > > > > > > > > > > > > becomes as low as 5% (in packets) of CHK+SSK,
> > > > > > > > > > > > > > > > while at least 92% of SSK requests were not
> > > > > > > > > > > > > > > > satisfied for assorted failures (plus quite
> > > > > > > > > > > > > > > > some more certainly resulted in NotFound
> > > > > > > > > > > > > > > > response due to missing the key in whole
> > > > > > > > > > > > > > > > network, but I don't have the number). This
> > > > > > > > > > > > > > > > makes low traffic node working highly
> > > > > > > > > > > > > > > > inefficient and improportionally slow; this
> > > > > > > > > > > > > > > > also slows down its peers with all the extra
> > > > > > > > > > > > > > > > reject traffic. Worse, input bandwidth
> > > > > > > > > > > > > > > > sometimes goes over set limit, suggesting
> > > > > > > > > > > > > > > > that on hardware 33600/56000 Kbps modem and
> > > > > > > > > > > > > > > > even ISDN things will just get worse due to
> > > > > > > > > > > > > > > > increased delays.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Another thing to note: low bandwidth node
> > > > > > > > > > > > > > > > (LBN) almost exclusively reject requests with
> > > > > > > > > > > > > > > > "input bandwidth liability" reason, and
> > > > > > > > > > > > > > > > extremely rarely other reasons.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Speculating a bit, the same picture will
> > > > > > > > > > > > > > > > likely be observed for peers of fast node
> > > > > > > > > > > > > > > > (1Mbps or more) with many peers having
> > > > > > > > > > > > > > > > typical home connection of 256Kbps or less.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Not sure if simulations ever showed anything
> > > > > > > > > > > > > > > > like this, but contributing to network mostly
> > > > > > > > > > > > > > > > SSK service (and absolute majority of SSK
> > > > > > > > > > > > > > > > requests fail!) is rather useless: optimally
> > > > > > > > > > > > > > > > working network is supposed to transfer at
> > > > > > > > > > > > > > > > least one CHK block for each SSK key, and
> > > > > > > > > > > > > > > > typically much much more (single 10MB file
> > > > > > > > > > > > > > > > consists of 481 CHK blocks!), and even if you
> > > > > > > > > > > > > > > > found SSK but not CHK the SSK points to, then
> > > > > > > > > > > > > > > > you failed to find information you requested.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > OK to make the long story short[er], at the
> > > > > > > > > > > > > > > > end of this message you will find a small
> > > > > > > > > > > > > > > > patch that noticably improves LBN situation.
> > > > > > > > > > > > > > > > Idea is to reserve some bandwidth for CHK
> > > > > > > > > > > > > > > > transfers (and SSK inserts, as those are too
> > > > > > > > > > > > > > > > rare to penalize, and more valuable than
> > > > > > > > > > > > > > > > requests). The line directly before the
> > > > > > > > > > > > > > > > inserted one implicitly penalizes CHK
> > > > > > > > > > > > > > > > transfers (as much smaller SSK requests tend
> > > > > > > > > > > > > > > > to rereserve bandwidth the next moment it got
> > > > > > > > > > > > > > > > released after CHK transfer finish, while
> > > > > > > > > > > > > > > > much larger CHK requests do not have such
> > > > > > > > > > > > > > > > good chance), so bandwidth should be reserved
> > > > > > > > > > > > > > > > for 2 CHKs at least (and tests show that's
> > > > > > > > > > > > > > > > enough to make a difference).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Another thing I tried was increasing the 90
> > > > > > > > > > > > > > > > seconds period up to 120. That had some (no
> > > > > > > > > > > > > > > > numbers here; just "noticeable but small")
> > > > > > > > > > > > > > > > positive effect on making traffic smoother
> > > > > > > > > > > > > > > > and staying closer to set limit, without
> > > > > > > > > > > > > > > > jumping up and down too much. Where the 90
> > > > > > > > > > > > > > > > seconds number came from anyway, and how
> > > > > > > > > > > > > > > > dangerous 120 could be?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Some pros observed and/or thought out during
> > > > > > > > > > > > > > > > tests of the patch: - I observe increase of
> > > > > > > > > > > > > > > > output payload by approx. 15% (of total
> > > > > > > > > > > > > > > > traffic), making LBN more useful for its
> > > > > > > > > > > > > > > > peers. - the change is negligibly small for
> > > > > > > > > > > > > > > > faster nodes so should not break anything
> > > > > > > > > > > > > > > > globally. - entire network SSK flood traffic
> > > > > > > > > > > > > > > > will be toned down a little bit (at temporary
> > > > > > > > > > > > > > > > overloaded nodes only), additionally
> > > > > > > > > > > > > > > > simplifying life for LBNs: after all,
> > > > > > > > > > > > > > > > requesting the same SSK every 15 seconds for
> > > > > > > > > > > > > > > > 35 hours, total 8100 times (factual numbers
> > > > > > > > > > > > > > > > from one of the test before this patch
> > > > > > > > > > > > > > > > applied; there are over 60 other SSKs that
> > > > > > > > > > > > > > > > were requested more than 1000 times during
> > > > > > > > > > > > > > > > the same period) is just way too much, SSKs
> > > > > > > > > > > > > > > > are not inserted into network THAT fast.
> > > > > > > > > > > > > > > > [does it worth to remember recently seen SSK
> > > > > > > > > > > > > > > > requests, and do not forward them if same
> > > > > > > > > > > > > > > > request was already forwarded within last 10
> > > > > > > > > > > > > > > > minutes and resulted in DNF/RNF? Table of
> > > > > > > > > > > > > > > > recently requested SSKs that are closest to
> > > > > > > > > > > > > > > > the node location should not be too big?].
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > And contras:
> > > > > > > > > > > > > > > > - in exceptional conditions (specificially,
> > > > > > > > > > > > > > > > with less than 2 incoming CHK requests per 90
> > > > > > > > > > > > > > > > seconds; factually I observe 2-7 CHK requests
> > > > > > > > > > > > > > > > per seconds, that's 180-630 per 90 seconds)
> > > > > > > > > > > > > > > > notwithstanding node bandwidth speed, up to
> > > > > > > > > > > > > > > > 800 Bps might end being unused. For high
> > > > > > > > > > > > > > > > bandwidth node that's just way too small to
> > > > > > > > > > > > > > > > notice, for LBN that's still acceptable (10%
> > > > > > > > > > > > > > > > of 56Kbps) and will decrease roundtrip delays
> > > > > > > > > > > > > > > > a bit which is always a good thing for so
> > > > > > > > > > > > > > > > slow links.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Other notes:
> > > > > > > > > > > > > > > > - distribution of location closeness/number
> > > > > > > > > > > > > > > > of SSK requests is very nice: only SSK
> > > > > > > > > > > > > > > > requests with location very close to node
> > > > > > > > > > > > > > > > location get repeated frequently; farther SSK
> > > > > > > > > > > > > > > > location is, less requests the node sees,
> > > > > > > > > > > > > > > > with those SSKs seen only once or two times
> > > > > > > > > > > > > > > > per 1-2 days period are distributed evenly
> > > > > > > > > > > > > > > > among location space. This suggests that
> > > > > > > > > > > > > > > > routing is working fine. - As far as I
> > > > > > > > > > > > > > > > understand, if input bandwidth
> > > > > > > > > > > > > > > > limit/liability exceeded (but a packet
> > > > > > > > > > > > > > > > already received anyway), CHK/SSK request
> > > > > > > > > > > > > > > > gets instantly rejected (thus throwing out
> > > > > > > > > > > > > > > > received bytes while input bandwidth has no
> > > > > > > > > > > > > > > > spare volume!); only otherwise node checks if
> > > > > > > > > > > > > > > > the requested key exists in the storage. Heh?
> > > > > > > > > > > > > > > > This feels like a serious bug hurting overall
> > > > > > > > > > > > > > > > network performance: better query storage and
> > > > > > > > > > > > > > > > hopefully send back result (or still reject
> > > > > > > > > > > > > > > > if the key not found locally) rather than
> > > > > > > > > > > > > > > > wait for retry request to waste more input
> > > > > > > > > > > > > > > > bandwidth. At least for SSK reject and reply
> > > > > > > > > > > > > > > > are comparable in output bandwidth usage, so
> > > > > > > > > > > > > > > > worth a little delay in response. Or do I
> > > > > > > > > > > > > > > > miss something?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ===
> > > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > > a/freenet/src/freenet/node/NodeStats.java
> > > > > > > > > > > > > > > > b/freenet/src/freenet/node/NodeStats.java
> > > > > > > > > > > > > > > > index 3b091b4..fb9f8b9 100644
> > > > > > > > > > > > > > > > --- a/freenet/src/freenet/node/NodeStats.java
> > > > > > > > > > > > > > > > +++ b/freenet/src/freenet/node/NodeStats.java
> > > > > > > > > > > > > > > > @@ -414,6 +414,7 @@ public class NodeStats
> > > > > > > > > > > > > > > > implements Persistable {
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > successfulChkInsertBytesReceivedAverage.curre
> > > > > > > > > > > > > > > >nt Va lu e( ) * node.getNumCHKInserts() +
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > successfulSskInsertBytesReceivedAverage.curre
> > > > > > > > > > > > > > > >nt Va lu e( ) * node.getNumSSKInserts();
> > > > > > > > > > > > > > > > bandwidthLiabilityInput +=
> > > > > > > > > > > > > > > > getSuccessfulBytes(isSSK, isInsert,
> > > > > > > > > > > > > > > > true).currentValue(); +               if
> > > > > > > > > > > > > > > > (isSSK && !isInsert)
> > > > > > > > > > > > > > > > bandwidthLiabilityInput+=successfulChkFetchBy
> > > > > > > > > > > > > > > >te sR ec ei ve dA ve ra ge .c ur re nt Va lu
> > > > > > > > > > > > > > > > e()+successfulChkInsertBytesReceivedAverage.c
> > > > > > > > > > > > > > > >ur re nt Va lu e( ); // slightly penalize SSK
> > > > > > > > > > > > > > > > requests by reserving bandwidth for 2
> > > > > > > > > > > > > > > > additional CHK transfers (or SSK inserts if
> > > > > > > > > > > > > > > > any) double bandwidthAvailableInput =
> > > > > > > > > > > > > > > > node.getInputBandwidthLimit() * 90; // 90
> > > > > > > > > > > > > > > > seconds at full power
> > > > > > > > > > > > > > > >                 if(bandwidthLiabilityInput >
> > > > > > > > > > > > > > > > bandwidthAvailableInput) { ===
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA -----
> > > > > > > > > > > > > > > > 2007.04.26 - 16:56:59GMT -----
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Most SSK requests fail. They DNF. The reason
> > > > > > > > > > > > > > > > for this is most SSK requests are polling for
> > > > > > > > > > > > > > > > data that has not yet been inserted.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Bandwidth liability is usually the main
> > > > > > > > > > > > > > > > reason for rejection. If we reach most of the
> > > > > > > > > > > > > > > > other reasons, there is a problem - usually a
> > > > > > > > > > > > > > > > cyclical problem. The main reason for it is
> > > > > > > > > > > > > > > > to ensure that we don't accept so many
> > > > > > > > > > > > > > > > requests that some of them needlessly timeout
> > > > > > > > > > > > > > > > even though they succeeded. The timeout is
> > > > > > > > > > > > > > > > 120 seconds, so we need the actual transfer
> > > > > > > > > > > > > > > > to take less than this; on a request, 30
> > > > > > > > > > > > > > > > seconds seems a reasonable upper bound for
> > > > > > > > > > > > > > > > the search time. We don't throw out many
> > > > > > > > > > > > > > > > bytes when we reject a request/insert because
> > > > > > > > > > > > > > > > the bulk of it hasn't been sent yet, except
> > > > > > > > > > > > > > > > with SSKs where typically a little under half
> > > > > > > > > > > > > > > > of the total bytes will have been moved.
> > > > > > > > > > > > > > > > Ideally we wouldn't send requests until we
> > > > > > > > > > > > > > > > have a good idea that they will be accepted,
> > > > > > > > > > > > > > > > but token passing load balancing is a long
> > > > > > > > > > > > > > > > way off, not likely to happen for 0.7.0.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We cannot control input bandwidth usage
> > > > > > > > > > > > > > > > precisely.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Any more info on SSK flooding? Is it simply
> > > > > > > > > > > > > > > > Frost?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We can add a failure table, we had one
> > > > > > > > > > > > > > > > before, however a failure table which results
> > > > > > > > > > > > > > > > in actually blocking keys can be extremely
> > > > > > > > > > > > > > > > dangerous; what I had envisaged was "per node
> > > > > > > > > > > > > > > > failure tables" i.e. reroute requests which
> > > > > > > > > > > > > > > > have recently failed to a different node
> > > > > > > > > > > > > > > > since we know it isn't where it's supposed to
> > > > > > > > > > > > > > > > be.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On what do you base the assertion about key
> > > > > > > > > > > > > > > > closeness? It would be nice to have a
> > > > > > > > > > > > > > > > histogram or circle on the stats pages
> > > > > > > > > > > > > > > > showing recent keys on the keyspace - can you
> > > > > > > > > > > > > > > > write a patch?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > As far as your patch goes, surely rejecting
> > > > > > > > > > > > > > > > more SSK requests would be counterproductive
> > > > > > > > > > > > > > > > as it wastes bandwidth? Shouldn't a slow node
> > > > > > > > > > > > > > > > accept those requests it's likely to be able
> > > > > > > > > > > > > > > > to handle?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I can see an argument that we shouldn't
> > > > > > > > > > > > > > > > prefer SSKs, and that on slow nodes we do
> > > > > > > > > > > > > > > > prefer SSKs... I'm not sure the above is the
> > > > > > > > > > > > > > > > right way to deal with it though. The effect
> > > > > > > > > > > > > > > > of the patch would be to never accept any
> > > > > > > > > > > > > > > > SSKs unless we have plenty of spare
> > > > > > > > > > > > > > > > bandwidth, correct?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI
> > > > > > > > > > > > > > > > ----- 2007.04.26 - 18:41:32GMT -----
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Ideally we wouldn't send requests until we
> > > > > > > > > > > > > > > > > have a good idea that they will
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > be accepted, but token passing load balancing
> > > > > > > > > > > > > > > > is a long way off, not likely to happen for
> > > > > > > > > > > > > > > > 0.7.0.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Well, even current algorithm implementation
> > > > > > > > > > > > > > > > has certain room for improvement. Here is the
> > > > > > > > > > > > > > > > typical numbers I observe:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ===
> > > > > > > > > > > > > > > > unclaimedFIFO Message Counts
> > > > > > > > > > > > > > > >     * FNPRejectOverload: 89 (45.2%)
> > > > > > > > > > > > > > > >     * FNPInsertTransfersCompleted: 59 (29.9%)
> > > > > > > > > > > > > > > >     * FNPDataNotFound: 15 (7.6%)
> > > > > > > > > > > > > > > >     * packetTransmit: 12 (6.1%)
> > > > > > > > > > > > > > > >     * FNPRejectLoop: 7 (3.6%)
> > > > > > > > > > > > > > > >     * FNPAccepted: 6 (3.0%)
> > > > > > > > > > > > > > > >     * FNPSwapRejected: 4 (2.0%)
> > > > > > > > > > > > > > > >     * FNPDataInsertRejected: 4 (2.0%)
> > > > > > > > > > > > > > > >     * FNPRouteNotFound: 1 (0.5%)
> > > > > > > > > > > > > > > >     * Unclaimed Messages Considered: 197
> > > > > > > > > > > > > > > > ===
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > FNPRejectOverload always stays at top
> > > > > > > > > > > > > > > > sometimes with hundreds messages (for the
> > > > > > > > > > > > > > > > last hour before unclaimed messages expire,
> > > > > > > > > > > > > > > > that's alot), and so indicates that there is
> > > > > > > > > > > > > > > > some bug (or bugs) with bandwidth limiting
> > > > > > > > > > > > > > > > obeying.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Any more info on SSK flooding? Is it simply
> > > > > > > > > > > > > > > > > Frost?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Not local frost for sure, it generates just
> > > > > > > > > > > > > > > > several SSK simultaneous requests (by default
> > > > > > > > > > > > > > > > max 8: 6 for boards plus 2 for filesharing,
> > > > > > > > > > > > > > > > AFAIR; practically 2-4 simutaneous requests
> > > > > > > > > > > > > > > > most of the time). Other 100 SSK requests
> > > > > > > > > > > > > > > > (without proposed patch) are forwarded ones.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >We can add a failure table, we had one
> > > > > > > > > > > > > > > > > before, however a failure table which
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > results in actually blocking keys can be
> > > > > > > > > > > > > > > > extremely dangerous;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Is it, having timeout of max few minutes
> > > > > > > > > > > > > > > > (i.e. at least few times less than SSK
> > > > > > > > > > > > > > > > propagation time visible with frost
> > > > > > > > > > > > > > > > messages)? Is it more dangerous than current
> > > > > > > > > > > > > > > > wastage of bandwith for same SSK key requests
> > > > > > > > > > > > > > > > several times per minute? Had some
> > > > > > > > > > > > > > > > simulations been done on that in the past?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > BTW, isn't the observed very low store hit
> > > > > > > > > > > > > > > > rate results from prioritising the
> > > > > > > > > > > > > > > > likely-to-fail SSKs?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > BTW2 the failure table could also act as a
> > > > > > > > > > > > > > > > targetted content propagation mechanism: if a
> > > > > > > > > > > > > > > > node sees SSK insert for a temporary
> > > > > > > > > > > > > > > > blacklisted (non-existing) SSK, then
> > > > > > > > > > > > > > > > forwarding the insert (more likely insert
> > > > > > > > > > > > > > > > copy, for security reasons and routing sake)
> > > > > > > > > > > > > > > > to the original requestor should speed up
> > > > > > > > > > > > > > > > propagaton of new SSKs toward the nodes that
> > > > > > > > > > > > > > > > already anticipate/await for them.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >what I had envisaged was "per node failure
> > > > > > > > > > > > > > > > > tables" i.e. reroute requests
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > which have recently failed to a different
> > > > > > > > > > > > > > > > node since we know it isn't where it's
> > > > > > > > > > > > > > > > supposed to be.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > At a glance, very nice idea. But LBNs
> > > > > > > > > > > > > > > > typically answer with reject, not DFN... even
> > > > > > > > > > > > > > > > with current code. Probably such rerouting
> > > > > > > > > > > > > > > > will even further increase SSK traffic toward
> > > > > > > > > > > > > > > > LBNs, and get sharply increased volume of SSK
> > > > > > > > > > > > > > > > rejects as result. Hmm, some
> > > > > > > > > > > > > > > > testing/simulation seems really needed here.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >On what do you base the assertion about key
> > > > > > > > > > > > > > > > > closeness? It would be nice to
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > have a histogram or circle on the stats pages
> > > > > > > > > > > > > > > > showing recent keys on the keyspace - can you
> > > > > > > > > > > > > > > > write a patch?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Mmmm... in fact I just added custom logging,
> > > > > > > > > > > > > > > > then a wild combination of grep/sed/sort/uniq
> > > > > > > > > > > > > > > > to analyze the logs. But let me think, maybe
> > > > > > > > > > > > > > > > visualizing a couple of stats files I operate
> > > > > > > > > > > > > > > > with will be rather trivial...
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > But I would rather stay away from stats page
> > > > > > > > > > > > > > > > graphics at this time, as the stats files I
> > > > > > > > > > > > > > > > operate (filtered+sorted+uniqued) with are
> > > > > > > > > > > > > > > > rather large, 20-50Mb each - too much memory
> > > > > > > > > > > > > > > > for the toy. Unless your 'recent' means just
> > > > > > > > > > > > > > > > 10-15 minutes at most?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >As far as your patch goes, surely rejecting
> > > > > > > > > > > > > > > > > more SSK requests would be
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > counterproductive as it wastes bandwidth?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Tests show the opposite: without the patch
> > > > > > > > > > > > > > > > payload output at stats page never exceeded
> > > > > > > > > > > > > > > > 38%, with patch it becomes 53% or little more
> > > > > > > > > > > > > > > > after several minutes upon node restart. So,
> > > > > > > > > > > > > > > > with the patch SSK/CHK forwarding behaviour
> > > > > > > > > > > > > > > > 'feels' logical:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > without patch:
> > > > > > > > > > > > > > > > - just several CHKs, and over over 100 SSKs
> > > > > > > > > > > > > > > > very typical.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > with patch:
> > > > > > > > > > > > > > > > - most of the time (say, 75%) number of
> > > > > > > > > > > > > > > > currently forwarded CHK requests+inserts
> > > > > > > > > > > > > > > > approximately equals to the number of SSK
> > > > > > > > > > > > > > > > requests+inserts (i.e. 10-25 each, depending
> > > > > > > > > > > > > > > > on set bandwidth limit); - sometimes (say,
> > > > > > > > > > > > > > > > 10%) CHK requests start to prevail, but
> > > > > > > > > > > > > > > > current SSK requests+inserts seems never go
> > > > > > > > > > > > > > > > below the amount which CHK get at max without
> > > > > > > > > > > > > > > > patch (i.e. 6-10). This is very typical when
> > > > > > > > > > > > > > > > number of CHK inserts gets several times
> > > > > > > > > > > > > > > > higher than CHK requests (close fast peer
> > > > > > > > > > > > > > > > inserts something really large?). - other
> > > > > > > > > > > > > > > > times (say, 15%) CHK requests+inserts flow
> > > > > > > > > > > > > > > > does not saturate bandwidth, and number of
> > > > > > > > > > > > > > > > SSK requests quickly climbs to 50 or even
> > > > > > > > > > > > > > > > over 100+ as it typically gets without the
> > > > > > > > > > > > > > > > patch.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > That's for LBN. Raising input bandwidth
> > > > > > > > > > > > > > > > allotment, number of SSKs quickly grows
> > > > > > > > > > > > > > > > resembling the situation without the patch.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So that's why I suggest reserving bandwidth
> > > > > > > > > > > > > > > > for 2 CHK transfers; 3 would kill SSKs, 1
> > > > > > > > > > > > > > > > still makes SSKs to seriously prevail over
> > > > > > > > > > > > > > > > CHKs (but nonetheless gives quite better
> > > > > > > > > > > > > > > > ratio, so is a legal value to try if the
> > > > > > > > > > > > > > > > value of 2 alarms you too much). Just, in
> > > > > > > > > > > > > > > > case of reserving bandwidth for 1 extra CHK
> > > > > > > > > > > > > > > > the proposed patch is not really needed:
> > > > > > > > > > > > > > > > simply comment out the line starting with
> > > > > > > > > > > > > > > > "bandwidthLiabilityInput +=" and decrease 90
> > > > > > > > > > > > > > > > seconds constant to 80 (10 seconds is roughly
> > > > > > > > > > > > > > > > how much 33.6Kbod modem takes to transmit a
> > > > > > > > > > > > > > > > single CHK - using anything noticeably slower
> > > > > > > > > > > > > > > > than 28800/33600bod for freenet will not ever
> > > > > > > > > > > > > > > > work well anyway).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >Shouldn't a slow node accept those requests
> > > > > > > > > > > > > > > > > it's likely to be able to handle?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Considering the very high chance of SSK
> > > > > > > > > > > > > > > > request failures (at lest 92%), I would say
> > > > > > > > > > > > > > > > the answer is no. But with sane SSK failure
> > > > > > > > > > > > > > > > rate (say 75% or below) SSK requests would
> > > > > > > > > > > > > > > > likely not waste the limited thus precious
> > > > > > > > > > > > > > > > LBN bandwidth so fruitlessly.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The problem, in my belief, is too small size
> > > > > > > > > > > > > > > > of UDP packets if SSK requests prevail:
> > > > > > > > > > > > > > > > PPP(oE)/TCP/FNP overhead becomes too large
> > > > > > > > > > > > > > > > while LBNs, unlike faster link nodes, almost
> > > > > > > > > > > > > > > > never coalesce packets, obviously.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA -----
> > > > > > > > > > > > > > > > 2007.04.27 - 17:19:24GMT -----
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The current algorithm is working, on most
> > > > > > > > > > > > > > > > nodes, far better than it has in *ages*. I'm
> > > > > > > > > > > > > > > > at 62% of a 700MB ISO, I started inserting it
> > > > > > > > > > > > > > > > yesterday morning, and only a few of my peers
> > > > > > > > > > > > > > > > are backed off - frequently none are backed
> > > > > > > > > > > > > > > > off, right now it's 11 connected, 6 backed
> > > > > > > > > > > > > > > > off, which is more backed off than I've seen
> > > > > > > > > > > > > > > > for quite a while.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Re failure tables: Yes it is extremely
> > > > > > > > > > > > > > > > dangerous. It can result in self-reinforcing
> > > > > > > > > > > > > > > > key censorship, either as an attack or just
> > > > > > > > > > > > > > > > occurring naturally. This happened on 0.5.
> > > > > > > > > > > > > > > > And the hit ratio is only for CHKs iirc.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Even LBNs don't often send local
> > > > > > > > > > > > > > > > RejectedOverload's on SSKs *once they have
> > > > > > > > > > > > > > > > accepted them*. They may relay downstream
> > > > > > > > > > > > > > > > RO's but that is not fatal. And if they
> > > > > > > > > > > > > > > > reject some requests, so what, it's a slow
> > > > > > > > > > > > > > > > node, it's bound to reject some requests with
> > > > > > > > > > > > > > > > the current load balancing system.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 10-15 minutes would be interesting. We
> > > > > > > > > > > > > > > > already show a circle and histogram of nearby
> > > > > > > > > > > > > > > > nodes from swapping and of our peers, you'd
> > > > > > > > > > > > > > > > just have to add another one. It would be
> > > > > > > > > > > > > > > > good to have a visual proof that routing is
> > > > > > > > > > > > > > > > working on the level of adhering to node
> > > > > > > > > > > > > > > > specialisations. I didn't expect it to be
> > > > > > > > > > > > > > > > working given the load: I'm surprised that it
> > > > > > > > > > > > > > > > does, it's an interesting result.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Packet size has nothing to do with it,
> > > > > > > > > > > > > > > > ethernet has a 1472 byte maximum. Dial-up has
> > > > > > > > > > > > > > > > 576 bytes max, but we ignore it, and use
> > > > > > > > > > > > > > > > fragmented packets (this sucks, obviously, as
> > > > > > > > > > > > > > > > it greatly increases the chance of losing a
> > > > > > > > > > > > > > > > packet and having to retransmit it).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Please explain why the patch doesn't result
> > > > > > > > > > > > > > > > in never accepting a single SSK?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI
> > > > > > > > > > > > > > > > ----- 2007.04.27 - 19:31:14GMT -----
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >Packet size has nothing to do with it,
> > > > > > > > > > > > > > > > > ethernet has a 1472 byte maximum.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Dial-up has 576 bytes max, but we ignore it,
> > > > > > > > > > > > > > > > and use fragmented packets (this sucks,
> > > > > > > > > > > > > > > > obviously, as it greatly increases the chance
> > > > > > > > > > > > > > > > of losing a packet and having to retransmit
> > > > > > > > > > > > > > > > it).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I am talking about typical/average packet
> > > > > > > > > > > > > > > > size, not MTU. LBNs, unlike faster nodes,
> > > > > > > > > > > > > > > > rarely have a chance to coalesce reject
> > > > > > > > > > > > > > > > responses (over max 100ms), and thus send
> > > > > > > > > > > > > > > > improportionally more tiny packets resulting
> > > > > > > > > > > > > > > > in much higher protocols overhead. Thus
> > > > > > > > > > > > > > > > having LBNs to mostly cater SSKs not CHKs
> > > > > > > > > > > > > > > > results in lowest imaginable usefulness of
> > > > > > > > > > > > > > > > LBNs for network as a whole.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > BTW in my experience typical/default
> > > > > > > > > > > > > > > > dialup/PPP MTU is 1500 minus link level
> > > > > > > > > > > > > > > > headers, like ethernet. 576 is a reasonable
> > > > > > > > > > > > > > > > adjustment for interactive traffic like ssh
> > > > > > > > > > > > > > > > but I fail to remember if it was used as
> > > > > > > > > > > > > > > > default since the time the super fast 28800
> > > > > > > > > > > > > > > > bod modems became common.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > :) 1400+ is the typical size of GPRS PPP
> > > > > > > > > > > > > > > > : packets too, and the same
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > holds true for other popular wireless mediae
> > > > > > > > > > > > > > > > like BlueTooth or WiFi; so I have no concerns
> > > > > > > > > > > > > > > > regarding IP fragmentation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Please explain why the patch doesn't result
> > > > > > > > > > > > > > > > > in never accepting a single SSK?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I can not. :) Can you explain why the current
> > > > > > > > > > > > > > > > code that penalizes CHKs still gives 5% for
> > > > > > > > > > > > > > > > them, even if CHKs are 25 times larger and
> > > > > > > > > > > > > > > > similarly less frequent so have really hard
> > > > > > > > > > > > > > > > time to arrive at the exact moment when
> > > > > > > > > > > > > > > > bandwidth liability is not maxed out?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Seriously, I believe that goes with 2 facts:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - SSK requests are much more frequent, so any
> > > > > > > > > > > > > > > > temporary drop of CHK requests level enables
> > > > > > > > > > > > > > > > node to quickly get a bunch of new SSKs
> > > > > > > > > > > > > > > > accepted for processing; - the large CHK
> > > > > > > > > > > > > > > > requests (at times while they prevail over
> > > > > > > > > > > > > > > > SSKs) tend to hit other limits too, like
> > > > > > > > > > > > > > > > "output bandwidth liability", "Insufficient
> > > > > > > > > > > > > > > > input/output bandwidth" throttles. Then the
> > > > > > > > > > > > > > > > small SSK requests quickly pick up all the
> > > > > > > > > > > > > > > > remaining bandwidth bits.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > But currently I do not have relevant
> > > > > > > > > > > > > > > > statistics to prove that.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Anyway, please commit the following patch -
> > > > > > > > > > > > > > > > it should equal out bandwidth rights for CHKs
> > > > > > > > > > > > > > > > and SSKs at least half way toward
> > > > > > > > > > > > > > > > fair/expected distribution (and the change
> > > > > > > > > > > > > > > > will make any difference for
> > > > > > > > > > > > > > > > high-/over-loaded nodes only). Once most of
> > > > > > > > > > > > > > > > my peers (and their peers) update, I will
> > > > > > > > > > > > > > > > study the new node traffic forwarding
> > > > > > > > > > > > > > > > efficiency and behavior at different
> > > > > > > > > > > > > > > > bandwidth limits and with different
> > > > > > > > > > > > > > > > penalization levels again - and then will be
> > > > > > > > > > > > > > > > in better position to prove the original
> > > > > > > > > > > > > > > > proposal of reserving bandwidth for 2 CHKs is
> > > > > > > > > > > > > > > > optimal (or maybe withdraw it).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ===
> > > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > > a/freenet/src/freenet/node/NodeStats.java
> > > > > > > > > > > > > > > > b/freenet/src/freenet/node/NodeStats.java
> > > > > > > > > > > > > > > > index 3b091b4..98c82c3 100644
> > > > > > > > > > > > > > > > --- a/freenet/src/freenet/node/NodeStats.java
> > > > > > > > > > > > > > > > +++ b/freenet/src/freenet/node/NodeStats.java
> > > > > > > > > > > > > > > > @@ -399,9 +399,8 @@ public class NodeStats
> > > > > > > > > > > > > > > > implements Persistable {
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > successfulSskFetchBytesSentAverage.currentVal
> > > > > > > > > > > > > > > >ue () * node.getNumSSKRequests() +
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > successfulChkInsertBytesSentAverage.currentVa
> > > > > > > > > > > > > > > >lu e( ) * node.getNumCHKInserts() +
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > successfulSskInsertBytesSentAverage.currentVa
> > > > > > > > > > > > > > > >lu e( ) * node.getNumSSKInserts();
> > > > > > > > > > > > > > > > -               bandwidthLiabilityOutput +=
> > > > > > > > > > > > > > > > getSuccessfulBytes(isSSK, isInsert,
> > > > > > > > > > > > > > > > false).currentValue(); double
> > > > > > > > > > > > > > > > bandwidthAvailableOutput = -
> > > > > > > > > > > > > > > > node.getOutputBandwidthLimit() * 90; // 90
> > > > > > > > > > > > > > > > seconds at full power; we have to leave some
> > > > > > > > > > > > > > > > time for the search as well +
> > > > > > > > > > > > > > > > node.getOutputBandwidthLimit() * 80; // 80
> > > > > > > > > > > > > > > > seconds at full power; we have to leave some
> > > > > > > > > > > > > > > > time for the search as well
> > > > > > > > > > > > > > > > bandwidthAvailableOutput *=
> > > > > > > > > > > > > > > > NodeStats.FRACTION_OF_BANDWIDTH_USED_BY_REQUE
> > > > > > > > > > > > > > > >ST S; if(bandwidthLiabilityOutput >
> > > > > > > > > > > > > > > > bandwidthAvailableOutput) {
> > > > > > > > > > > > > > > > preemptiveRejectReasons.inc("Output bandwidth
> > > > > > > > > > > > > > > > liability"); @@ -413,9 +412,8 @@ public class
> > > > > > > > > > > > > > > > NodeStats implements Persistable {
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > successfulSskFetchBytesReceivedAverage.curren
> > > > > > > > > > > > > > > >tV al ue () * node.getNumSSKRequests() +
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > successfulChkInsertBytesReceivedAverage.curre
> > > > > > > > > > > > > > > >nt Va lu e( ) * node.getNumCHKInserts() +
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > successfulSskInsertBytesReceivedAverage.curre
> > > > > > > > > > > > > > > >nt Va lu e( ) * node.getNumSSKInserts(); -    
> > > > > > > > > > > > > > > >           bandwidthLiabilityInput +=
> > > > > > > > > > > > > > > > getSuccessfulBytes(isSSK, isInsert,
> > > > > > > > > > > > > > > > true).currentValue(); double
> > > > > > > > > > > > > > > > bandwidthAvailableInput = -
> > > > > > > > > > > > > > > > node.getInputBandwidthLimit() * 90; // 90
> > > > > > > > > > > > > > > > seconds at full power
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > node.getInputBandwidthLimit() * 80; // 80
> > > > > > > > > > > > > > > > seconds at full power
> > > > > > > > > > > > > > > >                 if(bandwidthLiabilityInput >
> > > > > > > > > > > > > > > > bandwidthAvailableInput) {
> > > > > > > > > > > > > > > > preemptiveRejectReasons.inc("Input bandwidth
> > > > > > > > > > > > > > > > liability"); return "Input bandwidth
> > > > > > > > > > > > > > > > liability"; ===
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA -----
> > > > > > > > > > > > > > > > 2007.04.28 - 17:05:53GMT -----
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Why does assuming 80 seconds instead of 90
> > > > > > > > > > > > > > > > help? I would have expected it to make the
> > > > > > > > > > > > > > > > situation worse.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Isn't what you want to increment the value
> > > > > > > > > > > > > > > > you are multiplying the CHK byte counters by
> > > > > > > > > > > > > > > > if the request is an SSK? In any case I'm not
> > > > > > > > > > > > > > > > convinced - we accept 32x as many SSKs as
> > > > > > > > > > > > > > > > CHKs precisely because they use 32x less
> > > > > > > > > > > > > > > > bandwidth. As far as I can see incrementing
> > > > > > > > > > > > > > > > the CHK counts but only on a CHK would just
> > > > > > > > > > > > > > > > result in us never accepting an SSK...
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > But by all means continue to investigate.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0
> > > > > > > > > > > > > > > > ----- 2007.04.30 - 19:36:36GMT -----
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > we accept 32x as many SSKs as CHKs
> > > > > > > > > > > > > > > > > precisely because they use 32x less
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bandwidth.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Sorry but I don't understand the rationale
> > > > > > > > > > > > > > > > behind this. It seems to be based on the
> > > > > > > > > > > > > > > > assumption that equal resources should be
> > > > > > > > > > > > > > > > allocated to SSKs and CHKs, regardless of
> > > > > > > > > > > > > > > > whether there's equal demand for resources.
> > > > > > > > > > > > > > > > If we're only getting, say, 16 times as many
> > > > > > > > > > > > > > > > SSK requests as CHK requests, would we reject
> > > > > > > > > > > > > > > > CHK requests to keep things "fair"?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA -----
> > > > > > > > > > > > > > > > 2007.05.02 - 16:13:52GMT -----
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Why should CHKs be prioritised over SSKs?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > What do you think of the patch I committed
> > > > > > > > > > > > > > > > anyway?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI
> > > > > > > > > > > > > > > ----- 2007.05.02 - 17:03:52GMT -----
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Why should CHKs be prioritised over SSKs?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Because SSK success ratio is extremely low
> > > > > > > > > > > > > > > (remember the example I gave within initial
> > > > > > > > > > > > > > > message in the thread: 8100 rejected requests
> > > > > > > > > > > > > > > for the same SSK key within 35 hours, and
> > > > > > > > > > > > > > > uncounted amount of the same key requests
> > > > > > > > > > > > > > > resulted in DNF - and such key is nowhere near
> > > > > > > > > > > > > > > being alone); linear programming would
> > > > > > > > > > > > > > > certainly suggest that SSK requests like they
> > > > > > > > > > > > > > > currently are should be totally disabled. I do
> > > > > > > > > > > > > > > not propose anything as drastic; but as SSKs
> > > > > > > > > > > > > > > currently use bandwidth very inefficiently I
> > > > > > > > > > > > > > > propose to tone them down heuristically to
> > > > > > > > > > > > > > > approximately CHK level while node is short on
> > > > > > > > > > > > > > > bandwidth (but let SSKs go as much as sending
> > > > > > > > > > > > > > > node needs if spare bandwidth available).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [not that I meant to speak instead of mrogers]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 -----
> > > > > > > > > > > > > > > 2007.05.03 - 13:54:38GMT -----
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >> Why should CHKs be prioritised over SSKs?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Because SSK success ratio is extremely low
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > That doesn't make sense for two reasons:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1) rejecting SSKs will make the SSK success
> > > > > > > > > > > > > > > ratio worse, not better 2) "SSK not found" is
> > > > > > > > > > > > > > > useful information - for example, that's how
> > > > > > > > > > > > > > > you discover the current version of a USK - but
> > > > > > > > > > > > > > > "SSK rejected for not being a CHK" is not
> > > > > > > > > > > > > > > useful to anyone
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Let me use an analogy: you're designing a Cisco
> > > > > > > > > > > > > > > router. Some of the packets it gets asked to
> > > > > > > > > > > > > > > forward will be small SSH packets, others will
> > > > > > > > > > > > > > > be large HTTP packets. Do you say "half our
> > > > > > > > > > > > > > > resources should go to SSH because we don't
> > > > > > > > > > > > > > > want to prioritise HTTP, so we'll only forward
> > > > > > > > > > > > > > > one HTTP packet for every 32 SSH packets"? If
> > > > > > > > > > > > > > > you answered yes, you just lost your job at
> > > > > > > > > > > > > > > Cisco. The router's job is to deliver packets
> > > > > > > > > > > > > > > to the best of its ability, not to decide what
> > > > > > > > > > > > > > > kind of packets the end hosts "should" be
> > > > > > > > > > > > > > > sending.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ----- Anonymous ----- 2007.05.03 - 18:37:00GMT
> > > > > > > > > > > > > > -----
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > lets build an extreme case:
> > > > > > > > > > > > > > There is a packet type which is 1 MB in size and
> > > > > > > > > > > > > > one which is 1 byte in size. Both get constantly
> > > > > > > > > > > > > > send to a router at a higher rate then the
> > > > > > > > > > > > > > routers bandwidth quota allows it to forward the
> > > > > > > > > > > > > > packets. If the following rules apply not a
> > > > > > > > > > > > > > single 1 MB packet will get served: 1. the router
> > > > > > > > > > > > > > doesn't care for the size of the packet or
> > > > > > > > > > > > > > penalises any kind of packet, all are considered
> > > > > > > > > > > > > > equal 2. if a packet arrives and the remaining
> > > > > > > > > > > > > > bandwidth quota doesn't allow to forward it, it
> > > > > > > > > > > > > > gets instantly rejected 3. no queueing of
> > > > > > > > > > > > > > incomming packets is done (with part-allocation
> > > > > > > > > > > > > > of available bandwidth if packet size exceed the
> > > > > > > > > > > > > > free quota)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Afaik this is a possible situation for freenet,
> > > > > > > > > > > > > > correct me if I am wrong.
> > > > > > > > > > > > >
> > > > > > > > > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 -----
> > > > > > > > > > > > > 2007.05.04 - 11:15:09GMT -----
> > > > > > > > > > > > >
> > > > > > > > > > > > > You're right, that pretty much describes the
> > > > > > > > > > > > > problem. I'm suggesting that we should fix it by
> > > > > > > > > > > > > modifying step 2:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. if a packet arrives and the remaining bandwidth
> > > > > > > > > > > > > quota doesn't allow to forward *an average packet*,
> > > > > > > > > > > > > it gets instantly rejected
> > > > > > > > > > > > >
> > > > > > > > > > > > > The average is calculated over all packets, large
> > > > > > > > > > > > > and small. In other words we ask:
> > > > > > > > > > > > >
> > > > > > > > > > > > > a) how much bandwidth does a packet use, on
> > > > > > > > > > > > > average? b) is there enough bandwidth for another
> > > > > > > > > > > > > average packet?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let's say 75% of packets are 1 byte in size and 25%
> > > > > > > > > > > > > are 1 MB in size, so the average is 262,144.75
> > > > > > > > > > > > > bytes. We accept packets (large or small) if we
> > > > > > > > > > > > > have at least 262,144.75 bytes of bandwidth left in
> > > > > > > > > > > > > the quota, and reject them otherwise. If there's a
> > > > > > > > > > > > > long stream of 1 byte packets followed by a 1 MB
> > > > > > > > > > > > > packet we might go over quota temporarily, but
> > > > > > > > > > > > > we'll match the quota in the long term.
> > > > > > > > > > > > >
> > > > > > > > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI -----
> > > > > > > > > > > > > 2007.05.10 - 21:01:58GMT -----
> > > > > > > > > > > > >
> > > > > > > > > > > > > Oh no, getting stuff instantly rejected is most
> > > > > > > > > > > > > often not good (an exception would be certain kinds
> > > > > > > > > > > > > of realtime traffic, but that is not applicable for
> > > > > > > > > > > > > freenet at all). I have some experience with
> > > > > > > > > > > > > independant/competing ISPs and their broken traffic
> > > > > > > > > > > > > shaping routers that were always dropping packets
> > > > > > > > > > > > > not fitting current shaping limits; TCP performance
> > > > > > > > > > > > > was experiencing major hit there, and several TCP
> > > > > > > > > > > > > connections running under the same shaping were
> > > > > > > > > > > > > always taking seriously unfair bandwidth share
> > > > > > > > > > > > > (unless you get quite long intervals for stats,
> > > > > > > > > > > > > like 10+ minutes). Changing shaping processing by
> > > > > > > > > > > > > queueing an over-quota packet (even a single packet
> > > > > > > > > > > > > queue!) till the calculated average bandwidth
> > > > > > > > > > > > > allows to send the packet (thus in the end
> > > > > > > > > > > > > increasing roundtrip slightly) was always
> > > > > > > > > > > > > sufficient for TCP flow to work at 100% of the
> > > > > > > > > > > > > shaped level, and having simultaneous TCP streams
> > > > > > > > > > > > > to very equally share available bandwidth even for
> > > > > > > > > > > > > sub-second stats intervals, and there were no other
> > > > > > > > > > > > > working solution found (aside raising the shaping
> > > > > > > > > > > > > limit above the maximum speed of TCP peers).
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am not sure that can be directly applied to the
> > > > > > > > > > > > > current freenet networking code; honestly, the
> > > > > > > > > > > > > mechanism of first quickly accepting packets and
> > > > > > > > > > > > > then slowly picking them using some kind of filters
> > > > > > > > > > > > > looks unneccessary complicated and performance
> > > > > > > > > > > > > inoptimal, to say least: I have another bright
> > > > > > > > > > > > > example why - the mechanism quite resembles the
> > > > > > > > > > > > > traditional O/S network packets handling (with
> > > > > > > > > > > > > received packets extracted from NIC at highest
> > > > > > > > > > > > > priority - during hardware interrupt, and then
> > > > > > > > > > > > > having CPU/server business logic failing to process
> > > > > > > > > > > > > all received packets leading to internal queues
> > > > > > > > > > > > > overflow), and after years and decades it is
> > > > > > > > > > > > > generally agreed that such approach does not work
> > > > > > > > > > > > > well for server applications; instead, linux for
> > > > > > > > > > > > > several years already has mechanism named NAPI
> > > > > > > > > > > > > (which is optional for some NIC drivers - check
> > > > > > > > > > > > > kernel config, but default and mandatory for most
> > > > > > > > > > > > > server-grade and/or 1Gb NIC drivers): hardware
> > > > > > > > > > > > > interrupt just sets a flag/semaphore that NIC has
> > > > > > > > > > > > > received something, and instantly quits leaving the
> > > > > > > > > > > > > particular NIC interrupt line disabled (actual
> > > > > > > > > > > > > algorithm is a little bit more complex, allowing
> > > > > > > > > > > > > hardware interrupt to perform extraction of a very
> > > > > > > > > > > > > limited number of packets if the host is very
> > > > > > > > > > > > > idle). Then there is a lowest priority kernel
> > > > > > > > > > > > > thread ("software interrupt") woken up by the
> > > > > > > > > > > > > flag/semaphore starts reading packets from NIC into
> > > > > > > > > > > > > O/S queues (where user-level read()s get satisfied
> > > > > > > > > > > > > from), extracting only limited number of packets at
> > > > > > > > > > > > > a time (then yielding CPU for other runnable
> > > > > > > > > > > > > processes), and reenabling the NIC interrupts only
> > > > > > > > > > > > > when it managed to empty the hardware queue - with
> > > > > > > > > > > > > TCP flow control, and with the modern ethernet
> > > > > > > > > > > > > hardware flow control that works exceptionally
> > > > > > > > > > > > > well. Thus server business logic (i.e. useful work)
> > > > > > > > > > > > > running at priority much higher than software
> > > > > > > > > > > > > interrupt thread is never starved from CPU by
> > > > > > > > > > > > > hardware interrupts that first pull in packets
> > > > > > > > > > > > > which then result in CPU wasted to drop them from
> > > > > > > > > > > > > overflown system queue - resulting in smooth
> > > > > > > > > > > > > behaviour and best sustained performance.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Or in short - on overload, delaying input packets
> > > > > > > > > > > > > reading/processing is better than dropping or
> > > > > > > > > > > > > rejecting them instantly.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Toad - if you know a simple way to delay freenet
> > > > > > > > > > > > > reads from UDP socket in order to enforce
> > > > > > > > > > > > > configured input bandwidth limit, please do so.
> > > > > > > > > > > > > (And with that UDP read delay, I would be very
> > > > > > > > > > > > > interested to test freenet node without other input
> > > > > > > > > > > > > bandwidth limiters aside input bandwidth liability
> > > > > > > > > > > > > used - chances that the UDP socket read delay will
> > > > > > > > > > > > > be sufficient for quality shaping, with the
> > > > > > > > > > > > > valuable help of sending node tracking the
> > > > > > > > > > > > > roundtrip - an already well implemented feature).
> > > > > > > > > > > > >
> > > > > > > > > > > > > If the delay can not be done easily with the
> > > > > > > > > > > > > current codebase, I will consider doing major
> > > > > > > > > > > > > rewrite of the traffic accepting code part. Not of
> > > > > > > > > > > > > highest priority tho, due to anticipated large
> > > > > > > > > > > > > amount of work - but those high fruits look big and
> > > > > > > > > > > > > tasty.
> > > > > > > > > > > >
> > > > > > > > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 -----
> > > > > > > > > > > > 2007.05.11 - 14:56:37GMT -----
> > > > > > > > > > > >
> > > > > > > > > > > > > I am not sure that can be directly applied to the
> > > > > > > > > > > > > current freenet networking
> > > > > > > > > > > >
> > > > > > > > > > > > code;
> > > > > > > > > > > >
> > > > > > > > > > > > We're working on an idea called token-passing that's
> > > > > > > > > > > > supposed to address this: you can only send a search
> > > > > > > > > > > > (request/insert) to a peer if you have a flow control
> > > > > > > > > > > > token from that peer. If you don't have a token you
> > > > > > > > > > > > either keep the search in a queue until you receive a
> > > > > > > > > > > > token, or send it to the next-best peer if the queue
> > > > > > > > > > > > is full.
> > > > > > > > > > > >
> > > > > > > > > > > > > the mechanism quite resembles the traditional O/S
> > > > > > > > > > > > > network packets handling
> > > > > > > > > > > >
> > > > > > > > > > > > (with received packets extracted from NIC at highest
> > > > > > > > > > > > priority - during hardware interrupt, and then having
> > > > > > > > > > > > CPU/server business logic failing to process all
> > > > > > > > > > > > received packets leading to internal queues overflow)
> > > > > > > > > > > >
> > > > > > > > > > > > Interesting point - in the new congestion control
> > > > > > > > > > > > layer, maybe the UDP reader shouldn't advance the
> > > > > > > > > > > > receiver window until the internal queues have
> > > > > > > > > > > > dropped below a certain size... but it might be
> > > > > > > > > > > > tricky to implement because the internal queues all
> > > > > > > > > > > > belong to different threads...
> > > > > > > > > > > >
> > > > > > > > > > > > > If the delay can not be done easily with the
> > > > > > > > > > > > > current codebase, I will
> > > > > > > > > > > >
> > > > > > > > > > > > consider doing major rewrite of the traffic accepting
> > > > > > > > > > > > code part.
> > > > > > > > > > > >
> > > > > > > > > > > > This is due to be rewritten soon anyway, so now's
> > > > > > > > > > > > probably a good time to make suggestions.
> > > > > > > > > > > >
> > > > > > > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI -----
> > > > > > > > > > > > 2007.05.14 - 16:46:24GMT -----
> > > > > > > > > > > >
> > > > > > > > > > > > While token passing would indeed smooth the traffic
> > > > > > > > > > > > out, it feels excessive:
> > > > > > > > > > > >
> > > > > > > > > > > > - it adds extra traffic;
> > > > > > > > > > > > - it creates additional traffic patterns, that quite
> > > > > > > > > > > > simplify attacks (like those aiming at reliably
> > > > > > > > > > > > proving that a particular request originates from
> > > > > > > > > > > > attacked node) against a node which all connections
> > > > > > > > > > > > are monitored (by ISP), and some of them are fast but
> > > > > > > > > > > > compromised (compromised peers). - it requires to
> > > > > > > > > > > > pull a
> > > > > > > > > > > > multidimensional set of heurictics on whom to send
> > > > > > > > > > > > new token out of a thin air, and those heuristics
> > > > > > > > > > > > will tend to disagree for different connection types.
> > > > > > > > > > > >
> > > > > > > > > > > > The method of delaying network reads (thats important
> > > > > > > > > > > > - and AFAIK the only major missing thing to get
> > > > > > > > > > > > shaping rolling smoothly already) should work
> > > > > > > > > > > > similarly well (might be even better): just consider
> > > > > > > > > > > > the metric 'the current peer roundtrip time is lower
> > > > > > > > > > > > than the [peer] average roundtrip time' as
> > > > > > > > > > > > equivalence of 'the peer gave us few tokens', and
> > > > > > > > > > > > enjoy the bandwidth/crypt(CPU) free virtual token
> > > > > > > > > > > > passing which obeys both
> > > > > > > > > > > > hardware/ISP traffic shaping imposed limits, as well
> > > > > > > > > > > > as software configured limits - whichever is
> > > > > > > > > > > > stricter.
> > > > > > > > > > > >
> > > > > > > > > > > > So I currently discorage implementing explicit token
> > > > > > > > > > > > passing, in favor of lower, equially tasty fruit.
> > > > > > > > > > >
> > > > > > > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 -----
> > > > > > > > > > > 2007.05.17 - 21:40:27GMT -----
> > > > > > > > > > >
> > > > > > > > > > > > - it adds extra traffic
> > > > > > > > > > >
> > > > > > > > > > > Um, right. "Here are n tokens" takes about 6 bytes: two
> > > > > > > > > > > for the message type, two for the message size, and two
> > > > > > > > > > > for the number of tokens (we're never going to hand out
> > > > > > > > > > > more than 65535 tokens in one go). It uses less traffic
> > > > > > > > > > > than "Can I send you a request?" "Yes" "Here's the
> > > > > > > > > > > request", and it avoids a round-trip. It also uses less
> > > > > > > > > > > traffic than "Can I send you a request?" "No", because
> > > > > > > > > > > if you don't have a token, you don't need to ask!
> > > > > > > > > > >
> > > > > > > > > > > > - it creates additional traffic patterns, that quite
> > > > > > > > > > > > simplify attacks (like
> > > > > > > > > > >
> > > > > > > > > > > those aiming at reliably proving that a particular
> > > > > > > > > > > request originates from attacked node) against a node
> > > > > > > > > > > which all connections are monitored (by ISP), and some
> > > > > > > > > > > of them are fast but compromised (compromised peers).
> > > > > > > > > > >
> > > > > > > > > > > Please explain how handing my peer some tokens reveals
> > > > > > > > > > > anything about traffic patterns that wasn't already
> > > > > > > > > > > visible to traffic analysis. If they can see the
> > > > > > > > > > > requests and results going back and forth, who cares if
> > > > > > > > > > > they can also see the tokens?
> > > > > > > > > > >
> > > > > > > > > > > > - it requires to pull a multidimensional set of
> > > > > > > > > > > > heurictics on whom to send
> > > > > > > > > > >
> > > > > > > > > > > new token out of a thin air, and those heuristics will
> > > > > > > > > > > tend to disagree for different connection types.
> > > > > > > > > > >
> > > > > > > > > > > No magical heuristics are needed - we hand out tokens
> > > > > > > > > > > as long as we're not overloaded (measured by total
> > > > > > > > > > > queueing delay, including the bandwidth limiter). That
> > > > > > > > > > > alone should be enough to outperform the current
> > > > > > > > > > > system, because we'll avoid wasting traffic on rejected
> > > > > > > > > > > searches. Then we can start thinking about clever token
> > > > > > > > > > > allocation policies to enforce fairness when the
> > > > > > > > > > > network's busy, without imposing unnecessary limits
> > > > > > > > > > > when the network's idle, etc etc. But token passing
> > > > > > > > > > > doesn't depend on any such policy - it's just a
> > > > > > > > > > > lower-bandwidth alternative to pre-emptive rejection.
> > > > > > > > > >
> > > > > > > > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.19 -
> > > > > > > > > > 17:38:09GMT -----
> > > > > > > > > >
> > > > > > > > > > We hand out tokens when we're not overloaded? What if
> > > > > > > > > > later on we get overloaded? How exactly do we determine
> > > > > > > > > > the total appropriate number of tokens in the system?
> > > > > > > > > >
> > > > > > > > > > Token passing would be really nice, it would solve the
> > > > > > > > > > flooding problem and remove a lot of alchemy...
> > > > > > > > >
> > > > > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.20
> > > > > > > > > - 11:19:36GMT -----
> > > > > > > > >
> > > > > > > > > We determine the appropriate number of tokens by
> > > > > > > > > experimentation: as we approach overload we produce tokens
> > > > > > > > > more and more slowly, until at the overload point we don't
> > > > > > > > > produce tokens at all. For example, delay between producing
> > > > > > > > > tokens = MAX_DELAY/(MAX_DELAY - currentDelay) seconds, or
> > > > > > > > > something similar.
> > > > > > > > >
> > > > > > > > > We stop producing tokens when all the buckets are full. To
> > > > > > > > > limit the number of tokens in circulation, we adjust the
> > > > > > > > > bucket sizes. Each peer's bucket starts at size 0, and is
> > > > > > > > > reset to 0 whenever the peer reconnects. Whenever we add a
> > > > > > > > > token to an empty bucket, we increase its size by 1. Thus
> > > > > > > > > if a peer doesn't use the tokens we give it (ie doesn't
> > > > > > > > > empty its bucket), its bucket stays small. If it uses the
> > > > > > > > > tokens we give it (ie empties its bucket), we increase the
> > > > > > > > > size of its bucket as long as we're not overloaded (the
> > > > > > > > > bucket size is only increased when a token is handed out,
> > > > > > > > > which doesn't happen when we're overloaded).
> > > > > > > >
> > > > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.25
> > > > > > > > - 11:45:36GMT -----
> > > > > > > >
> > > > > > > > Even this brief description proposes 3 heuristics already:
> > > > > > > >
> > > > > > > > - "more slowly or something similar"
> > > > > > > > - "adjust bucket sizes"
> > > > > > > > - "stays small"
> > > > > > > >
> > > > > > > > No doubts complete practical implementation will involve
> > > > > > > > quite more, and I see no way to ensure they will all work
> > > > > > > > smoothly for all the wide variety of connection types.
> > > > > > > >
> > > > > > > > This also totally silences the burst problems, as well as
> > > > > > > > decreasing buckets during the bursts/overload periods. Kinda
> > > > > > > > extreme case: if you gave a node 5 tokens, and it decided to
> > > > > > > > use them 3 minutes later when the node got overloaded, what
> > > > > > > > did you gain with the tokens? That is a true Pandora box.
> > > > > > > >
> > > > > > > > In the end it is timing that matters, while tokens represent
> > > > > > > > data amount (and if you think of a formula
> > > > > > > > time*bandwidth=data, then bandwidth, generally speaking, is
> > > > > > > > not constant due to highly irregular traffic from different
> > > > > > > > peers and other apps sharing the same internet connection).
> > > > > > > > Purely controlling the latter will always be just a rough
> > > > > > > > approximation of controlling the former, alas.
> > > > > > >
> > > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.25 -
> > > > > > > 17:20:00GMT -----
> > > > > > >
> > > > > > > > I see no way to ensure they will all work smoothly for all
> > > > > > > > the wide variety
> > > > > > >
> > > > > > > of connection types
> > > > > > >
> > > > > > > That may be true, but can you ensure that with the current
> > > > > > > mechanism?
> > > > > > >
> > > > > > > > if you gave a node 5 tokens, and it decided to use them 3
> > > > > > > > minutes later when
> > > > > > >
> > > > > > > the node got overloaded, what did you gain with the tokens?
> > > > > > >
> > > > > > > It shouldn't ever reach that point, because when you give out 4
> > > > > > > tokens and get a burst that nearly overloads the node, you'll
> > > > > > > stop increasing the size of the buckets.
> > > > > > >
> > > > > > > > Purely controlling the latter will always be just a rough
> > > > > > > > approximation of
> > > > > > >
> > > > > > > controlling the former, alas.
> > > > > > >
> > > > > > > True, it's only a rough approximation and traffic is bursty and
> > > > > > > we can't predict the future, but all those things also apply to
> > > > > > > the current system.
> > > > > >
> > > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.25 -
> > > > > > 20:35:30GMT -----
> > > > > >
> > > > > > - I gave 4 tokens to peer A, and 4 tokens for peer B;
> > > > > > - peer A used up the token in one burst, causing local bandwidth
> > > > > > overage already.
> > > > > > - now if peer B decides to use up the tokens, my bandwidth is
> > > > > > already flooded badly.
> > > > > >
> > > > > > - I gave 2 tokens (generally: very small amount) to peer A, and 2
> > > > > > tokens to peer B at this time;
> > > > > > - the peer A used up both of them, but peer B does not have
> > > > > > anything to request for now.
> > > > > > - as result, the available bandwidth is only half-loaded until
> > > > > > peer A explicitly given another few tokens; but giving away small
> > > > > > amounts of tokens puts traffic overhead (often in form of small
> > > > > > sized packets with highest overhead), and speed of feeding small
> > > > > > packs of tokens is seriously limited by roundtrip delays.
> > > > > >
> > > > > > Thus nothing is really gained with tokens, and fighting that
> > > > > > situation sounds fruitless, and the only reasonable approach is
> > > > > > to control the roundtrip delay instead of fighting it, by
> > > > > > delaying network reads as needed to shape incoming traffic. This
> > > > > > gives sending node not just boolen flag can/cannot send
> > > > > > immediatelly, but certain number to estimate optimal outgoing
> > > > > > interpacket delay toward a particular peer.
> > > > > >
> > > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.26 -
> > > > > > 19:31:25GMT -----
> > > > > >
> > > > > > You're making the jump from 0 to 4 tokens as if there are no
> > > > > > intermediate stages. If 4 searches are enough to overload the
> > > > > > node, then after you give 2 tokens each to A and B, you'll get a
> > > > > > burst and stop increasing the size of the buckets. You should
> > > > > > never reach a situation where there are more tokens in
> > > > > > circulation than you can handle simultaneously.
> > > > > >
> > > > > > You're also confusing the number of tokens with the size of the
> > > > > > buckets (probably my fault for not explaining it properly). If
> > > > > > you give 2 tokens each to A and B, and A uses its tokens but B
> > > > > > doesn't, you're free to give 2 more tokens to A; there's no
> > > > > > reason for A to starve just because B is holding tokens. The only
> > > > > > effect of B holding tokens is to limit the size of the bursts you
> > > > > > can get from A to a level that will be safe even if you
> > > > > > simultaneously get a burst from B. To avoid making these limits
> > > > > > too strict, we start with small buckets and only increase the
> > > > > > size of a bucket when it's empty, so idle peers are limited to
> > > > > > smaller bursts than active peers, but the total burst size is
> > > > > > still limited to what we can handle.
> > > > > >
> > > > > > Obviously we should also have TCP-style congestion control
> > > > > > between peers, but that's not sufficient for network-wide load
> > > > > > limiting in my opinion (eg look at the original Gnutella, which
> > > > > > used TCP between peers but still had massive load limiting
> > > > > > problems).
> > > > >
> > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.27 -
> > > > > 08:54:29GMT -----
> > > > >
> > > > > Why you suppose each peer submits a stable flow of traffic? I might
> > > > > be downloading long and time lengthy file, gaining a bunch of
> > > > > tokens from you; and then as download finishes, I still possess
> > > > > alot of tokens?
> > > > >
> > > > > As you start issuing tokens is smaller chunks, you experience
> > > > > higher overhead from small token packets which failed to coalesce
> > > > > within thin time frame.
> > > > >
> > > > > TCP style congestion control (ECN) is not between peers; its
> > > > > existance is primarily targetted at ECN being directly supported by
> > > > > all (most) TCP/IP routers - without ECN or with ECN not supported
> > > > > by routers along the path TCP flow control enforced by packet drops
> > > > > only (when roudtrip measurements happen to fail predicting real
> > > > > momentarily network load).
> > > > >
> > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.27 -
> > > > > 15:06:51GMT -----
> > > > >
> > > > > > Why you suppose each peer submits a stable flow of traffic?
> > > > >
> > > > > I don't, that's why I have frequently mentioned bursts.
> > > > >
> > > > > >  I might be downloading long and time lengthy file, gaining a
> > > > > > bunch of
> > > > >
> > > > > tokens from you; and then as download finishes, I still possess
> > > > > alot of tokens?
> > > > >
> > > > > Yes, if I have previously been able to cope with large bursts of
> > > > > traffic it's possible I will give you a large number of tokens. If
> > > > > not, I won't. So what's the problem?
> > > > >
> > > > > > As you start issuing tokens is smaller chunks, you experience
> > > > > > higher
> > > > >
> > > > > overhead from small token packets which failed to coalesce within
> > > > > thin time frame.
> > > > >
> > > > > The problem of coalescing small messages on low-bandwidth links is
> > > > > not specified to token-passing. It also applies to the current
> > > > > mechanism, with its "can I send you a search?" "yes/no" messages
> > > > > that would be eliminated by token-passing.
> > > > >
> > > > > Why you keep talking about ECN I have no idea... are you suggesting
> > > > > that dropping/marking Freenet searches would have a similar effect
> > > > > to dropping/marking TCP packets in ECN? I don't think the two cases
> > > > > are comparable: in TCP you have a stream of packets between the
> > > > > same endpoints, so it makes sense to ask the sender to slow down if
> > > > > one of the routers is overloaded. In Freenet each search is routed
> > > > > independently to a different location, so it makes no sense to ask
> > > > > the sender to slow down if one of the nodes is overloaded - future
> > > > > searches handled by that node will come from different senders
> > > > > anyway. What you need to do is ask the overloaded node's neighbours
> > > > > to put less load on it. If that means they can accept fewer
> > > > > searches, then they should ask their neighbours to put less load on
> > > > > them, and so on.
> > > > >
> > > > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.27 -
> > > > > 18:38:46GMT -----
> > > > >
> > > > > > Yes, if I have previously been able to cope with large bursts of
> > > > > > traffic
> > > > >
> > > > > it's possible I will give you a large number of tokens. If not, I
> > > > > won't. So what's the problem?
> > > > >
> > > > > Once several dormant peers start sending new traffic at
> > > > > approximately the same time, the tokens passing mechanism have no
> > > > > even teoretical chance to avoid bandwidth overages/packets drops.
> > > > > More subtle mechanisms needed in order for sending nodes to
> > > > > know/decide not only about that fact that more traffic can be sent,
> > > > > but also how fast that should be done if there is more than one
> > > > > packet to send. Once such mechanisms are correctly implemented (and
> > > > > in noticeable part they already are), I see no advantages the
> > > > > explicit tokens passing could add.
> > > > >
> > > > > > so it makes no sense to ask the sender to slow down if one of the
> > > > > > nodes is
> > > > >
> > > > > overloaded
> > > > >
> > > > > Oh, why it could be so? Requests for keys with close locations are
> > > > > likely to get routed via the same overloaded node, and with peer
> > > > > locations pattern I keep observing the 'closeness' might easily be
> > > > > as wide as 0.4 and even more (the range of locations that prefer
> > > > > the same peer). The requests sent recently (in term of hops passed,
> > > > > not time) show even distribution of the keys along the 0..1 space,
> > > > > so pushing 40% of traffic at a single peer and disregarding his
> > > > > overload cries can not be seen as good.
> > > > >
> > > > > The ECN distinguishing feature is that congestion/impending
> > > > > overload shared state gets maintained at all devices along the
> > > > > route (including endpoints) that at least potentially do traffic
> > > > > shaping (due to explicit settings or just overload). Lacking it,
> > > > > both TCP and FNP have to resort downto carefully tracking
> > > > > roundtrips,
> > > > > scurpuluously calculating all delays at routers and final hosts in
> > > > > order to react upon even minor degradations quickly (but smoothly).
> > > > > FNP gets more complicated, as packet processing efforts (including
> > > > > all the encryption/decryption) are not as negligible as with TCP,
> > > > > but not indoable - the packets/protocol already carry the temporal
> > > > > information implicitly, it just has to be handled a little bit more
> > > > > careful.
> > > > >
> > > > > OTOH, adding explicit auxilarly traffic (be it tokens assing or
> > > > > whatever) creates a certain dilemma: trying to maximize the
> > > > > variable bandwidth usage without overages, the additional even
> > > > > little (but for many links, not so little due to the massive
> > > > > FNP+UDP+IP enveloping) overhead might turn out to be the bit that
> > > > > still causes the overages (in fact, the 'bit' in extremely unluky
> > > > > events flow can get up to 20% of traffic). Isn't it better to keep
> > > > > using simpler protocol, maybe aiming it at 98% load if overages are
> > > > > known to cause unreasonably frequent packet drops, or just believed
> > > > > to be absolute evil?
> > > > >
> > > > >
> > > > > On a slightly different topic, how about wireless connections like
> > > > > radioethernet: high bandwidth and higher-than-average link delays
> > > > > are combined with constant non-zero packets drop rate (0.1-2%, but
> > > > > sometimes up to 5% in large city daytime, or in industrial area).
> > > > > Similar things with satellite links, during bad weather or in polar
> > > > > areas where any dish size is almost never sufficient. Explicitly
> > > > > passing tokens would be dangerous because of very high bursts:
> > > > > there might be 500KB data en-route for unidirectional SOHO grade
> > > > > satellite link, 1MB for bidirectional one. But if outstanding
> > > > > several hundreds tokens is not feasible and a little over a dozen
> > > > > KB is a limit, such links will be usable at 2-5% of available
> > > > > bandwidth at most.
> > > > >
> > > > > (You know how does TCP behave over such links? During periods of
> > > > > lower drop rate it works more or less acceptably, especially if you
> > > > > aren't in haste. At other times people simply get a teabrake or
> > > > > find another task for the time being. Solution here could be SCTP
> > > > > like protocols but they are very inmature and unpopular, and
> > > > > unlikely suit freenet needs).
> > > > >
> > > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.28 -
> > > > > 22:10:09GMT -----
> > > > >
> > > > > > Once several dormant peers start sending new traffic at
> > > > > > approximately the
> > > > >
> > > > > same time, the tokens passing mechanism have no even teoretical
> > > > > chance to avoid bandwidth overages/packets drops.
> > > > >
> > > > > We're going in circles here. As I've already said, you never have
> > > > > more tokens outstanding than you can handle if they come back
> > > > > *simultaneously*. So I will only allow you to accumulate a large
> > > > > number of tokens if I can handle all those tokens, and all other
> > > > > outstanding tokens, being used at once. We don't have any magic way
> > > > > of determining how many tokens we can handle at once, so we
> > > > > *gradually* increase the limits until we get close to overload.
> > > > >
> > > > > > More subtle mechanisms needed in order for sending nodes to
> > > > > > know/decide not
> > > > >
> > > > > only about that fact that more traffic can be sent, but also how
> > > > > fast that should be done if there is more than one packet to send.
> > > > >
> > > > > Yes, we need a TCP-like mechanism to avoid congestion between
> > > > > peers, but that's not enough to avoid overloading the peers
> > > > > themselves (rather than the individual links).
> > > > >
> > > > > > pushing 40% of traffic at a single peer and disregarding his
> > > > > > overload cries
> > > > >
> > > > > can not be seen as good
> > > > >
> > > > > When did I say we should ignore overload? I said the overloaded
> > > > > node's neighbours should respond, rather than the source of the
> > > > > traffic, because the overloaded node's neighbours have more control
> > > > > over the amount of load it receives than a source elsewhere on the
> > > > > network.
> > > > >
> > > > > > The ECN distinguishing feature is that congestion/impending
> > > > > > overload shared
> > > > >
> > > > > state gets maintained at all devices along the route
> > > > >
> > > > > But my point was that in FNP there's no concept of a 'route' as
> > > > > there is in TCP. A series of TCP packets will travel along (more or
> > > > > less) the same route between the same endpoints. A series of FNP
> > > > > requests will travel along diverging routes to completely different
> > > > > endpoints. So asking the source to slow down when it hits an
> > > > > overloaded node is shutting the stable door after the horse has
> > > > > bolted: the source has already moved on to some other route that
> > > > > might be more or less loaded than the current route.
> > > > >
> > > > > > Isn't it better to keep using simpler protocol
> > > > >
> > > > > Sorry, I don't see how the current backoff/throttling/preemptive
> > > > > rejection mechanisms are simpler than token-passing. Equally
> > > > > complicated, maybe, but not less. :-)
> > > > >
> > > > > > Solution here could be SCTP like protocols but they are very
> > > > > > inmature and
> > > > >
> > > > > unpopular, and unlikely suit freenet needs
> > > > >
> > > > > Yup, when I was working on the new congestion control design last
> > > > > year I considered copying DCCP or TCP Vegas to be more
> > > > > loss-tolerant (and it's still a possibility, if you want to design
> > > > > and document it). But in the end I decided the rest of the world
> > > > > was going to carry on using TCP, no matter how unsuitable, and the
> > > > > world's networks would be made suitable for TCP by hook or by crook
> > > > > (or by ugly MAC-layer
> > > > > retransmission hacks), so we could just copy TCP and let the layer
> > > > > 2 engineers solve the problem. :-)
> > > >
> > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.28 -
> > > > 22:10:09GMT -----
> > > >
> > > > > Once several dormant peers start sending new traffic at
> > > > > approximately the
> > > >
> > > > same time, the tokens passing mechanism have no even teoretical
> > > > chance to avoid bandwidth overages/packets drops.
> > > >
> > > > We're going in circles here. As I've already said, you never have
> > > > more tokens outstanding than you can handle if they come back
> > > > *simultaneously*. So I will only allow you to accumulate a large
> > > > number of tokens if I can handle all those tokens, and all other
> > > > outstanding tokens, being used at once. We don't have any magic way
> > > > of determining how many tokens we can handle at once, so we
> > > > *gradually* increase the limits until we get close to overload.
> > > >
> > > > ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.05.29 - 01:26:17GMT
> > > > -----
> > > >
> > > > Okay, so just like the current code, we avoid timeouts by sacrificing
> > > > bandwidth - sometimes, sacrificing A LOT of bandwidth. Is that bad?
> > > >
> > > > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.05.29 -
> > > > 19:47:54GMT -----
> > > >
> > > > We'd be sacrificing a lot of bandwidth at startup, but eventually we
> > > > should reach full speed... I guess it partly depends on how long we
> > > > expect nodes to stay up.
> > >
> > > ----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.05.30 -
> > > 11:03:00GMT -----
> > >
> > > 1-2 hours at most.
> > >
> > > Even if freenet nodes typically stay online much much longer, daily
> > > effective/alloted bandwidth fluctations make any preformance-related
> > > statistics over longer periods pretty meaningless. Only stats like
> > > average packet size numbers do not vary noticeably.
> >
> > ----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.02 - 16:44:09GMT
> > -----
> >
> > I've given this some more thought and realised you're right: we don't
> > need to send special messages to inform our peers they have tokens. We
> > can handle it all internally by delaying network reads whenever we're
> > overloaded to cause the connections to block. (This will require some
> > minor modifications to the new congestion control design, because it
> > currently uses a huge receiver window - I'm chasing a deadline this
> > weekend but I'll make the changes to the wiki next week.)
> >
> > I still think we should use the token idea internally, to enforce
> > fairness by deciding which network connections to read from and which to
> > leave blocked. But that can wait for a later stage - the current problem
> > is avoiding overload, which doesn't require tokens.
>
> ----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.04 - 15:16:23GMT -----
>
> I don't follow. Don't we want to tell the node when it can send a request,
> so that it can deal with the situation where it doesn't have any tokens?
> I'm really not convinced by this don't-read-from-network idea: there could
> be *anything* in the incoming packet, it could be a request (in which case
> we may want to delay it), it could be data returning from a request we did
> (in which case we probably want to accept it, and not doing so may result
> in a timeout and retry => even more load), etc. The only way to tell the
> difference is to read and decode the packet.

----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.06.04 -
16:38:36GMT -----

The idea is to make the node that works at overload boundary to look as
becoming somewhat slower (very smoothly) and thus becoming a little bit less
attractive target for future traffic; on the other hand if a node is
underloaded, roundtrip time will drop and thus peers will be directing more
traffic here sparing more loaded peers.

Assuming the node does not have much more connections than recommended, even
overloaded node will still be doing certain progress reading each peer
packets, so timeouts are not to be expected commonly.

If a particular peer packets are not read fast enough so badly it becomes
dangerous that the peer will time out and resend the packet, likely it is
reasonable to read [some of] the peer packets fast, but also quickly inform
the peer explicitly that we are badly overloaded - it may be the currently
used backoff mechanism, or maybe instant overload notifications not tied to
any particular key/request. That won't count as smooth bandwidth control
indeed , but hopefully will be happening very rarely. Quite less frequently
than backoffs currently, anyway.

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.06 - 22:27:03GMT -----

We're not comparing it with the current mechanism, we're comparing it with
token passing. Which has a number of major advantages: It won't cause
existing transfers to timeout, it will throttle requests at the point of them
being sent, it will prevent flooding (e.g. you can fit an awful lot of
requests in a single 1K packet!).

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.07 - 11:41:38GMT
-----

True, but you don't have to process one packet at a time - you can process one
FNP message at a time (see my other reply about socket-like interfaces).

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.07 - 14:23:42GMT -----

So what exactly are you proposing? Not reading packets will result in more
network load - they will be resent. What we can do is read the packets,
decode them, and decide what action to take depending on the message type.

If it is a data block:
- If it is an answer to a local request, we can deal with it immediately, with
no load.
- If it is an answer to a remote request, we have accepted that request
already, and it has reached the rare state of actually transferring data.
- We pass it on: We are committed, the only alternative is a timeout. Which
would probably result in another request - which might come back to us or
might be unsuccessful and then come back to us. Either way, we've wasted all
the data sent so far, not just on this node but on all the other nodes it
passed through!
- Congestion control will ensure we don't send data too quickly.
- Request level limiting has already happened when we accepted the request. It
has determined that it is viable to accept this many requests: That we didn't
accept so many requests that they will have to timeout because they can't be
transferred fast enough to make the timeout deadline.

If it is a request, the calculation is obviously rather different. The two
tools available at this point are to reject the request or to queue it. If
the queue is full, we'd still have to reject it - or reject a previously
queued request, but that would be mad. All that explicit token passing does
is inform the previous node of when we are likely to be able to accept a
request without rejecting it, or queueing it for a long time. It propagates
the queueing back towards the request source. Which is what we want: It's a
basic requirement of 0.7 load management that load must be quenched as close
to the source as possible. Otherwise everyone constantly sends requests in
the hope that they will shout louder than the other nodes! Using rejections
to accomplish this function doesn't work, because the requests will be
rerouted, and they will use even more load. Using fatal rejections doesn't
work either because the client will just send more requests (certainly if
it's a flooder). At the moment we use rejections and propagate them back to
the source; the original client node is supposed to reduce its sending rate.
This does not prevent flooding at all. Now, queueing may be a better option
than rejection - but only marginally. When the queue is too long we will have
to reject requests. Requests which have been on the queue for too long will
also have to be rejected, and the previous node (or the client) will retry at
a less overloaded node. Explicit token passing on the other hand saves a lot
of time, unnecessary retries, and timeouts.

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.07 - 15:05:21GMT
-----

> Not reading packets will result in more network load - they will be resent.

I'm talking about the FNP layer. They will be acked at the transport layer, so
they won't need to be resent. Think about how it works with TCP: if you don't
read from a socket the packets don't time out, a couple of packets get acked
and sit in the buffer, and the sender blocks until the buffer is cleared.
Likewise with TLS/SSL: the input has to be decoded one record at a time, but
the application layer doesn't have to process a whole record at once, it
doesn't even know about TLS records, it just sees a socket.

> - We pass it on: We are committed, the only alternative is a timeout.

Yes, we should do everything we can to prevent existing transfers from timing
out. However, bear in mind that I'm only saying we should block when the
outgoing queues are already full. Reading a data block from a socket and
adding it to an overfull queue will not prevent a timeout. The only way we
can prevent timeouts is by accepting new requests at a sustainable rate, and
that means we sometimes have to make our peers and/or clients wait.

> All that explicit token passing does is inform the previous node of when we
are likely to be able to accept a request without rejecting it, or queueing
it for a long time.

Unfortunately it doesn't do that. I thought it did, but it doesn't, because as
Anonymous pointed out, token-passing is based on how many requests we
*predict* we'll be able to accept. It has the same problems as any other
prediction of future load (eg the current bandwidth liability estimates): if
we make a conservative prediction we don't use all the available resources,
but if we make a less conservative predicition we risk getting timeouts due
to natural bursts of load.

How can we avoid needing to predict future load? Instead of asking "how many
requests can I send you in the next minute" we could ask "can I send you a
request right now". But that would require an extra round-trip per request.
Luckily we don't need to ask, because we can see from the amount of data in
the peer's queue whether it's overloaded. So we can avoid routing new
requests to overloaded peers, as long as overloaded peers make us wait.

> It's a basic requirement of 0.7 load management that load must be quenched
as close to the source as possible.

That's a good principle, and I look forward to discussing how it can be
implemented in an untrusted network - I still think the token bucket ideas we
discussed last year are promising. However, we shouldn't lump all of these
different problems together (coalescing, congestion control, flow control,
fairness). First we need to solve the problem of handling the load that
exists in the network, then we can move on to the problem of deciding who
gets to generate that load.

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.07 - 15:58:57GMT -----

Again, we're not talking about "the amount of data in the peer's queue". We're
talking about *the number of requests in the peer's queue*. As stated
repeatedly, most messages need to be handled immediately in any case, to
prevent timeouts. Only messages likely to cause significant future load and
initiate a new process i.e. requests, inserts, and similar messages need to
be managed at this level. Lumping all messages together makes no sense.

I will not remove the current load limiting mechanisms - which DO propagate
load back to sender, admittedly in an exploitable way - until we have a
solution which also propagates load back to sender.

However, as far as I can see, any queueing or token passing mechanism should
do exactly this. What you propose is essentially a form of request queueing,
correct? We can't refuse to acknowledge a request until we have completed it
(because it would be resent). What we can do is track how many of our
requests are in progress (including queued) on each of our peers, and not
send more than a certain limit (set by the node). So what we have is
something like implicit token passing: When a request completes, usually we
can start another one, unless our maximum number of running requests is
reduced, in which case we have to wait for another of our requests to
complete. The only real difference between this and regular token passing is
the assumption that we will never be granted a token because some *other*
node finished a request, and the significantly smaller buckets. Are you
assuming that we will never have more than one request running on a given
node? I'm not sure this is reasonable - the search phase of a request uses
virtually no bandwidth, but takes time; and the data source may very well be
slower than the available bandwidth.

And yes, we can discuss how best to adapt all these algorithms for opennet
once we have some workable proposals for darknet.

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.07 - 22:07:26GMT
-----

> Again, we're not talking about "the amount of data in the peer's queue".
We're talking about *the number of requests in the peer's queue*.

You might be talking about that. I'm talking about the amount of data.

> As stated repeatedly, most messages need to be handled immediately in any
case, to prevent timeouts.

Yes, and as stated repeatedly, I'm not suggesting we shouldn't ack the
messages, I'm suggesting we should ack them and leave them in a buffer, and
tell the sender not to send any more messages until the buffer has been
cleared.

> Only messages likely to cause significant future load and initiate a new
process i.e. requests, inserts, and similar messages need to be managed at
this level. Lumping all messages together makes no sense.

I can see your point, but I can't see a mechanism that achieves what you want.
However, I can see a mechanism that achieves something close to what you
want, by lumping all messages together.

If you have a better mechanism in mind, could you please describe it in
detail? In particular, how do you avoid the choice between giving each peer
as many tokens as you can handle in one burst, thus risking overload if
several peers use their tokens at once, and dividing the number of tokens you
can handle in one burst among your peers, thus risking underutilisation if
not all peers use their tokens at once?

> I will not remove the current load limiting mechanisms - which DO propagate
load back to sender, admittedly in an exploitable way - until we have a
solution which also propagates load back to sender.

Fine, I'm not asking you to remove anything. I'm just saying we should design
the replacement one layer at a time. We need to be able to control load
before we can shape it. You can't push load back to the sender, or anywhere
else, until you have a grip on it!

> We can't refuse to acknowledge a request until we have completed it
(because it would be resent).

Right, and that's not what I'm suggesting.

> The only real difference between this and regular token passing is the
assumption that we will never be granted a token because some *other* node
finished a request, and the significantly smaller buckets.

Not sure what you mean by the assumption that we'll never be granted a token,
but I agree it's similar to token-passing - in fact it's really the same
thing but with buffers instead of buckets, and bytes instead of searches.

I don't want to throw out the baby with the bathwater - tokens are still a
good idea, it's just that the implementation I had in mind was getting more
and more complex and still failing to address the problems with the current
mechanism (ie the need to predict future load).

If you want to limit incoming searches while allowing other traffic (which I
agree is a good idea in principle), maybe we can come up with a mechanism
that achieves the best of both worlds. But I honestly can't see how to do
it - if we start cherry-picking messages from the incoming queue but leave
searches in the queue until we have free bandwidth, then other messages will
keep taking the bandwidth and the searches will sit in the queue until they
time out. Classic starvation situation.

> Are you assuming that we will never have more than one request running on a
given node?

Not at all - we can keep accepting new searches until our output queues get
full. Then we stop processing incoming traffic until our output queues are no
longer full.

> And yes, we can discuss how best to adapt all these algorithms for opennet
once we have some workable proposals for darknet.

That will be ... interesting. :-)

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.07 - 23:16:01GMT -----

As far as I can see the suggested mechanism doesn't achieve anything. In
particular, it does not prevent us from starting so many requests that not
only all those requests but also all our own requests and all our previously
started requests will time out. We must limit load pre-emptively, by
rejecting, queueing, or avoiding requests, rather than waiting for
catastrophe and then doing very little to alleviate it. The original proposal
for token passing limited load by limiting the number of requests. Even the
current broken mechanisms limit load by limiting the number of requests.
Congestion control is a separate matter - obviously we won't always be able
to forward a data block immediately, but we can at least cache it. Not doing
anything to limit requests will result in timeouts. Lots of timeouts,
especially if we are flooded (flood protection being one of the main reasons
for a new load management scheme).

With regards to underutilisation: We will get underutilisation unless we
either a) accept that we will have lots of timeouts, b) abolish timeouts
altogether, or c) have such long timeouts, and such fast average data
transfer, that avoiding timeouts will not prevent us using the connection
more or less fully. The latter is the output bandwidth liability calculation:
If we can transfer X bytes per second outbound to our requestors, a
successful request takes Y bytes, and a request transfer must complete in Z
seconds, then at most we can have N = X * Z / Y requests running. Now, will
this result in bandwidth underutilisation? If an average request takes R
bytes in Q seconds, we will use R * N / Q bytes per second; if this is
greater than X, we won't undershoot.

Now, lets plug some numbers in.
X = usable part of output limit. Lets say 10kB/sec.
Y = 32kB (approx for a CHK)
Z = 90 seconds (we don't take search time into account)
R = 6K (from my node for a remotely originated request)
Q = 18 seconds (from my node)

We can therefore have up to 10K * 90s / 32K = 28 CHK requests running at once.
These will use 6KB * 28 / 18s = 4kB/sec on average, so we will underuse the
bandwidth by a factor of 2.5.

How to improve on this situation? Make requests more likely to succeed, make
requests succeed more quickly when they do, or increase the timeout. Or
accept that if all the requests succeed (which is not that rare a blip), some
will timeout. Generally allowing requests to timeout is a very bad thing as
they will undoubtedly be retried.

Maybe we can try to adapt to this dynamically. One way to do that would be
simply to have a very short term average for R?

As far as starvation goes, transfers are caused by searches. The only thing
that would strangle searches is direct f2f transfers, or ULPR subscription
responses, and we can impose a maximum bandwidth usage on both.

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.08 - 00:02:05GMT
-----

> it does not prevent us from starting so many requests that not only all
those requests but also all our own requests and all our previously started
requests will time out.

Yes it does. We read a message from the socket and process it. After
processing it, if the data in the outgoing queues can be transmitted in a
reasonable time, we read another message. Otherwise we wait, which causes the
senders (peers and clients) to block. So we only start requests when we have
spare capacity.

> We must limit load pre-emptively, by rejecting, queueing, or avoiding
requests, rather than waiting for catastrophe and then doing very little to
alleviate it.

No argument there. :-) This proposal combines queueing and avoiding: we have
an outgoing queue for each peer, and if the queue gets too long we route
around the peer. If all the queues get too long we stop processing incoming
traffic, which will eventually cause our peers to route around us, and so on.

(By "route around the peer" I mean we don't send searches to the peer.
Obviously we continue any transfers that are in progress.)

> We can therefore have up to 10K * 90s / 32K = 28 CHK requests running at
once. These will use 6KB * 28 / 18s = 4kB/sec on average, so we will underuse
the bandwidth by a factor of 2.5.

I make the utilisation 9.3 kB/s, which isn't so bad.

> Maybe we can try to adapt to this dynamically. One way to do that would be
simply to have a very short term average for R?

Actually I think we should have a long-term average. The bandwidth usage of
the last request from neighbour X doesn't help us predict the bandwidth usage
of the next request from neighbour Y, but the average bandwidth usage of the
last hundred requests helps us predict the average bandwidth usage of the
next hundred requests.

Then we just need to be able to cope with bursts, eg a series of requests that
all succeed at once. Assuming we're limited by the bandwidth limiter rather
than congestion, we can drain the bandwidth bucket during a burst and refill
it afterwards. But if we're limited by congestion we might get timeouts.

> As far as starvation goes, transfers are caused by searches.

Good point. But we can have a lot of transfers in progress at any time, so
searches could still get stuck for a long time.

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.08 - 12:59:40GMT -----

> Yes it does. We read a message from the socket and process it. After
processing it, if the data in the outgoing queues can be transmitted in a
reasonable time, we read another message. Otherwise we wait, which causes the
senders (peers and clients) to block. So we only start requests when we have
spare capacity.

I don't follow. We read a request, we forward it, we read another request, we
forward that, repeat for 10 seconds or so until the replies start coming in
and the shit hits the fan...

We could have a separate token bucket for requests based on the average
bandwidth used by a request. In fact we already have that, but it's been
superceded by the liability limiter.

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.09 - 00:49:38GMT
-----

Why does the shit hit the fan? If the queues fill up with reply traffic our
peers will stop sending us new searches. I guess it's possible (though
unlikely) that all the replies will arrive at once, in which case the
transfers will proceed very slowly, but as long as they're making progress
they shouldn't time out, right?

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.09 - 14:09:37GMT -----

They will time out if they fail to complete within the overall transfer
timeout. Currently this is set at 2 minutes (from when we get an Accepted).
What you propose would result in us accepting an extremely large number of
requests over the 10-20 secs it takes to find the data, then getting bogged
down in replies and most requests timing out mid-transfer.

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.09 - 20:37:09GMT
-----

Instead of an overall timeout, would it make sense to use a shorter timeout
that's reset each time a new piece of the transfer arrives? Presumably we
don't want to abandon a transfer that's 90% complete just because it's
progressing slowly, as long as it's still progressing.

With regard to accepting more searches than we can handle: that shouldn't
happen once we reach a steady state (ie the queues are nearly full most of
the time), but you're right that it might be a problem in the first minute or
so after we start up. We should accept input cautiously until we get an idea
of our steady-state throughput. One way to do that would be to initialise the
output rate averages to very conservative values. That will cause us to
restrict the rate at which we accept input until the moving averages have had
time to adapt.

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.11 - 12:20:18GMT -----

Well, we want to cut it off at some point, don't we? We don't want it to start
transferring and take an hour to finish! A normal request should complete
reasonably quickly i.e. within a reasonable time. And it might very well take
a while to do the search and then have short gaps between blocks.

I don't see how the node would adapt. The whole point is that we are accepting
requests as if they had no side effects and only cost us the bandwidth
involved in forwarding them. When in reality they cost us a LARGE amount of
bandwidth later on. We MUST take into account the total lifespan bandwidth
used by a request when deciding whether to accept it. Perceived elegance is
no use if it doesn't work.

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.12 - 22:52:29GMT
-----

> And it might very well take a while to do the search and then have short
gaps between blocks.

Good point, I guess an overall timeout makes more sense in that situation. It
just seems wrong to cut off a transfer that took a long time to get started
but is now progressing, but I guess we have to draw the line somewhere.

> We MUST take into account the total lifespan bandwidth used by a request
when deciding whether to accept it.

I understand your point, but we don't know the total overhead a priori, so no
matter what mechanism we use for calculating how many requests we can accept,
we'll have to start slowly and increase gradually. Whether we use explicit
tokens, bandwidth liability estimation or blocking sockets, the mechanism
will involve a feedback loop that starts at a conservative value and adapts
gradually (gradually meaning 'on the order of the longest timeout'). That's
the only way to discover our capacity without timeouts.

If we expect peer connections to last for at least several minutes on average,
we can afford to take a few minutes to reach maximum speed. During that time
the first searches will have completed or timed out, and we'll have started
to get a realistic idea of the total overhead of each search. That
information goes back into the feedback loop. So there's really not much to
choose between the various approaches as far as I can see, except that
blocking sockets are simple. I don't think we'd be sacrificing anything for
the sake of elegance.

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.13 - 18:23:28GMT -----

It's the way to guarantee that we have *some* timeouts and that we use close
to the maximum available bandwidth. The other alternative is to have no
timeouts and use less than the available bandwidth sometimes. Which is what
we do now.

Anyway, what you just said assumes that we take into account the bandwidth
used by requests. What you've been arguing for for the last ten messages is
to NOT take it into account.

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.14 - 21:32:43GMT
-----

No, what I've been arguing is that we don't need to *explicitly* take the
lifetime overhead into account, because the feedback loop should be slow
enough that it will *implicitly* take it into account anyway. If the feedback
loop takes a few minutes to adapt (which it must if we don't want to
over-adapt to short-term conditions) then we're implicitly measuring the
lifetime overhead of a search, and we don't need to make an additional
explicit estimate of the lifetime overhead.

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.15 - 17:12:52GMT -----

What makes you think it would work at all? It seems to me that it would simply
oscillate - we accept too many requests, most of them time out, the we accept
too many requests again, and most of them timeout again. KISS is one thing,
but not slitting your own throat with occam's razor is another!

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.15 - 21:44:34GMT
-----

:) If the feedback loop is slow enough it shouldn't oscillate - that applies
equally to bandwidth liability estimation, token passing or any other
mechanism - we musn't get into the situation of saying "I accepted 100
requests in the last 10 seconds and nothing bad has happened yet, therefore I
can accept another 100". As far as I can see, the only way to avoid that
situation is by adapting very slowly, ie on the order of the longest timeout.
Luckily we expect the connections to be relatively long-lived (minutes or
hours) so we can afford to take a few minutes to get up to speed.

Assuming that we have a suitably slow feedback loop in place, the next
question is how to tell our peers when we're ready to accept another search.
We could use tokens, preemptive rejection or blocking sockets. I don't have
any religious opinions on this issue, but I thought Anonymous made a fairly
good case for handling it in the same way as any other network app: don't
read from the socket until you're ready to process more data. I realise
there's a highly variable relationship between bytes in and bytes out, but
regardless of which mechanism we use we'll have to rely on the slow feedback
loop to smooth that out, because so far nobody's suggested a way of dealing
with it that doesn't involve favouring some kinds of traffic over others. (I
hope we agree that we don't want the whole network to end up favouring SSK
traffic over CHK traffic just because it happens to come in smaller quanta.
Maybe there are reasons for giving SSK traffic priority, but that isn't one
of them.)

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.18 - 14:25:49GMT -----

It's not possible using the mechanism you propose *on its own*, while it *is*
interesting. Here's why: I have 48kB/sec output. With your proposed
mechanism, the fact that I have 1MB/sec input won't matter. So a leaf node
requests a splitfile through me, which is all in the network, but is not in
my datastore. I route his requests. It takes 5 seconds for enough of the
requests to start transferring to make a difference. Over that 5 seconds,
he's sent me 240kB of requests. That's around 240k/50 = 4,915 requests. Which
will yield 160MB of data. Unfortunately it will take me 160M/48k = nearly an
hour to transfer all the data.

This is not acceptable, because we want Freenet requests to complete in a
reasonable amount of time - if only for fproxy support, which IMHO is an
important application. And I don't like the idea of having different request
priorities either; it gives away information on the traffic and allows for
Bad Things.

Imposing an arbitrary limit on the number of requests running is not the
solution either. A limit may be imposed because of threads/memory usage, but
this is likely to be heavily architecture dependant. The solution IMHO is to
take into account likely future bandwidth usage. If we want to guarantee that
there are no timeouts, this means bandwidth liability limiting. The current
code will accept an SSK request only if it could also accept a CHK request,
and vice versa, so I don't see any reason to think it is excessively biased
in favour of CHKs. However if you like I will add code to collect the
probability of rejection for SSKs and CHKs individually.

For data blocks, clearly we cannot send one if there is no available space in
the congestion window to the peer in question. However we may want the data
for ourself, or for multiple peers, in which case the slowest peer should not
necessarily dictate the transfer rate; AFAICS we must accept the data as fast
as we can manage within memory/cpu/etc limits, and limit the incoming
bandwidth at a higher level.

So given that we must limit traffic on the request level (as well as the
congestion control level), how can we do this? We can either:

A) Wait for a request, and then either accept it or reject it, based on e.g.
how many requests are running already.
PRO:
- Simple
CON:
- Wasteful of time and bandwidth when we have to ask multiple peers before one
accepts
- Difficult to adapt to propagate load back to source

Queueing doesn't help much here because we still have to complete the
request - including queueing it - in a reasonable time.

B) Tell our peers when they can send us a request. The obvious way to do this
is to compute our overall quota of requests (or request-success-bytes), and
allocate it to our peers (and tell them on every packet/response/etc),
initially equally, but progressively allowing more to busier nodes and less
to nodes that don't use their quota (but with diminishing returns: a node
with a low quota that suddenly starts using more will take quota away from an
established heavy requestor). Thus, initially, if most nodes aren't busy we
won't run many requests, but as we identify those nodes which need a bigger
quota, more requests run overall. Note that running fewer requests than we
could isn't necessarily a problem anyway unless they complete really slowly.

How to avoid excessive misrouting?
Options:
1: We don't need to queue requests, as they are queued on the next node for us
(running requests can be regarded as queued requests). When we get a request,
we identify the set of nodes that we are willing to route it to, and if none
of them have any free request slots, we reject it.
2: Queueing requests helps, because we can match requests with nodes. When we
get a request, (if it's allowable by the node's current quota), we queue it.
When we have a slot in an outgoing node's quota, we send the request which is
closest to the node's location (which hasn't already been to that node),
regardless of the time it's been queued for (if multiple nodes have space, we
find the best request/node pair until they don't or we run out of requests).
If after a certain period a request hasn't been sent, we reject it. This
avoids excessive misrouting without requiring arbitrary parameters (as the
first option does), and sends more specialised requests to slower nodes.
3: Simulated queueing: remember the locations of recent requests we could have
sent to a node, and calculate a threshold, which decays over time if we don't
accept a request (obviously this will result in fewer requests being sent,
but lower latency).

Does this prevent flooding? Yes: we only accept, and offer, as many requests
as we can handle, and with the right fairness algorithms, a flood will be
limited to what the other peers don't need, although if our overall quota
calculation is too generous flooding may still be an effective attack.
Obviously on opennet, connecting to several hundred peers and flooding to
each of them will be fairly effective; opennet sucks, we all know this, it's
a transition phase.

Does this propagate load back to the originator? Not explicitly, as it's not
an originator-throttles scheme (such schemes don't prevent flooding and may
make it easier to identify the originator). However, the rate at which a node
can send requests is quite clearly limited by the rate at which the rest of
the network can handle them:

O - A - B - C

If B is a bottleneck, then O will only send requests at the rate that can be
funneled through it (or satisfied on A).

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.18 - 19:30:07GMT
-----

> I have 48kB/sec output.
> It takes 5 seconds for enough of the requests to start transferring to make
a difference. Over that 5 seconds, he's sent me 240kB of requests.

As I said on the mailing list, "argh". :-) You're still confusing steady-state
behaviour with startup behaviour.

1) Startup behaviour: not knowing how quickly I can process searches, I accept
very few. I do not accept 48 kB/s of traffic from my peers immediately - I
start slowly and gradually increase during the lifetime of the first
generation of searches. That means it will take at least the maximum lifetime
of a search (something like three minutes?) to discover how quickly I can
process searches.

2) Steady state behaviour: by the time the maximum lifetime of a search has
elapsed, I should be accepting searches from my peers at the same rate older
searches complete, whatever that rate happens to be. Think of it as a
pipeline: by the time the first generation of searches have completed, the
pipeline will be full of searches at all stages of the lifecycle, so by
accepting new searches into the pipeline as space becomes available, I will
implicitly be taking all stages of the lifecycle into account. I'll probably
be accepting 48 kB/s of traffic from my peers at this point, but most of it
will consist of existing transfers, not new searches.

Note that none of the above makes any mention of blocking sockets, tokens,
preemptive rejection or any other mechanism for controlling how many searches
I receive from my peers. The two issues are orthogonal: (a) deciding how many
searches I can accept, and (b) limiting my peers to sending that many
searches.

> Imposing an arbitrary limit on the number of requests running is not the
solution either.

I agree - we have to discover how many searches we can handle, and the only
way to do that is through *slow* adaptation. It has to be slow because the
delay between accepting a search and paying the price is extremely long, and
because the overhead is extremely variable (anywhere between about 50 bytes
for a DNF and about 320 kB for a CHK insert with lots of backtracking).

> The solution IMHO is to take into account likely future bandwidth usage. If
we want to guarantee that there are no timeouts, this means bandwidth
liability limiting.

Great, I have nothing against that particular mechanism, as long as it doesn't
favour SSKs over CHKs or requests over inserts.

> The current code will accept an SSK request only if it could also accept a
CHK request, and vice versa, so I don't see any reason to think it is
excessively biased in favour of CHKs. However if you like I will add code to
collect the probability of rejection for SSKs and CHKs individually.

I'd be interested to see those stats, because from my reading of the code we
do still accept SSKs under circumstances where we would reject CHKs. I
understand the conditional +1 in shouldRejectRequest(), but further down,
under "Do we have the bandwidth?", you're still checking whether there's
enough bandwidth for a search of the current type, rather than for *any* type
of search. The answer to that question will be "yes" more often for SSKs than
it will for CHKs, simply because they require less bandwidth. I realise that
in the short term it seems stupid to reject an SSK request when we have
enough bandwidth, just because we would have rejected a CHK request. But if
we don't do that, we are implicitly making it harder for CHKs and inserts to
succeed. A bandwidth-limited node is always operating at the margin: it saves
up just enough bandwidth for an SSK request, and immediately spends it on an
SSK request. So it never saves up enough bandwidth for an insert or a CHK
request. No wonder Anonymous's low-bandwidth node is flooded with SSK
requests - it rejects everything else! And because there are four separate
throttles, the whole network sends fewer inserts and CHKs as a result. Not
just to the slow node - to everywhere.

> the slowest peer should not necessarily dictate the transfer rate; AFAICS
we must accept the data as fast as we can manage

Agreed, the data should be pushed onto the slow peer's queue and the transfer
should be allowed to continue. We should only stop accepting data if most or
all of our outgoing queues are full. This implies misrouting, just as
advisory backoff currently implies misrouting: if a single peer is overloaded
we act as if it's not there for the purposes of routing. If most of our peers
are overloaded, we should start to accept traffic more slowly in order to
push the load back.

> compute our overall quota of requests (or request-success-bytes), and
allocate it to our peers (and tell them on every packet/response/etc),
initially equally, but progressively allowing more to busier nodes and less
to nodes that don't use their quota

Yes, this was the token-passing proposal and I still think it's a good idea.
All I'm saying is that it might not be neccessary to send an explicit message
such as "you can send me a search now" or "don't send me any more searches
for a while" or "send me searches at this rate on average", because if the
transport layer blocks the sender when the receiver's buffer is full then we
can send the same message by just not reading from the buffer. A nice
side-effect is that if we're late reading from the buffer for any unexpected
reason, eg if we're CPU starved by another process, the sender automatically
blocks until we get back on our feet.

But like I said before, I'm not married to this idea. We have to solve three
problems, and I don't think it's particularly important which mechanism we
use for each one as long as the problems get solved:

1) Find out, through a process of slow adaptation, how quickly we can process
searches on average
2) Limit our peers to sending as many searches as we can process
3) Enforce fairness so that no single peer can use all our resources, without
letting resources go unused

> How to avoid excessive misrouting?

OK, make that four problems. :-)

(Fear, surprise, ruthless efficiency...)

> When we have a slot in an outgoing node's quota, we send the request which
is closest to the node's location (which hasn't already been to that node),
regardless of the time it's been queued for

Very interesting suggestion, but I'm a bit wary because it's hard to imagine
the dynamic behaviour when searches overtake each other. For example if we
have two peers at locations 0.4 and 0.5, it seems like searches for 0.45 will
have a hard time making progress.

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.22 - 00:00:46GMT -----

Okay, so we have to determine how many searches we can accept from a given
node. That I can agree with. One way to do it is to simply have AIMD on the
number of searches (but only increase the window if we actually used it), and
an overall calculated cap, as I outlined in another post. I believe this has
been proposed at least once in the past.

I don't have a big problem with output liability limiting functioning as a
maximum guideline, rather than as the sole mechanism. AFAICS it doesn't
favour particular types of requests. For the long-term token buckets, you are
right, but these are very rarely used; output bandwidth liability is by far
the dominant rejection reason on most people's nodes (unless they have
traffic shaping, in which case it would be ping time). Maybe I should just
delete that part of the code - or have it check for the maximum and only grab
what that specific type needs?

As far as blocks go, see my other set of posts (monologues? :| ). I'm not
convinced that any such scheme would solve the problems it's aimed at
solving, and there really is no inherent flow control problem: we can spool
everything to disk (the datastore) very comfortably. The real problem we need
to deal with is that a node can cause more data to be transferred than it is
able to or willing to receive - but as far as I can see that cannot easily be
solved on a flow control level. By all means correct me if I'm wrong, but
preferably on the other thread.

I'm not arguing for an explicit message. What I was proposing was that when we
send a message to the node because e.g. a request has finished, we include
its current allowed window size. This differs to traditional token passing in
that in traditional token passing we'd finish a request and then send a token
to a *different* node soliciting a request. Whereas here, we just tell our
clients their quotas when we'd be talking to them anyway; if their quota
shrinks and they exceed it we reject a request, if it increases, it probably
won't increase fast enough to be a worry, though we might send an explicit
notification in a packet on its own occasionally - but it won't be the rule,
it'll be the exception.

> 1) Find out, through a process of slow adaptation, how quickly we can
process searches on average
> 2) Limit our peers to sending as many searches as we can process
> 3) Enforce fairness so that no single peer can use all our resources,
without letting resources go unused
> 4) How to avoid excessive misrouting?

These are basically the problems to be solved, right. And there are a range of
proposals to deal with them. There is the additional question of whether we
need end to end flow control or something similar on data transfer; that's a
somewhat orthogonal question though from my perspective.

> > When we have a slot in an outgoing node's quota, we send the request which
is closest to the node's location (which hasn't already been to that node),
regardless of the time it's been queued for
>
> Very interesting suggestion, but I'm a bit wary because it's hard to imagine
the dynamic behaviour when searches overtake each other. For example if we
have two peers at locations 0.4 and 0.5, it seems like searches for 0.45 will
have a hard time making progress.

Not sure I follow. It depends where the incoming requests' locations are,
doesn't it.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/tech/attachments/20070808/deae4281/attachment.pgp>

[Tech] [old frost] bandwidth usage improvements for old nodes (and much more!)

Reply via email to