[freenet-dev] Beyond New Load Management: A proposal

2011-08-30 Thread Arne Babenhauserheide
Am Dienstag, 30. August 2011, 01:08:16 schrieb Arne Babenhauserheide:
> 5) solution: count each SSK as only
>   average_SSK_success_rate * data_to_transfer_on_success.

Some more data: 

chances of having at least this many successful transfers for 40 SSKs with a 
mean success rate of 16%: 

for i in {0..16}; do echo $i $(./spielfaehig.py 0.16 40 $i); done

0 1.0
1 0.999064224991
2 0.99193451064
3 0.965452714478
4 0.901560126912
5 0.788987472629
6 0.634602118184
7 0.463062835467
8 0.304359825607
9 0.179664603573
10 0.0952149293922
11 0.0453494074947
12 0.0194452402752
13 0.00752109980912
14 0.0026291447461
15 0.000832100029072
16 0.00023879002726

what this means: if a SSK has a mean success rate of 0.16, then using 0.25 as 
value makes sure that 95% of the possible cases don?t exhaust the bandwidth. 
We then use only 64% of the bandwidth on average, though. With 0.2, we?d get 
68% of the possible distributions safe and use 80% of bandwidth on average.

Note: this is just a binomial spread: 

from math import factorial
fac = factorial
def n?k(n, k): 
   if k > n: return 0
   return fac(n) / (fac(k)*fac(n-k))

def binom(p, n, k): 
   return n?k(n, k) * p** k * (1-p)**(n-k)

def spielf?hig(p, n, min_spieler): 
   return sum([binom(p, n, k) for k in range(min_spieler, n+1)])


? USK at 6~ZDYdvAgMoUfG6M5Kwi7SQqyS-
gTcyFeaNN1Pf3FvY,OSOT4OEeg4xyYnwcGECZUX6~lnmYrZsz05Km7G7bvOQ,AQACAAE/bab/9/Content-
D426DC7.html


Best wishes, 
Arne
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 316 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110830/d131ec81/attachment.pgp>


[freenet-dev] Queueing doesn't use any bandwidth was Re: Beyond New Load Management

2011-08-30 Thread Arne Babenhauserheide
Am Dienstag, 30. August 2011, 12:32:17 schrieb Ian Clarke:
> Regardless, even if queueing doesn't use additional bandwidth or CPU
> resources, it also doesn't use any less of these resources - so it doesn't
> actually help to alleviate any load (unless it results in a timeout in which
> case it uses more of everything).

Queueing reduces the total bandwidth needed to transfer a given chunk, because 
it gives the requests the leeway they need to be able to choose the best 
route. This results in shorter routes. 

Actually it is a very simple system which is used in any train station: You 
wait before you get in instead of just choosing another train and trying to 
find a different way. And the fewer contacts we have, the more important it 
gets 
to choose the right path.

Also the increase in latency should be in the range of 20% for CHK requests 
and SSKs which succeed. Only unsuccessful requests should have a much higher 
latency than with OLM, because they don?t benefit from the faster transfers 
(shorter routes). 

Best wishes, 
Arne
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 316 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110830/82b7532f/attachment.pgp>


[freenet-dev] Queueing doesn't use any bandwidth was Re: Beyond New Load Management

2011-08-30 Thread Ian Clarke
On Mon, Aug 29, 2011 at 6:42 PM, Matthew Toseland  wrote:

> On Monday 29 Aug 2011 18:58:26 Ian Clarke wrote:
> > On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland <
> > Right, the same is true of queueing.  If nodes are forced to do things to
> > deal with overloading that make the problem worse then the load balancing
> > algorithm has failed.  Its job is to prevent that from happening.
>
> Not true. Queueing does not make anything worse (for bulk requests where we
> are not latency sensitive). **When a request is waiting for progress on a
> queue, it is not using any bandwidth!**
>

I thought there was some issue where outstanding requests occupied "slots"
or something?

Regardless, even if queueing doesn't use additional bandwidth or CPU
resources, it also doesn't use any less of these resources - so it doesn't
actually help to alleviate any load (unless it results in a timeout in which
case it uses more of everything).

And it does use more of one very important resource, which is the initial
requestor's time.  I mean, ultimately the symptom of overloading is that
requests take longer, and queueing makes that problem worse.

Queueing should be a last resort, the *right* load balancing algorithm
should avoid situations where queueing must occur.

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: ian at freenetproject.org
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110830/9afc084e/attachment.html>


[freenet-dev] Beyond New Load Management

2011-08-30 Thread Ian Clarke
On Tue, Aug 30, 2011 at 11:49 AM, Robert Hailey
wrote:

> On 2011/08/29 (Aug), at 12:58 PM, Ian Clarke wrote:
>
> The problem is that we come up with solutions that are too complicated to
> analyze or fix when they don't work
>
> The cause is complexity, which just grows and grows as we try to fix
> problems we don't understand by layering on more complexity.
>
>
> And what is the cause of that? Is the problem really one of *behavior*?
> emotions? workflow? organization?
>

A combination, but at its core I think its a failure to recognize that most
working systems are built on principles that are simple enough to understand
that they are either "obviously" correct, or can easily be proven to be
correct.  For example, AIMD, which is TCP's load management system, is
simple enough that you can describe the basic idea in a sentence.  It is
also fairly obvious that it will work, or at least its easy to simulate it
and convince yourself that it will work.

In contrast, Freenet's current load management system is a combination of
mechanisms that are perhaps individually of comparable complexity to AIMD
(in fact, AIMD is part of it although used in quite a different context to
how it is used in TCP), but together they are far more complex, and interact
in ways that are hard to predict.

For example, in a conversation with Matthew a few days ago, he realized that
the "fair sharing" mechanism which tries to allocate resources fairly among
"downstream" nodes, was probably generating a bunch of rejected messages
when it transitions from one mode of operation to another, with all kinds of
negative consequences.  Who knows how many other unintended interactions
there are?


> Matthew has presented some very real problems which he is trying to work
> around (with much frustration, I'm sure). I think he needs more leverage
>

My criticism is that Matthew's proposals for "fixing" the problem follow
exactly the same pattern of all the previous "fixes" that turned out to have
either no effect, or a negative effect.  It starts out with a bunch of
hand-wavey hypotheses about what the problem might be, which is followed by
a bunch of fixes for what might be the problem.  There is no evidence that
these problems are real, I think a big part of the reason is that the
existing system is too complicated to debug.

I know it sounds like I'm beating up on Matthew here, and that isn't my
intention, he is following a precedent set by me and others that have long
since left the project.  Having (I hope) learned some tough lessons, and
recognized the folly in our past approach, I'm now hoping its not too late
to rescue the project from an ignominious death.  I think we're talking
about weeks of work here, not months, and frankly I don't think we've got a
choice if the project is to survive.  Freenet is no use to anyone if it
doesn't work, regardless of how cool our goals are, or how clever our
algorithms and heuristics.


> If you read my suggestion below, we can discuss how it would allow:
>
> With an investment of developer time, we could separate the current freenet
> code into three interfaced sections (link-layer, routing-layer,
> user/client-interface-layer).
>
> If we then were to modify the outer layers to accept two routing-layers
> (e.g. client requests round-robin between the two but thereafter stay in
> that network) we could have "two networks in one" a stable-net (for the
> nay-sayers, a disaster/fallback, and as a control for measurement), and a
> development-net where experimentation could take place.
>
> Drawing the interface lines on theory (rather than present code-state)
> would be critical [e.g. load-balancing should be in the middle layer, imo].
> The goal being, reliable communication with near-guaranteed/methodical
> improvement
>
> While I'm sure there is much room for improvement in the way the code is
architected, and the separation of concerns - I don't think refactoring is
the answer.  We need to refactor the fundamental way that we solve problems
like load balancing.

Ian.


-- 
Ian Clarke
Founder, The Freenet Project
Email: ian at freenetproject.org
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110830/42369abb/attachment.html>


[freenet-dev] Beyond New Load Management

2011-08-30 Thread Robert Hailey

On 2011/08/30 (Aug), at 3:17 AM, Thomas Bruderer wrote:

> Thank you Ian! Good message! I am 100% behind your whole post! The  
> routing must go back to a simple system!

Even tearing up the current systems for a fifo queue is destructive  
without organization and a means of comparison.

In the science of software, there are generally two kinds of commits:

* Features - divergent, potentially disruptive, tend to contain bugs
* Bugfixes - convergent, generally repairing feature disruption,  
rarely unmask other bugs

On 2011/08/29 (Aug), at 12:58 PM, Ian Clarke wrote:

> The problem is that we come up with solutions that are too  
> complicated to analyze or fix when they don't work
> The cause is complexity, which just grows and grows as we try to fix  
> problems we don't understand by layering on more complexity.

And what is the cause of that? Is the problem really one of behavior?  
emotions? workflow? organization?

Matthew has presented some very real problems which he is trying to  
work around (with much frustration, I'm sure). I think he needs more  
leverage

If you read my suggestion below, we can discuss how it would allow:

* a full-network-sized network to try out new features & bugfixes
* nearly-identical traffic on both networks
* zero traffic redundancy (network waste)
* easy 1:1 comparison of various performances of two implementations  
(for measuring network effectiveness)
* full network uptime (at worst a totally-broken / 0% unstable-side  
would yield a 50% reduction in effectiveness)

It is simply a matter of *organization* and *multi-versioning* (which  
IMO, are both solved problems).

This is also tied into the subject of 'mandatory' node update  
deadline, as the utility of a split network would diminish if it's  
stable-side succumbs to the same 'chase' issues.

--
Robert Hailey


On 2011/08/26 (Aug), at 10:18 AM, Robert Hailey wrote:

>
> On 2011/08/25 (Aug), at 2:15 PM, Matthew Toseland wrote:
>
>> And we never, ever, ever, have enough data to evaluate a single  
>> build, even on the simplest metrics (see the push-pull tests). I  
>> could write a plugin to get more data, but digger3 promises to do  
>> it eventually and anyway I don't have time given the remaining  
>> funding and unlikeliness of getting more. And it's always been this  
>> way!
>>
>> Our whole business model forces me to just do things and not  
>> evaluate them!
>
> I think we had an idea for empirical stepwise advancement earlier.
>
> With an investment of developer time, we could separate the current  
> freenet code into three interfaced sections (link-layer, routing- 
> layer, user/client-interface-layer).
>
> If we then were to modify the outer layers to accept two routing- 
> layers (e.g. client requests round-robin between the two but  
> thereafter stay in that network) we could have "two networks in one"  
> a stable-net (for the nay-sayers, a disaster/fallback, and as a  
> control for measurement), and a development-net where  
> experimentation could take place.
>
> Drawing the interface lines on theory (rather than present code- 
> state) would be critical [e.g. load-balancing should be in the  
> middle layer, imo]. The goal being, reliable communication with near- 
> guaranteed/methodical improvement.
>
> --
> Robert Hailey
>
> ___
> Devl mailing list
> Devl at freenetproject.org
> http://freenetproject.org/cgi-bin/mailman/listinfo/devl

-- next part --
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110830/b3f0eedb/attachment.html>


[freenet-dev] Beyond New Load Management

2011-08-30 Thread Thomas Bruderer

> The entire approach of coming up with hypotheses about what is wrong, 
> building a solution based on these hypotheses (without actually 
> confirming that the hypotheses are accurate) and deploying it is deja 
> vu, we've been doing it for a decade, and we still haven't got load 
> management right.  We're just layering more complexity onto a system 
> that we already don't understand, based on guesses as to what the 
> problems were with the previous iteration that we can't test because 
> the system is too complicated with too many interactions for anyone to 
> get their heads around it.
>

Thank you Ian! Good message! I am 100% behind your whole post! The 
routing must go back to a simple system!



[freenet-dev] Beyond New Load Management: A proposal

2011-08-30 Thread Arne Bab.
Von: "Matthew Toseland" ???
>But the other question is, can queueing ever be helpful? It can if it allows 
>us to route more accurately (which NLM clearly does), and/or to run enough 
>requests in parallel that the longer time taken for the request to reach its 
>destination is offset. Is this condition met?

Experience with the deployed NLM showed that even in the fully congested case 
it had success rates of 60% for HTL 18,17 and 16, compared to less than 40% for 
OLM. This means that the requests are sent over fewer hops on average, because 
find the content fewer hops away from the requester.

A download of 1MiB which is sent over 2 hops needs 2 MiB in total network 
bandwidth.
If it is sent over only 1.5 hops on average, then it needs only 1.5 MiB total 
network bandwidth.

So essentially NLM can distribute 30% more content with the same network 
resources?. And these numbers are actual observations. The only reason why this 
did not result in increased performance is that the nodes used less than 50% of 
their allocated bandwidth? - which is a problem with the bandwidth scheduler 
and not with queueing.

Best wishes,
Arne

?: The relevant network resource is upload bandwidth.
?: Source: observations from me and two other freenet users.

PS: How exactly the bandwidth limiter is fixed is an implementation detail. I 
think you are actually the only person who can judge how to do this most 
efficiently.

___
Schon geh?rt? WEB.DE hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://produkte.web.de/go/toolbar
-- next part --
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1743 bytes
Desc: S/MIME Cryptographic Signature
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110830/02b4e75f/attachment.bin>


[freenet-dev] Beyond New Load Management: A proposal

2011-08-30 Thread Arne Babenhauserheide
oblem with queueing;
> > the alternative is to allow a larger window between when we start
> > complaining and when stuff breaks i.e. use less % of the total capacity
> >  in NLM: q(SSK) ~ q(CHK) ~ ?(CHK), ?(CHK) lower due to better
> > routes?  which might be faster in practice
> >  ?(CHK) depends on the length of the route. with 25% better
> > success rates per hop, it should be much lower
> >  ?need NLM stats? do you have some handy?
> >  let?s estimate 60%/50%/50%/50%  for HTL 18/17/16/15
> >  and I currentlo have 45%/50/25%/25% with OLM
> >  starting with 1000 requests, in NLM 600 have 1 hop, 200 have 2
> > hops, 100 3 and 50 4, 50 have more ? irrelevant.
> > -*- toad_ not following
> >  in OLM 450 have 1 hop, 275 have 2 hops, 69 3 and 51 4, 150
> > have more  I?m trying to estimate the hops a transfer has to
> > take  we can?t ignore the 150 with more than 4 hops in OLM
> >  I?ll just go down to 50, too
> >  what are you trying to compute?
> >  ian is convinced that queueing always makes the underlying
> > problem worse
> >  i'm inclined to agree with him unless you come up with a
> > persuasive theoretical argument
> >  120 have 5, 96 have 6, 77 have 7, 61 have 8, 50 have more
> >  so a 95% of the transfers in OLM take on average ?
> >  gah? need to divide the numbers, too
> >  (I need to generate data to make an argument - that?s what I?m
> > doing right now)
> >  average hops for OLM: 450*1 + 275*2 + 69*3 + 51*4 + [now with
> > correction] 150*0.22*5+120*0.2*6+96*0.2*7+77*0.2*8+61*0.2*9
> >  ? 2087.4
> >  for NLM 95% of 1000 transfers need 600*1+200*2+100*3+50*4
> >  = 1500 hops together
> >  that?s 2.09 hops per transfer for OLM and 1.5 hops for NLM ?
> > ?_nlm / ?_olm ~ 0.71
> >  ArneBab: okay, that's plausible
> >  ArneBab: however, it should be possible with smart load limiting
> > on the originator to achieve NLM-level success rates
> >  but not the resilience
> >  it still keeps freenet open to a DoS, NLM should help there.
> >  now back to the queueing: OLM had: ?q?(SSK) ~ 16s, ?q?(CHK) ~
> > 18s, ?(CHK) ~ 45s (my stats)
> >  possibly - fair sharing limits our vulnerability to a DoS,
> > possibly enough as long as we don't have to worry about incentives
> > issues  that?s about: q = ? ? ? (OLM)
> >  NLM: q ~ ?
> >  NLM: q ~ ? (NLM)
> >  time: 2?q + ?
> >  OLM: time ~ 5/3 ?_olm
> >  NLM: time = 3 ? 0.72 ?_olm = 2.15 ?_olm
> >  toad_: Alright, it's alive. https://github.com/freenet/fred-
> > staging/pull/55
> >  ? time_nlm / time_olm ~ 2.15 / (5/3) ~ 1.3
> >  so the time to transfer should be a bit longer
> >  (not yet finished: this is the current state)
> >  now, if we decrease the timeout time, the chance that a given
> > timeout happens in the first 4 hops should be about 4/20 = 0.2
> >  ?cut that?
> >  if we decrease the timeout time below the transfer time per
> > hop, there should be more misrouting ? ? goes up, q might go down or up
> > ? cut that.  transfer time per hop in OLM ~ 45s / hops_olm =
> > 45s/2.09 = 21.5s  ?actually, the time in NLM is so dependant
> > on transfer time, that the most efficient stratigy would likely be to
> > decrease the block size?  or to get a faster network
> >  toad_: got it, damnit: NLM is so much slower than OLM, because
> > it used less bandwidth!
> >  the time is a function of the raw bandwidth (not so with OLM),
> > and NLM used only half my bandwidth after it had been deployed for 2
> > days (at the start much more)
> >  when we double the bandwidth (1.8 years?), NLM should be
> > faster than OLM
> >  operhiem1: cool!
> >  toad_: actually I think the slot number calculation is flawed
> > ? less bandwith used than possible
> >  that?s why it did not break down, but slowed down to 1/5 OLM.
> > From the math here I?d have guessed 1/2.6
> >  but adding SSKs with many more hops and time almost pure queue
> > time it fits
> >  q_nlm ~ 3??q?_olm; in the full bandwidth case
> >  but with half bandwidth we actually are at 6?q_olm
> >  ? more slots should actually make it much better
> >  toad_: summary: ? ~ bandwidth. q_olm ~ 16s, q_nlm ~ ?! ? using
> > only 50% of bandwidth (too little slots) massively slows down NLM.
> >  the transfer times should actually be dominant
> >  though they are lower than the queue time.
> >  and freenet should get faster with faster network or lower
> > chunk sizes.
> >  toad_: so first step: make sure all bandwidth gets used -
> > maybe by allocating more slots till about 2? the current number are
> > transferring -*- ArneBab is happy
> >  cool, lot's of stuff to read tomorrow morning. :)
> >  NLM should with the current network be slower than OLM by 23%.
> > But in 18 month it should actually be faster by ~8%, given Moores Law
> > holds for upload bandwidth.
> >  :)
> >  with faster I mean time to complete a request.
> >  reaction time ? latency
> >  digger3: maybe you can doublecheck the reasoning
--
A man in the streets faces a knife.
Two policemen are there it once. They raise a sign:

?Illegal Scene! Noone may watch this!?

The man gets robbed and stabbed and bleeds to death.
The police had to hold the sign.

?Welcome to Europe, citizen. Censorship is beautiful.

   ( http://draketo.de/stichwort/censorship )


-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 316 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110830/cfc43728/attachment.pgp>


[freenet-dev] Queueing doesn't use any bandwidth was Re: Beyond New Load Management

2011-08-30 Thread Matthew Toseland
On Monday 29 Aug 2011 18:58:26 Ian Clarke wrote:
> On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland <
> toad at amphibian.dyndns.org> wrote:

> > Misrouting is unacceptable, in general. Extremely overloaded or extremely
> > low capacity nodes may be routed around. We might even allow some bounded
> > amount of misrouting in the more general case (e.g. go to either of the top
> > two peers for the key). But in general, transforming load into misrouting
> > (or into reduced HTL, or any other bogus escape valve) is a bad idea. We
> > need to reduce the incoming load.
> 
> Right, the same is true of queueing.  If nodes are forced to do things to
> deal with overloading that make the problem worse then the load balancing
> algorithm has failed.  Its job is to prevent that from happening.

Not true. Queueing does not make anything worse (for bulk requests where we are 
not latency sensitive). **When a request is waiting for progress on a queue, it 
is not using any bandwidth!**
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110830/88fda31d/attachment.pgp>


[freenet-dev] Beyond New Load Management: A proposal

2011-08-30 Thread Matthew Toseland
e is to allow a larger window between when we start
> > > > complaining and when stuff breaks i.e. use less % of the total capacity
> > > >  in NLM: q(SSK) ~ q(CHK) ~ ?(CHK), ?(CHK) lower due to better
> > > > routes?  which might be faster in practice
> > > >  ?(CHK) depends on the length of the route. with 25% better
> > > > success rates per hop, it should be much lower
> > > >  ?need NLM stats? do you have some handy?
> > > >  let?s estimate 60%/50%/50%/50%  for HTL 18/17/16/15
> > > >  and I currentlo have 45%/50/25%/25% with OLM
> > > >  starting with 1000 requests, in NLM 600 have 1 hop, 200 have 2
> > > > hops, 100 3 and 50 4, 50 have more ? irrelevant.
> > > > -*- toad_ not following
> > > >  in OLM 450 have 1 hop, 275 have 2 hops, 69 3 and 51 4, 150
> > > > have more  I?m trying to estimate the hops a transfer has to
> > > > take  we can?t ignore the 150 with more than 4 hops in OLM
> > > >  I?ll just go down to 50, too
> > > >  what are you trying to compute?
> > > >  ian is convinced that queueing always makes the underlying
> > > > problem worse
> > > >  i'm inclined to agree with him unless you come up with a
> > > > persuasive theoretical argument
> > > >  120 have 5, 96 have 6, 77 have 7, 61 have 8, 50 have more
> > > >  so a 95% of the transfers in OLM take on average ?
> > > >  gah? need to divide the numbers, too
> > > >  (I need to generate data to make an argument - that?s what I?m
> > > > doing right now)
> > > >  average hops for OLM: 450*1 + 275*2 + 69*3 + 51*4 + [now with
> > > > correction] 150*0.22*5+120*0.2*6+96*0.2*7+77*0.2*8+61*0.2*9
> > > >  ? 2087.4
> > > >  for NLM 95% of 1000 transfers need 600*1+200*2+100*3+50*4
> > > >  = 1500 hops together
> > > >  that?s 2.09 hops per transfer for OLM and 1.5 hops for NLM ?
> > > > ?_nlm / ?_olm ~ 0.71
> > > >  ArneBab: okay, that's plausible
> > > >  ArneBab: however, it should be possible with smart load limiting
> > > > on the originator to achieve NLM-level success rates
> > > >  but not the resilience
> > > >  it still keeps freenet open to a DoS, NLM should help there.
> > > >  now back to the queueing: OLM had: ?q?(SSK) ~ 16s, ?q?(CHK) ~
> > > > 18s, ?(CHK) ~ 45s (my stats)
> > > >  possibly - fair sharing limits our vulnerability to a DoS,
> > > > possibly enough as long as we don't have to worry about incentives
> > > > issues  that?s about: q = ? ? ? (OLM)
> > > >  NLM: q ~ ?
> > > >  NLM: q ~ ? (NLM)
> > > >  time: 2?q + ?
> > > >  OLM: time ~ 5/3 ?_olm
> > > >  NLM: time = 3 ? 0.72 ?_olm = 2.15 ?_olm
> > > >  toad_: Alright, it's alive. https://github.com/freenet/fred-
> > > > staging/pull/55
> > > >  ? time_nlm / time_olm ~ 2.15 / (5/3) ~ 1.3
> > > >  so the time to transfer should be a bit longer
> > > >  (not yet finished: this is the current state)
> > > >  now, if we decrease the timeout time, the chance that a given
> > > > timeout happens in the first 4 hops should be about 4/20 = 0.2
> > > >  ?cut that?
> > > >  if we decrease the timeout time below the transfer time per
> > > > hop, there should be more misrouting ? ? goes up, q might go down or up
> > > > ? cut that.  transfer time per hop in OLM ~ 45s / hops_olm =
> > > > 45s/2.09 = 21.5s  ?actually, the time in NLM is so dependant
> > > > on transfer time, that the most efficient stratigy would likely be to
> > > > decrease the block size?  or to get a faster network
> > > >  toad_: got it, damnit: NLM is so much slower than OLM, because
> > > > it used less bandwidth!
> > > >  the time is a function of the raw bandwidth (not so with OLM),
> > > > and NLM used only half my bandwidth after it had been deployed for 2
> > > > days (at the start much more)
> > > >  when we double the bandwidth (1.8 years?), NLM should be
> > > > faster than OLM
> > > >  operhiem1: cool!
> > > >  toad_: actually I think the slot number calculation is flawed
> > > > ? less bandwith used than possible
> > > >  that?s why it did not break down, but slowed down to 1/5 OLM.
> > > > From the math here I?d have guessed 1/2.6
> > > >  but adding SSKs with many more hops and time almost pure queue
> > > > time it fits
> > > >  q_nlm ~ 3??q?_olm; in the full bandwidth case
> > > >  but with half bandwidth we actually are at 6?q_olm
> > > >  ? more slots should actually make it much better
> > > >  toad_: summary: ? ~ bandwidth. q_olm ~ 16s, q_nlm ~ ?! ? using
> > > > only 50% of bandwidth (too little slots) massively slows down NLM.
> > > >  the transfer times should actually be dominant
> > > >  though they are lower than the queue time.
> > > >  and freenet should get faster with faster network or lower
> > > > chunk sizes.
> > > >  toad_: so first step: make sure all bandwidth gets used -
> > > > maybe by allocating more slots till about 2? the current number are
> > > > transferring -*- ArneBab is happy
> > > >  cool, lot's of stuff to read tomorrow morning. :)
> > > >  NLM should with the current network be slower than OLM by 23%.
> > > > But in 18 month it should actually be faster by ~8%, given Moores Law
> > > > holds for upload bandwidth.
> > > >  :)
> > > >  with faster I mean time to complete a request.
> > > >  reaction time ? latency
> > > >  digger3: maybe you can doublecheck the reasoning
> > --
> > A man in the streets faces a knife.
> > Two policemen are there it once. They raise a sign:
> > 
> > ?Illegal Scene! Noone may watch this!?
> > 
> > The man gets robbed and stabbed and bleeds to death.
> > The police had to hold the sign.
> > 
> > ?Welcome to Europe, citizen. Censorship is beautiful.
> > 
> >( http://draketo.de/stichwort/censorship )
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110830/c5c14ad2/attachment.pgp>


[freenet-dev] Beyond New Load Management: A proposal

2011-08-30 Thread Matthew Toseland
ime = 3 ? 0.72 ?_olm = 2.15 ?_olm
> > >  toad_: Alright, it's alive. https://github.com/freenet/fred-
> > > staging/pull/55
> > >  ? time_nlm / time_olm ~ 2.15 / (5/3) ~ 1.3
> > >  so the time to transfer should be a bit longer
> > >  (not yet finished: this is the current state)
> > >  now, if we decrease the timeout time, the chance that a given
> > > timeout happens in the first 4 hops should be about 4/20 = 0.2
> > >  ?cut that?
> > >  if we decrease the timeout time below the transfer time per
> > > hop, there should be more misrouting ? ? goes up, q might go down or up
> > > ? cut that.  transfer time per hop in OLM ~ 45s / hops_olm =
> > > 45s/2.09 = 21.5s  ?actually, the time in NLM is so dependant
> > > on transfer time, that the most efficient stratigy would likely be to
> > > decrease the block size?  or to get a faster network
> > >  toad_: got it, damnit: NLM is so much slower than OLM, because
> > > it used less bandwidth!
> > >  the time is a function of the raw bandwidth (not so with OLM),
> > > and NLM used only half my bandwidth after it had been deployed for 2
> > > days (at the start much more)
> > >  when we double the bandwidth (1.8 years?), NLM should be
> > > faster than OLM
> > >  operhiem1: cool!
> > >  toad_: actually I think the slot number calculation is flawed
> > > ? less bandwith used than possible
> > >  that?s why it did not break down, but slowed down to 1/5 OLM.
> > > From the math here I?d have guessed 1/2.6
> > >  but adding SSKs with many more hops and time almost pure queue
> > > time it fits
> > >  q_nlm ~ 3??q?_olm; in the full bandwidth case
> > >  but with half bandwidth we actually are at 6?q_olm
> > >  ? more slots should actually make it much better
> > >  toad_: summary: ? ~ bandwidth. q_olm ~ 16s, q_nlm ~ ?! ? using
> > > only 50% of bandwidth (too little slots) massively slows down NLM.
> > >  the transfer times should actually be dominant
> > >  though they are lower than the queue time.
> > >  and freenet should get faster with faster network or lower
> > > chunk sizes.
> > >  toad_: so first step: make sure all bandwidth gets used -
> > > maybe by allocating more slots till about 2? the current number are
> > > transferring -*- ArneBab is happy
> > >  cool, lot's of stuff to read tomorrow morning. :)
> > >  NLM should with the current network be slower than OLM by 23%.
> > > But in 18 month it should actually be faster by ~8%, given Moores Law
> > > holds for upload bandwidth.
> > >  :)
> > >  with faster I mean time to complete a request.
> > >  reaction time ? latency
> > >  digger3: maybe you can doublecheck the reasoning
> --
> A man in the streets faces a knife.
> Two policemen are there it once. They raise a sign:
> 
> ?Illegal Scene! Noone may watch this!?
> 
> The man gets robbed and stabbed and bleeds to death.
> The police had to hold the sign.
> 
> ?Welcome to Europe, citizen. Censorship is beautiful.
> 
>( http://draketo.de/stichwort/censorship )
> 
> 
> 
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110830/ce2bd38b/attachment.pgp>


Re: [freenet-dev] Beyond New Load Management

2011-08-30 Thread Thomas Bruderer


The entire approach of coming up with hypotheses about what is wrong, 
building a solution based on these hypotheses (without actually 
confirming that the hypotheses are accurate) and deploying it is deja 
vu, we've been doing it for a decade, and we still haven't got load 
management right.  We're just layering more complexity onto a system 
that we already don't understand, based on guesses as to what the 
problems were with the previous iteration that we can't test because 
the system is too complicated with too many interactions for anyone to 
get their heads around it.




Thank you Ian! Good message! I am 100% behind your whole post! The 
routing must go back to a simple system!

___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl


Re: [freenet-dev] Beyond New Load Management

2011-08-30 Thread Robert Hailey


On 2011/08/30 (Aug), at 3:17 AM, Thomas Bruderer wrote:

Thank you Ian! Good message! I am 100% behind your whole post! The  
routing must go back to a simple system!


Even tearing up the current systems for a fifo queue is destructive  
without organization and a means of comparison.


In the science of software, there are generally two kinds of commits:

* Features - divergent, potentially disruptive, tend to contain bugs
* Bugfixes - convergent, generally repairing feature disruption,  
rarely unmask other bugs


On 2011/08/29 (Aug), at 12:58 PM, Ian Clarke wrote:

The problem is that we come up with solutions that are too  
complicated to analyze or fix when they don't work
The cause is complexity, which just grows and grows as we try to fix  
problems we don't understand by layering on more complexity.


And what is the cause of that? Is the problem really one of behavior?  
emotions? workflow? organization?


Matthew has presented some very real problems which he is trying to  
work around (with much frustration, I'm sure). I think he needs more  
leverage


If you read my suggestion below, we can discuss how it would allow:

* a full-network-sized network to try out new features  bugfixes
* nearly-identical traffic on both networks
* zero traffic redundancy (network waste)
* easy 1:1 comparison of various performances of two implementations  
(for measuring network effectiveness)
* full network uptime (at worst a totally-broken / 0% unstable-side  
would yield a 50% reduction in effectiveness)


It is simply a matter of *organization* and *multi-versioning* (which  
IMO, are both solved problems).


This is also tied into the subject of 'mandatory' node update  
deadline, as the utility of a split network would diminish if it's  
stable-side succumbs to the same 'chase' issues.


--
Robert Hailey


On 2011/08/26 (Aug), at 10:18 AM, Robert Hailey wrote:



On 2011/08/25 (Aug), at 2:15 PM, Matthew Toseland wrote:

And we never, ever, ever, have enough data to evaluate a single  
build, even on the simplest metrics (see the push-pull tests). I  
could write a plugin to get more data, but digger3 promises to do  
it eventually and anyway I don't have time given the remaining  
funding and unlikeliness of getting more. And it's always been this  
way!


Our whole business model forces me to just do things and not  
evaluate them!


I think we had an idea for empirical stepwise advancement earlier.

With an investment of developer time, we could separate the current  
freenet code into three interfaced sections (link-layer, routing- 
layer, user/client-interface-layer).


If we then were to modify the outer layers to accept two routing- 
layers (e.g. client requests round-robin between the two but  
thereafter stay in that network) we could have two networks in one  
a stable-net (for the nay-sayers, a disaster/fallback, and as a  
control for measurement), and a development-net where  
experimentation could take place.


Drawing the interface lines on theory (rather than present code- 
state) would be critical [e.g. load-balancing should be in the  
middle layer, imo]. The goal being, reliable communication with near- 
guaranteed/methodical improvement.


--
Robert Hailey

___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl


___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Beyond New Load Management

2011-08-30 Thread Ian Clarke
On Tue, Aug 30, 2011 at 11:49 AM, Robert Hailey
rob...@freenetproject.orgwrote:

 On 2011/08/29 (Aug), at 12:58 PM, Ian Clarke wrote:

 The problem is that we come up with solutions that are too complicated to
 analyze or fix when they don't work

 The cause is complexity, which just grows and grows as we try to fix
 problems we don't understand by layering on more complexity.


 And what is the cause of that? Is the problem really one of *behavior*?
 emotions? workflow? organization?


A combination, but at its core I think its a failure to recognize that most
working systems are built on principles that are simple enough to understand
that they are either obviously correct, or can easily be proven to be
correct.  For example, AIMD, which is TCP's load management system, is
simple enough that you can describe the basic idea in a sentence.  It is
also fairly obvious that it will work, or at least its easy to simulate it
and convince yourself that it will work.

In contrast, Freenet's current load management system is a combination of
mechanisms that are perhaps individually of comparable complexity to AIMD
(in fact, AIMD is part of it although used in quite a different context to
how it is used in TCP), but together they are far more complex, and interact
in ways that are hard to predict.

For example, in a conversation with Matthew a few days ago, he realized that
the fair sharing mechanism which tries to allocate resources fairly among
downstream nodes, was probably generating a bunch of rejected messages
when it transitions from one mode of operation to another, with all kinds of
negative consequences.  Who knows how many other unintended interactions
there are?


 Matthew has presented some very real problems which he is trying to work
 around (with much frustration, I'm sure). I think he needs more leverage


My criticism is that Matthew's proposals for fixing the problem follow
exactly the same pattern of all the previous fixes that turned out to have
either no effect, or a negative effect.  It starts out with a bunch of
hand-wavey hypotheses about what the problem might be, which is followed by
a bunch of fixes for what might be the problem.  There is no evidence that
these problems are real, I think a big part of the reason is that the
existing system is too complicated to debug.

I know it sounds like I'm beating up on Matthew here, and that isn't my
intention, he is following a precedent set by me and others that have long
since left the project.  Having (I hope) learned some tough lessons, and
recognized the folly in our past approach, I'm now hoping its not too late
to rescue the project from an ignominious death.  I think we're talking
about weeks of work here, not months, and frankly I don't think we've got a
choice if the project is to survive.  Freenet is no use to anyone if it
doesn't work, regardless of how cool our goals are, or how clever our
algorithms and heuristics.


 If you read my suggestion below, we can discuss how it would allow:

 With an investment of developer time, we could separate the current freenet
 code into three interfaced sections (link-layer, routing-layer,
 user/client-interface-layer).

 If we then were to modify the outer layers to accept two routing-layers
 (e.g. client requests round-robin between the two but thereafter stay in
 that network) we could have two networks in one a stable-net (for the
 nay-sayers, a disaster/fallback, and as a control for measurement), and a
 development-net where experimentation could take place.

 Drawing the interface lines on theory (rather than present code-state)
 would be critical [e.g. load-balancing should be in the middle layer, imo].
 The goal being, reliable communication with near-guaranteed/methodical
 improvement

 While I'm sure there is much room for improvement in the way the code is
architected, and the separation of concerns - I don't think refactoring is
the answer.  We need to refactor the fundamental way that we solve problems
like load balancing.

Ian.


-- 
Ian Clarke
Founder, The Freenet Project
Email: i...@freenetproject.org
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Queueing doesn't use any bandwidth was Re: Beyond New Load Management

2011-08-30 Thread Ian Clarke
On Mon, Aug 29, 2011 at 6:42 PM, Matthew Toseland t...@amphibian.dyndns.org
 wrote:

 On Monday 29 Aug 2011 18:58:26 Ian Clarke wrote:
  On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland 
  Right, the same is true of queueing.  If nodes are forced to do things to
  deal with overloading that make the problem worse then the load balancing
  algorithm has failed.  Its job is to prevent that from happening.

 Not true. Queueing does not make anything worse (for bulk requests where we
 are not latency sensitive). **When a request is waiting for progress on a
 queue, it is not using any bandwidth!**


I thought there was some issue where outstanding requests occupied slots
or something?

Regardless, even if queueing doesn't use additional bandwidth or CPU
resources, it also doesn't use any less of these resources - so it doesn't
actually help to alleviate any load (unless it results in a timeout in which
case it uses more of everything).

And it does use more of one very important resource, which is the initial
requestor's time.  I mean, ultimately the symptom of overloading is that
requests take longer, and queueing makes that problem worse.

Queueing should be a last resort, the *right* load balancing algorithm
should avoid situations where queueing must occur.

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: i...@freenetproject.org
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Queueing doesn't use any bandwidth was Re: Beyond New Load Management

2011-08-30 Thread Arne Babenhauserheide
Am Dienstag, 30. August 2011, 12:32:17 schrieb Ian Clarke:
 Regardless, even if queueing doesn't use additional bandwidth or CPU
 resources, it also doesn't use any less of these resources - so it doesn't
 actually help to alleviate any load (unless it results in a timeout in which
 case it uses more of everything).

Queueing reduces the total bandwidth needed to transfer a given chunk, because
it gives the requests the leeway they need to be able to choose the best
route. This results in shorter routes.

Actually it is a very simple system which is used in any train station: You
wait before you get in instead of just choosing another train and trying to
find a different way. And the fewer contacts we have, the more important it gets
to choose the right path.

Also the increase in latency should be in the range of 20% for CHK requests
and SSKs which succeed. Only unsuccessful requests should have a much higher
latency than with OLM, because they don’t benefit from the faster transfers
(shorter routes).

Best wishes,
Arne

signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-30 Thread Arne Babenhauserheide
Am Dienstag, 30. August 2011, 01:08:16 schrieb Arne Babenhauserheide:
 5) solution: count each SSK as only
   average_SSK_success_rate * data_to_transfer_on_success.

Some more data:

chances of having at least this many successful transfers for 40 SSKs with a
mean success rate of 16%:

for i in {0..16}; do echo $i $(./spielfaehig.py 0.16 40 $i); done

0 1.0
1 0.999064224991
2 0.99193451064
3 0.965452714478
4 0.901560126912
5 0.788987472629
6 0.634602118184
7 0.463062835467
8 0.304359825607
9 0.179664603573
10 0.0952149293922
11 0.0453494074947
12 0.0194452402752
13 0.00752109980912
14 0.0026291447461
15 0.000832100029072
16 0.00023879002726

what this means: if a SSK has a mean success rate of 0.16, then using 0.25 as
value makes sure that 95% of the possible cases don’t exhaust the bandwidth.
We then use only 64% of the bandwidth on average, though. With 0.2, we’d get
68% of the possible distributions safe and use 80% of bandwidth on average.

Note: this is just a binomial spread:

from math import factorial
fac = factorial
def nük(n, k):
   if k  n: return 0
   return fac(n) / (fac(k)*fac(n-k))

def binom(p, n, k):
   return nük(n, k) * p** k * (1-p)**(n-k)

def spielfähig(p, n, min_spieler):
   return sum([binom(p, n, k) for k in range(min_spieler, n+1)])


→ USK@6~ZDYdvAgMoUfG6M5Kwi7SQqyS-
gTcyFeaNN1Pf3FvY,OSOT4OEeg4xyYnwcGECZUX6~lnmYrZsz05Km7G7bvOQ,AQACAAE/bab/9/Content-
D426DC7.html


Best wishes,
Arne

signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl