[freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Matthew Toseland
we do about it? We would have to reject requests,
> > and if we're not sure, or if it could be a downstream troublemaker, couldn't
> > that result in misrouting and make things worse? On the other hand, any
> > malicious node would be most easily detected by the nodes directly connected
> > to it, so this could work much the same way as fair sharing - provided that
> > it can reject or terminate the requests.
> 
> No, we probably wouldn't start rejecting requests - more likely we would
> disconnect from that node.  Obviously we'd need to be fairly sure of abusive
> behavior before doing that - but abuse should be rare.

Well then we'd need to be very sure, and this means you could get away with a 
lot before triggering it. Wierd topologies for instance could cause this sort 
of problem on a temporary basis.
> 
> > Arguably fair sharing achieves the same objective. It could be extended in
> > various ways e.g. if we are relaying a lot of slow-down notices to a peer we
> > could start to think about squeezing the peer's allocation.
> 
> Yes, except it doesn't work, and its too complicated to reason about.

What makes you think it doesn't work? What makes you think it's complicated 
either?
> 
> > > Of course, lots of questions:
> > >
> > >- How do we compute the averages?  A decaying running average of some
> > >kind?  What time period?
> >
> > Yay more tunables. :(
> 
> Ideally we'd do this in a way that avoids tunables.  This is control theory,
> we shouldn't have to reinvent the wheel.

Control theory doesn't apply because we are not dealing with feedback going to 
a single system, it's thousands of nodes, which all have separate counters.
> 
> > >- How do we translate load into a desired rate of requests?
> >
> > And even more tunables. :(
> 
> Don't be pessimistic, hopefully we can avoid tunables if we are smart about
> it.
> 
> > >- What criteria indicate that a peer is abusing the system?  What is
> > the
> > >remedy?
> >
> > In the current system, don't accept their requests.
> 
> Yes, although I'm leaning towards disconnecting from them, which is
> more-or-less the same thing, just more permanent and they don't consume our
> resources any more.

Only if we can be really sure. And I'm not sure how realistic that is.
> 
> > In ANY system, there will always be a borderline below which they can get
> > away with it. We can either try to make this as small as possible or try not
> > to hit legitimate traffic too hard.
> 
> Abuse should be rare, we should bias against false positives when we look
> for it.

Which means that an abuser can stay below the threshold for a long time on a 
lot of nodes ...

IMHO telling the difference could be difficult.
> 
> > > This is basically control theory stuff, and we'll need robust answers to
> > > these questions before we proceed (ideally with a theoretical rather than
> > > experimental foundation).
> >
> > IMHO small tweaks to AIMDs, combined with fair sharing, will achieve much
> > the same without having to wait years for our theoreticians to wake up.
> 
> Yes, small tweaks have worked so well for us for the last decade, leaving us
> pretty-much where we were in 2003.  No, we don't understand how the current
> system works, there is no point in trying to fix something when we don't
> even know what is broken.

I think we have a much better understanding of it than we have had until 
recently: All forms of load management tried so far transmute incoming load 
into something bad - misrouting, queueing delays, reduced effective HTL. We 
need to reduce the incoming load, WITHOUT these things. The simplest 
implementation of this concept is to send non-local RejectedOverload's when we 
are *close* to the limits, not only local ones when we are *at* the limits 
(which are then relayed as non-local); in other words, to send a slow down 
message *before* it's too late.
> 
> I am touched by your faith in "theoreticians", but I have a feeling we're in
> uncharted territory here.  We need to agree on an approach that is simple
> enough to reason about, and then build a simulation for it.  I think the
> simulation should just be a few days of work, we're not talking years here,
> hopefully not even months.

Weeks probably - and simulations themselves tend to have tunables. Anyway if I 
start writing simulations people get mad because I'm not fixing bugs.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/975bb061/attachment.pgp>


[freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Matthew Toseland
and 1.5 hops for NLM ? ?_nlm 
> / 
> ?_olm ~ 0.71
>  ArneBab: okay, that's plausible
>  ArneBab: however, it should be possible with smart load limiting on 
> the originator to achieve NLM-level success rates
>  but not the resilience
>  it still keeps freenet open to a DoS, NLM should help there.
>  now back to the queueing: OLM had: ?q?(SSK) ~ 16s, ?q?(CHK) ~ 18s, 
> ?(CHK) ~ 45s (my stats)
>  possibly - fair sharing limits our vulnerability to a DoS, possibly 
> enough as long as we don't have to worry about incentives issues
>  that?s about: q = ? ? ? (OLM)
>  NLM: q ~ ?
>  NLM: q ~ ? (NLM)
>  time: 2?q + ?
>  OLM: time ~ 5/3 ?_olm
>  NLM: time = 3 ? 0.72 ?_olm = 2.15 ?_olm
>  toad_: Alright, it's alive. https://github.com/freenet/fred-
> staging/pull/55
>  ? time_nlm / time_olm ~ 2.15 / (5/3) ~ 1.3
>  so the time to transfer should be a bit longer
>  (not yet finished: this is the current state)
>  now, if we decrease the timeout time, the chance that a given 
> timeout happens in the first 4 hops should be about 4/20 = 0.2
>  ?cut that?
>  if we decrease the timeout time below the transfer time per hop, 
> there should be more misrouting ? ? goes up, q might go down or up ? cut that.
>  transfer time per hop in OLM ~ 45s / hops_olm = 45s/2.09 = 21.5s
>  ?actually, the time in NLM is so dependant on transfer time, that 
> the most efficient stratigy would likely be to decrease the block size? 
>  or to get a faster network
>  toad_: got it, damnit: NLM is so much slower than OLM, because it 
> used less bandwidth!
>  the time is a function of the raw bandwidth (not so with OLM), and 
> NLM used only half my bandwidth after it had been deployed for 2 days (at the 
> start much more)
>  when we double the bandwidth (1.8 years?), NLM should be faster 
> than 
> OLM
>  operhiem1: cool!
>  toad_: actually I think the slot number calculation is flawed ? 
> less 
> bandwith used than possible
>  that?s why it did not break down, but slowed down to 1/5 OLM. From 
> the math here I?d have guessed 1/2.6
>  but adding SSKs with many more hops and time almost pure queue time 
> it fits
>  q_nlm ~ 3??q?_olm; in the full bandwidth case
>  but with half bandwidth we actually are at 6?q_olm
>  ? more slots should actually make it much better
>  toad_: summary: ? ~ bandwidth. q_olm ~ 16s, q_nlm ~ ?! ? using only 
> 50% of bandwidth (too little slots) massively slows down NLM.
>  the transfer times should actually be dominant
>  though they are lower than the queue time. 
>  and freenet should get faster with faster network or lower chunk 
> sizes.
>  toad_: so first step: make sure all bandwidth gets used - maybe by 
> allocating more slots till about 2? the current number are transferring
> -*- ArneBab is happy
>  cool, lot's of stuff to read tomorrow morning. :)
>  NLM should with the current network be slower than OLM by 23%. But 
> in 18 month it should actually be faster by ~8%, given Moores Law holds for 
> upload bandwidth.
>  :)
>  with faster I mean time to complete a request.
>  reaction time ? latency
>  digger3: maybe you can doublecheck the reasoning
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/7d766e77/attachment.pgp>


[freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Arne Babenhauserheide
freenet should get faster with faster network or lower chunk 
sizes.
 toad_: so first step: make sure all bandwidth gets used - maybe by 
allocating more slots till about 2? the current number are transferring
-*- ArneBab is happy
 cool, lot's of stuff to read tomorrow morning. :)
 NLM should with the current network be slower than OLM by 23%. But 
in 18 month it should actually be faster by ~8%, given Moores Law holds for 
upload bandwidth.
 :)
 with faster I mean time to complete a request.
 reaction time ? latency
 digger3: maybe you can doublecheck the reasoning


--
Konstruktive Kritik: 

- http://draketo.de/licht/krude-ideen/konstruktive-kritik

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 316 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/37cfcada/attachment.pgp>


[freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Matthew Toseland
On Monday 29 Aug 2011 20:09:58 Matthew Toseland wrote:
> On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote:
> > Ok, thinking about it, here is a proposal, or  rather, the beginning of a
> > proposal.  I'm assuming that we get rid of NLM, fair sharing, and anything
> > else intended to control load, and replace it with this.  We will absolutely
> > need to simulate this before we write a single line of code to deploy it.
> 
> I have justified fair sharing in my other reply. However alternate mechanisms 
> might replace it.
> 
> I agree we should simulate, although this might be significant work.
> > 
> > The core idea is that a node will include a floating point number in
> > response to any kind of request showing how close that node is to being
> > overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
> > completely saturated and must reject requests.  Clearly the goal is to avoid
> > getting anywhere near to 1.0.
> > 
> > A node tracks several things:
> 
> All of these apply to both locally originated requests and remotely 
> originated requests? And are they for only our direct peers?
> > 
> >- What is the overall average load reported by responses this node has
> >received
> >- What is the average load reported by responses this node has received,
> >per remote node
> >- What is the average load reported by responses this node forwarded, per
> >remote node
> 
> Ahhh, this one could be interesting - you could use it to penalise nodes 
> which spam excessively.

Actually, thinking about it, I don't think this will work. Requests are usually 
distributed evenly across the keyspace, and downstream nodes only know they are 
from this node - not which previous node they are from. So the pain will be 
shared across all the peers sending us requests, including our own local 
requests, and won't be identifiable as due to any single node. You have to do 
it by volume, not by reported load.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/0761719f/attachment.pgp>


[freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Matthew Toseland
On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote:
> Ok, thinking about it, here is a proposal, or  rather, the beginning of a
> proposal.  I'm assuming that we get rid of NLM, fair sharing, and anything
> else intended to control load, and replace it with this.  We will absolutely
> need to simulate this before we write a single line of code to deploy it.

I have justified fair sharing in my other reply. However alternate mechanisms 
might replace it.

I agree we should simulate, although this might be significant work.
> 
> The core idea is that a node will include a floating point number in
> response to any kind of request showing how close that node is to being
> overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
> completely saturated and must reject requests.  Clearly the goal is to avoid
> getting anywhere near to 1.0.
> 
> A node tracks several things:

All of these apply to both locally originated requests and remotely originated 
requests? And are they for only our direct peers?
> 
>- What is the overall average load reported by responses this node has
>received
>- What is the average load reported by responses this node has received,
>per remote node
>- What is the average load reported by responses this node forwarded, per
>remote node

Ahhh, this one could be interesting - you could use it to penalise nodes which 
spam excessively.
> 
> I think, given these metrics, we should be able to do the following:
> 
>- Limit our overall rate of initiating local requests based on the global
>average reported load

We can already do this on the basis of AIMD's, based purely on behaviour of 
local requests. However, taking more data into account might result in more 
accurate results - although it might also dilute the feedback.

In any case, this part is "safe": it's not going to cause feedback problems, on 
its own. Conversely, it doesn't deal with problems such as network bottlenecks.

However, while AIMD's are based on the *global* average load (or rather the 
load of all the nodes along the chain), it sounds like you are proposing to 
only report the *local* load of our direct peers? How are you going to compute 
the global average load?

>- Limit our rate of local requests based on the average load of the
>connection to the peer it would need to be forwarded to

Not sure I follow here. Are you proposing that we initiate more requests in 
keyspace areas where there is lower load?

>- Detect when remote peers are abusing the system by disregarding load -
>as evidenced by a significantly higher average load in replies forwarded to
>them

Okay, that is useful - although I doubt it would be an unambiguous detection, 
and what would we do about it? We would have to reject requests, and if we're 
not sure, or if it could be a downstream troublemaker, couldn't that result in 
misrouting and make things worse? On the other hand, any malicious node would 
be most easily detected by the nodes directly connected to it, so this could 
work much the same way as fair sharing - provided that it can reject or 
terminate the requests.

Arguably fair sharing achieves the same objective. It could be extended in 
various ways e.g. if we are relaying a lot of slow-down notices to a peer we 
could start to think about squeezing the peer's allocation.
> 
> Of course, lots of questions:
> 
>- How do we compute the averages?  A decaying running average of some
>kind?  What time period?

Yay more tunables. :(

>- How do we translate load into a desired rate of requests?

And even more tunables. :(

>- What criteria indicate that a peer is abusing the system?  What is the
>remedy?

In the current system, don't accept their requests.

In ANY system, there will always be a borderline below which they can get away 
with it. We can either try to make this as small as possible or try not to hit 
legitimate traffic too hard.
> 
> This is basically control theory stuff, and we'll need robust answers to
> these questions before we proceed (ideally with a theoretical rather than
> experimental foundation).

IMHO small tweaks to AIMDs, combined with fair sharing, will achieve much the 
same without having to wait years for our theoreticians to wake up.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/c06395ac/attachment.pgp>


[freenet-dev] Beyond New Load Management

2011-08-29 Thread Matthew Toseland
ook 
in, rather than be rejected at the same rate as everyone else.
> 
> > - The easiest way to implement #1 is with AIMD's. We can keep them, but
> > send slow-down messages earlier.
> > - We could keep NLM, i.e. queue for limited times to smooth out delays.
> 
> Doesn't that violate the principal that the load balancing stuff shouldn't
> do anything that makes the problem worse?  Queueing does exactly that.

So what do you do if the incoming requests are not at the same time as the 
completed outgoing requests? I guess you keep a large number of request slots 
free at any given time?
> 
> > Currently we RNF after a (longish) period waiting for a node to route to.
> > We would send a slow-down message when we have queued for more than a
> > certain period.  This means we can have a fixed, limited amount of
> > misrouting, we can keep NLM's clear benefits for routing accuracy (as
> > visible in success rates), and we can ensure that the input load is low
> > enough to avoid severe slowdown due to queueing for a long time.
> 
> No, I think we need to avoid queueing except in case of emergency.  Queueing
> only makes things worse by tying up more resources for longer.

I have the same attitude to misrouting.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/0394eae5/attachment.pgp>


[freenet-dev] Beyond New Load Management

2011-08-29 Thread Matthew Toseland
 *local* requests are initiated, based on 
> how many slow-down signals they get.
> 
> (This ensures that such gridlock or feedback loop problems can't happen)
> 
> The main problems are security:
> - Deliberate flooding: Malicious nodes not following AIMD's, in order to clog 
> up the network.
> 
> Fair sharing can limit this. Limiting backoff so that we only misroute when 
> nodes are severely overloaded can also help. We could queue (provided this 
> happens only rarely and briefly), allow bounded misrouting (e.g. route to one 
> of the top two routes), or terminate requests (thus freeing up resources).
> 
> - Opportunistic flooding: Users hack their nodes to not follow AIMD's because 
> it speeds up their downloads.
> 
> This is harder than solving deliberate flooding but also less urgent. Fair 
> sharing helps, but the slow-down signals, or rejections/terminations, may be 
> generated a long way away from the originator. In which case, they are only 
> proportionately affected, meaning it's still beneficial for them to flood. It 
> may be that this isn't resolvable and we'll have to continue to rely on a 
> certain amount of altruism; ask people to not use the Fr33n3t L33t L34ch 
> Cl13nt cos it slows things down for everyone else, should this arise.
> 
> - Local privacy: We can maybe identify which requests are local based on 
> timings.
> 
> This is somewhat speculative; while it is probably exploitable, there are 
> much easier attacks if you are directly connected to the target.
> 
> - Remote privacy: We can maybe correlate different bundles of requests based 
> on timings.
> 
> This is speculative, but it could be solved by using separate AIMD's for 
> separate unconnected bundles of requests (we will need some sort of 
> tagging/grouping mechanism like this when we have tunnels anyway).
> 
> 2. Nodes control the rate at which remote requests are initiated too, 
> propagating information on the capacity of the network. For instance, they 
> could estimate the capacity of the network and generate slow-down signals 
> when incoming requests are over that capacity (and eventually terminate 
> requests); this is self-feeding because their estimate of capacity depends on 
> *incoming* slow-down messages!
> 
> The main problem with this is that feedback between nodes could well result 
> in the estimated capacity collapsing to zero or whatever the minimum is. It 
> is probably possible to surmount that but would certainly require a great 
> deal of theoretical (simulation) work.
> 
> We can reuse existing code to a significant degree:
> - RejectedOverload should be split up. It already represents a slow-down 
> message (provided local=false). It should also indicate whether this is due 
> to an actual rejection or to using more than expected resources.
> - The easiest way to implement #1 is with AIMD's. We can keep them, but send 
> slow-down messages earlier.
> - We could keep NLM, i.e. queue for limited times to smooth out delays. 
> Currently we RNF after a (longish) period waiting for a node to route to. We 
> would send a slow-down message when we have queued for more than a certain 
> period.  This means we can have a fixed, limited amount of misrouting, we can 
> keep NLM's clear benefits for routing accuracy (as visible in success rates), 
> and we can ensure that the input load is low enough to avoid severe slowdown 
> due to queueing for a long time.

To be clear, the benefits of keeping NLM/queueing are that incoming request 
timings are often not precisely matched with the availability of outgoing 
slots. However it is essential that the load be propagated back to the 
originators AS WELL, so that queueing times are kept very low on average.

(Culmination of a long conversation where ArneBab was arguing I shouldn't give 
up on NLM, see logs)
[18:37:55]  ArneBab: i think we might be able to keep NLM
[18:38:32]  mainly using queueing as a way to smooth out the fact that 
the inter-request interval is quite long, so there's a good chance there isn't 
a free slot immediately when a request arrives
[18:38:50]  ArneBab: we will need to combine it with some sort of 
early-slow-down-signal mechanism though
[18:38:55]  ArneBab: see my latest reply to ian on devl
[18:39:17]  yes
[18:39:33]  okay, so we are on roughly the same page
[18:40:34]  jupp
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/aff20b05/attachment.pgp>


[freenet-dev] Beyond New Load Management

2011-08-29 Thread Matthew Toseland
riefly), allow bounded misrouting (e.g. route to one 
of the top two routes), or terminate requests (thus freeing up resources).

- Opportunistic flooding: Users hack their nodes to not follow AIMD's because 
it speeds up their downloads.

This is harder than solving deliberate flooding but also less urgent. Fair 
sharing helps, but the slow-down signals, or rejections/terminations, may be 
generated a long way away from the originator. In which case, they are only 
proportionately affected, meaning it's still beneficial for them to flood. It 
may be that this isn't resolvable and we'll have to continue to rely on a 
certain amount of altruism; ask people to not use the Fr33n3t L33t L34ch Cl13nt 
cos it slows things down for everyone else, should this arise.

- Local privacy: We can maybe identify which requests are local based on 
timings.

This is somewhat speculative; while it is probably exploitable, there are much 
easier attacks if you are directly connected to the target.

- Remote privacy: We can maybe correlate different bundles of requests based on 
timings.

This is speculative, but it could be solved by using separate AIMD's for 
separate unconnected bundles of requests (we will need some sort of 
tagging/grouping mechanism like this when we have tunnels anyway).

2. Nodes control the rate at which remote requests are initiated too, 
propagating information on the capacity of the network. For instance, they 
could estimate the capacity of the network and generate slow-down signals when 
incoming requests are over that capacity (and eventually terminate requests); 
this is self-feeding because their estimate of capacity depends on *incoming* 
slow-down messages!

The main problem with this is that feedback between nodes could well result in 
the estimated capacity collapsing to zero or whatever the minimum is. It is 
probably possible to surmount that but would certainly require a great deal of 
theoretical (simulation) work.

We can reuse existing code to a significant degree:
- RejectedOverload should be split up. It already represents a slow-down 
message (provided local=false). It should also indicate whether this is due to 
an actual rejection or to using more than expected resources.
- The easiest way to implement #1 is with AIMD's. We can keep them, but send 
slow-down messages earlier.
- We could keep NLM, i.e. queue for limited times to smooth out delays. 
Currently we RNF after a (longish) period waiting for a node to route to. We 
would send a slow-down message when we have queued for more than a certain 
period.  This means we can have a fixed, limited amount of misrouting, we can 
keep NLM's clear benefits for routing accuracy (as visible in success rates), 
and we can ensure that the input load is low enough to avoid severe slowdown 
due to queueing for a long time.
> 
> Ian.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/b007f4b4/attachment.pgp>


[freenet-dev] Fundraising

2011-08-29 Thread Steve Dougherty
The project's current state is reflected in a dollar amount of remaining
project funds, along with an estimate of how much additional development time
it corresponds to. This amount will decrease suddenly and significantly when a
payment is made. It's not - at least supposed to be - a secret that the Freenet
project is set to run out of funds mid-September. I've mentioned to people - on
Sone, in #freenet - that the project is running out of funds and they had no
idea. This is a problem. I propose (in decreasing importance) that:

1) The donation page mention _when_ the last ~2-month payment was made.
- Or ideally, that the time remaining estimate include remaining
funded development time. This need not be exact: it would allow the
estimate to decrease gradually instead of in sudden chunks.

2) Fundraising be presented as drives for a month's worth of funding.
- This would allow donating to have goals, and be framed as increasing
toward something tangible, (a month of development) rather than building up
funds against sudden, quiet decrease.
- The estimate ceases to be meaningful if it sits at "8 days" for two
months.

3) Release announcements include statements on the project's current financial
state and how to donate.
- Fred could have integrated release announcements for greater visibility.
These could of course be disabled by the user.

4) Release announcements also be displayed on the freenetproject.org homepage
in something akin to the "Latest Releases"/"News" sidebar on winehq.org.
- Having only news posts that are several months old makes the project
appear inactive, which it is not.

It should go without saying that a 0.8 release would probably make good press,
assuming it can be made stable before funding runs out.



[freenet-dev] Fundraising

2011-08-29 Thread Ian Clarke
These are good ideas Steve, I must confess that I've been a little
distracted lately - and we do definitely need to get our act together on
funding.  Matthew, what is our current "drop-dead" date as best as you can
estimate it?

Ian.

On Mon, Aug 29, 2011 at 4:18 PM, Steve Dougherty wrote:

> The project's current state is reflected in a dollar amount of remaining
> project funds, along with an estimate of how much additional development
> time
> it corresponds to. This amount will decrease suddenly and significantly
> when a
> payment is made. It's not - at least supposed to be - a secret that the
> Freenet
> project is set to run out of funds mid-September. I've mentioned to people
> - on
> Sone, in #freenet - that the project is running out of funds and they had
> no
> idea. This is a problem. I propose (in decreasing importance) that:
>
> 1) The donation page mention _when_ the last ~2-month payment was made.
>- Or ideally, that the time remaining estimate include remaining
>funded development time. This need not be exact: it would allow the
>estimate to decrease gradually instead of in sudden chunks.
>
> 2) Fundraising be presented as drives for a month's worth of funding.
>- This would allow donating to have goals, and be framed as increasing
>toward something tangible, (a month of development) rather than building
> up
>funds against sudden, quiet decrease.
>- The estimate ceases to be meaningful if it sits at "8 days" for two
>months.
>
> 3) Release announcements include statements on the project's current
> financial
> state and how to donate.
>- Fred could have integrated release announcements for greater
> visibility.
>These could of course be disabled by the user.
>
> 4) Release announcements also be displayed on the freenetproject.orghomepage
> in something akin to the "Latest Releases"/"News" sidebar on winehq.org.
>- Having only news posts that are several months old makes the project
>appear inactive, which it is not.
>
> It should go without saying that a 0.8 release would probably make good
> press,
> assuming it can be made stable before funding runs out.
> ___
> Devl mailing list
> Devl at freenetproject.org
> http://freenetproject.org/cgi-bin/mailman/listinfo/devl
>



-- 
Ian Clarke
Personal blog: http://blog.locut.us/
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/79bd2904/attachment.html>


[freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Ian Clarke
On Mon, Aug 29, 2011 at 2:21 PM, Matthew Toseland  wrote:

>  > >- What is the average load reported by responses this node
> forwarded, per
> > >remote node
> >
> > Ahhh, this one could be interesting - you could use it to penalise nodes
> which spam excessively.
>
> Actually, thinking about it, I don't think this will work. Requests are
> usually distributed evenly across the keyspace, and downstream nodes only
> know they are from this node - not which previous node they are from. So the
> pain will be shared across all the peers sending us requests, including our
> own local requests, and won't be identifiable as due to any single node. You
> have to do it by volume, not by reported load.
>

But if a node is being abusive won't the nodes its requesting from tend to
have a higher than average load?  Perhaps you are right though, but this is
the kind of thought-process we need.

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: ian at freenetproject.org
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/32287d0b/attachment.html>


[freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Ian Clarke
in
> various ways e.g. if we are relaying a lot of slow-down notices to a peer we
> could start to think about squeezing the peer's allocation.
>

Yes, except it doesn't work, and its too complicated to reason about.


> > Of course, lots of questions:
> >
> >- How do we compute the averages?  A decaying running average of some
> >kind?  What time period?
>
> Yay more tunables. :(
>

Ideally we'd do this in a way that avoids tunables.  This is control theory,
we shouldn't have to reinvent the wheel.


> >- How do we translate load into a desired rate of requests?
>
> And even more tunables. :(
>

Don't be pessimistic, hopefully we can avoid tunables if we are smart about
it.


> >- What criteria indicate that a peer is abusing the system?  What is
> the
> >remedy?
>
> In the current system, don't accept their requests.
>

Yes, although I'm leaning towards disconnecting from them, which is
more-or-less the same thing, just more permanent and they don't consume our
resources any more.


> In ANY system, there will always be a borderline below which they can get
> away with it. We can either try to make this as small as possible or try not
> to hit legitimate traffic too hard.
>

Abuse should be rare, we should bias against false positives when we look
for it.


> > This is basically control theory stuff, and we'll need robust answers to
> > these questions before we proceed (ideally with a theoretical rather than
> > experimental foundation).
>
> IMHO small tweaks to AIMDs, combined with fair sharing, will achieve much
> the same without having to wait years for our theoreticians to wake up.
>

Yes, small tweaks have worked so well for us for the last decade, leaving us
pretty-much where we were in 2003.  No, we don't understand how the current
system works, there is no point in trying to fix something when we don't
even know what is broken.

I am touched by your faith in "theoreticians", but I have a feeling we're in
uncharted territory here.  We need to agree on an approach that is simple
enough to reason about, and then build a simulation for it.  I think the
simulation should just be a few days of work, we're not talking years here,
hopefully not even months.

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: ian at freenetproject.org
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/caeca808/attachment.html>


[freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Ian Clarke
Ok, thinking about it, here is a proposal, or  rather, the beginning of a
proposal.  I'm assuming that we get rid of NLM, fair sharing, and anything
else intended to control load, and replace it with this.  We will absolutely
need to simulate this before we write a single line of code to deploy it.

The core idea is that a node will include a floating point number in
response to any kind of request showing how close that node is to being
overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
completely saturated and must reject requests.  Clearly the goal is to avoid
getting anywhere near to 1.0.

A node tracks several things:

   - What is the overall average load reported by responses this node has
   received
   - What is the average load reported by responses this node has received,
   per remote node
   - What is the average load reported by responses this node forwarded, per
   remote node

I think, given these metrics, we should be able to do the following:

   - Limit our overall rate of initiating local requests based on the global
   average reported load
   - Limit our rate of local requests based on the average load of the
   connection to the peer it would need to be forwarded to
   - Detect when remote peers are abusing the system by disregarding load -
   as evidenced by a significantly higher average load in replies forwarded to
   them

Of course, lots of questions:

   - How do we compute the averages?  A decaying running average of some
   kind?  What time period?
   - How do we translate load into a desired rate of requests?
   - What criteria indicate that a peer is abusing the system?  What is the
   remedy?

This is basically control theory stuff, and we'll need robust answers to
these questions before we proceed (ideally with a theoretical rather than
experimental foundation).

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: ian at freenetproject.org
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/9f83f964/attachment.html>


[freenet-dev] Beyond New Load Management

2011-08-29 Thread Ian Clarke
On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland <
toad at amphibian.dyndns.org> wrote:

> That is because we do not have the time or funding to empirically test
> hypotheses. We don't gather enough data, we don't have a huge testnet to try
> stuff on over extended periods, and so on. Most software development has
> deadlines and resource limitations, but most software development doesn't
> NEED to be empirical, at least not very often! See my other mail about this.
>

I don't think the problem is a lack of time or funding.  The problem is that
we come up with solutions that are too complicated to analyze or fix when
they don't work.  Throwing more time and money at that might help, but that
is addressing the symptom, not the cause.  The cause is complexity, which
just grows and grows as we try to fix problems we don't understand by
layering on more complexity.


> Misrouting is unacceptable, in general. Extremely overloaded or extremely
> low capacity nodes may be routed around. We might even allow some bounded
> amount of misrouting in the more general case (e.g. go to either of the top
> two peers for the key). But in general, transforming load into misrouting
> (or into reduced HTL, or any other bogus escape valve) is a bad idea. We
> need to reduce the incoming load.
>

Right, the same is true of queueing.  If nodes are forced to do things to
deal with overloading that make the problem worse then the load balancing
algorithm has failed.  Its job is to prevent that from happening.


> > What if all nodes told other nodes that they were overloaded because all
> > their peers are overloaded.  Such a situation would basically cause the
> > entire network to think it was overloaded, even though nobody actually
> was!
> >  It becomes a bit like this cartoon: http://flic.kr/p/5npfm2  How can
> this
> > be avoided?
>
> Exactly. There are two possible approaches:
> 1. Nodes control the rate at which *local* requests are initiated, based on
> how many slow-down signals they get.
>
> (This ensures that such gridlock or feedback loop problems can't happen)
>

Ah, so slow-downs only ever affect local request initiation?  The problem
there is that if a peer is overloaded, only those peers that are directly
connected to it will slow down their rate of requests.  Peers elsewhere on
the network will continue to fire requests at it, and there will be no way
for them to know they need to slow down :-/

Unless... what if the DataReply message (or whatever its called these days)
contains a slow-down request, and its respected by every peer along that
path according to AIMD?  The problem is that there will be a very week link
between the rate of requests, and the rate of slow-down messages.  It would
be like having 10,000 individual heaters with thermostats all trying to
regulate the temperature of a gigantic space.  Each individual
thermostat/heater will have an almost imperceptible impact.  Although
perhaps that would work...?

Fair sharing can limit this. Limiting backoff so that we only misroute when
> nodes are severely overloaded can also help. We could queue (provided this
> happens only rarely and briefly), allow bounded misrouting (e.g. route to
> one of the top two routes), or terminate requests (thus freeing up
> resources).
>

I think "fair sharing" is one of those complexities that we need to go back
to the drawing board on.  Its clear that even now we have hypotheses about
how fair sharing could be causing problems.  Its too complicated to really
reason about, or to anticipate all of its implications.


> - The easiest way to implement #1 is with AIMD's. We can keep them, but
> send slow-down messages earlier.
> - We could keep NLM, i.e. queue for limited times to smooth out delays.


Doesn't that violate the principal that the load balancing stuff shouldn't
do anything that makes the problem worse?  Queueing does exactly that.


> Currently we RNF after a (longish) period waiting for a node to route to.
> We would send a slow-down message when we have queued for more than a
> certain period.  This means we can have a fixed, limited amount of
> misrouting, we can keep NLM's clear benefits for routing accuracy (as
> visible in success rates), and we can ensure that the input load is low
> enough to avoid severe slowdown due to queueing for a long time.
>

No, I think we need to avoid queueing except in case of emergency.  Queueing
only makes things worse by tying up more resources for longer.

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: ian at freenetproject.org
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/1d8153cd/attachment.html>


Re: [freenet-dev] Beyond New Load Management

2011-08-29 Thread Matthew Toseland
On Saturday 27 Aug 2011 21:35:58 Ian Clarke wrote:
 Matthew,
 
 This makes sense from the perspective of making incremental changes, but I
 think we need to be more drastic than that.  I think we need to go back to
 the drawing board with load management.  We need to find a solution that is
 simple enough to reason about, and to debug if we have problems with it.
 
 The entire approach of coming up with hypotheses about what is wrong,
 building a solution based on these hypotheses (without actually confirming
 that the hypotheses are accurate) and deploying it is deja vu, we've been
 doing it for a decade, and we still haven't got load management right.
  We're just layering more complexity onto a system that we already don't
 understand, based on guesses as to what the problems were with the previous
 iteration that we can't test because the system is too complicated with too
 many interactions for anyone to get their heads around it.

That is because we do not have the time or funding to empirically test 
hypotheses. We don't gather enough data, we don't have a huge testnet to try 
stuff on over extended periods, and so on. Most software development has 
deadlines and resource limitations, but most software development doesn't NEED 
to be empirical, at least not very often! See my other mail about this.
 
 If something isn't working, and we don't understand for definite what is
 wrong with it, then we shouldn't build more on top of it in the hope that
 we'll accidentally solve the problem, we should replace it with something
 simple enough that we do understand it, right?
 
 The purpose of load management is relatively simple: *Don't allow clients to
 pump more requests into the network than the network can handle, while
 ensuring that this workload is distributed across the network in a
 reasonably efficient manner.  This must be done in a decentralized way that
 is not vulnerable to abuse.*
 *
 The current load management system includes many different interacting
 components that make it nearly impossible to understand or debug.  I think
 we need to go back to the drawing board with load management, starting from
 the goal I cite above.
 
 I would invite people to suggest the simplest possible load management
 schemes that might work, let's then discuss them, and figure out which is
 most likely to work, and if it doesn't work, which will be easiest to debug.
 
 We can bear in mind a few lesson's we've learned though.  What
 characteristics should our load balancing system have?
 
- Dropping requests should be extremely rare because this
just exacerbates overloading
- Delaying requests should also be extremely rare for the same reason
- Misrouting requests should be limited, but perhaps acceptable
occasionally, for the same reason

Misrouting is unacceptable, in general. Extremely overloaded or extremely low 
capacity nodes may be routed around. We might even allow some bounded amount of 
misrouting in the more general case (e.g. go to either of the top two peers for 
the key). But in general, transforming load into misrouting (or into reduced 
HTL, or any other bogus escape valve) is a bad idea. We need to reduce the 
incoming load.
 
 It therefore really comes down to nodes anticipating whether the network is
 in danger of having to drop, delay, or misroute requests, and reduce the
 rate at which they are pumping requests into the network if this danger
 point is reached.
 
 So how do nodes inform each-other that they are at risk of having to drop,
 delay, or misroute requests?  There are two reasons this might happen.
  Either a node itself is close to its own capacity for relaying requests, or
 the other nodes it relays to are themselves at or close to capacity.
 
 One problem I'm concerned about when nodes share information about how
 overloaded their peers are is a gridlock:
 
 What if all nodes told other nodes that they were overloaded because all
 their peers are overloaded.  Such a situation would basically cause the
 entire network to think it was overloaded, even though nobody actually was!
  It becomes a bit like this cartoon: http://flic.kr/p/5npfm2  How can this
 be avoided?

Exactly. There are two possible approaches:
1. Nodes control the rate at which *local* requests are initiated, based on how 
many slow-down signals they get.

(This ensures that such gridlock or feedback loop problems can't happen)

The main problems are security:
- Deliberate flooding: Malicious nodes not following AIMD's, in order to clog 
up the network.

Fair sharing can limit this. Limiting backoff so that we only misroute when 
nodes are severely overloaded can also help. We could queue (provided this 
happens only rarely and briefly), allow bounded misrouting (e.g. route to one 
of the top two routes), or terminate requests (thus freeing up resources).

- Opportunistic flooding: Users hack their nodes to not follow AIMD's because 
it speeds up their downloads.

This is harder than 

Re: [freenet-dev] Beyond New Load Management

2011-08-29 Thread Matthew Toseland
On Monday 29 Aug 2011 18:37:39 Matthew Toseland wrote:
 On Saturday 27 Aug 2011 21:35:58 Ian Clarke wrote:
  Matthew,
  
  This makes sense from the perspective of making incremental changes, but I
  think we need to be more drastic than that.  I think we need to go back to
  the drawing board with load management.  We need to find a solution that is
  simple enough to reason about, and to debug if we have problems with it.
  
  The entire approach of coming up with hypotheses about what is wrong,
  building a solution based on these hypotheses (without actually confirming
  that the hypotheses are accurate) and deploying it is deja vu, we've been
  doing it for a decade, and we still haven't got load management right.
   We're just layering more complexity onto a system that we already don't
  understand, based on guesses as to what the problems were with the previous
  iteration that we can't test because the system is too complicated with too
  many interactions for anyone to get their heads around it.
 
 That is because we do not have the time or funding to empirically test 
 hypotheses. We don't gather enough data, we don't have a huge testnet to try 
 stuff on over extended periods, and so on. Most software development has 
 deadlines and resource limitations, but most software development doesn't 
 NEED to be empirical, at least not very often! See my other mail about this.
  
  If something isn't working, and we don't understand for definite what is
  wrong with it, then we shouldn't build more on top of it in the hope that
  we'll accidentally solve the problem, we should replace it with something
  simple enough that we do understand it, right?
  
  The purpose of load management is relatively simple: *Don't allow clients to
  pump more requests into the network than the network can handle, while
  ensuring that this workload is distributed across the network in a
  reasonably efficient manner.  This must be done in a decentralized way that
  is not vulnerable to abuse.*
  *
  The current load management system includes many different interacting
  components that make it nearly impossible to understand or debug.  I think
  we need to go back to the drawing board with load management, starting from
  the goal I cite above.
  
  I would invite people to suggest the simplest possible load management
  schemes that might work, let's then discuss them, and figure out which is
  most likely to work, and if it doesn't work, which will be easiest to debug.
  
  We can bear in mind a few lesson's we've learned though.  What
  characteristics should our load balancing system have?
  
 - Dropping requests should be extremely rare because this
 just exacerbates overloading
 - Delaying requests should also be extremely rare for the same reason
 - Misrouting requests should be limited, but perhaps acceptable
 occasionally, for the same reason
 
 Misrouting is unacceptable, in general. Extremely overloaded or extremely low 
 capacity nodes may be routed around. We might even allow some bounded amount 
 of misrouting in the more general case (e.g. go to either of the top two 
 peers for the key). But in general, transforming load into misrouting (or 
 into reduced HTL, or any other bogus escape valve) is a bad idea. We need to 
 reduce the incoming load.
  
  It therefore really comes down to nodes anticipating whether the network is
  in danger of having to drop, delay, or misroute requests, and reduce the
  rate at which they are pumping requests into the network if this danger
  point is reached.
  
  So how do nodes inform each-other that they are at risk of having to drop,
  delay, or misroute requests?  There are two reasons this might happen.
   Either a node itself is close to its own capacity for relaying requests, or
  the other nodes it relays to are themselves at or close to capacity.
  
  One problem I'm concerned about when nodes share information about how
  overloaded their peers are is a gridlock:
  
  What if all nodes told other nodes that they were overloaded because all
  their peers are overloaded.  Such a situation would basically cause the
  entire network to think it was overloaded, even though nobody actually was!
   It becomes a bit like this cartoon: http://flic.kr/p/5npfm2  How can this
  be avoided?
 
 Exactly. There are two possible approaches:
 1. Nodes control the rate at which *local* requests are initiated, based on 
 how many slow-down signals they get.
 
 (This ensures that such gridlock or feedback loop problems can't happen)
 
 The main problems are security:
 - Deliberate flooding: Malicious nodes not following AIMD's, in order to clog 
 up the network.
 
 Fair sharing can limit this. Limiting backoff so that we only misroute when 
 nodes are severely overloaded can also help. We could queue (provided this 
 happens only rarely and briefly), allow bounded misrouting (e.g. route to one 
 of the top two routes), or terminate requests (thus freeing up 

Re: [freenet-dev] Beyond New Load Management

2011-08-29 Thread Ian Clarke
On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland 
t...@amphibian.dyndns.org wrote:

 That is because we do not have the time or funding to empirically test
 hypotheses. We don't gather enough data, we don't have a huge testnet to try
 stuff on over extended periods, and so on. Most software development has
 deadlines and resource limitations, but most software development doesn't
 NEED to be empirical, at least not very often! See my other mail about this.


I don't think the problem is a lack of time or funding.  The problem is that
we come up with solutions that are too complicated to analyze or fix when
they don't work.  Throwing more time and money at that might help, but that
is addressing the symptom, not the cause.  The cause is complexity, which
just grows and grows as we try to fix problems we don't understand by
layering on more complexity.


 Misrouting is unacceptable, in general. Extremely overloaded or extremely
 low capacity nodes may be routed around. We might even allow some bounded
 amount of misrouting in the more general case (e.g. go to either of the top
 two peers for the key). But in general, transforming load into misrouting
 (or into reduced HTL, or any other bogus escape valve) is a bad idea. We
 need to reduce the incoming load.


Right, the same is true of queueing.  If nodes are forced to do things to
deal with overloading that make the problem worse then the load balancing
algorithm has failed.  Its job is to prevent that from happening.


  What if all nodes told other nodes that they were overloaded because all
  their peers are overloaded.  Such a situation would basically cause the
  entire network to think it was overloaded, even though nobody actually
 was!
   It becomes a bit like this cartoon: http://flic.kr/p/5npfm2  How can
 this
  be avoided?

 Exactly. There are two possible approaches:
 1. Nodes control the rate at which *local* requests are initiated, based on
 how many slow-down signals they get.

 (This ensures that such gridlock or feedback loop problems can't happen)


Ah, so slow-downs only ever affect local request initiation?  The problem
there is that if a peer is overloaded, only those peers that are directly
connected to it will slow down their rate of requests.  Peers elsewhere on
the network will continue to fire requests at it, and there will be no way
for them to know they need to slow down :-/

Unless... what if the DataReply message (or whatever its called these days)
contains a slow-down request, and its respected by every peer along that
path according to AIMD?  The problem is that there will be a very week link
between the rate of requests, and the rate of slow-down messages.  It would
be like having 10,000 individual heaters with thermostats all trying to
regulate the temperature of a gigantic space.  Each individual
thermostat/heater will have an almost imperceptible impact.  Although
perhaps that would work...?

Fair sharing can limit this. Limiting backoff so that we only misroute when
 nodes are severely overloaded can also help. We could queue (provided this
 happens only rarely and briefly), allow bounded misrouting (e.g. route to
 one of the top two routes), or terminate requests (thus freeing up
 resources).


I think fair sharing is one of those complexities that we need to go back
to the drawing board on.  Its clear that even now we have hypotheses about
how fair sharing could be causing problems.  Its too complicated to really
reason about, or to anticipate all of its implications.


 - The easiest way to implement #1 is with AIMD's. We can keep them, but
 send slow-down messages earlier.
 - We could keep NLM, i.e. queue for limited times to smooth out delays.


Doesn't that violate the principal that the load balancing stuff shouldn't
do anything that makes the problem worse?  Queueing does exactly that.


 Currently we RNF after a (longish) period waiting for a node to route to.
 We would send a slow-down message when we have queued for more than a
 certain period.  This means we can have a fixed, limited amount of
 misrouting, we can keep NLM's clear benefits for routing accuracy (as
 visible in success rates), and we can ensure that the input load is low
 enough to avoid severe slowdown due to queueing for a long time.


No, I think we need to avoid queueing except in case of emergency.  Queueing
only makes things worse by tying up more resources for longer.

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: i...@freenetproject.org
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Ian Clarke
Ok, thinking about it, here is a proposal, or  rather, the beginning of a
proposal.  I'm assuming that we get rid of NLM, fair sharing, and anything
else intended to control load, and replace it with this.  We will absolutely
need to simulate this before we write a single line of code to deploy it.

The core idea is that a node will include a floating point number in
response to any kind of request showing how close that node is to being
overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
completely saturated and must reject requests.  Clearly the goal is to avoid
getting anywhere near to 1.0.

A node tracks several things:

   - What is the overall average load reported by responses this node has
   received
   - What is the average load reported by responses this node has received,
   per remote node
   - What is the average load reported by responses this node forwarded, per
   remote node

I think, given these metrics, we should be able to do the following:

   - Limit our overall rate of initiating local requests based on the global
   average reported load
   - Limit our rate of local requests based on the average load of the
   connection to the peer it would need to be forwarded to
   - Detect when remote peers are abusing the system by disregarding load -
   as evidenced by a significantly higher average load in replies forwarded to
   them

Of course, lots of questions:

   - How do we compute the averages?  A decaying running average of some
   kind?  What time period?
   - How do we translate load into a desired rate of requests?
   - What criteria indicate that a peer is abusing the system?  What is the
   remedy?

This is basically control theory stuff, and we'll need robust answers to
these questions before we proceed (ideally with a theoretical rather than
experimental foundation).

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: i...@freenetproject.org
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Beyond New Load Management

2011-08-29 Thread Matthew Toseland
On Monday 29 Aug 2011 18:58:26 Ian Clarke wrote:
 On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland 
 t...@amphibian.dyndns.org wrote:
 
  That is because we do not have the time or funding to empirically test
  hypotheses. We don't gather enough data, we don't have a huge testnet to try
  stuff on over extended periods, and so on. Most software development has
  deadlines and resource limitations, but most software development doesn't
  NEED to be empirical, at least not very often! See my other mail about this.
 
 I don't think the problem is a lack of time or funding.  The problem is that
 we come up with solutions that are too complicated to analyze or fix when
 they don't work.  Throwing more time and money at that might help, but that
 is addressing the symptom, not the cause.  The cause is complexity, which
 just grows and grows as we try to fix problems we don't understand by
 layering on more complexity.

Perhaps. Ripping everything out and starting again is always tempting 
especially when subordinates have been doing things without asking you. :)
 
  Misrouting is unacceptable, in general. Extremely overloaded or extremely
  low capacity nodes may be routed around. We might even allow some bounded
  amount of misrouting in the more general case (e.g. go to either of the top
  two peers for the key). But in general, transforming load into misrouting
  (or into reduced HTL, or any other bogus escape valve) is a bad idea. We
  need to reduce the incoming load.
 
 Right, the same is true of queueing.  If nodes are forced to do things to
 deal with overloading that make the problem worse then the load balancing
 algorithm has failed.  Its job is to prevent that from happening.

Okay.
 
   What if all nodes told other nodes that they were overloaded because all
   their peers are overloaded.  Such a situation would basically cause the
   entire network to think it was overloaded, even though nobody actually
  was!
It becomes a bit like this cartoon: http://flic.kr/p/5npfm2  How can
  this
   be avoided?
 
  Exactly. There are two possible approaches:
  1. Nodes control the rate at which *local* requests are initiated, based on
  how many slow-down signals they get.
 
  (This ensures that such gridlock or feedback loop problems can't happen)
 
 Ah, so slow-downs only ever affect local request initiation?  The problem
 there is that if a peer is overloaded, only those peers that are directly
 connected to it will slow down their rate of requests.  Peers elsewhere on
 the network will continue to fire requests at it, and there will be no way
 for them to know they need to slow down :-/

No, because the slow-down's are propagated back to the original request 
originator, and only the request originator takes notice of them. That's how it 
works now.
 
 Unless... what if the DataReply message (or whatever its called these days)
 contains a slow-down request, and its respected by every peer along that
 path according to AIMD?  The problem is that there will be a very week link
 between the rate of requests, and the rate of slow-down messages.  It would
 be like having 10,000 individual heaters with thermostats all trying to
 regulate the temperature of a gigantic space.  Each individual
 thermostat/heater will have an almost imperceptible impact.  Although
 perhaps that would work...?

The rest is plausible - feedback might be too slow, but this can be helped in 
part by an increased window size between the point at which we start sending 
slow-down messages and the point at which things start breaking more badly.
 
 Fair sharing can limit this. Limiting backoff so that we only misroute when
  nodes are severely overloaded can also help. We could queue (provided this
  happens only rarely and briefly), allow bounded misrouting (e.g. route to
  one of the top two routes), or terminate requests (thus freeing up
  resources).
 
 I think fair sharing is one of those complexities that we need to go back
 to the drawing board on.  Its clear that even now we have hypotheses about
 how fair sharing could be causing problems.  Its too complicated to really
 reason about, or to anticipate all of its implications.

It is vital for:
1. Security: It quashes DoS attacks/floods quickly. Without it, any attacker 
can easily swamp any node it is connected to, using all of its capacity for 
sending its flood onward. With it, they are limited to a fraction of the node's 
capacity. This is a concern that load management must deal with!
2. Nodes with low numbers of peers, and slower nodes in general, to get a look 
in, rather than be rejected at the same rate as everyone else.
 
  - The easiest way to implement #1 is with AIMD's. We can keep them, but
  send slow-down messages earlier.
  - We could keep NLM, i.e. queue for limited times to smooth out delays.
 
 Doesn't that violate the principal that the load balancing stuff shouldn't
 do anything that makes the problem worse?  Queueing does exactly that.

So what do you 

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Matthew Toseland
On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote:
 Ok, thinking about it, here is a proposal, or  rather, the beginning of a
 proposal.  I'm assuming that we get rid of NLM, fair sharing, and anything
 else intended to control load, and replace it with this.  We will absolutely
 need to simulate this before we write a single line of code to deploy it.

I have justified fair sharing in my other reply. However alternate mechanisms 
might replace it.

I agree we should simulate, although this might be significant work.
 
 The core idea is that a node will include a floating point number in
 response to any kind of request showing how close that node is to being
 overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
 completely saturated and must reject requests.  Clearly the goal is to avoid
 getting anywhere near to 1.0.
 
 A node tracks several things:

All of these apply to both locally originated requests and remotely originated 
requests? And are they for only our direct peers?
 
- What is the overall average load reported by responses this node has
received
- What is the average load reported by responses this node has received,
per remote node
- What is the average load reported by responses this node forwarded, per
remote node

Ahhh, this one could be interesting - you could use it to penalise nodes which 
spam excessively.
 
 I think, given these metrics, we should be able to do the following:
 
- Limit our overall rate of initiating local requests based on the global
average reported load

We can already do this on the basis of AIMD's, based purely on behaviour of 
local requests. However, taking more data into account might result in more 
accurate results - although it might also dilute the feedback.

In any case, this part is safe: it's not going to cause feedback problems, on 
its own. Conversely, it doesn't deal with problems such as network bottlenecks.

However, while AIMD's are based on the *global* average load (or rather the 
load of all the nodes along the chain), it sounds like you are proposing to 
only report the *local* load of our direct peers? How are you going to compute 
the global average load?

- Limit our rate of local requests based on the average load of the
connection to the peer it would need to be forwarded to

Not sure I follow here. Are you proposing that we initiate more requests in 
keyspace areas where there is lower load?

- Detect when remote peers are abusing the system by disregarding load -
as evidenced by a significantly higher average load in replies forwarded to
them

Okay, that is useful - although I doubt it would be an unambiguous detection, 
and what would we do about it? We would have to reject requests, and if we're 
not sure, or if it could be a downstream troublemaker, couldn't that result in 
misrouting and make things worse? On the other hand, any malicious node would 
be most easily detected by the nodes directly connected to it, so this could 
work much the same way as fair sharing - provided that it can reject or 
terminate the requests.

Arguably fair sharing achieves the same objective. It could be extended in 
various ways e.g. if we are relaying a lot of slow-down notices to a peer we 
could start to think about squeezing the peer's allocation.
 
 Of course, lots of questions:
 
- How do we compute the averages?  A decaying running average of some
kind?  What time period?

Yay more tunables. :(

- How do we translate load into a desired rate of requests?

And even more tunables. :(

- What criteria indicate that a peer is abusing the system?  What is the
remedy?

In the current system, don't accept their requests.

In ANY system, there will always be a borderline below which they can get away 
with it. We can either try to make this as small as possible or try not to hit 
legitimate traffic too hard.
 
 This is basically control theory stuff, and we'll need robust answers to
 these questions before we proceed (ideally with a theoretical rather than
 experimental foundation).

IMHO small tweaks to AIMDs, combined with fair sharing, will achieve much the 
same without having to wait years for our theoreticians to wake up.


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Matthew Toseland
On Monday 29 Aug 2011 20:09:58 Matthew Toseland wrote:
 On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote:
  Ok, thinking about it, here is a proposal, or  rather, the beginning of a
  proposal.  I'm assuming that we get rid of NLM, fair sharing, and anything
  else intended to control load, and replace it with this.  We will absolutely
  need to simulate this before we write a single line of code to deploy it.
 
 I have justified fair sharing in my other reply. However alternate mechanisms 
 might replace it.
 
 I agree we should simulate, although this might be significant work.
  
  The core idea is that a node will include a floating point number in
  response to any kind of request showing how close that node is to being
  overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
  completely saturated and must reject requests.  Clearly the goal is to avoid
  getting anywhere near to 1.0.
  
  A node tracks several things:
 
 All of these apply to both locally originated requests and remotely 
 originated requests? And are they for only our direct peers?
  
 - What is the overall average load reported by responses this node has
 received
 - What is the average load reported by responses this node has received,
 per remote node
 - What is the average load reported by responses this node forwarded, per
 remote node
 
 Ahhh, this one could be interesting - you could use it to penalise nodes 
 which spam excessively.

Actually, thinking about it, I don't think this will work. Requests are usually 
distributed evenly across the keyspace, and downstream nodes only know they are 
from this node - not which previous node they are from. So the pain will be 
shared across all the peers sending us requests, including our own local 
requests, and won't be identifiable as due to any single node. You have to do 
it by volume, not by reported load.


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Ian Clarke
On Mon, Aug 29, 2011 at 2:09 PM, Matthew Toseland t...@amphibian.dyndns.org
 wrote:

 On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote:
  Ok, thinking about it, here is a proposal, or  rather, the beginning of a
  proposal.  I'm assuming that we get rid of NLM, fair sharing, and
 anything
  else intended to control load, and replace it with this.  We will
 absolutely
  need to simulate this before we write a single line of code to deploy it.

 I have justified fair sharing in my other reply. However alternate
 mechanisms might replace it.

 I agree we should simulate, although this might be significant work.


I doubt it would be more work than trying it without simulation and failing
for a decade, which was our previous approach.


  The core idea is that a node will include a floating point number in
  response to any kind of request showing how close that node is to being
  overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
  completely saturated and must reject requests.  Clearly the goal is to
 avoid
  getting anywhere near to 1.0.
 
  A node tracks several things:

 All of these apply to both locally originated requests and remotely
 originated requests? And are they for only our direct peers?


Every peer along the reply path will take note of the load value, so it will
be a mixture of the loads of close-by peers, and more distant peers.
 Obviously, since we are more likely to receive messages from peers that are
closer, our average will automatically be biased towards closer peers.

- What is the average load reported by responses this node forwarded,
 per
 remote node

 Ahhh, this one could be interesting - you could use it to penalise nodes
 which spam excessively.


Exactly.


  I think, given these metrics, we should be able to do the following:
 
 - Limit our overall rate of initiating local requests based on the
 global
 average reported load

 We can already do this on the basis of AIMD's, based purely on behaviour of
 local requests. However, taking more data into account might result in more
 accurate results - although it might also dilute the feedback.


Right, AIMD tends to control the rate at which a specific event occurs, like
a dropped IP packet with TCP, or a dropped request as we currently use it.
 This is a bit like signaling that you have a headache by banging your head
against the wall.  It is also a rather coarse way to deliver feedback.

A floating point load number conveys more information in a non-destructive
way.

In any case, this part is safe: it's not going to cause feedback problems,
 on its own. Conversely, it doesn't deal with problems such as network
 bottlenecks.


Well, it depends.  If we are tracking load on a per-outbound-connection
basis then we can potentially take action to manage load per-connection in
addition to overall.


 However, while AIMD's are based on the *global* average load (or rather the
 load of all the nodes along the chain), it sounds like you are proposing to
 only report the *local* load of our direct peers? How are you going to
 compute the global average load?


No, its not the local load of our direct peers.  Its the load reported in
all reply messages that we forward, some of those will be from our direct
peers, but most will be from remote peers.


 - Limit our rate of local requests based on the average load of the
 connection to the peer it would need to be forwarded to

 Not sure I follow here. Are you proposing that we initiate more requests in
 keyspace areas where there is lower load?


Well, kinda, except the opposite.  We might reduce requests in keyspaces
where there is higher load.


 - Detect when remote peers are abusing the system by disregarding load
 -
 as evidenced by a significantly higher average load in replies
 forwarded to
 them

 Okay, that is useful - although I doubt it would be an unambiguous
 detection, and what would we do about it? We would have to reject requests,
 and if we're not sure, or if it could be a downstream troublemaker, couldn't
 that result in misrouting and make things worse? On the other hand, any
 malicious node would be most easily detected by the nodes directly connected
 to it, so this could work much the same way as fair sharing - provided that
 it can reject or terminate the requests.


No, we probably wouldn't start rejecting requests - more likely we would
disconnect from that node.  Obviously we'd need to be fairly sure of abusive
behavior before doing that - but abuse should be rare.


 Arguably fair sharing achieves the same objective. It could be extended in
 various ways e.g. if we are relaying a lot of slow-down notices to a peer we
 could start to think about squeezing the peer's allocation.


Yes, except it doesn't work, and its too complicated to reason about.


  Of course, lots of questions:
 
 - How do we compute the averages?  A decaying running average of some
 kind?  What time period?

 Yay more tunables. 

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Ian Clarke
On Mon, Aug 29, 2011 at 2:21 PM, Matthew Toseland t...@amphibian.dyndns.org
 wrote:

   - What is the average load reported by responses this node
 forwarded, per
  remote node
 
  Ahhh, this one could be interesting - you could use it to penalise nodes
 which spam excessively.

 Actually, thinking about it, I don't think this will work. Requests are
 usually distributed evenly across the keyspace, and downstream nodes only
 know they are from this node - not which previous node they are from. So the
 pain will be shared across all the peers sending us requests, including our
 own local requests, and won't be identifiable as due to any single node. You
 have to do it by volume, not by reported load.


But if a node is being abusive won't the nodes its requesting from tend to
have a higher than average load?  Perhaps you are right though, but this is
the kind of thought-process we need.

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: i...@freenetproject.org
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Arne Babenhauserheide
Am Montag, 29. August 2011, 14:32:13 schrieb Ian Clarke:
 Yes, small tweaks have worked so well for us for the last decade, leaving us
 pretty-much where we were in 2003.  No, we don't understand how the current
 system works, there is no point in trying to fix something when we don't
 even know what is broken.

I’d like to present a clue what is broken in NLM. Before I kill you with the
log, here’s the result:


With NLM the latency of a request is a function of the raw bandwidth
(not so with OLM), and NLM used only half my bandwidth after it had
been deployed for 2 days (at the start much more).


τ ~ bandwidth. q_olm ~ 16s, q_nlm ~ τ! ; with τ: transfer time, q: queue time
(time to find the node), nlm: new load management, olm: old load management.



So first step: make sure all bandwidth gets used - maybe by allocating more
slots till we use all allowed bandwidth. Better having to throttle a transfer
than not using bandwidth.


*NLM should with the current network be slower than OLM by 23%. But in 18
months it should actually be faster by ~8% — given Moores Law holds for upload
bandwidth — because the routes are shorter.*


The main advantage of NLM is, that it should be much more resilient against
attackers (DoS).


Now to the log - it’s math and not cleaned up; you have been warned :)


ArneBab SSK-time: σ, CHK-time: ψ, success: Xs, fail: Xf.
ArneBab queue-time: q, transfer-time: τ, hops remaining: h, total hops: h₀,
w: success rate
ArneBab ψs = τ(h) + q(h)
ArneBab ψf = q(h)
ArneBab ψ ~ w₁·ψs + (1-w₁)·ψf
ArneBab σs = τ(h) + q(h)
ArneBab σf = q(h)
ArneBab σ ~ w₂·ψs + (1-w₂)·ψf; w₂ ~ 15%
ArneBab num(ψ) / num(σ) ~ 1
ArneBab → time ~ σ + ψ
ArneBab q(h) depends on timeouts, as do w₁ and w₂
ArneBab time =  w₁·ψs + (1-w₁)·ψf +  w₂·ψs + (1-w₂)·ψf
ArneBab = w₁ · (τ(h) + q(h)) + (1-w₁)·q(h) + w₂ · (τ(h) + q(h)) + (1-
w₂)·q(h)
ArneBab = t(h) · (w₁+w₂) + 2·q(h) · (2-w₁-w₂)
ArneBab = τ(h) · (w₁+w₂) + 2·q(h) · (2-w₁-w₂)
ArneBab in the congestion case q(h) ~ timeout
ArneBab timeout = o
ArneBab timeout: o
ArneBab w depends on the timeout *somehow*, but inversely
ArneBab o=0 → w=0
ArneBab assumption: o = ∞ → w₂ ~ 20%, w₁ ~ 100%
ArneBab assumption: o = ∞ → w₂ ~ 0.2, w₁ ~ 1
ArneBab correction: in the congestion case: q(h) ~ min(timeout, τ(h))
ArneBab timeout matters for q(h) only when timeout  τ(h)
ArneBab I try to: I still need a dependency of w on timeout
ArneBab … lets call it t(w)
ArneBab better: w(o) :)
toad_ well, if there is a timeout, we have a fixed time, but we reduce the
hops ...
toad_ i thought w was success rate
ArneBab ah!
ArneBab and the success rates where in the NLM stats
ArneBab going mostly smoothly from 60% to 0%
ArneBab for the HTL
toad_ right, success rate peaks at 18 or sometimes 16
toad_ what are w1 vs w2?
toad_ chk vs ssk i guess
ArneBab yes
-*- toad_ thinks considering both is probably overambitious at this stage?
ArneBab should not be too bad: SSKs drop much more rapidly at decreasing
hops
ArneBab hops→HTL
toad_ ψs is time for a successful chk; ψf is time for a failed chk ... in
which case h in the first instance is low, and in the second instance is h0
ArneBab yes
toad_ okay, i don't follow this line: time =  w₁·ψs + (1-w₁)·ψf +  w₂·ψs +
(1-w₂)·ψf
toad_ i thought w2 related to SSKs?
ArneBab uh, yes…
ArneBab time =  w₁·ψs + (1-w₁)·ψf +  w₂·σs + (1-w₂)·σf
toad_ you have to appreciate i'm only just getting back into maths and
physics after 12 years ...
toad_ (retaking a-levels to get a degree)
ArneBab no probs, I’m also no expert in this. I try to get a relation
between the time and the timeout, so we can try to find a minimum
toad_ in any case, there are two different h's for the two uses of q(h) - h0
and h_avg
toad_ h_avg for success and h0 for failure
ArneBab hm, yes
ArneBab which makes this harder…
ArneBab it’s wrong anyway… the q(h_avg) was missing
toad_ h_avg is somewhere between 5 and 10 imho
toad_ at least it is if everything is working well and the input load isn't
all really popular stuff (in which case it's answered quickly and can be
ignored)
ArneBab = τ(h) · (w₁+w₂) + q(h) · (2-w₁-w₂) + q(h_avg) · (w₁+w₂)
ArneBab would have been too easy :)
toad_ okay so here q(h) means q(h0) i.e. h = h0, max hops?
ArneBab jepp, and max hops sinks with falling timeout
toad_ hmm?
ArneBab the max actual hops
toad_ on the upside, q() is linear
-- Torgal (~Torgal@78.251.49.8) hat das Netzwerk verlassen (Ping timeout: 276
seconds)
toad_ hopefully
ArneBab yes: q(h) = h·o
ArneBab (in the congestion case)
toad_ the problem i have is it looks like q(1) ~= time [ for a full request
], unless load is very low
ArneBab so τ « q?
ArneBab τ much smaller than q?
toad_ of course it's bounded by timeouts, but i'd expect a runaway feedback
loop until it reaches heavy timeouts and effectively cuts the htl
toad_ well, with OLM, success time for a CHK is 1m25s, unsuccessful is
19sec, so transfer time is at least 1 minute
toad_ and less than 1m25; but with NLM, unsuccessful is 3 min+
ArneBab 

[freenet-dev] Fundraising

2011-08-29 Thread Steve Dougherty
The project's current state is reflected in a dollar amount of remaining
project funds, along with an estimate of how much additional development time
it corresponds to. This amount will decrease suddenly and significantly when a
payment is made. It's not - at least supposed to be - a secret that the Freenet
project is set to run out of funds mid-September. I've mentioned to people - on
Sone, in #freenet - that the project is running out of funds and they had no
idea. This is a problem. I propose (in decreasing importance) that:

1) The donation page mention _when_ the last ~2-month payment was made.
- Or ideally, that the time remaining estimate include remaining
funded development time. This need not be exact: it would allow the
estimate to decrease gradually instead of in sudden chunks.

2) Fundraising be presented as drives for a month's worth of funding.
- This would allow donating to have goals, and be framed as increasing
toward something tangible, (a month of development) rather than building up
funds against sudden, quiet decrease.
- The estimate ceases to be meaningful if it sits at 8 days for two
months.

3) Release announcements include statements on the project's current financial
state and how to donate.
- Fred could have integrated release announcements for greater visibility.
These could of course be disabled by the user.

4) Release announcements also be displayed on the freenetproject.org homepage
in something akin to the Latest Releases/News sidebar on winehq.org.
- Having only news posts that are several months old makes the project
appear inactive, which it is not.

It should go without saying that a 0.8 release would probably make good press,
assuming it can be made stable before funding runs out.
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl


Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Matthew Toseland
Okay, I don't understand most of that, I might be able to check the math if it 
was written properly, but it looks difficult. However, as far as I can see:
- The most obvious way to increase bandwidth usage would be to increase the 
timeout time for output bandwidth liability (and at the same time increase the 
relevant block transfer timeouts).
- This would increase the number of slots but it would also increase the number 
of requests seeking them; I don't see why it would help matters.
- Running an excessive number of requests without adjusting the block transfer 
timeouts would result in some of the transfers timing out.
- It would also, as you mention, make SSK requests (especially failed SSK 
requests) even slower.
- I am quite confident that Moore's Law DOES NOT hold for upload bandwidth.
- As far as I can see, all the benefits of NLM re attackers are achieved by 
fair sharing.
- A system which is more resistant to attacks but slower probably isn't all 
that interesting if the attacks in question are relatively expensive anyway.
- Smaller block sizes would have a significant efficiency cost, and would 
probably make load management more difficult.

I apologise if you see this as rather a broadside after I encouraged you to 
analyse the problem, but it is not yet a convincing demonstration that queueing 
is a viable strategy and not the spawn of satan! :)

On Monday 29 Aug 2011 21:27:01 Arne Babenhauserheide wrote:
 Am Montag, 29. August 2011, 14:32:13 schrieb Ian Clarke:
  Yes, small tweaks have worked so well for us for the last decade, leaving us
  pretty-much where we were in 2003.  No, we don't understand how the current
  system works, there is no point in trying to fix something when we don't
  even know what is broken.
 
 I’d like to present a clue what is broken in NLM. Before I kill you with the 
 log, here’s the result: 
 
   With NLM the latency of a request is a function of the raw bandwidth 
   (not so with OLM), and NLM used only half my bandwidth after it had 
   been deployed for 2 days (at the start much more).
 
 τ ~ bandwidth. q_olm ~ 16s, q_nlm ~ τ! ; with τ: transfer time, q: queue time 
 (time to find the node), nlm: new load management, olm: old load management.
 
 So first step: make sure all bandwidth gets used - maybe by allocating more 
 slots till we use all allowed bandwidth. Better having to throttle a transfer 
 than not using bandwidth.
 
 *NLM should with the current network be slower than OLM by 23%. But in 18 
 months it should actually be faster by ~8% — given Moores Law holds for 
 upload 
 bandwidth — because the routes are shorter.*
 
 The main advantage of NLM is, that it should be much more resilient against 
 attackers (DoS).
 
 Now to the log - it’s math and not cleaned up; you have been warned :)
 
 ArneBab SSK-time: σ, CHK-time: ψ, success: Xs, fail: Xf. 
 ArneBab queue-time: q, transfer-time: τ, hops remaining: h, total hops: h₀, 
 w: success rate
 ArneBab ψs = τ(h) + q(h)
 ArneBab ψf = q(h)
 ArneBab ψ ~ w₁·ψs + (1-w₁)·ψf
 ArneBab σs = τ(h) + q(h)
 ArneBab σf = q(h)
 ArneBab σ ~ w₂·ψs + (1-w₂)·ψf; w₂ ~ 15%
 ArneBab num(ψ) / num(σ) ~ 1
 ArneBab → time ~ σ + ψ
 ArneBab q(h) depends on timeouts, as do w₁ and w₂
 ArneBab time =  w₁·ψs + (1-w₁)·ψf +  w₂·ψs + (1-w₂)·ψf
 ArneBab = w₁ · (τ(h) + q(h)) + (1-w₁)·q(h) + w₂ · (τ(h) + q(h)) + (1-
 w₂)·q(h)
 ArneBab = t(h) · (w₁+w₂) + 2·q(h) · (2-w₁-w₂)
 ArneBab = τ(h) · (w₁+w₂) + 2·q(h) · (2-w₁-w₂)
 ArneBab in the congestion case q(h) ~ timeout
 ArneBab timeout = o
 ArneBab timeout: o
 ArneBab w depends on the timeout *somehow*, but inversely
 ArneBab o=0 → w=0
 ArneBab assumption: o = ∞ → w₂ ~ 20%, w₁ ~ 100%
 ArneBab assumption: o = ∞ → w₂ ~ 0.2, w₁ ~ 1
 ArneBab correction: in the congestion case: q(h) ~ min(timeout, τ(h))
 ArneBab timeout matters for q(h) only when timeout  τ(h)
 ArneBab I try to: I still need a dependency of w on timeout
 ArneBab … lets call it t(w)
 ArneBab better: w(o) :)
 toad_ well, if there is a timeout, we have a fixed time, but we reduce the 
 hops ...
 toad_ i thought w was success rate
 ArneBab ah! 
 ArneBab and the success rates where in the NLM stats
 ArneBab going mostly smoothly from 60% to 0%
 ArneBab for the HTL
 toad_ right, success rate peaks at 18 or sometimes 16
 toad_ what are w1 vs w2?
 toad_ chk vs ssk i guess
 ArneBab yes
 -*- toad_ thinks considering both is probably overambitious at this stage?
 ArneBab should not be too bad: SSKs drop much more rapidly at decreasing 
 hops
 ArneBab hops→HTL
 toad_ ψs is time for a successful chk; ψf is time for a failed chk ... in 
 which case h in the first instance is low, and in the second instance is h0
 ArneBab yes
 toad_ okay, i don't follow this line: time =  w₁·ψs + (1-w₁)·ψf +  w₂·ψs + 
 (1-w₂)·ψf
 toad_ i thought w2 related to SSKs?
 ArneBab uh, yes…
 ArneBab time =  w₁·ψs + (1-w₁)·ψf +  w₂·σs + (1-w₂)·σf
 toad_ you have to appreciate i'm only just getting back into maths and 
 physics after 12 years ...
 toad_ 

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Matthew Toseland
On Monday 29 Aug 2011 20:32:13 Ian Clarke wrote:
 On Mon, Aug 29, 2011 at 2:09 PM, Matthew Toseland t...@amphibian.dyndns.org
  wrote:
 
  On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote:
   Ok, thinking about it, here is a proposal, or  rather, the beginning of a
   proposal.  I'm assuming that we get rid of NLM, fair sharing, and
  anything
   else intended to control load, and replace it with this.  We will
  absolutely
   need to simulate this before we write a single line of code to deploy it.
 
  I have justified fair sharing in my other reply. However alternate
  mechanisms might replace it.
 
  I agree we should simulate, although this might be significant work.
 
 
 I doubt it would be more work than trying it without simulation and failing
 for a decade, which was our previous approach.
 
 
   The core idea is that a node will include a floating point number in
   response to any kind of request showing how close that node is to being
   overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
   completely saturated and must reject requests.  Clearly the goal is to
  avoid
   getting anywhere near to 1.0.
  
   A node tracks several things:
 
  All of these apply to both locally originated requests and remotely
  originated requests? And are they for only our direct peers?
 
 
 Every peer along the reply path will take note of the load value, so it will
 be a mixture of the loads of close-by peers, and more distant peers.
  Obviously, since we are more likely to receive messages from peers that are
 closer, our average will automatically be biased towards closer peers.

So it is from the data source? Or every node along the way?
 
 - What is the average load reported by responses this node forwarded,
  per
  remote node
 
  Ahhh, this one could be interesting - you could use it to penalise nodes
  which spam excessively.
 
 Exactly.
 
   I think, given these metrics, we should be able to do the following:
  
  - Limit our overall rate of initiating local requests based on the
  global
  average reported load
 
  We can already do this on the basis of AIMD's, based purely on behaviour of
  local requests. However, taking more data into account might result in more
  accurate results - although it might also dilute the feedback.
 
 Right, AIMD tends to control the rate at which a specific event occurs, like
 a dropped IP packet with TCP, or a dropped request as we currently use it.
  This is a bit like signaling that you have a headache by banging your head
 against the wall.  It is also a rather coarse way to deliver feedback.

Fair point.
 
 A floating point load number conveys more information in a non-destructive
 way.
 
 In any case, this part is safe: it's not going to cause feedback problems,
  on its own. Conversely, it doesn't deal with problems such as network
  bottlenecks.
 
 Well, it depends.  If we are tracking load on a per-outbound-connection
 basis then we can potentially take action to manage load per-connection in
 addition to overall.

Not sure I follow here. Routing by load is bad, unless it is severely limited.
 
  However, while AIMD's are based on the *global* average load (or rather the
  load of all the nodes along the chain), it sounds like you are proposing to
  only report the *local* load of our direct peers? How are you going to
  compute the global average load?
 
 No, its not the local load of our direct peers.  Its the load reported in
 all reply messages that we forward, some of those will be from our direct
 peers, but most will be from remote peers.
 
  - Limit our rate of local requests based on the average load of the
  connection to the peer it would need to be forwarded to
 
  Not sure I follow here. Are you proposing that we initiate more requests in
  keyspace areas where there is lower load?
 
 Well, kinda, except the opposite.  We might reduce requests in keyspaces
 where there is higher load.
 
 
  - Detect when remote peers are abusing the system by disregarding load
  -
  as evidenced by a significantly higher average load in replies
  forwarded to
  them
 
  Okay, that is useful - although I doubt it would be an unambiguous
  detection, and what would we do about it? We would have to reject requests,
  and if we're not sure, or if it could be a downstream troublemaker, couldn't
  that result in misrouting and make things worse? On the other hand, any
  malicious node would be most easily detected by the nodes directly connected
  to it, so this could work much the same way as fair sharing - provided that
  it can reject or terminate the requests.
 
 No, we probably wouldn't start rejecting requests - more likely we would
 disconnect from that node.  Obviously we'd need to be fairly sure of abusive
 behavior before doing that - but abuse should be rare.

Well then we'd need to be very sure, and this means you could get away with a 
lot before triggering it. Wierd topologies for instance could cause 

Re: [freenet-dev] Fundraising

2011-08-29 Thread Ian Clarke
These are good ideas Steve, I must confess that I've been a little
distracted lately - and we do definitely need to get our act together on
funding.  Matthew, what is our current drop-dead date as best as you can
estimate it?

Ian.

On Mon, Aug 29, 2011 at 4:18 PM, Steve Dougherty st...@asksteved.comwrote:

 The project's current state is reflected in a dollar amount of remaining
 project funds, along with an estimate of how much additional development
 time
 it corresponds to. This amount will decrease suddenly and significantly
 when a
 payment is made. It's not - at least supposed to be - a secret that the
 Freenet
 project is set to run out of funds mid-September. I've mentioned to people
 - on
 Sone, in #freenet - that the project is running out of funds and they had
 no
 idea. This is a problem. I propose (in decreasing importance) that:

 1) The donation page mention _when_ the last ~2-month payment was made.
- Or ideally, that the time remaining estimate include remaining
funded development time. This need not be exact: it would allow the
estimate to decrease gradually instead of in sudden chunks.

 2) Fundraising be presented as drives for a month's worth of funding.
- This would allow donating to have goals, and be framed as increasing
toward something tangible, (a month of development) rather than building
 up
funds against sudden, quiet decrease.
- The estimate ceases to be meaningful if it sits at 8 days for two
months.

 3) Release announcements include statements on the project's current
 financial
 state and how to donate.
- Fred could have integrated release announcements for greater
 visibility.
These could of course be disabled by the user.

 4) Release announcements also be displayed on the freenetproject.orghomepage
 in something akin to the Latest Releases/News sidebar on winehq.org.
- Having only news posts that are several months old makes the project
appear inactive, which it is not.

 It should go without saying that a 0.8 release would probably make good
 press,
 assuming it can be made stable before funding runs out.
 ___
 Devl mailing list
 Devl@freenetproject.org
 http://freenetproject.org/cgi-bin/mailman/listinfo/devl




-- 
Ian Clarke
Personal blog: http://blog.locut.us/
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Arne Babenhauserheide
After discussing and going deeper into this, it became apparent that the
problem is not overload of queues. I’ll repeat the result I came to here (and
not in chatlog form :) ):

The problem is in the load limiter:


1) we need to use all bandwidth, because latency depends on the number of
finishing transfers.

2) SSKs fail 85% of the time.

3) the load limiter assumes that all SSKs succeed and reserves bandwidth for
them.

4) this now bites us, because NLM has shorter transfer times and longer wait
times. So the over-reservation of bandwidth for SSKs lasts longer.

5) solution: count each SSK as only
average_SSK_success_rate * data_to_transfer_on_success.


To (1): When we have less bandwidth, than less CHKs succeed per second, so
less slots get freed and the queueing time it bigger.

Best wishes,
Arne



Am Montag, 29. August 2011, 22:27:47 schrieb Matthew Toseland:
 Okay, I don't understand most of that, I might be able to check the math if
 it was written properly, but it looks difficult. However, as far as I can
 see: - The most obvious way to increase bandwidth usage would be to
 increase the timeout time for output bandwidth liability (and at the same
 time increase the relevant block transfer timeouts). - This would increase
 the number of slots but it would also increase the number of requests
 seeking them; I don't see why it would help matters. - Running an excessive
 number of requests without adjusting the block transfer timeouts would
 result in some of the transfers timing out. - It would also, as you
 mention, make SSK requests (especially failed SSK requests) even slower. -
 I am quite confident that Moore's Law DOES NOT hold for upload bandwidth. -
 As far as I can see, all the benefits of NLM re attackers are achieved by
 fair sharing. - A system which is more resistant to attacks but slower
 probably isn't all that interesting if the attacks in question are
 relatively expensive anyway. - Smaller block sizes would have a significant
 efficiency cost, and would probably make load management more difficult.

 I apologise if you see this as rather a broadside after I encouraged you to
 analyse the problem, but it is not yet a convincing demonstration that
 queueing is a viable strategy and not the spawn of satan! :)
 On Monday 29 Aug 2011 21:27:01 Arne Babenhauserheide wrote:
  Am Montag, 29. August 2011, 14:32:13 schrieb Ian Clarke:
   Yes, small tweaks have worked so well for us for the last decade,
   leaving us pretty-much where we were in 2003.  No, we don't
   understand how the current system works, there is no point in
   trying to fix something when we don't even know what is broken.
 
  I’d like to present a clue what is broken in NLM. Before I kill you with
  the
  log, here’s the result:
  With NLM the latency of a request is a function of the raw bandwidth
  (not so with OLM), and NLM used only half my bandwidth after it had
  been deployed for 2 days (at the start much more).
 
  τ ~ bandwidth. q_olm ~ 16s, q_nlm ~ τ! ; with τ: transfer time, q: queue
  time (time to find the node), nlm: new load management, olm: old load
  management.
 
  So first step: make sure all bandwidth gets used - maybe by allocating
  more slots till we use all allowed bandwidth. Better having to throttle
  a transfer than not using bandwidth.
 
  *NLM should with the current network be slower than OLM by 23%. But in
  18
  months it should actually be faster by ~8% — given Moores Law holds for
  upload bandwidth — because the routes are shorter.*
 
  The main advantage of NLM is, that it should be much more resilient
  against attackers (DoS).
 
  Now to the log - it’s math and not cleaned up; you have been warned :)
 
  ArneBab SSK-time: σ, CHK-time: ψ, success: Xs, fail: Xf.
  ArneBab queue-time: q, transfer-time: τ, hops remaining: h, total
  hops: h₀, w: success rate
  ArneBab ψs = τ(h) + q(h)
  ArneBab ψf = q(h)
  ArneBab ψ ~ w₁·ψs + (1-w₁)·ψf
  ArneBab σs = τ(h) + q(h)
  ArneBab σf = q(h)
  ArneBab σ ~ w₂·ψs + (1-w₂)·ψf; w₂ ~ 15%
  ArneBab num(ψ) / num(σ) ~ 1
  ArneBab → time ~ σ + ψ
  ArneBab q(h) depends on timeouts, as do w₁ and w₂
  ArneBab time =  w₁·ψs + (1-w₁)·ψf +  w₂·ψs + (1-w₂)·ψf
  ArneBab = w₁ · (τ(h) + q(h)) + (1-w₁)·q(h) + w₂ · (τ(h) + q(h)) + (1-
  w₂)·q(h)
  ArneBab = t(h) · (w₁+w₂) + 2·q(h) · (2-w₁-w₂)
  ArneBab = τ(h) · (w₁+w₂) + 2·q(h) · (2-w₁-w₂)
  ArneBab in the congestion case q(h) ~ timeout
  ArneBab timeout = o
  ArneBab timeout: o
  ArneBab w depends on the timeout *somehow*, but inversely
  ArneBab o=0 → w=0
  ArneBab assumption: o = ∞ → w₂ ~ 20%, w₁ ~ 100%
  ArneBab assumption: o = ∞ → w₂ ~ 0.2, w₁ ~ 1
  ArneBab correction: in the congestion case: q(h) ~ min(timeout, τ(h))
  ArneBab timeout matters for q(h) only when timeout  τ(h)
  ArneBab I try to: I still need a dependency of w on timeout
  ArneBab … lets call it t(w)
  ArneBab better: w(o) :)
  toad_ well, if there is a timeout, we have a fixed time, but we reduce
  

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Matthew Toseland
On Tuesday 30 Aug 2011 00:08:16 Arne Babenhauserheide wrote:
 After discussing and going deeper into this, it became apparent that the 
 problem is not overload of queues. I’ll repeat the result I came to here (and 
 not in chatlog form :) ): 
 
 The problem is in the load limiter: 
 
 
 1) we need to use all bandwidth, because latency depends on the number of  
   finishing transfers.
 
 2) SSKs fail 85% of the time.
 
 3) the load limiter assumes that all SSKs succeed and reserves bandwidth for 
   them.
 
 4) this now bites us, because NLM has shorter transfer times and longer wait 
   times. So the over-reservation of bandwidth for SSKs lasts longer. 
 
 5) solution: count each SSK as only 
   average_SSK_success_rate * data_to_transfer_on_success.
 
 
 To (1): When we have less bandwidth, than less CHKs succeed per second, so 
 less slots get freed and the queueing time it bigger.

Another way to look at this:

Both me and Ian have been assuming that the resource is some number 
representing a property of the node, for example, how many requests are 
running. But the real resource is the external bandwidth usage, which is rather 
different.

An alternative solution - if we are worried about bursts of requests succeeding 
and therefore timing out - is to allocate a fixed chunk of bandwidth for SSKs, 
and not let them interfere with CHKs. Then CHKs can have long transfer times, 
and therefore long queueing times when they fail (but this is only ~ 40% of the 
time with NLM's accurate routing), and SSKs can have short transfer times, and 
therefore short queueing times. This may be less disruptive and should also be 
relatively easy to implement.

But the other question is, can queueing ever be helpful? It can if it allows us 
to route more accurately (which NLM clearly does), and/or to run enough 
requests in parallel that the longer time taken for the request to reach its 
destination is offset. Is this condition met?
 
 Best wishes, 
 Arne
 
 
 
 Am Montag, 29. August 2011, 22:27:47 schrieb Matthew Toseland:
  Okay, I don't understand most of that, I might be able to check the math if
  it was written properly, but it looks difficult. However, as far as I can
  see: - The most obvious way to increase bandwidth usage would be to
  increase the timeout time for output bandwidth liability (and at the same
  time increase the relevant block transfer timeouts). - This would increase
  the number of slots but it would also increase the number of requests
  seeking them; I don't see why it would help matters. - Running an excessive
  number of requests without adjusting the block transfer timeouts would
  result in some of the transfers timing out. - It would also, as you
  mention, make SSK requests (especially failed SSK requests) even slower. -
  I am quite confident that Moore's Law DOES NOT hold for upload bandwidth. -
  As far as I can see, all the benefits of NLM re attackers are achieved by
  fair sharing. - A system which is more resistant to attacks but slower
  probably isn't all that interesting if the attacks in question are
  relatively expensive anyway. - Smaller block sizes would have a significant
  efficiency cost, and would probably make load management more difficult.
  
  I apologise if you see this as rather a broadside after I encouraged you to
  analyse the problem, but it is not yet a convincing demonstration that
  queueing is a viable strategy and not the spawn of satan! :)
  On Monday 29 Aug 2011 21:27:01 Arne Babenhauserheide wrote:
   Am Montag, 29. August 2011, 14:32:13 schrieb Ian Clarke:
Yes, small tweaks have worked so well for us for the last decade,
leaving us pretty-much where we were in 2003.  No, we don't
understand how the current system works, there is no point in
trying to fix something when we don't even know what is broken.
   
   I’d like to present a clue what is broken in NLM. Before I kill you with
   the 
   log, here’s the result:
 With NLM the latency of a request is a function of the raw bandwidth
 (not so with OLM), and NLM used only half my bandwidth after it had
 been deployed for 2 days (at the start much more).
   
   τ ~ bandwidth. q_olm ~ 16s, q_nlm ~ τ! ; with τ: transfer time, q: queue
   time (time to find the node), nlm: new load management, olm: old load
   management.
   
   So first step: make sure all bandwidth gets used - maybe by allocating
   more slots till we use all allowed bandwidth. Better having to throttle
   a transfer than not using bandwidth.
   
   *NLM should with the current network be slower than OLM by 23%. But in
   18
   months it should actually be faster by ~8% — given Moores Law holds for
   upload bandwidth — because the routes are shorter.*
   
   The main advantage of NLM is, that it should be much more resilient
   against attackers (DoS).
   
   Now to the log - it’s math and not cleaned up; you have been warned :)
   
   ArneBab SSK-time: σ, CHK-time: ψ, success: Xs, 

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Matthew Toseland
On Tuesday 30 Aug 2011 00:18:35 Matthew Toseland wrote:
 On Tuesday 30 Aug 2011 00:08:16 Arne Babenhauserheide wrote:
  After discussing and going deeper into this, it became apparent that the 
  problem is not overload of queues. I’ll repeat the result I came to here 
  (and 
  not in chatlog form :) ): 
  
  The problem is in the load limiter: 
  
  
  1) we need to use all bandwidth, because latency depends on the number of  
  finishing transfers.
  
  2) SSKs fail 85% of the time.
  
  3) the load limiter assumes that all SSKs succeed and reserves bandwidth 
  for 
  them.
  
  4) this now bites us, because NLM has shorter transfer times and longer 
  wait 
  times. So the over-reservation of bandwidth for SSKs lasts longer. 
  
  5) solution: count each SSK as only 
  average_SSK_success_rate * data_to_transfer_on_success.
  
  
  To (1): When we have less bandwidth, than less CHKs succeed per second, so 
  less slots get freed and the queueing time it bigger.
 
 Another way to look at this:
 
 Both me and Ian have been assuming that the resource is some number 
 representing a property of the node, for example, how many requests are 
 running. But the real resource is the external bandwidth usage, which is 
 rather different.
 
 An alternative solution - if we are worried about bursts of requests 
 succeeding and therefore timing out - is to allocate a fixed chunk of 
 bandwidth for SSKs, and not let them interfere with CHKs. Then CHKs can have 
 long transfer times, and therefore long queueing times when they fail (but 
 this is only ~ 40% of the time with NLM's accurate routing), and SSKs can 
 have short transfer times, and therefore short queueing times. This may be 
 less disruptive and should also be relatively easy to implement.
 
 But the other question is, can queueing ever be helpful? It can if it allows 
 us to route more accurately (which NLM clearly does), and/or to run enough 
 requests in parallel that the longer time taken for the request to reach its 
 destination is offset. Is this condition met?

Yes it is!

If a request takes P seconds without queueing, and P+Q seconds with queueing, 
then if we have the same number of requests running at once, we have a slowdown 
of (P+Q)/P. However, if we start (P+Q)/P times more requests, and the queueing 
time remains constant, we can compensate for this. 

Does it? 

There are two main components:
1) The queueing time. This is roughly proportional to (number of requests 
running) / (number of slots available). If we increase the capacity of the 
node, we multiply both the top and bottom by the same factor - so the queueing 
time is unchanged, unless the data transfer time is changed.
2) The data transfer time. In the above case with a slowdown, the data transfer 
time is the same, or a little shorter, because most of our bandwidth isn't 
being used. If we increase it by running more requests at once, it should reach 
where it was, and if we increase it further, transfers will start to take 
longer again.

ArneBab has analysed this in more detail and shown that the largest component 
of the queueing time is proportional to the data transfer time, hence, 
separating SSKs (which have a very short transfer time) from CHKs (which have a 
much longer transfer time) would help significantly.

The key advantage of NLM, of course, is that it routes accurately, resulting in 
10%-30% improvements in success rates, and transferring data over significantly 
fewer hops (= more effective bandwidth).
  
  Best wishes, 
  Arne
  
  
  
  Am Montag, 29. August 2011, 22:27:47 schrieb Matthew Toseland:
   Okay, I don't understand most of that, I might be able to check the math 
   if
   it was written properly, but it looks difficult. However, as far as I can
   see: - The most obvious way to increase bandwidth usage would be to
   increase the timeout time for output bandwidth liability (and at the same
   time increase the relevant block transfer timeouts). - This would increase
   the number of slots but it would also increase the number of requests
   seeking them; I don't see why it would help matters. - Running an 
   excessive
   number of requests without adjusting the block transfer timeouts would
   result in some of the transfers timing out. - It would also, as you
   mention, make SSK requests (especially failed SSK requests) even slower. -
   I am quite confident that Moore's Law DOES NOT hold for upload bandwidth. 
   -
   As far as I can see, all the benefits of NLM re attackers are achieved by
   fair sharing. - A system which is more resistant to attacks but slower
   probably isn't all that interesting if the attacks in question are
   relatively expensive anyway. - Smaller block sizes would have a 
   significant
   efficiency cost, and would probably make load management more difficult.
   
   I apologise if you see this as rather a broadside after I encouraged you 
   to
   analyse the problem, but it is not yet a 

Re: [freenet-dev] Beyond New Load Management: A proposal

2011-08-29 Thread Arne Bab.
Von: Matthew Toseland t...@amphibian.dyndns.org   
But the other question is, can queueing ever be helpful? It can if it allows 
us to route more accurately (which NLM clearly does), and/or to run enough 
requests in parallel that the longer time taken for the request to reach its 
destination is offset. Is this condition met?

Experience with the deployed NLM showed that even in the fully congested case 
it had success rates of 60% for HTL 18,17 and 16, compared to less than 40% for 
OLM. This means that the requests are sent over fewer hops on average, because 
find the content fewer hops away from the requester.

A download of 1MiB which is sent over 2 hops needs 2 MiB in total network 
bandwidth.
If it is sent over only 1.5 hops on average, then it needs only 1.5 MiB total 
network bandwidth.

So essentially NLM can distribute 30% more content with the same network 
resources¹. And these numbers are actual observations. The only reason why this 
did not result in increased performance is that the nodes used less than 50% of 
their allocated bandwidth² - which is a problem with the bandwidth scheduler 
and not with queueing.

Best wishes,
Arne

¹: The relevant network resource is upload bandwidth.
²: Source: observations from me and two other freenet users.

PS: How exactly the bandwidth limiter is fixed is an implementation detail. I 
think you are actually the only person who can judge how to do this most 
efficiently.

___
Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://produkte.web.de/go/toolbar


smime.p7s
Description: S/MIME Cryptographic Signature
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl