On Thu, Aug 10, 2006 at 11:43:32AM -0700, Ian Clarke wrote:
> Why use a median, rather than a mean?
>
> I'm concerned about this, it increases (yet again) the complexity of
> our load balancing, it hasn't been simulated, and I think there will
> inevitably be unpredictable results. For example, what happens if
> *all* nodes have an AcceptedFrac below the median? This mechanism
> will just spiral down until all nodes are hardly accepting anything.
The current mechanism is spiraling down, and nodes are accepting hardly
anything. The median is defined as the middle value if you order the
values, so half the nodes will be above it and half below it. So we'll
always be able to send as many requests as we want to half the nodes.
The reason for the current lack of performance is overcompensation for
load, as evidenced by the low bandwidth usage. To be precise, it's
because backoff is used as a load limiting mechanism and not just a load
balancing mechanism.
>
> We can *not* afford to fall back into the trap of adding more and
> more complexity to the point that nobody understands what is going
> on, especially in the area of load balancing which is one of the
> areas that is most difficult to get our heads around.
The theory is simple: We have propagation of overload back to the
original senders. They will slow down if they get lots of timeouts and
forwarded RejectedOverload's. That's load *limiting*. The job of load
*balancing* is to prevent all our traffic being routed to slow nodes
which then generate lots of RejectedOverload's and slow down the whole
network. We could take out load balancing completely, but that might
result in a significant slowdown due to slow nodes. What's even better
is that this scheme has fewer arbitrary constants than backoff.
>
> I really feel that we lack a "theory" of load balancing, a simple
> conceptual model through which we can think about the problem and
> find a solution. Rather, we come up with idea after idea based on
> vague and invariably flawed notions and assumptions, layering one on
> top of the other until nobody understands what is happening.
Load limiting: Sender side, determining how fast to send requests into
the network. If there are too many RejectedOverload's, the sender slows
down. This will ensure the network is not overloaded, full stop. It
works for TCP/IP, it should work for us.
Load limiting without load balancing: If there are slow nodes near the
sender, and we send these an equal proportion of our incoming requests
(according to their location), then most of those requests will be
rejected, and this results in an overall slowdown.
Load balancing: Routing side. The purpose is to determine which nodes to
send requests to, in order that a) our requests are reasonably fast, and
b) we don't send too many requests to really slow nodes, causing an
overall slowdown via load limiting. Load balancing can legitimately be
"selfish" i.e. it can send a request even if all available nodes are
slow, because load LIMITING will reduce the overall send rate if the
overall network is slow; that's not load balancing's problem. Not being
"selfish" in this sense is a threat to anonymity (all nodes backed off
except one), usability (all nodes backed off), and performance, and is
not justified. TCP/IP runs over more than ethernet, and I don't think
ethernet collision detection is a viable model for backoff.
Conceptually this isn't a difficult model. And I don't think it could
possibly be any worse than current performance; right now we are plainly
overcompensating for load, the reason for this is backoff, which is a
means for both limiting and balancing load, when it should be solely a
balancing mechanism; we have a better limiting mechanism already.
>
> We must place a moratorium on all but the most trivial "improvements"
> to load balancing unless they have been simulated.
That could be another month. Michael has rightly insisted on doing a
fairly low level simulation, which has resulted in us rearchitecting a
lot of the lower layers of Freenet; this is a good thing, but it takes
time. Token passing is a substantial change, which we should not deploy
without simulation. It should solve our current problems, but if it
takes another month for it to be ready, given the current situation of
massive overcompensation for load resulting in universally low bandwidth
usage, lots of backoff, and really slow inserts, it seems reasonable to
me to try the above. Actually I don't think that load limiting with no
backoff would do too badly, but it would be rather sensitive to modem
nodes etc.
>
> Ian.
>
> On 9 Aug 2006, at 07:30, Matthew Toseland wrote:
>
> >What is backoff for? It's not to prevent overloading nodes; nodes
> >reject
> >requests pre-emptively. What it is for is this:
> >
> >If we don't backoff slow nodes, then these slow nodes will have to
> >reject (or occasionally timeout) the requests that we do forward to
> >them. Those failures will be propagated back to the request
> >senders, who
> >will have to reduce their send rate. In other words, if we don't avoid
> >sending requests to the slowest nodes, the requests we do send to them
> >will be rejected, and those rejections will cause a universal slowdown
> >in request rates.
> >
> >Thus, I don't think that the current mechanism is sensible. Nodes
> >can be
> >expected to take care of themselves, now that we have pre-emptive
> >rejection. I don't think the ethernet analogy works 100%; we don't
> >really have a shared medium.
> >
> >What we are looking at here is simply a way to ensure that requests
> >are
> >not forwarded to nodes who will inevitably reject them and thus
> >cause an
> >overall slowdown at the requester end. This is completely different to
> >the case in 0.5, where requesters never slowed down.
> >
> >So what am I proposing?
> >
> >For each node, we track the proportion of requests which are accepted.
> >We calculate a median. For any node above the median, all requests are
> >forwarded to it: if most of our nodes are slow, that's not our
> >problem,
> >it's our peers who want to forward requests through us who have to
> >deal
> >with it (by not sending requests to us). For nodes below the
> >median, we
> >artificially narrow their specializations:
> >
> >While routing:
> >
> >public double distance(PeerNode node, double target) {
> >
> >double distance;
> >if(node.acceptedFrac >= medianAcceptedFrac) {
> > distance = rawDistance(node.location, target);
> >} else {
> > distance = rawDistance(node.location, target) *
> > medianAcceptedFrac / node.acceptedFrac;
> >}
> >return distance;
> >}
> >
> >
> >Now, to some degree, this is alchemy, in the sense that it hasn't been
> >simulated. However it will be some time before the new rate limiting
> >mechanism is available, and it appears a solid argument to me: If all
> >our peers are slow, that DOES NOT mean we shouldn't send requests to
> >them. Because we have load propagation, it is entirely legitimate to
> >send requests ANYWAY, and let upstream nodes deal with the slowness -
> >either the entire network is slow, in which case we reduce all the
> >send rates, or this particular node is slow, in which case we reduce
> >the number of requests sent to it! Note also that slavishly following
> >backoffs reduces our anonymity quite noticeably.
> >
> >Comments?
> >--
> >Matthew J Toseland - toad at amphibian.dyndns.org
> >Freenet Project Official Codemonkey - http://freenetproject.org/
> >ICTHUS - Nothing is impossible. Our Boss says so.
>
> Ian Clarke: Co-Founder & Chief Scientist Revver, Inc.
> phone: 323.871.2828 | personal blog - http://locut.us/blog
>
--
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL:
<https://emu.freenetproject.org/pipermail/tech/attachments/20060810/11407218/attachment.pgp>