On Tuesday 15 July 2008 14:04, Matthew Toseland wrote:
> On Tuesday 15 July 2008 11:12, Theodore Hong wrote:
> > I think that there are actually two distinct though related issues
> > going on here, which are downtime and polling.
> > 
> > Downtime means that the data you want is in the network, but you can't
> > get to it right now because one of the nodes that the request needs to
> > go through isn't up right now (e.g. a gateway out of a darknet).
> > 
> > Polling means that the data you want isn't in the network yet, but
> > you'd like to get it sometime in the future when it is inserted.
> > 
> > 
> > DOWNTIME
> > 
> > The downtime problem is related to delay-tolerant networking -
> > http://en.wikipedia.org/wiki/Delay_Tolerant_Networking
> > http://www.dtnrg.org/wiki
> > 
> > One way to think about it is to think about routing in time as well as
> > space.  Suppose you want to send a message along the route A-B-C, but
> > A-B is only up on Monday and B-C is only up on Tuesday.  You can
> > complete the route by holding the message at B for a day - effectively
> > 'routing' between B_on_Monday and B_on_Tuesday.
> > 
> > The practical implication is that if your first-choice destination is
> > down, sometimes it may be better to hold the request for a while and
> > wait for it to come back up, rather than immediately rerouting to your
> > second-choice destination.  Or, you route to the second-choice
> > destination, but if it fails, you hold onto it.  When the first-choice
> > destination comes back up, it triggers a retry - a sort of "active"
> > passive request.  (A terrible term - perhaps "persistent request" or
> > "delay-tolerant request"?)
> > 
> > 
> > POLLING
> > 
> > Polling, on the other hand, is related to publish-subscribe:
> > http://en.wikipedia.org/wiki/Publish/subscribe
> > 
> > You set up a subscription that registers your interest in some key(s),
> > together with a route showing where to find you later on.  When the
> > data is later inserted, it gets pushed to you.
> > 
> > Here, the trigger event is data arriving, rather than a node coming
> > up.  There is no further forward routing, only sending data back along
> > an already-existing route.  If the route to the subscriber no longer
> > exists, that's another problem, but in this case I think we should
> > just drop it rather than try to find the subscriber.  If s/he really
> > wants it, s/he can just request it again.
> > 
> > Thoughts?
> > theo
> 
> Okay, there are two problems, but there may be some overlap in the 
solutions. 
> First I will explain why I think we need persistent requests for polling, I 
> would very much appreciate some input on this.
> 
> CURRENT SITUATION:
> 
> We already have a form of unreliable polling. Ultra-lightweight persistent 
> requests last for 1 hour, are dropped if the connection is lost, are 
> established automatically after any request, and propagate the data back to 
> the originator and any other subscribers when it is found. This was 
> implemented shortly before 0.7. The main motivation was increasing worry 
> about the load that FMS is likely to put on the network. FMS is a 
> usenet-style chat system which uses outbox polling to prevent spam. An 
> earlier system, Frost, uses a global outbox and has been thoroughly DoS'ed.
> 
> An important side-issue is that of many nodes constantly polling for big 
files 
> that have fallen out of the network. At the same time as implementing ULPRs, 
> we (I) also implemented per-node failure tables, so that every request for 
> the same key (within some period) will be routed slightly differently 
(within 
> some period). Before this, we were constantly sending requests for data we 
> can't find; now we send 3 requests every half hour, on the hope that 1) the 
3 
> requests will go to slightly different paths, and 2) if the data is found 
> during the half hour pause, it *may* be propagated back to us via ULPRs. 
Both 
> big file downloads and FMS's polling use the same mechanism.
> 
> Other polling apps are under development: batosai's web of trust plugin 
works 
> on the same principles as FMS but is incompatible and may see wider use for 
> filesharing apps and so on. And there is a real-time chat client which seems 
> to work (for small numbers so far), and which presumably polls one or more 
> queues constantly.
> 
> The second phase for ULPRs was going to be, and still may be, to limit the 
> number of requests going through any single node for any given key, on the 
> basis that the data will be propagated back to the requestors via ULPRs when 
> it is found. This should have a significant impact on network 
> load/efficiency.
> 
> PASSIVE REQUESTS:
> 
> Polling is inefficient and causes a lot of network load, especially if a 
large 
> proportion of the network are polling for an overlapping large number of 
keys 
> (as with FMS). Also, many applications will not be happy when a ULPR fails 
> because a connection is lost, and they aren't told about it (because ULPRs 
> don't do that), and their data is delayed by half an hour. This is almost 
> tolerable for FMS but even there it's annoying. For audio streaming and 
> certainly for real time chat it's not tolerable (admittedly real time chat 
> probably isn't something we want to encourage, since it will likely be very 
> inefficient in any case as even SSKs are 1KB payload).
> 
> Full passive requests would enable a node to subscribe to a key, and not 
have 
> to worry about polling it. The network would be responsible for polling, and 
> it could do it far more efficiently than the client could, at least for keys 
> being requested by many nodes. Passive requests would not arbitrarily fail 
> and not tell the originator: they would be automatically rerouted when a 
> connection is lost or a better route is found. This could be reasonably 
> efficient bandwidth-wise, since we have a large number of passive requests 
> effectively queued, we can send a whole bunch of them in a single packet 
when 
> something changes. Passive requests would include some form of coalescing, 
of 
> course, forming a big tree if a key is popular. When the data is inserted, 
it 
> can be propagated to all subscribers very fast. All this would of course 
> require different load management: since a passive request has ongoing 
> overhead, each node's subscriptions would have to be limited, probably in 
> relation to its connectedness to the rest of the network to prevent an 
> attacker from doing DoS attacks. There would probably also be network level 
> priorities, which again would present some load management issues. Also, if 
> the capacity is available, we would be able to have a lot more passive 
> requests than we can currently have ULPRs, possibly persisting them on disk.
> 
> The end user impact of all of this is that apps such as FMS should both have 
> much lower latency and much less impact on the network, even as it scales 
up, 
> and applications such as audio streaming will be feasible without having to 
> have a 30-60 minutes delay.
> 
> Some open questions:
> 
> 1. How many passive requests can we afford? Will churn make them very costly 
> for non-popular keys? Maybe...
> 
> 2. Can we mitigate this, perhaps by not always rerouting immediately? What 
> would the performance cost be? It could be significant... maybe make it 
> related to the priority/the number of subscribers/etc?
> 
> 3. What is the impact on the other main source of network load/inefficiency: 
> Clients polling big files that have mostly fallen out of the network ? ULPRs 
> throttle this to some degree, although for big files ULPRs won't be much 
help 
> as a single block will be rerequested less than once every hour; per-node 
> failure tables mean that every retry will go somewhere different, improving 
> the chances of finding data that isn't where it should be. Clients may want 
> to subscribe to the keys, but at a lower priority than their FMS traffic, 
and 
> explicitly poll to try to do an exhaustive search as well - it's a question 
> of how much we limit both of these strategies. If we could assume that 
> routing works and that data is always where it should be, and that passive 
> requests are low overhad, we'd simply subscribe to the key (at least for 
> popular keys), and block all further requests for it on the grounds that 
when 
> we find it we'll return it to all requestors. But this may not be a feasible 
> assumption, unless we automatically migrate data when locations are swapped, 
> which might cause a lot of load...

Okay, now, lets look at downtime in this context:

At first glance, passive requests would seem to closely match your "delay 
tolerant requests". However, for it to really fly, we would probably want the 
ability to subscribe through a node whose connection then goes offline, and 
have it, and the rest of the network, do the work of keeping the subscription 
up, and transferring the data, so that when we come back up we can have it 
fast. (Obviously the first implementation of passive requests shouldn't do 
this, and it shouldn't happen on opennet). The problem with this is what 
happens if a new connection comes up, which is a better target than our 
existing subscription? We can't abandon our old subscription unless we can 
get a message to the involved nodes to that effect. We could attempt a 
provisional rerouting, which is only completed if we manage to hook onto the 
old chain and cancel it, whether or not some nodes on the chain are 
offline... What are the advantages to having non-persistent delay tolerant 
requests, which seem to be the other option? Lower network resources in 
exchange for having to poll, I suppose is the answer...

Two other big questions:

Inserts:

Passive inserts probably would be a bad idea. We need an insert to complete at 
some stage; requests complete when they find the data, or the originator gets 
bored of them and wants to reuse the quota / disconnects (either at all or 
permanently or for some period). But inserts are only limited by HTL. Also, 
it would leave a trail of breadcrumbs which a hideously powerful attacker 
(able to compromize nodes at will) would be able to follow; passive requests 
do this too to some degree, of course, but it's more important for inserts.

Delay tolerant inserts? Might be an option... we know where we want to go, we 
know when it's likely to be online again... but when we've inserted, the 
insert is finished.

Darknet merging:

Not all connection issues are caused by existing connections going down and 
coming back up regularly. Adding new connections to connect two darknets 
together also has many issues. Passive requests would be perfectly happy with 
such a situation, but inserts might not be; and there are likely to be 
opportunities for attack: if you know that you are the only link to the other 
darknet, you can probe all their datastores on connection for any keys you 
are interested in...
> > 
> > 
> > 2008/7/14 Matthew Toseland <toad at amphibian.dyndns.org>:
> > > 1. Many people have proposed over the years that we have a "bulk" flag 
> which
> > > can be set when the timing of a request is less important (e.g. for 
> splitfile
> > > fetches), or a priority class for a request which is visible at the 
> network
> > > layer. I have always opposed this mostly because it makes traffic 
> profiling
> > > slightly easier and any sort of priority scheme would need careful 
> regulation
> > > to prevent race-to-the-top.
> > >
> > > 2. Long-term, and in particularly nasty places, Freenet will have to be 
> mostly
> > > darknet, because it is much easier to attack opennet nodes, or to block 
> them
> > > in bulk. One of the biggest practical problems with a pure darknet is 
the
> > > 24/7 issue: more people have laptops than have real PCs nowadays, and 
this
> > > trend is likely to continue and accelerate, but even if people have a 
> desktop
> > > PC, many users won't run it 24x7 for various reasons: power consumption,
> > > noise, security (with encrypted disk, do you want to leave it 
> unattended?),
> > > etc etc. Fanless home server appliances might be able to run 24x7, but 
> that
> > > means additional expenditure to buy them.
> > >
> > > 3. FMS, even more than Frost, makes heavy use of SSK polling, and this 
is
> > > likely to expand as the network grows and FMS becomes more newbie 
> friendly.
> > > Also various innovative applications require fast propagation of data 
once
> > > inserted (although there are frequently security issues with this). And
> > > widely-wanted data which is hard to find can be effectively polled by 
much 
> of
> > > the network, causing excessive load.
> > >
> > > 4. The solution to SSK polling etc is some form of passive requests. In 
> 0.7,
> > > we have ultra-lightweight passive requests, which are a very limited and
> > > unreliable mechanism but nonetheless should help significantly. The 
basic
> > > principle of ULPRs is that once a request completes, each node on the 
> network
> > > remembers who wants the data and who it has asked for it, for a short 
> time,
> > > without making any effort to reroute if connections are lost; if the 
data 
> is
> > > found it is propagated quickly to everyone who wants it.
> > >
> > > 5. True passive requests (0.9) would be a mechanism whereby a node could 
> send
> > > out a request, which once it failed would be remembered permanently, 
> subject
> > > to a (long) timeout and/or periodic renewal from the originator. It 
would 
> be
> > > automatically rerouted if the network topology changes. Passive requests
> > > would introduce a number of new technical challenges such as load 
> management
> > > for persistent requests, evaluating a peer's competence in performing 
> them,
> > > and so on, but they could greatly reduce the cost of SSK polling,
> > > rerequesting common but absent data, and enable such things as medium
> > > bandwidth high latency publish/subscribe for for example audio streams.
> > > Passive requests would probably have to have a priority level setting. 
> It's a
> > > big job, but a big prize...
> > >
> > > 6. Passive requests would go a long way to solving the uptime problem. 
Say 
> you
> > > have a small darknet, say 5 nodes. Its nodes are only online during 
> evenings
> > > local time. Its only connection to the outside world is through one node
> > > which is connected to two of the small darknet, which is only online on
> > > Thursdays. Right now, except on Thursdays, the network would be 
> essentially a
> > > leaf network: our real-time routing assumes that the network is fully
> > > connected. Most data will be very difficult to obtain. Real-time routing
> > > requires real-time load balancing, which means that all the nodes would
> > > request whatever it is they want constantly, generating load to no good
> > > purpose, except on Thursdays when the requests would get through, but
> > > severely limited by load management, and by the fact that more than one 
of
> > > the small darknet may be asking for the same file. So on Thursdays, some
> > > progress would be made, but often not very much.
> > >
> > > Now, with true passive requests, things can be very different. From the 
> user's
> > > point of view the semantics are essentially the same: they click a link, 
> it
> > > gets a DNF (fairly quickly), and they click the button to queue it to 
the
> > > global queue; some time later, they get a notification that the content 
is
> > > available. But performance could be much higher. If a node requests a 
> block
> > > while the network is "offline", the request will propagate to all 5 
nodes,
> > > and then sit there waiting for something to happen. When we connect to 
the
> > > wider network, the request is immediately rerouted to the node that just
> > > connected (either because it's a better route, or because there are 
spare
> > > hops). It propagates, fails, and is stored as a passive request on the 
> wider
> > > network, hopefully reaching somewhere near the optimal node for the key. 
> When
> > > the link is lost, both sides remember the other, so when/if the data is 
> found
> > > on the wider network, it is propagated back to the originator. 
> Furthermore,
> > > the load management would be optimised for passive requests: when the 
> small
> > > network connects, it can immediately send a large number of passive 
> requests
> > > for different blocks of the same file or for different files. These are 
> not
> > > real-time requests, because they have already failed and turned into 
> passive
> > > requests; so they can be trickled out at whatever rate the recipient 
sees
> > > fit. Also, they are not subject to the anti-polling measures we have
> > > introduced: Polling a key in 0.7 means requesting it 3 times, sleeping 
for
> > > half an hour, and repeating ad infinitum. Further similar measures may 
> need
> > > to be introduced at the node level to try to deal with increasing load 
> caused
> > > by FMS, but because we reroute on getting a connection, we can 
immediately
> > > route the requests. When we reconnect, hopefully our peer will have 
found
> > > most of the data we requested and can transfer it at link speed (or 
> whatever
> > > limit may be imposed for security reasons). The transfer might take 
longer
> > > than the intersection, but I expect the whole system will be 
significantly
> > > faster than it would be now. It's even better if you have more than two
> > > network fragments: on a large darknet you might have subnetworks coming
> > > online and going offline constantly, so that you never actually have a 
> fully
> > > online network. Passive requests would happily search out every relevant 
> nook
> > > and cranny of the network.
> > >
> > > Note that much of this is only feasible on darknet, because of the trust
> > > connection: on opennet, passive requests probably will have to last only 
> as
> > > long as the connection is open, and bulk transfer of passive requests is
> > > certainly not feasible on opennet.
> > >
> > > With regards to security, it may be possible to determine whether an FMS
> > > poster (for example) is on the local network, if you know when his posts 
> come
> > > in. This is of course feasible now on such a topology, but on the other 
> hand
> > > if nobody uses it because it's unusable, there's no threat. Passive 
> requests
> > > would probably make it a little easier. Some form of tunneling, 
preferably
> > > with long client controlled delays for inserts, might help to solve it, 
> but
> > > we would have to have a way of determining that the network is too small 
> to
> > > provide useful anonymity.
> > >
> > > 7. Even longer term, many ISPs and countries may deploy traffic flow 
> analysis
> > > hardware to identify and block all (unlicensed?) peer to peer 
networking. 
> The
> > > only way to beat traffic flow analysis is to not send data continually 
> over
> > > the internet. The obvious ways to do this are:
> > > 1. Parasitic transports: Steal the video stream in a VoIP call to a 
> friend.
> > > Note that if VoIP calls are rare this won't work well, and if they are
> > > artificially common to speed downloads up that will probably be 
> detectable.
> > > 2. Fake timing: Make the transports look like e.g. a private gaming 
> server,
> > > and fake the timing based on statistical models. This is classic stego. 
> It's
> > > a race between you and your opponent for whoever has the better model. 
> Given
> > > that your opponent may store traffic data indefinitely and not act
> > > immediately, this is very dangerous...
> > > 3. Wifi etc, non-internet constant data transfer.
> > > 4. Sneakernet and physical-rendezvous-based protocols (the latter 
working 
> on
> > > the Freenet threat model assumptions, so still a form of darknet, rather 
> than
> > > Haggle's free-for-all-networking system which is also interesting but 
IMHO
> > > dangerous in the long term, and certainly isn't Freenet).
> > >
> > > Passive requests are again the right tool to deal with this IMHO. They 
> lend
> > > themselves to efficient stream subscriptions, and also enable long-term
> > > downloads without assuming a fully connected network at all times. Of 
> course
> > > for the very high latency options (sneakernet), there are other 
> challenges,
> > > such as how to assign locations without being able to swap continually. 
> But
> > > for medium latency caused by sparse darknets not often having many nodes
> > > online simultaneously, and for transports which have the same effect 
> despite
> > > nodes' host computers actually being switched on, it should work well.
> > >
> > > Comments? Am I spouting nonsense? :) Apologies for the length of this 
> mail,
> > > it's a somewhat complex subject!
> > >
> > > Given the enormous implications, maybe we should postpone passive 
requests 
> to
> > > post 1.0 ... but I'm worried that FMS, and polling in general, may force 
> our
> > > hand. It's likely that there will be different parts of it implemented 
> over
> > > different versions of Freenet in any case...
> > >
> > > _______________________________________________
> > > Tech mailing list
> > > Tech at freenetproject.org
> > > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech
> > >
> > _______________________________________________
> > Tech mailing list
> > Tech at freenetproject.org
> > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech
> > 
> > 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20080715/a92b4f6c/attachment.pgp>

Reply via email to