> -----Original Message----- > From: Bruce Lowekamp [mailto:[email protected]] > Sent: Thursday, December 18, 2008 8:50 PM > To: Narayanan, Vidya > Cc: Salman Abdul Baset; [email protected] > Subject: Re: [P2PSIP] RELOAD fragmentation > > I think it's enough that the buffer space is close.
The point in my email is that it isn't. But, I agree that it isn't significant either way. >The exact details > of the amount of state stored for return path are fairly variable > depending on specifics of the deployment. Plus, in all cases the > amount of state required doesn't appear to be terribly significant > compared to the space required to store the messages waiting for the > hop-by-hop ACK. > > On Thu, Dec 18, 2008 at 8:05 PM, Narayanan, Vidya <[email protected]> > wrote: > > I think the math on this needs to be revisited. Here's a shot at it: > > > > Assuming uniform distribution of objects and queries, an effective > storage of X*N is not valid for a functional overlay, simply because > not every node will route through a particular node. Rather, O(log N) > would be the right order of messages routed through a given node. If we > think about X * (E+G), then G is O(log n). However, at E >> O(log N), > the overlay is not going to be functional anyway and that is not the > case under discussion. > > > > Yes, X * N is worst case, though none of the rest of my numbers > depended on that. Interesting question how large E can get here > before other aspects of the overlay break down. > When E gets large enough that the overlay operation gets disrupted, we have bigger problems than buffer space exhaustion. So, that is a question that will have several dimensions to it. > > > Sticking to your calculation model, the total amount of memory for > storing routing state would be Transaction ID + return connection + > source Node ID + timeout. For the source node ID field, you don't > actually need to store all the 128 bits, since you only maintain log N > connections. You could just do with log (N = 2500) bits ~ 12 ~ 2 bytes > of information. This alone would reduce the total storage cost to 18 > bytes. Accounting for a 15second timer (which, btw, I think is too > long), this gives a total of 27K for message state and around 8K for > the NodeID counter. This leads us to a total of 35K and even with 50% > overhead each on the hash and data structure (which also seems too > high), to around 70K (compared to the 220K you indicate below). > > > > You need the source node id when you're counting number of > transactions you're storing per source, which is Salman's proposal. > I don't believe that a node needs to be counting the number of transactions per source. That is not the only way a node can be under attack anyway - a misbehaving finger may route everything it gets through a given node irrespective of the destination - in which case, the source of the message tells you nothing. The only thing a node can realistically do is determine when it is serving beyond its capacity and then kick in some mitigation or backoff techniques. > > Along the same lines, for the hop-by-hop reassembly approach, we have > 5K + 5*1500 = 12500 (12.5K) and with log N connections (~12 with N = > 2500), we get a total requirement of 150K. This is atleast twice as > worse. Also, I am not sure I understand the reasoning as to why there > is 50% overhead in the first case compared to no overhead for the hop- > by-hop reassembly case. If no overhead is taken into consideration, > then the hop-by-hop reassembly would be four times worse. Also, we > have to include the via list in the calculation for the hop-by-hop > reassembly approach and this makes it worse. > > > > Hash tables have relatively high overhead compared to the overhead for > the reassembly buffers. > > > With this, I'm not seeing a case for per-hop reassembly of messages. > In fact, I don't even think there is a need to put an upper bound on > the number of in-progress messages a node can have at any given time. > Also, if anything, the buffer exhaustion attack that you seem to be > worried about is more feasible with per-hop reassembly. > > > > I'm sorry, but I don't see anything above that leads to any of these > conclusions. Why is it not necessary to bound the number of messages > a peer sends through you? Wasn't that the method Salman proposed for > detecting when you're under attack? I should have caught that before and clarified it - but, as I mentioned above, it is not an effective way of detecting an attack and I don't think it buys anything. > How can a buffer exhaustion > attack possibly apply to a technique with statically sized buffers? > I guess what you mean is buffers that don't grow with messages - but, this comes at a cost of preventing message interleaving, which is steep. However, even with that, you have to accommodate for a variable buffer size, given that you don't know what the size of the via list may be. You need to account for a malicious node continuing to send more and more fragments - which brings timers into the picture again, upon expiry of which, the fragments need to be purged. Also, if you want to allow interleaving, the problem gets worse. > Ultimately, the motivation for hop-by-hop reassembly is that it is > provably immune to this particular attack, with a slight latency > penalty. I don't think we can conclude that the impact to latency is "slight". I actually think it would be significant enough to be undesirable. > Storing return path state is harder to bound the size of the > state under "normal" conditions and requires probabilistic detection > of attacks. False positives degrade overlay performance > significantly. (Open to suggestions of other ways of detecting and > mitigating attacks.) > > For an internet-scale protocol, I really prefer provably immune to > probabilistic arguments with poor behaviors under unanticipated > non-attack scenarios. I'm not sure I understand the poor behavior in non-attack scenarios. A little more explanation around that sentence would help me. > I was actually one of the strong proponents of > the original text for end-to-end reassembly only, but the need to > store return path somewhere (in the message or on the peers) makes > this a fundamentally different problem than internet routing. > Not really. It isn't fundamentally different from LSRR in IPv4 or the Source Routing Header in IPv6. Of course, there are plenty of other issues with IPv6 Type 0 RH that are not relevant here :) > In most scenarios, the return state storage will work fine and have > better performance. I'm just not convinced it has good failure > behavior. If performance is the focus, the best performance that I > can see is still to store return path state and fall back to > reassembly under attack. > I don't think the failure behaviors are that different and I really think we are trying to find a solution for an attack in one particular scenario, while the attack can be launched in several other ways. The performance differences in my mind make the end-to-end reassembly the best option. Vidya > Bruce > > > > > Vidya > > > >> -----Original Message----- > >> From: Bruce Lowekamp [mailto:[email protected]] > >> Sent: Wednesday, December 17, 2008 7:28 PM > >> To: Salman Abdul Baset > >> Cc: Narayanan, Vidya; [email protected] > >> Subject: Re: [P2PSIP] RELOAD fragmentation > >> > >> On Wed, Dec 17, 2008 at 7:48 PM, Salman Abdul Baset > >> <[email protected]> wrote: > >> > > >> >> FWIW, storing forwarding state vs appending to the via list is a > >> >> long-standing option in the protocol. The only thing that is new > is > >> >> whether to do hop-by-hop reassembly. > >> >> > >> >> I'd be interested in seeing a proposal for a DoS mitigation > >> technique > >> >> on this type of attack that has nearly the effectiveness or > >> simplicity > >> >> of hop-by-hop reassembly. For that matter, I'd be interested in > a > >> DoS > >> >> mitigation approach for storing return path state that is simpler > >> than > >> >> using reassembly as a fall-back strategy. > >> > > >> > I mentioned the solution in an earlier post: > >> > http://www.ietf.org/mail-archive/web/p2psip/current/msg04541.html > >> > > >> > For return path state, one solution is to limit the inprogress > >> messages from > >> > a certain peer to a configured number. If the peer still sends > more > >> > messages, it can be informed to do ROUTE-QUERY, thereby preventing > >> further > >> > consumption of memory on the intermediary nodes. > >> > >> Looking at that idea, it provides a much simpler approach to dealing > >> with excessive state than the always reassemble. But it has a > pretty > >> substantial cost, too. Let the max inprogress messages from a peer > be > >> X. Since any peer in the system can send messages through it, it > >> effectively requires storage of X * N, where N is the number of > peers > >> in the system. You can argue that it should be X * E+G, where E is > >> the number of evil peers and G is the number of good peers that have > a > >> legitimate need to route messages through this peer, but it's really > >> hard to bound either of those numbers. > >> > >> There's also a problem that I think this will break exponentially > >> spaced finger tables, such as used by chord. The first finger table > >> entry will be used for 50% of a peer's originiated requests > (assuming > >> even distribution). I think it's going to be hard to set X high > >> enough so you don't force a peer to have additional connections > beyond > >> its first finger table entry, but low enough to defend against DoS > >> attacks. > >> > >> Going back to the size issue, I'm going to handwave at some numbers > >> here. Let's say that X is 10 and we route 100 messages per second > >> from 2500 peers (over the course of a few minutes). The current > >> message timeout in the draft (needs to be changed) is 15 seconds. > So > >> you're storing state for 1500 messages at a typical time. For each > >> message you're storing > >> > >> transaction id -> {return connection (could be an index in a table > or > >> a pointer to a dtls structure), source Node-ID , timeout } > >> > >> so that's 64 bits + 32/64 bits + 128 bits + 32 bits (for some > >> systems), let's round it to 40 bytes and allow 50% overhead for the > >> hashtable and data structure inefficiency, so 80 bytes per entry. > >> > >> That gives 120K dedicated to message state. > >> > >> Plus, we have to store counters for the number of messages we're > >> storing for each peer. For this, you have a hashtable of > >> > >> Node-ID -> counter > >> so 128 + 32, so 20 bytes goes to 40 bytes. That's 100K for this > data > >> structure. > >> > >> So essentially, 220K for the state and counters. Oh, and you need > to > >> sweep your first hashtable occasionally to prune out expired > >> transactions, or introduce a queue for that. Seems implementation > >> dependent whether memory or CPU/memory bandwidth is cheaper, though > I > >> would lean toward a separate queue structure, in which case we'll > just > >> round and say we're at 300K of data structures. > >> > >> Now keep in mind that X=10 is pretty low and will certainly cause > >> problems with a peer that has us as their first finger table entry > or > >> a peer with a lot of (non-reload) clients behind it. > >> > >> Now, for comparison with hop-by-hop reassembly, let's say a peer has > >> 20 active connections and the max message size for the overlay is > 5K, > >> and the max out-of-order is 5 fragments, with MTU=1500. So we need > >> per connection buffers of 5K + 10*1500=12500. * 20 connections > gives > >> us 250K > >> > >> So the data structure sizes are similar. I'll call it dead even > since > >> these numbers can obviously be swung either way with slight tweaks > of > >> the parameters. > >> > >> A pro of the max-per-source approach is that it never slows messages > >> for buffering. A con is that it has a tradeoff in that it > essentially > >> breaks exponentially spaced finger table routing for peers that have > a > >> legitimate need to exceed whatever bound X is used. > >> > >> The pro of the always reassemble approach is that it's rather simple > >> to implement. It also only needs to be done when a smallish return > >> path state table is full. The con is that it slows down messages by > >> whatever the time it is to assemble the entire message. > >> > >> I prefer the always reassemble approach because it has fewer magic > >> numbers and it doesn't impact the rest of the system so much. But > we > >> may need a more thorough comparison/writeup of the various > approaches > >> to decide. > >> > >> Bruce > > _______________________________________________ P2PSIP mailing list [email protected] https://www.ietf.org/mailman/listinfo/p2psip
