I think it's enough that the buffer space is close. The exact details of the amount of state stored for return path are fairly variable depending on specifics of the deployment. Plus, in all cases the amount of state required doesn't appear to be terribly significant compared to the space required to store the messages waiting for the hop-by-hop ACK.
On Thu, Dec 18, 2008 at 8:05 PM, Narayanan, Vidya <[email protected]> wrote: > I think the math on this needs to be revisited. Here's a shot at it: > > Assuming uniform distribution of objects and queries, an effective storage of > X*N is not valid for a functional overlay, simply because not every node will > route through a particular node. Rather, O(log N) would be the right order of > messages routed through a given node. If we think about X * (E+G), then G is > O(log n). However, at E >> O(log N), the overlay is not going to be > functional anyway and that is not the case under discussion. > Yes, X * N is worst case, though none of the rest of my numbers depended on that. Interesting question how large E can get here before other aspects of the overlay break down. > Sticking to your calculation model, the total amount of memory for storing > routing state would be Transaction ID + return connection + source Node ID + > timeout. For the source node ID field, you don't actually need to store all > the 128 bits, since you only maintain log N connections. You could just do > with log (N = 2500) bits ~ 12 ~ 2 bytes of information. This alone would > reduce the total storage cost to 18 bytes. Accounting for a 15second timer > (which, btw, I think is too long), this gives a total of 27K for message > state and around 8K for the NodeID counter. This leads us to a total of 35K > and even with 50% overhead each on the hash and data structure (which also > seems too high), to around 70K (compared to the 220K you indicate below). > You need the source node id when you're counting number of transactions you're storing per source, which is Salman's proposal. > Along the same lines, for the hop-by-hop reassembly approach, we have 5K + > 5*1500 = 12500 (12.5K) and with log N connections (~12 with N = 2500), we get > a total requirement of 150K. This is atleast twice as worse. Also, I am not > sure I understand the reasoning as to why there is 50% overhead in the first > case compared to no overhead for the hop-by-hop reassembly case. If no > overhead is taken into consideration, then the hop-by-hop reassembly would be > four times worse. Also, we have to include the via list in the calculation > for the hop-by-hop reassembly approach and this makes it worse. > Hash tables have relatively high overhead compared to the overhead for the reassembly buffers. > With this, I'm not seeing a case for per-hop reassembly of messages. In > fact, I don't even think there is a need to put an upper bound on the number > of in-progress messages a node can have at any given time. Also, if > anything, the buffer exhaustion attack that you seem to be worried about is > more feasible with per-hop reassembly. > I'm sorry, but I don't see anything above that leads to any of these conclusions. Why is it not necessary to bound the number of messages a peer sends through you? Wasn't that the method Salman proposed for detecting when you're under attack? How can a buffer exhaustion attack possibly apply to a technique with statically sized buffers? Ultimately, the motivation for hop-by-hop reassembly is that it is provably immune to this particular attack, with a slight latency penalty. Storing return path state is harder to bound the size of the state under "normal" conditions and requires probabilistic detection of attacks. False positives degrade overlay performance significantly. (Open to suggestions of other ways of detecting and mitigating attacks.) For an internet-scale protocol, I really prefer provably immune to probabilistic arguments with poor behaviors under unanticipated non-attack scenarios. I was actually one of the strong proponents of the original text for end-to-end reassembly only, but the need to store return path somewhere (in the message or on the peers) makes this a fundamentally different problem than internet routing. In most scenarios, the return state storage will work fine and have better performance. I'm just not convinced it has good failure behavior. If performance is the focus, the best performance that I can see is still to store return path state and fall back to reassembly under attack. Bruce > Vidya > >> -----Original Message----- >> From: Bruce Lowekamp [mailto:[email protected]] >> Sent: Wednesday, December 17, 2008 7:28 PM >> To: Salman Abdul Baset >> Cc: Narayanan, Vidya; [email protected] >> Subject: Re: [P2PSIP] RELOAD fragmentation >> >> On Wed, Dec 17, 2008 at 7:48 PM, Salman Abdul Baset >> <[email protected]> wrote: >> > >> >> FWIW, storing forwarding state vs appending to the via list is a >> >> long-standing option in the protocol. The only thing that is new is >> >> whether to do hop-by-hop reassembly. >> >> >> >> I'd be interested in seeing a proposal for a DoS mitigation >> technique >> >> on this type of attack that has nearly the effectiveness or >> simplicity >> >> of hop-by-hop reassembly. For that matter, I'd be interested in a >> DoS >> >> mitigation approach for storing return path state that is simpler >> than >> >> using reassembly as a fall-back strategy. >> > >> > I mentioned the solution in an earlier post: >> > http://www.ietf.org/mail-archive/web/p2psip/current/msg04541.html >> > >> > For return path state, one solution is to limit the inprogress >> messages from >> > a certain peer to a configured number. If the peer still sends more >> > messages, it can be informed to do ROUTE-QUERY, thereby preventing >> further >> > consumption of memory on the intermediary nodes. >> >> Looking at that idea, it provides a much simpler approach to dealing >> with excessive state than the always reassemble. But it has a pretty >> substantial cost, too. Let the max inprogress messages from a peer be >> X. Since any peer in the system can send messages through it, it >> effectively requires storage of X * N, where N is the number of peers >> in the system. You can argue that it should be X * E+G, where E is >> the number of evil peers and G is the number of good peers that have a >> legitimate need to route messages through this peer, but it's really >> hard to bound either of those numbers. >> >> There's also a problem that I think this will break exponentially >> spaced finger tables, such as used by chord. The first finger table >> entry will be used for 50% of a peer's originiated requests (assuming >> even distribution). I think it's going to be hard to set X high >> enough so you don't force a peer to have additional connections beyond >> its first finger table entry, but low enough to defend against DoS >> attacks. >> >> Going back to the size issue, I'm going to handwave at some numbers >> here. Let's say that X is 10 and we route 100 messages per second >> from 2500 peers (over the course of a few minutes). The current >> message timeout in the draft (needs to be changed) is 15 seconds. So >> you're storing state for 1500 messages at a typical time. For each >> message you're storing >> >> transaction id -> {return connection (could be an index in a table or >> a pointer to a dtls structure), source Node-ID , timeout } >> >> so that's 64 bits + 32/64 bits + 128 bits + 32 bits (for some >> systems), let's round it to 40 bytes and allow 50% overhead for the >> hashtable and data structure inefficiency, so 80 bytes per entry. >> >> That gives 120K dedicated to message state. >> >> Plus, we have to store counters for the number of messages we're >> storing for each peer. For this, you have a hashtable of >> >> Node-ID -> counter >> so 128 + 32, so 20 bytes goes to 40 bytes. That's 100K for this data >> structure. >> >> So essentially, 220K for the state and counters. Oh, and you need to >> sweep your first hashtable occasionally to prune out expired >> transactions, or introduce a queue for that. Seems implementation >> dependent whether memory or CPU/memory bandwidth is cheaper, though I >> would lean toward a separate queue structure, in which case we'll just >> round and say we're at 300K of data structures. >> >> Now keep in mind that X=10 is pretty low and will certainly cause >> problems with a peer that has us as their first finger table entry or >> a peer with a lot of (non-reload) clients behind it. >> >> Now, for comparison with hop-by-hop reassembly, let's say a peer has >> 20 active connections and the max message size for the overlay is 5K, >> and the max out-of-order is 5 fragments, with MTU=1500. So we need >> per connection buffers of 5K + 10*1500=12500. * 20 connections gives >> us 250K >> >> So the data structure sizes are similar. I'll call it dead even since >> these numbers can obviously be swung either way with slight tweaks of >> the parameters. >> >> A pro of the max-per-source approach is that it never slows messages >> for buffering. A con is that it has a tradeoff in that it essentially >> breaks exponentially spaced finger table routing for peers that have a >> legitimate need to exceed whatever bound X is used. >> >> The pro of the always reassemble approach is that it's rather simple >> to implement. It also only needs to be done when a smallish return >> path state table is full. The con is that it slows down messages by >> whatever the time it is to assemble the entire message. >> >> I prefer the always reassemble approach because it has fewer magic >> numbers and it doesn't impact the rest of the system so much. But we >> may need a more thorough comparison/writeup of the various approaches >> to decide. >> >> Bruce > _______________________________________________ P2PSIP mailing list [email protected] https://www.ietf.org/mailman/listinfo/p2psip
