Hi,
Is it right that Preventing RELOAD fragmentation is acturally to prevent IP
fragmentation?
If via-list is so long that the RELOAD message is bigger than MTU,
is it possible to have an short identifer represented as via-list aftering
routing across overlay?
Regards,
Gengyu
----- Original Message -----
From: "Bruce Lowekamp" <[email protected]>
To: "Narayanan, Vidya" <[email protected]>
Cc: "Salman Abdul Baset" <[email protected]>; <[email protected]>
Sent: Friday, December 19, 2008 12:50 PM
Subject: Re: [P2PSIP] RELOAD fragmentation
I think it's enough that the buffer space is close. The exact details
of the amount of state stored for return path are fairly variable
depending on specifics of the deployment. Plus, in all cases the
amount of state required doesn't appear to be terribly significant
compared to the space required to store the messages waiting for the
hop-by-hop ACK.
On Thu, Dec 18, 2008 at 8:05 PM, Narayanan, Vidya <[email protected]>
wrote:
I think the math on this needs to be revisited. Here's a shot at it:
Assuming uniform distribution of objects and queries, an effective
storage of X*N is not valid for a functional overlay, simply because not
every node will route through a particular node. Rather, O(log N) would
be the right order of messages routed through a given node. If we think
about X * (E+G), then G is O(log n). However, at E >> O(log N), the
overlay is not going to be functional anyway and that is not the case
under discussion.
Yes, X * N is worst case, though none of the rest of my numbers
depended on that. Interesting question how large E can get here
before other aspects of the overlay break down.
Sticking to your calculation model, the total amount of memory for
storing routing state would be Transaction ID + return connection +
source Node ID + timeout. For the source node ID field, you don't
actually need to store all the 128 bits, since you only maintain log N
connections. You could just do with log (N = 2500) bits ~ 12 ~ 2 bytes
of information. This alone would reduce the total storage cost to 18
bytes. Accounting for a 15second timer (which, btw, I think is too long),
this gives a total of 27K for message state and around 8K for the NodeID
counter. This leads us to a total of 35K and even with 50% overhead each
on the hash and data structure (which also seems too high), to around 70K
(compared to the 220K you indicate below).
You need the source node id when you're counting number of
transactions you're storing per source, which is Salman's proposal.
Along the same lines, for the hop-by-hop reassembly approach, we have 5K
+ 5*1500 = 12500 (12.5K) and with log N connections (~12 with N = 2500),
we get a total requirement of 150K. This is atleast twice as worse. Also,
I am not sure I understand the reasoning as to why there is 50% overhead
in the first case compared to no overhead for the hop-by-hop reassembly
case. If no overhead is taken into consideration, then the hop-by-hop
reassembly would be four times worse. Also, we have to include the via
list in the calculation for the hop-by-hop reassembly approach and this
makes it worse.
Hash tables have relatively high overhead compared to the overhead for
the reassembly buffers.
With this, I'm not seeing a case for per-hop reassembly of messages. In
fact, I don't even think there is a need to put an upper bound on the
number of in-progress messages a node can have at any given time. Also,
if anything, the buffer exhaustion attack that you seem to be worried
about is more feasible with per-hop reassembly.
I'm sorry, but I don't see anything above that leads to any of these
conclusions. Why is it not necessary to bound the number of messages
a peer sends through you? Wasn't that the method Salman proposed for
detecting when you're under attack? How can a buffer exhaustion
attack possibly apply to a technique with statically sized buffers?
Ultimately, the motivation for hop-by-hop reassembly is that it is
provably immune to this particular attack, with a slight latency
penalty. Storing return path state is harder to bound the size of the
state under "normal" conditions and requires probabilistic detection
of attacks. False positives degrade overlay performance
significantly. (Open to suggestions of other ways of detecting and
mitigating attacks.)
For an internet-scale protocol, I really prefer provably immune to
probabilistic arguments with poor behaviors under unanticipated
non-attack scenarios. I was actually one of the strong proponents of
the original text for end-to-end reassembly only, but the need to
store return path somewhere (in the message or on the peers) makes
this a fundamentally different problem than internet routing.
In most scenarios, the return state storage will work fine and have
better performance. I'm just not convinced it has good failure
behavior. If performance is the focus, the best performance that I
can see is still to store return path state and fall back to
reassembly under attack.
Bruce
Vidya
-----Original Message-----
From: Bruce Lowekamp [mailto:[email protected]]
Sent: Wednesday, December 17, 2008 7:28 PM
To: Salman Abdul Baset
Cc: Narayanan, Vidya; [email protected]
Subject: Re: [P2PSIP] RELOAD fragmentation
On Wed, Dec 17, 2008 at 7:48 PM, Salman Abdul Baset
<[email protected]> wrote:
>
>> FWIW, storing forwarding state vs appending to the via list is a
>> long-standing option in the protocol. The only thing that is new is
>> whether to do hop-by-hop reassembly.
>>
>> I'd be interested in seeing a proposal for a DoS mitigation
technique
>> on this type of attack that has nearly the effectiveness or
simplicity
>> of hop-by-hop reassembly. For that matter, I'd be interested in a
DoS
>> mitigation approach for storing return path state that is simpler
than
>> using reassembly as a fall-back strategy.
>
> I mentioned the solution in an earlier post:
> http://www.ietf.org/mail-archive/web/p2psip/current/msg04541.html
>
> For return path state, one solution is to limit the inprogress
messages from
> a certain peer to a configured number. If the peer still sends more
> messages, it can be informed to do ROUTE-QUERY, thereby preventing
further
> consumption of memory on the intermediary nodes.
Looking at that idea, it provides a much simpler approach to dealing
with excessive state than the always reassemble. But it has a pretty
substantial cost, too. Let the max inprogress messages from a peer be
X. Since any peer in the system can send messages through it, it
effectively requires storage of X * N, where N is the number of peers
in the system. You can argue that it should be X * E+G, where E is
the number of evil peers and G is the number of good peers that have a
legitimate need to route messages through this peer, but it's really
hard to bound either of those numbers.
There's also a problem that I think this will break exponentially
spaced finger tables, such as used by chord. The first finger table
entry will be used for 50% of a peer's originiated requests (assuming
even distribution). I think it's going to be hard to set X high
enough so you don't force a peer to have additional connections beyond
its first finger table entry, but low enough to defend against DoS
attacks.
Going back to the size issue, I'm going to handwave at some numbers
here. Let's say that X is 10 and we route 100 messages per second
from 2500 peers (over the course of a few minutes). The current
message timeout in the draft (needs to be changed) is 15 seconds. So
you're storing state for 1500 messages at a typical time. For each
message you're storing
transaction id -> {return connection (could be an index in a table or
a pointer to a dtls structure), source Node-ID , timeout }
so that's 64 bits + 32/64 bits + 128 bits + 32 bits (for some
systems), let's round it to 40 bytes and allow 50% overhead for the
hashtable and data structure inefficiency, so 80 bytes per entry.
That gives 120K dedicated to message state.
Plus, we have to store counters for the number of messages we're
storing for each peer. For this, you have a hashtable of
Node-ID -> counter
so 128 + 32, so 20 bytes goes to 40 bytes. That's 100K for this data
structure.
So essentially, 220K for the state and counters. Oh, and you need to
sweep your first hashtable occasionally to prune out expired
transactions, or introduce a queue for that. Seems implementation
dependent whether memory or CPU/memory bandwidth is cheaper, though I
would lean toward a separate queue structure, in which case we'll just
round and say we're at 300K of data structures.
Now keep in mind that X=10 is pretty low and will certainly cause
problems with a peer that has us as their first finger table entry or
a peer with a lot of (non-reload) clients behind it.
Now, for comparison with hop-by-hop reassembly, let's say a peer has
20 active connections and the max message size for the overlay is 5K,
and the max out-of-order is 5 fragments, with MTU=1500. So we need
per connection buffers of 5K + 10*1500=12500. * 20 connections gives
us 250K
So the data structure sizes are similar. I'll call it dead even since
these numbers can obviously be swung either way with slight tweaks of
the parameters.
A pro of the max-per-source approach is that it never slows messages
for buffering. A con is that it has a tradeoff in that it essentially
breaks exponentially spaced finger table routing for peers that have a
legitimate need to exceed whatever bound X is used.
The pro of the always reassemble approach is that it's rather simple
to implement. It also only needs to be done when a smallish return
path state table is full. The con is that it slows down messages by
whatever the time it is to assemble the entire message.
I prefer the always reassemble approach because it has fewer magic
numbers and it doesn't impact the rest of the system so much. But we
may need a more thorough comparison/writeup of the various approaches
to decide.
Bruce
_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip
_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip