Re: [P2PSIP] RELOAD fragmentation

Narayanan, Vidya Fri, 19 Dec 2008 00:20:19 -0800

> -----Original Message-----
> From: Bruce Lowekamp [mailto:[email protected]]
> Sent: Thursday, December 18, 2008 8:50 PM
> To: Narayanan, Vidya
> Cc: Salman Abdul Baset; [email protected]
> Subject: Re: [P2PSIP] RELOAD fragmentation
> 
> I think it's enough that the buffer space is close.


The point in my email is that it isn't.  But, I agree that it isn't significant 
either way. 

>The exact details
> of the amount of state stored for return path are fairly variable
> depending on specifics of the deployment.  Plus, in all cases the
> amount of state required doesn't appear to be terribly significant
> compared to the space required to store the messages waiting for the
> hop-by-hop ACK.
> 
> On Thu, Dec 18, 2008 at 8:05 PM, Narayanan, Vidya <[email protected]>
> wrote:
> > I think the math on this needs to be revisited.  Here's a shot at it:
> >
> > Assuming uniform distribution of objects and queries, an effective
> storage of X*N is not valid for a functional overlay, simply because
> not every node will route through a particular node. Rather, O(log N)
> would be the right order of messages routed through a given node. If we
> think about X * (E+G), then G is O(log n). However, at E >> O(log N),
> the overlay is not going to be functional anyway and that is not the
> case under discussion.
> >
> 
> Yes, X * N is worst case, though none of the rest of my numbers
> depended on that.  Interesting question how large E can get here
> before other aspects of the overlay break down.
> 

When E gets large enough that the overlay operation gets disrupted, we have 
bigger problems than buffer space exhaustion.  So, that is a question that will 
have several dimensions to it.  

> 
> > Sticking to your calculation model, the total amount of memory for
> storing routing state would be Transaction ID + return connection +
> source Node ID + timeout. For the source node ID field, you don't
> actually need to store all the 128 bits, since you only maintain log N
> connections.  You could just do with log (N = 2500) bits ~ 12 ~ 2 bytes
> of information. This alone would reduce the total storage cost to 18
> bytes. Accounting for a 15second timer (which, btw, I think is too
> long), this gives a total of 27K for message state and around 8K for
> the NodeID counter. This leads us to a total of 35K and even with 50%
> overhead each on the hash and data structure (which also seems too
> high), to around 70K (compared to the 220K you indicate below).
> >
> 
> You need the source node id when you're counting number of
> transactions you're storing per source, which is Salman's proposal.
> 

I don't believe that a node needs to be counting the number of transactions per 
source.  That is not the only way a node can be under attack anyway - a 
misbehaving finger may route everything it gets through a given node 
irrespective of the destination - in which case, the source of the message 
tells you nothing. The only thing a node can realistically do is determine when 
it is serving beyond its capacity and then kick in some mitigation or backoff 
techniques. 

> > Along the same lines, for the hop-by-hop reassembly approach, we have
> 5K + 5*1500 = 12500 (12.5K) and with log N connections (~12 with N =
> 2500), we get a total requirement of 150K. This is atleast twice as
> worse. Also, I am not sure I understand the reasoning as to why there
> is 50% overhead in the first case compared to no overhead for the hop-
> by-hop reassembly case. If no overhead is taken into consideration,
> then the hop-by-hop reassembly would be four times worse.  Also, we
> have to include the via list in the calculation for the hop-by-hop
> reassembly approach and this makes it worse.
> >
> 
> Hash tables have relatively high overhead compared to the overhead for
> the reassembly buffers.
> 
> > With this, I'm not seeing a case for per-hop reassembly of messages.
> In fact, I don't even think there is a need to put an upper bound on
> the number of in-progress messages a node can have at any given time.
> Also, if anything, the buffer exhaustion attack that you seem to be
> worried about is more feasible with per-hop reassembly.
> >
> 
> I'm sorry, but I don't see anything above that leads to any of these
> conclusions.  Why is it not necessary to bound the number of messages
> a peer sends through you?  Wasn't that the method Salman proposed for
> detecting when you're under attack?  

I should have caught that before and clarified it - but, as I mentioned above, 
it is not an effective way of detecting an attack and I don't think it buys 
anything.  

> How can a buffer exhaustion
> attack possibly apply to a technique with statically sized buffers?
> 

I guess what you mean is buffers that don't grow with messages - but, this 
comes at a cost of preventing message interleaving, which is steep.  However, 
even with that, you have to accommodate for a variable buffer size, given that 
you don't know what the size of the via list may be.  You need to account for a 
malicious node continuing to send more and more fragments - which brings timers 
into the picture again, upon expiry of which, the fragments need to be purged.  
Also, if you want to allow interleaving, the problem gets worse.  

> Ultimately, the motivation for hop-by-hop reassembly is that it is
> provably immune to this particular attack, with a slight latency
> penalty.  

I don't think we can conclude that the impact to latency is "slight".  I 
actually think it would be significant enough to be undesirable.  

> Storing return path state is harder to bound the size of the
> state under "normal" conditions and requires probabilistic detection
> of attacks.  False positives degrade overlay performance
> significantly.  (Open to suggestions of other ways of detecting and
> mitigating attacks.)
> 
> For an internet-scale protocol, I really prefer provably immune to
> probabilistic arguments with poor behaviors under unanticipated
> non-attack scenarios.  

I'm not sure I understand the poor behavior in non-attack scenarios.  A little 
more explanation around that sentence would help me.   

> I was actually one of the strong proponents of
> the original text for end-to-end reassembly only, but the need to
> store return path somewhere (in the message or on the peers) makes
> this a fundamentally different problem than internet routing.
> 

Not really.  It isn't fundamentally different from LSRR in IPv4 or the Source 
Routing Header in IPv6. Of course, there are plenty of other issues with IPv6 
Type 0 RH that are not relevant here :) 

> In most scenarios, the return state storage will work fine and have
> better performance.  I'm just not convinced it has good failure
> behavior.  If performance is the focus, the best performance that I
> can see is still to store return path state and fall back to
> reassembly under attack.
> 

I don't think the failure behaviors are that different and I really think we 
are trying to find a solution for an attack in one particular scenario, while 
the attack can be launched in several other ways.  The performance differences 
in my mind make the end-to-end reassembly the best option. 

Vidya 

> Bruce
> 
> 
> 
> > Vidya
> >
> >> -----Original Message-----
> >> From: Bruce Lowekamp [mailto:[email protected]]
> >> Sent: Wednesday, December 17, 2008 7:28 PM
> >> To: Salman Abdul Baset
> >> Cc: Narayanan, Vidya; [email protected]
> >> Subject: Re: [P2PSIP] RELOAD fragmentation
> >>
> >> On Wed, Dec 17, 2008 at 7:48 PM, Salman Abdul Baset
> >> <[email protected]> wrote:
> >> >
> >> >> FWIW, storing forwarding state vs appending to the via list is a
> >> >> long-standing option in the protocol.  The only thing that is new
> is
> >> >> whether to do hop-by-hop reassembly.
> >> >>
> >> >> I'd be interested in seeing a proposal for a DoS mitigation
> >> technique
> >> >> on this type of attack that has nearly the effectiveness or
> >> simplicity
> >> >> of hop-by-hop reassembly.  For that matter, I'd be interested in
> a
> >> DoS
> >> >> mitigation approach for storing return path state that is simpler
> >> than
> >> >> using reassembly as a fall-back strategy.
> >> >
> >> > I mentioned the solution in an earlier post:
> >> > http://www.ietf.org/mail-archive/web/p2psip/current/msg04541.html
> >> >
> >> > For return path state, one solution is to limit the inprogress
> >> messages from
> >> > a certain peer to a configured number. If the peer still sends
> more
> >> > messages, it can be informed to do ROUTE-QUERY, thereby preventing
> >> further
> >> > consumption of memory on the intermediary nodes.
> >>
> >> Looking at that idea, it provides a much simpler approach to dealing
> >> with excessive state than the always reassemble.  But it has a
> pretty
> >> substantial cost, too.  Let the max inprogress messages from a peer
> be
> >> X.  Since any peer in the system can send messages through it, it
> >> effectively requires storage of X * N, where N is the number of
> peers
> >> in the system.  You can argue that it should be X * E+G, where E is
> >> the number of evil peers and G is the number of good peers that have
> a
> >> legitimate need to route messages through this peer, but it's really
> >> hard to bound either of those numbers.
> >>
> >> There's also a problem that I think this will break exponentially
> >> spaced finger tables, such as used by chord.  The first finger table
> >> entry will be used for 50% of a peer's originiated requests
> (assuming
> >> even distribution).  I think it's going to be hard to set X high
> >> enough so you don't force a peer to have additional connections
> beyond
> >> its first finger table entry, but low enough to defend against DoS
> >> attacks.
> >>
> >> Going back to the size issue, I'm going to handwave at some numbers
> >> here.  Let's say that X is 10 and we route 100 messages per second
> >> from 2500 peers (over the course of a few minutes).   The current
> >> message timeout in the draft (needs to be changed) is 15 seconds.
> So
> >> you're storing state for 1500 messages at a typical time.  For each
> >> message you're storing
> >>
> >> transaction id -> {return connection (could be an index in a table
> or
> >> a pointer to a dtls structure), source Node-ID , timeout }
> >>
> >> so that's 64 bits + 32/64 bits + 128 bits + 32 bits (for some
> >> systems), let's round it to 40 bytes and allow 50% overhead for the
> >> hashtable and data structure inefficiency, so 80 bytes per entry.
> >>
> >> That gives 120K dedicated to message state.
> >>
> >> Plus, we have to store counters for the number of messages we're
> >> storing for each peer.  For this, you have a hashtable of
> >>
> >> Node-ID -> counter
> >> so 128 + 32, so 20 bytes goes to 40 bytes.  That's 100K for this
> data
> >> structure.
> >>
> >> So essentially, 220K for the state and counters.  Oh, and you need
> to
> >> sweep your first hashtable occasionally to prune out expired
> >> transactions, or introduce a queue for that.  Seems implementation
> >> dependent whether memory or CPU/memory bandwidth is cheaper, though
> I
> >> would lean toward a separate queue structure, in which case we'll
> just
> >> round and say we're at 300K of data structures.
> >>
> >> Now keep in mind that X=10 is pretty low and will certainly cause
> >> problems with a peer that has us as their first finger table entry
> or
> >> a peer with a lot of (non-reload) clients behind it.
> >>
> >> Now, for comparison with hop-by-hop reassembly, let's say a peer has
> >> 20 active connections and the max message size for the overlay is
> 5K,
> >> and the max out-of-order is 5 fragments, with MTU=1500.   So we need
> >> per connection buffers of 5K + 10*1500=12500.  * 20 connections
> gives
> >> us 250K
> >>
> >> So the data structure sizes are similar.  I'll call it dead even
> since
> >> these numbers can obviously be swung either way with slight tweaks
> of
> >> the parameters.
> >>
> >> A pro of the max-per-source approach is that it never slows messages
> >> for buffering.  A con is that it has a tradeoff in that it
> essentially
> >> breaks exponentially spaced finger table routing for peers that have
> a
> >> legitimate need to exceed whatever bound X is used.
> >>
> >> The pro of the always reassemble approach is that it's rather simple
> >> to implement.  It also only needs to be done when a smallish return
> >> path state table is full.  The con is that it slows down messages by
> >> whatever the time it is to assemble the entire message.
> >>
> >> I prefer the always reassemble approach because it has fewer magic
> >> numbers and it doesn't impact the rest of the system so much.  But
> we
> >> may need a more thorough comparison/writeup of the various
> approaches
> >> to decide.
> >>
> >> Bruce
> >
_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip

Re: [P2PSIP] RELOAD fragmentation

Reply via email to