Re: [ovs-dev] [PATCH ovn v2] OVN: Fix learning of neighbors from ARP/ND packets.

Dumitru Ceara Wed, 14 Aug 2019 23:12:59 -0700

On Thu, Aug 8, 2019 at 1:52 AM Han Zhou <zhou...@gmail.com> wrote:
>
>
>
> On Wed, Aug 7, 2019 at 8:12 AM Dumitru Ceara <dce...@redhat.com> wrote:
>>
>> On Mon, Aug 5, 2019 at 5:34 PM Han Zhou <zhou...@gmail.com> wrote:
>> >
>> >
>> >
>> > On Mon, Aug 5, 2019 at 1:06 AM Dumitru Ceara <dce...@redhat.com> wrote:
>> > >
>> > > On Mon, Aug 5, 2019 at 1:41 AM Han Zhou <zhou...@gmail.com> wrote:
>> > > >
>> > > >
>> > > >
>> > > > On Mon, Jul 29, 2019 at 2:30 AM Dumitru Ceara <dce...@redhat.com> 
>> > > > wrote:
>> > > > >
>> > > > > Add a restriction on the target protocol addresses to match the
>> > > > > configured subnets. All other ARP/ND packets are invalid in this
>> > > > > context.
>> > > > >
>> > > > > One exception is for ARP replies that are received for route 
>> > > > > next-hops
>> > > > > that are only reachable via a port but can't be directly resolved
>> > > > > through route lookups. Such support was introduced by commit:
>> > > > >
>> > > > > 6b785fd8fe29 ("ovn-util: Allow /32 IP addresses for router ports.")
>> > > > >
>> > > > > Reported-at: https://bugzilla.redhat.com/1729846
>> > > > > Reported-by: Haidong Li <ha...@redhat.com>
>> > > > > CC: Han Zhou <zhou...@gmail.com>
>> > > > > CC: Guru Shetty <g...@ovn.org>
>> > > > > Fixes: b068454082f5 ("ovn-northd: Support learning neighbor from ARP 
>> > > > > request.")
>> > > > > Signed-off-by: Dumitru Ceara <dce...@redhat.com>
>> > > >
>> > > > Hi Dumitru,
>> > >
>> > > Hi Han,
>> > >
>> > > Thanks for reviewing this.
>> > >
>> > > >
>> > > > Sorry for my slow response, and thanks a lot for revising the patch 
>> > > > for a bigger scope of validations. However, the exception of /32 
>> > > > address makes me thinking more about this patch. If ARP replies is 
>> > > > allowed from any nexthop for a LR port with /32, at least ARP request 
>> > > > for GARP purpose should also be allowed. But before asking for a v3, 
>> > > > I'd hold on to rethink the purpose of this patch.
>> > >
>> > > Right, we should allow GARP requests too. If we decide to go ahead
>> > > with this patch I'll add a function in v3 to handle all types of ARPs
>> > > and call it both for unreachable static route next-hops and attached
>> > > subnets.
>> > >
>> > > >
>> > > > The nexthop specific flows are now from static routes. What if OVN 
>> > > > support dynamical routing protocols in the future? Of course we can 
>> > > > then take those dynamic nexthops into allowed peers. But then I am 
>> > > > thinking what is the real benefit of all these restrictions? Why can't 
>> > > > we just have simpler logic to handle all these situations without 
>> > > > validation? I think the major benefit of the validation is to avoid 
>> > > > handling the noise ARP/NDs when multiple subnets shares same L2, but 
>> > > > most cases it is really not a big deal, right? For the CPU problem 
>> > > > caused by ARP flooding as mentioned by the bug report, it is a real 
>> > > > problem, but this patch seems not really helpful for that, because the 
>> > > > attacker can just trigger the same CPU problem with *valid* packets. 
>> > > > So I am not sure if the benefit of the change is worth the complexity 
>> > > > it introduced. Please share your thought and correct me if I missed 
>> > > > something.
>> > > >
>> > > > Thanks,
>> > > > Han
>> > >
>> > > I assume the simpler logic to handle all these situations without
>> > > validation is to add rate limiting for ARP packets that get punted to
>> > > the controller. I agree that this should be implemented too.
>> > >
>> > > But I think rate limiting all ARP packets regardless of IP addresses
>> > > is not enough. In the following scenario and if we would have a way to
>> > > rate limit ARP packets:
>> > > - Subnet 42.42.42.0/24 configured on the OVN
>> > > - "Invalid" ARP packets are injected at high rate in the network for 
>> > > 41.41.41.41
>> > > - Host 42.42.42.42 sends GARP.
>> > > - Rate limiting of ARP packets towards controller at 100pps
>> > >
>> > > With the current code, ARP packets for 41.41.41.41 will be punted to
>> > > controller at a rate of at most 100 per second. But the chances that
>> > > the valid 42.42.42.42 GARP is dropped is really high.
>> > >
>> > > If we instead use the following approach:
>> > > a. Punt only useful ARPs to controller.
>> > > b. Rate limit ARPs that are sent to controller.
>> > >
>> > > Then ARP packets outside 42.42.42./24 are never punted to the
>> > > controller and don't consume any rate limiting tokens. For the second
>> > > case, when an attacker would flood with valid ARP packets we would
>> > > have the rate limit in place to protect the controller CPU.
>> > >
>> > > My commit addresses point "a" above as I think point "b" should be
>> > > done in a generic way for all protocol packets that need to reach the
>> > > controller.
>> > >
>> > > For dynamic routing protocols on the other hand I think we should not
>> > > install routes with next-hops that are unreachable through other
>> > > direct or indirect routes.
>> > >
>> > > Thanks,
>> > > Dumitru
>> >
>> > I agree that blindly ARP rate limit is not helpful, but a) is not really 
>> > helpful in this case either. In your example, the attacker can just use 
>> > any valid IP in 42.42.42.0/24 to send GARP flooding, which would result in 
>> > exactly same result that a useful GARP from 42.42.42.42 is dropped because 
>> > of blindly rate limiting all ARPs. To solve the problem properly, the ARP 
>> > rate limiting must be done per IP.
>>
>> Ok, ideally ARP rate limiting should be done per IP but it would take
>> quite a lot of resources to keep that information per host.
>>
>> Any idea how to implement that in an efficient way? There are
>> scenarios when we don't know beforehand the IPs of the hosts running
>> in the network so that we can whitelist them. Also, from what I've
>> seen physical routers usually have a single queue for control plane
>> protection for ARPs.
>>
>> Thanks,
>> Dumitru
>
>
> Yes, I agree that system resource is a challenge for per IP rate limiting. 
> And yes there is no way to whitelist (because otherwise ARP is not needed). 
> We may make some trade-off between the accuracy and efficiency. For example, 
> we can have separate meter groups for each logical router port for ARP 
> ratelimiting. Each meter group may have e.g. 64 meters, and then having a 
> stage to do hash for the IPs, and in the next stage using the hash result 
> (between 0 - 63) as index to use corresponding meter to do ratelimiting. This 
> way, flooding to a specific IP will impact only the other IPs that fall into 
> the same hash bucket on the same router port. But this is just a rough idea 
> and I believe many more details still need to be figured out for the hashing 
> part. As a first step maybe we can just do ratelimiting per router port. It 
> is definitely better than nothing. What do you think?


Hi Han,

It sounds like a good idea to have a hashtable for rate limiting
groups of ARPs. I will start like you suggested with a single bucket
per router port. The question is if we still want to punt only ARPs
for configured networks and nexthops like this patch is trying to do.
I would add it and then implement rate limiting as a follow up
patch/series.

Thanks,
Dumitru
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v2] OVN: Fix learning of neighbors from ARP/ND packets.

Reply via email to