Re: [ovs-dev] [PATCH ovn] Do not reply on unicast arps for targets.

Vasyl Saienko Sun, 19 May 2024 22:47:36 -0700

Thanks for feedback Numan.

On Wed, May 15, 2024 at 12:04 AM Numan Siddique <[email protected]> wrote:


> On Mon, May 13, 2024 at 9:00 AM Dumitru Ceara <[email protected]> wrote:
> >
> > On 5/4/24 11:38, Vasyl Saienko wrote:
> > > Hello Numan
> > >
> > > Thanks for review and feedback.
> > >
> > > On Fri, May 3, 2024 at 7:14 PM Numan Siddique <[email protected]> wrote:
> > >
> > >> On Mon, Apr 22, 2024 at 2:54 AM Vasyl Saienko <[email protected]>
> > >> wrote:
> > >>>
> > >>> Reply only if target ethernet address is broadcast, if
> > >>> address is specified explicitly do noting to let target
> > >>> reply by itself. This technique allows to monitor target
> > >>> aliveness with arping.
> > >>
> > >> Thanks for the patch.
> > >>
> > >> This patch does change the behavior of OVN.  Having ARP responder
> > >> flows avoids tunnelling the request packet if the destination is on a
> > >> different compute node.
> > >> But I don't think its a big deal.
> > >>
> > >> You are totally correct that the ARP responder allows limiting ARP
> > > broadcast traffic between nodes. Normally ARP requests are sent to
> > > broadcast ff:ff:ff:ff:ff:ff ethernet, as the protocol is designed to
> > > identify ethernet addresses for known IP addresses. In this case since
> > > traffic is broadcast it will be flooded among all nodes if ARP
> responder is
> > > not applied. At the same time the client may specify the target
> > > ethernet address as unicast (in this case a broadcast storm will not
> > > happen, as the destination ethernet address is unicast).
> > >
> > > In OpenStack we use unicast ARP requests as a way to monitor VM
> aliveness
> > > from remote locations to make sure there are no issues with
> > > flows/underlying networking infra between nodes. We use ARPs due to
> several
> > > reasons:
> > > 1. This protocol is mandatory and can't be disabled on the target
> machine
> > > (which guarantees that the target VM will always reply to ARPs if it is
> > > alive)
> > > 2. This protocol is not filtered by firewalls/security groups as it is
> > > mandatory for network workability
> > > 3. We can't use upper layers protocols such as ICMP as they will
> require
> > > specific firewall rules inside VMs, and we do not control VMs in cloud
> > > cases, but still need to monitor infrastructure aliveness.
> > >
> > > If ARP responder replies to requests with broadcast and unicast target
> > > ethernet address, we can't use this technique for monitoring, as even
> if
> > > target VM is down (not necessary that ovn port is down, the VM may
> stuck or
> > > be in kernel panic for example, or datapath between monitoring tool
> and VM
> > > is broken) the ARP responder will do reply to our request so we can't
> > > identify if VM is really accessible or not as we will always got
> replies
> > > from local ARP responder. At the same time there is no need to set up
> ARP
> > > responder for requests with a unicast ethernet address, as they will
> not
> > > generate broadcast storms by design (they are unicasts).
> > >
> >
> > Right, this sounds acceptable to me too.
> >
> > >
> > >> 1.  Accept your patch
> > >> 2.  Use the NB Global option - "ignore_lsp_down" and set it to false,
> > >> so that ovn-northd will install the ARP responder flows only if the
> > >> logical port is up.
> > >>
> > > This option will not allow catching a situation when there is an issue
> in
> > > datapath between monitoring software and VM located on another node,
> for
> > > example  issues with flows or some underlying networking problems.
> > >
> > >      Have you considered this approach for your use case ?
> > >> 3.  Add another global option - "disable_arp_responder" which when set
> > >> to true, ovn-northd will not install ARP responder flows.
> > >>
> > >> Disabling ARP responder for ports completely will allow broadcast
> storms
> > > for ARP requests. Which we would like to avoid, but enable the
> possibility
> > > to monitor infrastructure aliveness.
> > >
> > >
> > >> Given that OpenStack neutron ml2ovs uses ARP responder flows only for
> > >> broad cast my preference would be (1).
> > >>
> > >> I'd like to know other's opinion on this.
> > >>
> >
> > I'd accept the patch too so I'd go for (1).
> >
> > Regards,
> > Dumitru
> >
> > >> Thanks
> > >> Numan
> > >>
> > >>>
> > >>> Closes  #239
> > >>>
> > >>> Signed-off-by: Vasyl Saienko <[email protected]>
> > >>> ---
> > >>>  northd/northd.c     | 11 +++++++++--
> > >>>  tests/ovn-northd.at |  4 ++--
> > >>>  2 files changed, 11 insertions(+), 4 deletions(-)
> > >>>
> > >>> diff --git a/northd/northd.c b/northd/northd.c
> > >>> index 37f443e70..e80e1885d 100644
> > >>> --- a/northd/northd.c
> > >>> +++ b/northd/northd.c
> > >>> @@ -8832,8 +8832,15 @@
> build_lswitch_arp_nd_responder_known_ips(struct
> > >> ovn_port *op,
> > >>>          for (size_t i = 0; i < op->n_lsp_addrs; i++) {
> > >>>              for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs;
> j++) {
> > >>>                  ds_clear(match);
> > >>> -                ds_put_format(match, "arp.tpa == %s && arp.op == 1",
> > >>> -                            op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> > >>> +                /* NOTE(vsaienko): Do not reply on unicast ARPs,
> forward
> > >>> +                 * them to the target to have ability to monitor
> target
> > >>> +                 * aliveness via ARPs.
> > >>> +                */
> > >>> +                ds_put_format(match,
> > >>> +                    "arp.tpa == %s && "
> > >>> +                    "arp.op == 1 && "
> > >>> +                    "eth.dst == ff:ff:ff:ff:ff:ff",
> > >>> +                    op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> > >>>                  ds_clear(actions);
> > >>>                  ds_put_format(actions,
> > >>>                      "eth.dst = eth.src; "
> > >>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > >>> index be006fb32..4162196f4 100644
> > >>> --- a/tests/ovn-northd.at
> > >>> +++ b/tests/ovn-northd.at
> > >>> @@ -9283,9 +9283,9 @@ AT_CAPTURE_FILE([S1flows])
> > >>>
> > >>>  AT_CHECK([grep -e "ls_in_arp_rsp" S1flows | ovn_strip_lflows], [0],
> [dnl
> > >>>    table=??(ls_in_arp_rsp      ), priority=0    , match=(1),
> > >> action=(next;)
> > >>> -  table=??(ls_in_arp_rsp      ), priority=100  , match=(arp.tpa ==
> > >> 192.168.0.10 && arp.op == 1 && inport == "S1-vm1"), action=(next;)
> > >>> +  table=??(ls_in_arp_rsp      ), priority=100  , match=(arp.tpa ==
> > >> 192.168.0.10 && arp.op == 1 && eth.dst == ff:ff:ff:ff:ff:ff && inport
> ==
> > >> "S1-vm1"), action=(next;)
> > >>>    table=??(ls_in_arp_rsp      ), priority=100  , match=(nd_ns &&
> > >> ip6.dst == {fd00::10, ff02::1:ff00:10} && nd.target == fd00::10 &&
> inport
> > >> == "S1-vm1"), action=(next;)
> > >>> -  table=??(ls_in_arp_rsp      ), priority=50   , match=(arp.tpa ==
> > >> 192.168.0.10 && arp.op == 1), action=(eth.dst = eth.src; eth.src =
> > >> 50:54:00:00:00:10; arp.op = 2; /* ARP reply */ arp.tha = arp.sha;
> arp.sha =
> > >> 50:54:00:00:00:10; arp.tpa = arp.spa; arp.spa = 192.168.0.10; outport
> =
> > >> inport; flags.loopback = 1; output;)
> > >>> +  table=??(ls_in_arp_rsp      ), priority=50   , match=(arp.tpa ==
> > >> 192.168.0.10 && arp.op == 1 && eth.dst == ff:ff:ff:ff:ff:ff),
> > >> action=(eth.dst = eth.src; eth.src = 50:54:00:00:00:10; arp.op = 2;
> /* ARP
> > >> reply */ arp.tha = arp.sha; arp.sha = 50:54:00:00:00:10; arp.tpa =
> arp.spa;
> > >> arp.spa = 192.168.0.10; outport = inport; flags.loopback = 1; output;)
> > >>>    table=??(ls_in_arp_rsp      ), priority=50   , match=(nd_ns &&
> > >> ip6.dst == {fd00::10, ff02::1:ff00:10} && nd.target == fd00::10),
> > >> action=(nd_na { eth.src = 50:54:00:00:00:10; ip6.src = fd00::10;
> nd.target
> > >> = fd00::10; nd.tll = 50:54:00:00:00:10; outport = inport;
> flags.loopback =
> > >> 1; output; };)
>
> I took a look at this patch.  There's one problem with this patch.  If
> you see the test case changes, this patch now
> replaces the below logical flow
>
> table=??(ls_in_arp_rsp      ), priority=100  , match=(arp.tpa ==
> 192.168.0.10 && arp.op == 1 && inport == "S1-vm1"), action=(next;)
>
> with
>
>  table=??(ls_in_arp_rsp      ), priority=100  , match=(arp.tpa ==
> 192.168.0.10 && arp.op == 1 && eth.dst == ff:ff:ff:ff:ff:ff && inport
> == "S1-vm1"), action=(next;)
>
> This is wrong.  If there is an ARP request packet entering from a
> router port to the switch,  we were advancing this packet to the next
> stage.
> But now, we only advance if it's a broadcast pkt.  Please correct this.
>
> This does not look wrong, the router is yet another client that uses ARP
to map IP address to MAC address.
If the MAC address is known, the router will not send ARP to the target at
all, in other cases it will send it to broadcast where
we will reply from the ARP responder.


> Also please update the logical flow documentation in ovn-northd.8.xml
> for the "ls_in_arp_rsp" stage.  I think its also worth mentioning this
> change in the
> NEWS file.
>
> Will update in next revision.


> Thanks
> Numan
>
>
>
>
>
> > >>>  ])
> > >>>
> > >>> --
> > >>> 2.40.1
> > >>>
> > >>> _______________________________________________
> > >>> dev mailing list
> > >>> [email protected]
> > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >>>
> > >>
> > >
> > >
> >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


-- 
<https://www.mirantis.com/>
Vasyl Saienko

Principal DevOps Engineer
[email protected] <[email protected]>
+(380) 66 072 07 17  <++1+(650)+564+7038>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] Do not reply on unicast arps for targets.

Reply via email to