On Tue, Jun 27, 2017 at 10:42 AM, Han Zhou <zhou...@gmail.com> wrote:

>
>
> On Tue, Jun 27, 2017 at 10:40 AM, Han Zhou <zhou...@gmail.com> wrote:
> >
> >
> >
> > On Tue, Jun 27, 2017 at 10:12 AM, Mickey Spiegel <mickeys....@gmail.com>
> wrote:
> > >
> > >
> > > On Tue, Jun 27, 2017 at 1:02 AM, Han Zhou <zhou...@gmail.com> wrote:
> > >>
> > >> Localnet port was supposed to work for directly connected datapath
> > >> only. However, the recursive local_datapath filling introduced a
> > >> problem in below scenario:
> > >>
> > >> LS A <-> LR <-> LS B, port a@HV1 is on LS A, port b@HV2 is on LS B.
> > >> If B has localnet port, then ovn-controller on HV1 would think port
> > >> b can be reached from HV1 by localnet port on LS B, which is wrong.
> > >
> > >
> > > This scenario is flawed. Logical Routers should only be connected to
> Logical Switches with localnet ports through gateway routers or distributed
> gateway ports. Distributed gateway port logic will cause traffic to be
> emitted from an appropriate hypervisor. It is designed to work with logical
> switches with localnet ports, whereas normal router ports on distributed
> logical routers were not.
> > >
> > >>
> > >> This patch fixes it by adding hops information in local datapath
> > >> which can tell if a local-datapath is directly connected to the
> > >> hypervisor.
> > >
> > >
> > > I have not run any tests or tried it, but I think this patch breaks
> distributed gateway ports. We need to use the localnet port to get to the
> outside world from the distributed gateway port. There is no problem with
> the existing localnet port logic, when it is used as designed.
> > >
> >
> > Hi Mickey, thanks for your comments! I agree the scenario is not real
> use case, but I think it is still a bug which is revealed in such "flawed"
> scenario, and the result is misleading, regarding the discussion [1]. So I
> think it is still worth to be fixed.
> >
> It went out too fast: [1] https://mail.openvswitch.org/
> pipermail/ovs-dev/2017-June/334634.html
> >
> > This patch fixes only the "flaw" scenario, and I don't think it breaks
> the distributed gateway scenario.
>

I still believe that it breaks distributed gateway port functionality. See
more explanation below.


> The patch is just to make sure a remote port can be reached by "localnet"
> port that is directly connected to the current HV. The logic is in the
> context of how to reach a *remote port*.
>

I still feel that you are trying to fix an unsupported scenario. Moreover,
I don't follow what you are trying to do. It seems like you are making it
like OVN does not have any localnets at all, sending a Geneve tunneled
packet from one side to the other (since the sender HV1 does not see the
localnet for LS B). It is not using the non-OVN VXLAN, unless the
underlying transport between HV1 and HV2 uses non-OVN VXLAN, in which case
it would be Geneve over VXLAN.

And the distributed gateway test cases are passed:
> >
> > OVN end-to-end tests
> >
> > 2336: ovn -- 1 LR with distributed router gateway port ok
> > 2337: ovn -- send gratuitous arp for NAT rules on distributed router ok
> >
>
> Please let me know if I misunderstood anything, or the tests are not
> complete.
>

The tests are not complete. While there are some multi-node tests in tests/
ovn.at in the "1 LR with distributed router gateway port" test case, they
do not test the NAT functionality. In that existing test case, without NAT,
all north/south traffic goes through the redirect-chassis which will
trigger localnet even with your patch.

Where I think your patch will break things is for distributed NAT rules,
where the traffic can go out of the local instance of the distributed
gateway port even on a chassis that is not the redirect-chassis. As in the
"1 LR with distributed router gateway port" test, the logical topology is:
foo -- R1 -- distributed gateway port -- alice
The physical topology is:
hv1 hosts vif foo1 on LS foo.
hv2 is the redirect-chassis for the distributed gateway port.
There is a distributed NAT rule for foo1.
When foo1 sends traffic to alice1 on hv3, on hv1 it goes
foo -- R1 -- distributed gateway port -- alice -- localnet
The localnet will then get the traffic to hv3 on LS alice, so that it can
reach the destination alice1.
Similarly, traffic to the outside world will use
foo -- R1 -- distributed gateway port -- alice -- localnet
all on hv1.
In both cases the localnet is needed on hv1, but the depth of alice on hv1
is 2.
I was testing these scenarios manually while developing the feature.

When the distributed gateway port functionality was developed, NAT could
only be tested using the kernel module, i.e. "make check-kernel" or "make
check-kmod". However, this restricts the tests to be single node tests
which will not catch the issue introduced by your patch. Userspace NAT was
committed on June 2, but I do not know whether that feature allows for
multi-node NAT tests to be developed. If it does, then an automated version
of the scenario described above should be developed. It is pretty much the
topology of the "1 LR with distributed router gateway port" test case,
mixed with the "DNAT and SNAT on distributed router" test cases from tests/
system-ovn.at.

Mickey


> >
> >
> >
> > >
> > >>
> > >> Signed-off-by: Han Zhou <zhou...@gmail.com>
> > >> Reported-by: Qianyu Wang <wang.qia...@zte.com.cn>
> > >> ---
> > >>  ovn/controller/binding.c        | 1 +
> > >>  ovn/controller/ovn-controller.h | 3 +++
> > >>  ovn/controller/physical.c       | 2 +-
> > >>  3 files changed, 5 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> > >> index bb76608..03c310d 100644
> > >> --- a/ovn/controller/binding.c
> > >> +++ b/ovn/controller/binding.c
> > >> @@ -129,6 +129,7 @@ add_local_datapath__(const struct ldatapath_index
> *ldatapaths,
> > >>      ovs_assert(ld->ldatapath);
> > >>      ld->localnet_port = NULL;
> > >>      ld->has_local_l3gateway = has_local_l3gateway;
> > >> +    ld->hops = depth;
> > >>
> > >>      if (depth >= 100) {
> > >>          static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1,
> 1);
> > >> diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-
> controller.h
> > >> index 4bc0467..9b85087 100644
> > >> --- a/ovn/controller/ovn-controller.h
> > >> +++ b/ovn/controller/ovn-controller.h
> > >> @@ -66,6 +66,9 @@ struct local_datapath {
> > >>      /* True if this datapath contains an l3gateway port located on
> this
> > >>       * hypervisor. */
> > >>      bool has_local_l3gateway;
> > >> +
> > >> +    /* Number of logical hops the datapath is connected to this
> hypervisor. */
> > >> +    int hops;
> > >>  };
> > >>
> > >>  struct local_datapath *get_local_datapath(const struct hmap *,
> > >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> > >> index f2d9676..7051906 100644
> > >> --- a/ovn/controller/physical.c
> > >> +++ b/ovn/controller/physical.c
> > >> @@ -151,7 +151,7 @@ get_localnet_port(struct hmap *local_datapaths,
> int64_t tunnel_key)
> > >>  {
> > >>      struct local_datapath *ld = get_local_datapath(local_datapaths,
> > >>                                                     tunnel_key);
> > >> -    return ld ? ld->localnet_port : NULL;
> > >> +    return ld && !ld->hops ? ld->localnet_port : NULL;
> > >>  }
> > >>
> > >>  /* Datapath zone IDs for connection tracking and NAT */
> > >> --
> > >> 2.1.0
> > >>
> > >> _______________________________________________
> > >> dev mailing list
> > >> d...@openvswitch.org
> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> > >
>
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to