Re: [ovs-dev] [PATCH] ovn-controller: use localnet port for directly connected datapath only.

2017-07-05 Thread Mickey Spiegel
On Wed, Jul 5, 2017 at 12:36 AM, Han Zhou <zhou...@gmail.com> wrote:

>
>
> On Tue, Jul 4, 2017 at 10:56 PM, Mickey Spiegel <mickeys@gmail.com>
> wrote:
> >
> >
> > On Tue, Jun 27, 2017 at 10:42 AM, Han Zhou <zhou...@gmail.com> wrote:
> >>
> >>
> >>
> >> On Tue, Jun 27, 2017 at 10:40 AM, Han Zhou <zhou...@gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Jun 27, 2017 at 10:12 AM, Mickey Spiegel <
> mickeys@gmail.com> wrote:
> >> > >
> >> > >
> >> > > On Tue, Jun 27, 2017 at 1:02 AM, Han Zhou <zhou...@gmail.com>
> wrote:
> >> > >>
> >> > >> Localnet port was supposed to work for directly connected datapath
> >> > >> only. However, the recursive local_datapath filling introduced a
> >> > >> problem in below scenario:
> >> > >>
> >> > >> LS A <-> LR <-> LS B, port a@HV1 is on LS A, port b@HV2 is on LS
> B.
> >> > >> If B has localnet port, then ovn-controller on HV1 would think port
> >> > >> b can be reached from HV1 by localnet port on LS B, which is wrong.
> >> > >
> >> > >
> >> > > This scenario is flawed. Logical Routers should only be connected
> to Logical Switches with localnet ports through gateway routers or
> distributed gateway ports. Distributed gateway port logic will cause
> traffic to be emitted from an appropriate hypervisor. It is designed to
> work with logical switches with localnet ports, whereas normal router ports
> on distributed logical routers were not.
> >> > >
> >> > >>
> >> > >> This patch fixes it by adding hops information in local datapath
> >> > >> which can tell if a local-datapath is directly connected to the
> >> > >> hypervisor.
> >> > >
> >> > >
> >> > > I have not run any tests or tried it, but I think this patch breaks
> distributed gateway ports. We need to use the localnet port to get to the
> outside world from the distributed gateway port. There is no problem with
> the existing localnet port logic, when it is used as designed.
> >> > >
> >> >
> >> > Hi Mickey, thanks for your comments! I agree the scenario is not real
> use case, but I think it is still a bug which is revealed in such "flawed"
> scenario, and the result is misleading, regarding the discussion [1]. So I
> think it is still worth to be fixed.
> >> >
> >> It went out too fast: [1] https://mail.openvswitch.org/
> pipermail/ovs-dev/2017-June/334634.html
> >> >
> >> > This patch fixes only the "flaw" scenario, and I don't think it
> breaks the distributed gateway scenario.
> >
> >
> > I still believe that it breaks distributed gateway port functionality.
> See more explanation below.
> >
> >>
> >> The patch is just to make sure a remote port can be reached by
> "localnet" port that is directly connected to the current HV. The logic is
> in the context of how to reach a *remote port*.
> >
> >
> > I still feel that you are trying to fix an unsupported scenario.
> Moreover, I don't follow what you are trying to do.
>
> Yes, I am fixing an unsupported scenario, because when user configure it
> this way, the localnet will be abused, and the behavior could be confusing.
> This fix is just to make sure the localnet port works in the old and safe
> way. Maybe this is not the good way to fix, and a better way might be
> detecting such configuration (localnet port and non-gateway LR port on the
> same LS) in northd and write a warning log.
>

That would be my vote.


> I was also thinking about abandoning this patch because I think connecting
> logical switches with localnet ports by distributed LR could be useful in
> some special case, but since that case would require some more changes and
> there are open issues, I'd rather discuss it separately, independent of
> this patch.
>

If you want to continue to try to support this type of connectivity, I
agree that it should involve a more comprehensive discussion. You would
still have to make changes such as ARP responses. I wonder how far such a
solution would go down the path already travelled during development of the
distributed gateway port.

>
> > It seems like you are making it like OVN does not have any localnets at
> all, sending a Geneve tunneled packet from one side to the other (since the
> sender HV1 does not see the localnet fo

Re: [ovs-dev] [PATCH] ovn-controller: use localnet port for directly connected datapath only.

2017-07-04 Thread Mickey Spiegel
On Tue, Jun 27, 2017 at 10:42 AM, Han Zhou <zhou...@gmail.com> wrote:

>
>
> On Tue, Jun 27, 2017 at 10:40 AM, Han Zhou <zhou...@gmail.com> wrote:
> >
> >
> >
> > On Tue, Jun 27, 2017 at 10:12 AM, Mickey Spiegel <mickeys@gmail.com>
> wrote:
> > >
> > >
> > > On Tue, Jun 27, 2017 at 1:02 AM, Han Zhou <zhou...@gmail.com> wrote:
> > >>
> > >> Localnet port was supposed to work for directly connected datapath
> > >> only. However, the recursive local_datapath filling introduced a
> > >> problem in below scenario:
> > >>
> > >> LS A <-> LR <-> LS B, port a@HV1 is on LS A, port b@HV2 is on LS B.
> > >> If B has localnet port, then ovn-controller on HV1 would think port
> > >> b can be reached from HV1 by localnet port on LS B, which is wrong.
> > >
> > >
> > > This scenario is flawed. Logical Routers should only be connected to
> Logical Switches with localnet ports through gateway routers or distributed
> gateway ports. Distributed gateway port logic will cause traffic to be
> emitted from an appropriate hypervisor. It is designed to work with logical
> switches with localnet ports, whereas normal router ports on distributed
> logical routers were not.
> > >
> > >>
> > >> This patch fixes it by adding hops information in local datapath
> > >> which can tell if a local-datapath is directly connected to the
> > >> hypervisor.
> > >
> > >
> > > I have not run any tests or tried it, but I think this patch breaks
> distributed gateway ports. We need to use the localnet port to get to the
> outside world from the distributed gateway port. There is no problem with
> the existing localnet port logic, when it is used as designed.
> > >
> >
> > Hi Mickey, thanks for your comments! I agree the scenario is not real
> use case, but I think it is still a bug which is revealed in such "flawed"
> scenario, and the result is misleading, regarding the discussion [1]. So I
> think it is still worth to be fixed.
> >
> It went out too fast: [1] https://mail.openvswitch.org/
> pipermail/ovs-dev/2017-June/334634.html
> >
> > This patch fixes only the "flaw" scenario, and I don't think it breaks
> the distributed gateway scenario.
>

I still believe that it breaks distributed gateway port functionality. See
more explanation below.


> The patch is just to make sure a remote port can be reached by "localnet"
> port that is directly connected to the current HV. The logic is in the
> context of how to reach a *remote port*.
>

I still feel that you are trying to fix an unsupported scenario. Moreover,
I don't follow what you are trying to do. It seems like you are making it
like OVN does not have any localnets at all, sending a Geneve tunneled
packet from one side to the other (since the sender HV1 does not see the
localnet for LS B). It is not using the non-OVN VXLAN, unless the
underlying transport between HV1 and HV2 uses non-OVN VXLAN, in which case
it would be Geneve over VXLAN.

And the distributed gateway test cases are passed:
> >
> > OVN end-to-end tests
> >
> > 2336: ovn -- 1 LR with distributed router gateway port ok
> > 2337: ovn -- send gratuitous arp for NAT rules on distributed router ok
> >
>
> Please let me know if I misunderstood anything, or the tests are not
> complete.
>

The tests are not complete. While there are some multi-node tests in tests/
ovn.at in the "1 LR with distributed router gateway port" test case, they
do not test the NAT functionality. In that existing test case, without NAT,
all north/south traffic goes through the redirect-chassis which will
trigger localnet even with your patch.

Where I think your patch will break things is for distributed NAT rules,
where the traffic can go out of the local instance of the distributed
gateway port even on a chassis that is not the redirect-chassis. As in the
"1 LR with distributed router gateway port" test, the logical topology is:
foo -- R1 -- distributed gateway port -- alice
The physical topology is:
hv1 hosts vif foo1 on LS foo.
hv2 is the redirect-chassis for the distributed gateway port.
There is a distributed NAT rule for foo1.
When foo1 sends traffic to alice1 on hv3, on hv1 it goes
foo -- R1 -- distributed gateway port -- alice -- localnet
The localnet will then get the traffic to hv3 on LS alice, so that it can
reach the destination alice1.
Similarly, traffic to the outside world will use
foo -- R1 -- distributed gateway port -- alice -- localnet
all on hv1.
In both cases the localnet is needed on hv1, but the depth of alice on hv1
is 2.
I was tes

Re: [ovs-dev] 答复: Re: 答复: [spam可疑邮件]Re: 答复: Re: [PATCH 2/2] ovn-northd: Fix ping failure of vlan networks.

2017-06-29 Thread Mickey Spiegel
On Thu, Jun 29, 2017 at 2:19 PM, Han Zhou  wrote:

> I learned that this use case is kind of Hierarchical scenario:
> https://specs.openstack.org/openstack/neutron-specs/specs/
> kilo/ml2-hierarchical-port-binding.html
>
> In such scenario, user wants to use OVN to manage vlan networks, and the
> vlan networks is connected via VTEP overlay, which is not managed by OVN
> itself. VTEP is needed to connect to BM/SRIOV VMs to the same L2 that OVS
> VIFs are connected to.
>
> User don't want to use OVN to manage VTEPs since there will be flooding to
> many VTEPs. (Mac-learning is not supported yet)
>
> So in this scenario, user want to utilize distributed logical router as a
> way to optimize the datapath. For VM to VM traffic between different vlans,
> instead of going to a centralized external L3 router, user wants the
> traffic to be tagged to the destination vlan directly and go straight from
> source HV to destination HV through the destination vlan.
>

L2 and L3 have different semantics and different ways of handling packets.
There is a big difference between:
1) bridging between different VLANs, going through a VTEP overlay
   that connects those VLANs, and
2) routing between different VLANs.
Trying to blur that boundary will lead to unexpected behavior and various
issues.

>
> In the vtep scenario, this is a valuable optimization. Even in a normal
> vlan setup without vtep, this can be an optimization too if src and dst VMs
> are on the same HV (so that the packet doesn't need to go to physical
> switch and come back).
>
> So, I agree with Qianyu that connecting VLAN networks with logical router
> is reasonable, which means, the transport of logical router can be not only
> tunnels, but also physical networks (localnet port).


Mixing routers and localnet is dangerous because of interactions with
MAC addresses and L2 learning. The reason it is not a good idea to
transmit packets from a distributed logical router directly to a physical
network through localnet is because the router rewrites the source MAC
to the router MAC. The physical network will learn about the router MAC
based on the last location from which it saw the router send a packet to
the physical network. That may have low correlation with the next
packet from the physical network back to the router MAC, so north/south
packets may end up with an almost random distribution of chassis on
which the distributed logical router resides, independent of where the
destination actually resides.

You are looking only at east west traffic, counting on the asymmetry using
the distributed router instance on the other side so that return traffic
never hits the physical network with the router MAC as the destination
MAC. However, there will be north/south traffic destined to the router
MAC, and this approach will make that bounce all over the place.

Perhaps you can make something work if you have per node per router
MAC addresses, but it still scares me.

To fulfill this
> requirement, we need to solve some problems in current OVN code:
>
> 1) Since the data path is asymmetric, we need to solve the CT problem of
> the localnet port. I agree with the idea of the patch from Qianyu, which
> bypasses FW for localnet ports, since localnet port is to connect real
> endpoints, so maybe there is not much value to add ACL on localnet ports.
> Not sure if there is use case where ACL is really needed for localnet
> ports.
>
> 2) When there are ARP requests from vlan network (e.g. from a BM) to
> logical router interface IP, the ARP request will reach every HV through
> the localnet port and the distributed logical router port will respond from
> every HV. Shall we disable ARP response from logical router for requests
> from localnet port? In this scenario, I would expect the BM/SRIOV VMs on
> the same vlan to use a different GW rather than the logical router.
>

How will north/south traffic work?

Distributed gateway ports limit ARP responses so that only one instance
of the distributed gateway port on one chassis responds.

At the moment distributed gateway ports only allow direct traffic (without
going through the central node) for 1:1 NAT with logical_port and
external_mac specified. This is because of the implications of upstream
L2 learning, and lack of a routing protocol. If there are mechanisms that
allow things on the outside to know which chassis to forward traffic to,
then this can be relaxed. This is controlled by the gw_redirect table.

Mickey


> 3) We need to add a restriction/validation so that the localnet connection
> is used only we are sure the 2 logical switches are on the same "physical
> network", e.g. different vlans under same physical bridge group, or a
> virtual L2 bridge group formed by vtep overlays.
>
> Thanks,
> Han
>
>
> On Tue, Jun 27, 2017 at 10:00 AM, Han Zhou  wrote:
>
> > It is not about limit but more about use case. Could you explain your use
> > case why using localnet ports here while the 

Re: [ovs-dev] [PATCH] ovn-controller: use localnet port for directly connected datapath only.

2017-06-27 Thread Mickey Spiegel
On Tue, Jun 27, 2017 at 1:02 AM, Han Zhou  wrote:

> Localnet port was supposed to work for directly connected datapath
> only. However, the recursive local_datapath filling introduced a
> problem in below scenario:
>
> LS A <-> LR <-> LS B, port a@HV1 is on LS A, port b@HV2 is on LS B.
> If B has localnet port, then ovn-controller on HV1 would think port
> b can be reached from HV1 by localnet port on LS B, which is wrong.
>

This scenario is flawed. Logical Routers should only be connected to
Logical Switches with localnet ports through gateway routers or distributed
gateway ports. Distributed gateway port logic will cause traffic to be
emitted from an appropriate hypervisor. It is designed to work with logical
switches with localnet ports, whereas normal router ports on distributed
logical routers were not.


> This patch fixes it by adding hops information in local datapath
> which can tell if a local-datapath is directly connected to the
> hypervisor.
>

I have not run any tests or tried it, but I think this patch breaks
distributed gateway ports. We need to use the localnet port to get to the
outside world from the distributed gateway port. There is no problem with
the existing localnet port logic, when it is used as designed.

Mickey


> Signed-off-by: Han Zhou 
> Reported-by: Qianyu Wang 
> ---
>  ovn/controller/binding.c| 1 +
>  ovn/controller/ovn-controller.h | 3 +++
>  ovn/controller/physical.c   | 2 +-
>  3 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> index bb76608..03c310d 100644
> --- a/ovn/controller/binding.c
> +++ b/ovn/controller/binding.c
> @@ -129,6 +129,7 @@ add_local_datapath__(const struct ldatapath_index
> *ldatapaths,
>  ovs_assert(ld->ldatapath);
>  ld->localnet_port = NULL;
>  ld->has_local_l3gateway = has_local_l3gateway;
> +ld->hops = depth;
>
>  if (depth >= 100) {
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-
> controller.h
> index 4bc0467..9b85087 100644
> --- a/ovn/controller/ovn-controller.h
> +++ b/ovn/controller/ovn-controller.h
> @@ -66,6 +66,9 @@ struct local_datapath {
>  /* True if this datapath contains an l3gateway port located on this
>   * hypervisor. */
>  bool has_local_l3gateway;
> +
> +/* Number of logical hops the datapath is connected to this
> hypervisor. */
> +int hops;
>  };
>
>  struct local_datapath *get_local_datapath(const struct hmap *,
> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> index f2d9676..7051906 100644
> --- a/ovn/controller/physical.c
> +++ b/ovn/controller/physical.c
> @@ -151,7 +151,7 @@ get_localnet_port(struct hmap *local_datapaths,
> int64_t tunnel_key)
>  {
>  struct local_datapath *ld = get_local_datapath(local_datapaths,
> tunnel_key);
> -return ld ? ld->localnet_port : NULL;
> +return ld && !ld->hops ? ld->localnet_port : NULL;
>  }
>
>  /* Datapath zone IDs for connection tracking and NAT */
> --
> 2.1.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] 答复: [spam可疑邮件]Re: 答复: Re: [PATCH 2/2] ovn-northd: Fix ping failure of vlan networks.

2017-06-27 Thread Mickey Spiegel
On Thu, Jun 15, 2017 at 1:04 AM,  wrote:

> Hi Russell, I am sorry for the late reply.
> The route not bound to a chassis, and have no redirect-chassis. The dumped
> northbound db is as follow.
> Ip addresses of 100.0.0.148 and 200.0.0.2 locate on different chassis. The
> ping between them is not success before this patch.
>

I have a different opinion from Han.

This is an unsupported combination of features that can never be made to
work. The problem is the use of a normal logical router port on a
distributed router, connecting directly to a logical switch with a localnet
port. For many reasons, this is not a combination that OVN supports. Try to
issue an ARP from a physical device that gets flooded on the physical
network, and see how many ARP responses you get and what that does to L2
learning.

There are two methods to connect OVN routers to a logical switches with
localnet ports:
1. Use a gateway router, with a transit logical switch connecting the
gateway router and the distributed router.
2. Use a distributed gateway port. This feature was designed to allow
distributed logical routers to connect directly to logical switches with
localnet ports. However, note that there is a limitation that there can be
only one distributed gateway port on a distributed logical router. While
there was an attempt to expand to multiple distributed gateway ports on a
distributed logical router a few months ago, that did not progress very
far. This discussion points out that such an expansion will be tricky, in
particular if trying to send east/west traffic using two different
distributed gateway ports with two different redirect-chassis.

Mickey



>
> [root@tecs159 ~]#
> [root@tecs159 ~]# ovsdb-client dump
> unix:/var/run/openvswitch/ovnnb_db.sock
> ACL table
> _uuidactiondirection  external_ids
> log   match priority
>  - --
>  -
> 
> 
> --
> 
> ac2900f9-49fd-430a-b646-88d1f7c54ab8 allow from-lport
> {"neutron:lport"="1ef52eb4-1f0e-416d-8dc2-e2fc7557979c"} false "inport ==
> \"1ef52eb4-1f0e-416d-8dc2-e2fc7557979c\" && ip4 && ip4.dst ==
> {255.255.255.255, 100.0.0.0/24} && udp && udp.src == 68 && udp.dst == 67"
> 1002
> 784a55c3-05fd-4c4d-a51e-5b9ee5cc1e8e allow from-lport
> {"neutron:lport"="6c04e45e-ad83-4cf0-ae74-84f7720a5bc4"} false "inport ==
> \"6c04e45e-ad83-4cf0-ae74-84f7720a5bc4\" && ip4 && ip4.dst ==
> {255.255.255.255, 100.0.0.0/24} && udp && udp.src == 68 && udp.dst == 67"
> 1002
> 08be2532-f8ff-493f-83e3-085eede36e08 allow from-lport
> {"neutron:lport"="c5ff4f7b-bd0d-4757-ac18-636f9d62b94c"} false "inport ==
> \"c5ff4f7b-bd0d-4757-ac18-636f9d62b94c\" && ip4 && ip4.dst ==
> {255.255.255.255, 100.0.0.0/24} && udp && udp.src == 68 && udp.dst == 67"
> 1002
> bb263947-a436-4a0d-9218-5abd89546a69 allow from-lport
> {"neutron:lport"="f8de0603-f4ec-4546-a8f3-574640f270e8"} false "inport ==
> \"f8de0603-f4ec-4546-a8f3-574640f270e8\" && ip4 && ip4.dst ==
> {255.255.255.255, 200.0.0.0/24} && udp && udp.src == 68 && udp.dst == 67"
> 1002
> 092964cc-2ce5-4a34-b747-558006bb3de1 allow-related from-lport
> {"neutron:lport"="1ef52eb4-1f0e-416d-8dc2-e2fc7557979c"} false "inport ==
> \"1ef52eb4-1f0e-416d-8dc2-e2fc7557979c\" && ip4" 1002
> 5f2ebb8e-edbc-40aa-ada6-2fc90fc104af allow-related from-lport
> {"neutron:lport"="1ef52eb4-1f0e-416d-8dc2-e2fc7557979c"} false "inport ==
> \"1ef52eb4-1f0e-416d-8dc2-e2fc7557979c\" && ip6" 1002
> 13d32fab-0ed7-4472-97c2-1e3057eaca6e allow-related from-lport
> {"neutron:lport"="6c04e45e-ad83-4cf0-ae74-84f7720a5bc4"} false "inport ==
> \"6c04e45e-ad83-4cf0-ae74-84f7720a5bc4\" && ip4" 1002
> 7fa4e0b0-ffce-436f-a20a-07b0584c3285 allow-related from-lport
> {"neutron:lport"="6c04e45e-ad83-4cf0-ae74-84f7720a5bc4"} false "inport ==
> \"6c04e45e-ad83-4cf0-ae74-84f7720a5bc4\" && ip6" 1002
> b32cf462-a8e5-4597-9c6e-4dc02ae2e2c4 allow-related from-lport
> {"neutron:lport"="c5ff4f7b-bd0d-4757-ac18-636f9d62b94c"} false "inport ==
> \"c5ff4f7b-bd0d-4757-ac18-636f9d62b94c\" && ip4" 1002
> 4d003f24-f546-49fa-a33c-92384e4d3549 allow-related from-lport
> {"neutron:lport"="c5ff4f7b-bd0d-4757-ac18-636f9d62b94c"} false "inport ==
> \"c5ff4f7b-bd0d-4757-ac18-636f9d62b94c\" && ip6" 1002
> 7078873a-fa44-4c64-be7f-067d19361fb4 allow-related from-lport
> {"neutron:lport"="f8de0603-f4ec-4546-a8f3-574640f270e8"} false "inport ==
> \"f8de0603-f4ec-4546-a8f3-574640f270e8\" && ip4" 1002
> a15bd032-9755-45a5-b7ea-9687b9d14560 allow-related from-lport
> {"neutron:lport"="f8de0603-f4ec-4546-a8f3-574640f270e8"} false "inport ==
> 

Re: [ovs-dev] ovn: SFC Patch V3

2017-06-08 Thread Mickey Spiegel
A couple more issues that have not come up in a while or at all so far:

1. A port can have multiple mac addresses. Right now you are only using the
first mac address on a port (traffic_port->nbsp->addresses[0]). Rules need
to be installed for all of the mac addresses on the port.

2. The VNF ports are associated with the logical switch's datapath. Any
rules in non-SFC specific tables (e.g. ingress stages 1-9, egress stages
1-9) will be applied over and over again at each SFC hop. This affects
performance. Besides the overhead of additional redundant lookups, if there
are stateful ACL rules then conntrack recirc will occur. Possibly even
worse, if any of the VNFs change any of the fields in the ACL match
condition, then the packet could fail the ACL and be dropped. One option to
avoid all of this would be to insert a pipeline stage at the beginning of
the ingress pipeline and egress pipelines (we already have one for SFC in
the egress pipeline, which can be reused for this purpose as well),
skipping most if not all other pipeline stages on VNF ports. For
intermediate SFC hops in the ingress pipeline, I guess the rules could be
put in this first table rather than the current table 10. For the egress
pipeline, I guess just output directly?

Mickey


On Wed, May 10, 2017 at 3:49 PM, Mickey Spiegel <mickeys@gmail.com>
wrote:

> Three issues before diving in:
>
>
> 1. Placement of S_SWITCH_IN_CHAIN
>
> For some reason I thought S_SWITCH_IN_CHAIN was after all the stateful
> processing, but now I see that it is table 7. That breaks ACLs and other
> stateful processing, since REGBIT_CONNTRACK_COMMIT is set in
> S_SWITCH_IN_ACL and matched in S_SWITCH_IN_STATEFUL.
>
> S_SWITCH_IN_CHAIN should instead be table 10. The comments below are
> written assuming this change.
>
>
> 2. Ingress pipeline needs to be expanded to more than 16 tables
>
> DNS went in since the v3 patch and used up the last of the 16 ingress
> tables. If you rebase, you will see that there is no space in the ingress
> pipeline for the addition of S_SWITCH_IN_CHAIN. Someone (not me) needs to
> expand the ingress pipeline to more than 16 stages before you can proceed.
>
>
> 3. While writing this response, I paid a little more attention to the
> "exit-lport" direction and noticed a couple of significant issues.
>
> a. If a packet goes from VM A on port 1 to VM B on port 4, there is a
> logical port chain classifier on port 1 in the "entry-lport" direction, and
> there is a logical port chain classifier on port 4 in the "exit-lport"
> direction, you will only go down one of the service chains. Since the
> priorities are equal, I cannot even tell you which one of the service
> chains. Logically I would think that the packet should go down both service
> chains, first the port 1 "entry-lport" service chain and then the port 4
> "exit-lport" service chain.
>
> b. This is done in the ingress pipeline, not the egress pipeline, and is
> based on matching eth.dst. This assumes that the forwarding decision will
> be based on eth.dst, since you are immediately going down the service
> chain, skipping the other ingress pipeline stages, and at the end you go
> directly to the egress pipeline with outport based on eth.dst. That is
> quite restrictive for a generic forwarding architecture like OVN. I would
> think that the right thing to do would be to move the classifier to the
> egress pipeline stage, but then I do not know how to avoid loops. When a
> packet comes to the egress pipeline stage where the VM resides, there is no
> way to tell whether the packet has already gone down the service chain or
> not. I guess you could put a S_SWITCH_IN_EGRESS_CHAIN ingress pipeline
> stage right after L2_LKUP instead, and match on outport in addition to
> eth.dst, but it feels a bit unclean.
>
> On Tue, May 9, 2017 at 4:33 PM, John McDowall <jmcdowall@paloaltonetworks.
> com> wrote:
>
>> Mickey,
>>
>>
>>
>> Thanks for the review. I need some help understanding a couple of things:
>>
>>
>>
>> 1)   The proposed change, I could see the previous logic where we
>> inserted the flow back in the ingress pipeline just after the IN_CHAIN
>> stage. The changes you suggest seem to imply that the action is still
>> insert after the _*IN*_CHAIN stage but in the egress (OUT) pipeline. I
>> am missing something here – can you give me some more info?
>>
> Assume you have port 1 to a VM on HV1, port 2 as the input port to a VNF
> on HV2, and port 3 as the output port from that same VNF on HV2. The
> service chain is just that one VNF, with direction "entry-lport".
>
> The packet progresses as follows:
>
> HV1, ingress pipeline, inport 1
> Tables 

Re: [ovs-dev] 回复: [PATCH] ovn-northd: Add logical flows to reply ICMP echo requests for all the other router ports connected to one switch

2017-06-01 Thread Mickey Spiegel
On Thu, Jun 1, 2017 at 4:28 AM, 钢锁0918  wrote:

> that is for this problem[ovs-dev] [ovs-discuss] ovn: unsnat handling error
> for Distributed  Gatewayhttps://mail.openvswitch.org/pipermail/ovs-
> dev/2017-April/330536.html


I don't understand why this workaround is a good thing. In theory, all
pings should succeed. The point of the ping is to test the datapath with an
actual packet, to make sure that things work. Replying to the ping when you
have gone less than half way to the endpoint seems to me to defeat the
purpose.

Mickey



>
> *RTFSC*
> --发件人:Ben
> Pfaff 发送时间:2017年6月1日(星期四) 07:32收件人:钢锁0918 抄
> 送:dev 主 题:Re: [ovs-dev] [PATCH] ovn-northd: Add
> logical flows to reply ICMP echo requests for all the other router ports
> connected to one switch
> I don't understand this yet.  The OVN logical router already has logic
> to reply to ICMP requests.  Can you add some more explanation to the
> commit message?
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovn: increase size of ingress and egress pipelines

2017-05-17 Thread Mickey Spiegel
The OVN ingress pipeline for a logical switch is maxed out at 16 stages.

This patch takes the simple approach of starting the ingress pipeline at
table 8 rather than table 16, and starting the egress pipeline at
table 40 rather than table 48.

v1->v2:
Bumped range of Logical_Flow.table_id column in ovn/ovn-sb.ovsschema
from 0 to 15, to 0 to 23.
Ran automated tests with an extra noop table, pushing S_SWITCH_IN_L2_LKUP
to 16.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/controller/lflow.h |  6 +++---
 ovn/ovn-architecture.7.xml | 27 ++-
 ovn/ovn-sb.ovsschema   |  6 +++---
 ovn/utilities/ovn-trace.c  |  2 +-
 tests/ovn.at   | 40 
 tests/test-ovn.c   |  6 +++---
 6 files changed, 44 insertions(+), 43 deletions(-)

diff --git a/ovn/controller/lflow.h b/ovn/controller/lflow.h
index 8761b1e..a23cde0 100644
--- a/ovn/controller/lflow.h
+++ b/ovn/controller/lflow.h
@@ -49,17 +49,17 @@ struct uuid;
  * These are heavily documented in ovn-architecture(7), please update it if
  * you make any changes. */
 #define OFTABLE_PHY_TO_LOG0
-#define OFTABLE_LOG_INGRESS_PIPELINE 16 /* First of LOG_PIPELINE_LEN tables. */
+#define OFTABLE_LOG_INGRESS_PIPELINE  8 /* First of LOG_PIPELINE_LEN tables. */
 #define OFTABLE_REMOTE_OUTPUT32
 #define OFTABLE_LOCAL_OUTPUT 33
 #define OFTABLE_CHECK_LOOPBACK   34
-#define OFTABLE_LOG_EGRESS_PIPELINE  48 /* First of LOG_PIPELINE_LEN tables. */
+#define OFTABLE_LOG_EGRESS_PIPELINE  40 /* First of LOG_PIPELINE_LEN tables. */
 #define OFTABLE_SAVE_INPORT  64
 #define OFTABLE_LOG_TO_PHY   65
 #define OFTABLE_MAC_BINDING  66
 
 /* The number of tables for the ingress and egress pipelines. */
-#define LOG_PIPELINE_LEN 16
+#define LOG_PIPELINE_LEN 24
 
 void lflow_init(void);
 void lflow_run(struct controller_ctx *,
diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index d8114f1..eb6744b 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -774,7 +774,7 @@
 VXLAN tunnels do not transmit the logical output port field.
 Since VXLAN tunnels do not carry a logical output port field in
 the tunnel key, when a packet is received from VXLAN tunnel by
-an OVN hypervisor, the packet is resubmitted to table 16 to
+an OVN hypervisor, the packet is resubmitted to table 8 to
 determine the output port(s);  when the packet reaches table 32,
 these packets are resubmitted to table 33 for local delivery by
 checking a MLF_RCV_FROM_VXLAN flag, which is set when the packet
@@ -835,7 +835,7 @@
 the packet's ingress port.  Its actions annotate the packet with
 logical metadata, by setting the logical datapath field to identify the
 logical datapath that the packet is traversing and the logical input
-port field to identify the ingress port.  Then it resubmits to table 16
+port field to identify the ingress port.  Then it resubmits to table 8
 to enter the logical ingress pipeline.
   
 
@@ -864,13 +864,13 @@
 
 
   
-OpenFlow tables 16 through 31 execute the logical ingress pipeline from
+OpenFlow tables 8 through 31 execute the logical ingress pipeline from
 the Logical_Flow table in the OVN Southbound database.
 These tables are expressed entirely in terms of logical concepts like
 logical ports and logical datapaths.  A big part of
 ovn-controller's job is to translate them into equivalent
 OpenFlow (in particular it translates the table numbers:
-Logical_Flow tables 0 through 15 become OpenFlow tables 16
+Logical_Flow tables 0 through 23 become OpenFlow tables 8
 through 31).
   
 
@@ -999,7 +999,7 @@
 and resubmit these packets to table 33 for local delivery. Packets
 received from VXLAN tunnels reach here because of a lack of logical
 output port field in the tunnel key and thus these packets needed to
-be submitted to table 16 to determine the output port.
+be submitted to table 8 to determine the output port.
   
 
   
@@ -1024,13 +1024,13 @@
   
 Table 34 matches and drops packets for which the logical input and
 output ports are the same and the MLF_ALLOW_LOOPBACK flag is not
-set.  It resubmits other packets to table 48.
+set.  It resubmits other packets to table 40.
   
 
 
 
   
-OpenFlow tables 48 through 63 execute the logical egress pipeline from
+OpenFlow tables 40 through 63 execute the logical egress pipeline from
 the Logical_Flow table in the OVN Southbound database.
 The egress pipeline can perform a final stage of validation before
 packet delivery.  Eventually, it may execute an output
@@ -1110,27 +1110,28 @@
 
 
   In OVS ver

[ovs-dev] [PATCH] ovn: increase size of ingress and egress pipelines

2017-05-11 Thread Mickey Spiegel
The OVN ingress pipeline for a logical switch is maxed out at 16 stages.

This patch takes the simple approach of starting the ingress pipeline at
table 8 rather than table 16, and starting the egress pipeline at
table 40 rather than table 48.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/controller/lflow.h |  6 +++---
 ovn/ovn-architecture.7.xml | 27 ++-
 ovn/utilities/ovn-trace.c  |  2 +-
 tests/ovn.at   | 40 
 tests/test-ovn.c   |  6 +++---
 5 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/ovn/controller/lflow.h b/ovn/controller/lflow.h
index 8761b1e..a23cde0 100644
--- a/ovn/controller/lflow.h
+++ b/ovn/controller/lflow.h
@@ -49,17 +49,17 @@ struct uuid;
  * These are heavily documented in ovn-architecture(7), please update it if
  * you make any changes. */
 #define OFTABLE_PHY_TO_LOG0
-#define OFTABLE_LOG_INGRESS_PIPELINE 16 /* First of LOG_PIPELINE_LEN tables. */
+#define OFTABLE_LOG_INGRESS_PIPELINE  8 /* First of LOG_PIPELINE_LEN tables. */
 #define OFTABLE_REMOTE_OUTPUT32
 #define OFTABLE_LOCAL_OUTPUT 33
 #define OFTABLE_CHECK_LOOPBACK   34
-#define OFTABLE_LOG_EGRESS_PIPELINE  48 /* First of LOG_PIPELINE_LEN tables. */
+#define OFTABLE_LOG_EGRESS_PIPELINE  40 /* First of LOG_PIPELINE_LEN tables. */
 #define OFTABLE_SAVE_INPORT  64
 #define OFTABLE_LOG_TO_PHY   65
 #define OFTABLE_MAC_BINDING  66
 
 /* The number of tables for the ingress and egress pipelines. */
-#define LOG_PIPELINE_LEN 16
+#define LOG_PIPELINE_LEN 24
 
 void lflow_init(void);
 void lflow_run(struct controller_ctx *,
diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index d8114f1..eb6744b 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -774,7 +774,7 @@
 VXLAN tunnels do not transmit the logical output port field.
 Since VXLAN tunnels do not carry a logical output port field in
 the tunnel key, when a packet is received from VXLAN tunnel by
-an OVN hypervisor, the packet is resubmitted to table 16 to
+an OVN hypervisor, the packet is resubmitted to table 8 to
 determine the output port(s);  when the packet reaches table 32,
 these packets are resubmitted to table 33 for local delivery by
 checking a MLF_RCV_FROM_VXLAN flag, which is set when the packet
@@ -835,7 +835,7 @@
 the packet's ingress port.  Its actions annotate the packet with
 logical metadata, by setting the logical datapath field to identify the
 logical datapath that the packet is traversing and the logical input
-port field to identify the ingress port.  Then it resubmits to table 16
+port field to identify the ingress port.  Then it resubmits to table 8
 to enter the logical ingress pipeline.
   
 
@@ -864,13 +864,13 @@
 
 
   
-OpenFlow tables 16 through 31 execute the logical ingress pipeline from
+OpenFlow tables 8 through 31 execute the logical ingress pipeline from
 the Logical_Flow table in the OVN Southbound database.
 These tables are expressed entirely in terms of logical concepts like
 logical ports and logical datapaths.  A big part of
 ovn-controller's job is to translate them into equivalent
 OpenFlow (in particular it translates the table numbers:
-Logical_Flow tables 0 through 15 become OpenFlow tables 16
+Logical_Flow tables 0 through 23 become OpenFlow tables 8
 through 31).
   
 
@@ -999,7 +999,7 @@
 and resubmit these packets to table 33 for local delivery. Packets
 received from VXLAN tunnels reach here because of a lack of logical
 output port field in the tunnel key and thus these packets needed to
-be submitted to table 16 to determine the output port.
+be submitted to table 8 to determine the output port.
   
 
   
@@ -1024,13 +1024,13 @@
   
 Table 34 matches and drops packets for which the logical input and
 output ports are the same and the MLF_ALLOW_LOOPBACK flag is not
-set.  It resubmits other packets to table 48.
+set.  It resubmits other packets to table 40.
   
 
 
 
   
-OpenFlow tables 48 through 63 execute the logical egress pipeline from
+OpenFlow tables 40 through 63 execute the logical egress pipeline from
 the Logical_Flow table in the OVN Southbound database.
 The egress pipeline can perform a final stage of validation before
 packet delivery.  Eventually, it may execute an output
@@ -1110,27 +1110,28 @@
 
 
   In OVS versions 2.7 and later, the packet is cloned and resubmitted
-  directly to OpenFlow flow table 16, setting the logical ingress
-  port to the peer logical patch port, and using the peer logical
-  patch port's l

Re: [ovs-dev] OVN: Increasing size of Switch Ingress Pipeline Stage Table

2017-05-11 Thread Mickey Spiegel
On Thu, May 11, 2017 at 11:05 AM, John McDowall <
jmcdow...@paloaltonetworks.com> wrote:

> With the addition of the DNS stages there are no entries left in the
> PIPELINE_STAGE, SWITCH IN table. I need one for SFC.  As this is a core
> part of the infrastructure I do not want to make changes without advice
> from the core OVN team.
>
> What is the best approach?
>

I quickly wrote up a patch with a simple approach. I will put it out for
review as one option. No manual testing so far, so others may want to vet
it more thoroughly.

Mickey


>
> Regards
>
> John
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn: SFC Patch V3

2017-05-10 Thread Mickey Spiegel
. With this alternative, all packets go
through Tables 1 to 9 on HV1 twice. We have to make a judgement call which
alternative is least ugly, or move to NSH and stuff the metadata in the NSH
header.

>
>
> +for (int ii = 0; ii < MFF_N_LOG_REGS; ii++) {
> +ds_put_format(, "reg%d = 0; ", ii);
> +}
> +ds_put_format(, "next(pipeline=ingress, table=%d); };",
> +ovn_stage_get_table(S_SWITCH_IN_CHAIN) + 1);
> +ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
> +lcc_match, ds_cstr());
>
>
>
> Replace the line above by:
>
>
>
> ovn_lflow_add(lflows, od, S_SWITCH_OUT_SFC_LOOPBACK, 100,
> lcc_match, ds_cstr());
>
>
>
>
>
>
>
> 2)   I can try and put some checks in for loop avoidance. Can you
> think of scenarios that would cause this, a badly configured port-pair
> could perhaps cause it (if the eth egress of the port-pair was configured
> as the ingress eth.) Any other scenarios that come to mind ?
>

I cannot think of a good reason why a packet would end up on the egress
pipeline on HV1 with outport == 1.
After thinking about it more, I think it is OK if flags.loopback is left as
0. If this case is ever triggered and the second time around through the
ingress pipeline still sets outport = 1, then the Table 34 loopback check
will detect that outport == inport and drop the packet.

Mickey


>
> Regards
>
>
>
> John
>
> *From: *Mickey Spiegel <mickeys@gmail.com>
> *Date: *Monday, April 24, 2017 at 6:39 PM
> *To: *John McDowall <jmcdow...@paloaltonetworks.com>
> *Cc: *"ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
> *Subject: *Re: [ovs-dev] ovn: SFC Patch V3
>
>
>
>
>
> On Mon, Apr 24, 2017 at 12:56 PM, <jmcdow...@paloaltonetworks.com> wrote:
>
> From: John McDowall <jmcdow...@paloaltonetworks.com>
>
>
> Fixed changes from Mickey's last review.
>
> Changes
>
> 1) Fixed re-circulation rules
>
>
>
> Still a few modifications required. See comments inline. I just typed some
> stuff out, have not run, built, or tested anything.
>
>
>
> 2) Fixed match statement - match is only applied to beginnning of chain in
>each direction.
> 3) Fixed array length of chain of VNFs. I have tested thi sup to three VNFs
>in a chain and it looks like it works in both directions.
>
> Areas to review
>
> 1) The logic now supports hair-pinnign the flow back to the original
> source to
>ensure that the MAC learnign problem is addressed. I tested this using
>ovn-trace - any better testing that I should do?
>
> Current todo list
>
> 1) I have standalone tests need to add tests to ovs/ovn framework.
> 2) Load-balancing support for port-pair-groups
> 3) Publish more detailed examples.
> 4) Submit suggestions to change and shorted the CLI names.
>
> Simple example using ovn-trace
>
> #!/bin/sh
> #
> clear
> ovn-nbctl ls-add swt1
>
> ovn-nbctl lsp-add swt1 swt1-appc
> ovn-nbctl lsp-add swt1 swt1-apps
> ovn-nbctl lsp-add swt1 swt1-vnfp1
> ovn-nbctl lsp-add swt1 swt1-vnfp2
>
> ovn-nbctl lsp-set-addresses swt1-appc "00:00:00:00:00:01 192.168.33.1"
> ovn-nbctl lsp-set-addresses swt1-apps "00:00:00:00:00:02 192.168.33.2"
> ovn-nbctl lsp-set-addresses swt1-vnfp1 00:00:00:00:00:03
> ovn-nbctl lsp-set-addresses swt1-vnfp2 00:00:00:00:00:04
> #
> # Configure Service chain
> #
> ovn-nbctl lsp-pair-add swt1 swt1-vnfp1 swt1-vnfp2 pp1
> ovn-nbctl lsp-chain-add swt1 pc1
> ovn-nbctl lsp-pair-group-add pc1 ppg1
> ovn-nbctl lsp-pair-group-add-port-pair ppg1 pp1
> ovn-nbctl lsp-chain-classifier-add swt1 pc1 swt1-appc "entry-lport"
> "bi-directional" pcc1
> #
> ovn-sbctl dump-flows
> #
> # Run trace command
> printf "\n-Flow 1 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-appc" && eth.src ==
> 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> printf "\n-Flow 2 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-vnfp1" && eth.src ==
> 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> printf "\n-Flow 3 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-apps" && eth.dst ==
> 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
> printf "\n-Flow 4 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-vnfp2" && eth.dst ==
> 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
> #
> # Cleanup
> #
> ov

Re: [ovs-dev] 答复: Re: 答复: Re: [PATCH] ovn-controller: Support vxlan tunnel in ovn

2017-05-07 Thread Mickey Spiegel
There are some assumptions that you are making which need to be called out.
These assumptions may not hold going forward. In fact I refer to two
different patches below that are currently under review, that break your
assumptions.

On Fri, May 5, 2017 at 7:18 PM,  wrote:

> Hi,Russell
>
> We think vxlan is the most commonly used tunnel encapsulation in the
> overlay network openstack,ovn should better consider it.
>
> As my workmate wang qianyu said,we would consider computer node connect
> with existing hardware switches which associates with SR-IOV as VTEP.
>
> After discussion, we feel that as long as the following changes for vxlan
> tunnel in the table0:
>
> 1.For local switch, move MFF_TUN_ID to MFF_LOG_DATAPATH, resubmit to
> OFTABLE_ETH_UCAST(table29)
>

It looks like you are overloading OFTABLE_ETH_UCAST that you define here
with S_SWITCH_IN_L2_LKUP in ovn/northd/ovn-northd.c. Hardcoding the value
to 29 in ovn/controller/lflow.h is not the way to do this. This pipeline
stage does move back as new features are added. In fact it is now table 31
due to the recent addition of 2 tables for DNS lookup.


> //---In table29, we can find out dst port based on dst mac
>

You are assuming that outport determination is only based on
S_SWITCH_IN_L2_LKUP with no impact from any other ingress pipeline stages.
This may not always be true, which I think is the point of Ben's complaint.
For example the SFC patch that is currently under review (
http://patchwork.ozlabs.org/patch/754427/) may set outport and then do
"output" in the ingress pipeline, in a pipeline stage other than
S_SWITCH_IN_L2_LKUP.

The alternative is to go through the entire ingress pipeline, but here you
have a problem since you do not know the inport. The current VTEP-centric
VXLAN code assumes that there is only one port binding per datapath from
the VTEP chassis. For the general case that you are trying to address, this
assumption does not hold, so you cannot properly determine the inport. The
inport may affect the ultimate decision on outport. This is certainly the
case for the SFC patch currently under review.

You are also assuming that inport does not affect anything in the egress
pipeline. This seems to be true at the moment, but this might not always be
true as features are added.

The existing VTEP functionality does not rely on the assumptions that you
made, but since you changed the logic to determine inport in case of VXLAN,
you are changing existing functionality.


> 2.For local chassisredirect port, move MFF_TUN_ID to MFF_LOG_DATAPATH, set
> port tunnel_key to MFF_LOG_OUTPORT and then resubmit to
> OFTABLE_LOCAL_OUTPUT.
> //---In table 33, we can find out dst local sw and sw patch port based on
> the local chassisredirect port,and then follow the exsiting flows.
>

At the moment, the only case where a tunnel is used for a datapath
representing a logical router is when the outport is a chassisredirect
port. Your code assumes that will always be the case. If we do what you are
suggesting, then that becomes a restriction for all logical router features
going forward.

This code also assumes that there can only be one chassisredirect port per
datapath per chassis. There is a patch that has not yet been reviewed (
http://patchwork.ozlabs.org/patch/732815/) that proposes multiple
distributed gateway ports (and correspondingly chassisredirect ports) on
one datapath. I am not sure what the use case is, but if that feature were
added and more than one distributed gateway port on one datapath specified
the same redirect-chassis, it would break this code.

This version of the code is better than the original version, which was
based on a hack that used table 29 on a datapath for a logical router
(!!!), adding contrived flows that matched VXLAN tunnel bit, metadata ==
datapath->tunnel_key, and MAC address from the SB MAC_Binding table in
order to set the outport to a chassisredirect port, based on the assumption
that tunnels are only used for logical routers when the outport is a
chassisredirect port.

Mickey



> Next step, we will consider how ovn-controller-hw manages SR-IOV as well.
>
> Waiting for your suggestions,Thanks.
>
> ---
>  ovn/controller/lflow.h|  1 +
>  ovn/controller/physical.c | 17 +
>  2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/ovn/controller/lflow.h b/ovn/controller/lflow.h
> index 4f284a0..418f59e 100644
> --- a/ovn/controller/lflow.h
> +++ b/ovn/controller/lflow.h
> @@ -50,6 +50,7 @@ struct uuid;
>   * you make any changes. */
>  #define OFTABLE_PHY_TO_LOG0
>  #define OFTABLE_LOG_INGRESS_PIPELINE 16 /* First of LOG_PIPELINE_LEN
> tables. */
> +#define OFTABLE_ETH_UCAST29
>  #define OFTABLE_REMOTE_OUTPUT32
>  #define OFTABLE_LOCAL_OUTPUT 33
>  #define OFTABLE_CHECK_LOOPBACK   34
> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> index 0f1aa63..d34140f 100644
> --- a/ovn/controller/physical.c
> 

Re: [ovs-dev] [PATCH v2 09/13] ovn-trace: Add some basic tracing for ct_snat and ct_dnat actions.

2017-05-03 Thread Mickey Spiegel
On Wed, May 3, 2017 at 8:45 AM, Ben Pfaff <b...@ovn.org> wrote:

> Without this support, ovn-trace is not very useful with OpenStack, which
> uses connection tracking extensively.
>
> Signed-off-by: Ben Pfaff <b...@ovn.org>
>

Acked-by: Mickey Spiegel <mickeys@gmail.com>

A couple of minor comments below.

---
>  ovn/utilities/ovn-trace.8.xml | 50 ++
> +++
>  ovn/utilities/ovn-trace.c | 52 ++
> ++---
>  2 files changed, 99 insertions(+), 3 deletions(-)
>
> diff --git a/ovn/utilities/ovn-trace.8.xml b/ovn/utilities/ovn-trace.8.xml
> index 8bb329bfbd71..b2d46ac3d50b 100644
> --- a/ovn/utilities/ovn-trace.8.xml
> +++ b/ovn/utilities/ovn-trace.8.xml
> @@ -166,6 +166,56 @@
>  output;
>
>
> +  Stateful Actions
>

The flow is a little funny. At the beginning of the output section:

ovn-trace supports the three different forms of output,
each
described in a separate section below.

Now there is another h2 section in between the three sections describing
the different forms of output.

Perhaps just use ?

+
> +  
> +Some OVN logical actions use or update state that is not available in
> the
> +southbound database.  ovn-trace handles these actions as
> +described below:
> +  
> +
> +  
> +ct_next
> +
> +  By default ovn-trace treats flows as ``tracked'' and
> +  ``established.''  The --ct option overrides this
> behavior;
> +  refer to its description for more information.
> +
> +
> +ct_commit
> +
> +  This action is treated as a no-op.
> +
> +
> +ct_dnat
> +ct_snat
> +
> +  
> +When one of these action is used without arguments, to ``un-NAT''
> a
>

s/action/actions


> +packet, ovn-trace assumes that no NAT state was
> available
> +and treats it as a no-op.
> +  
> +
> +  
> +With an argument, ovn-trace sets the IP destination
> or
> +source, as appropriate, to the given address. It also sets
> +ct.dnat or ct.snat to 1 to indicate
> that NAT
> +has taken place.
> +  
> +
> +
> +ct_lb
> +
> +  Not yet implemented; currently implemented as a no-op.
> +
> +
> +put_arp
> +put_nd
> +
> +  This action is treated as a no-op.
> +
> +  
> +
>Summary Output
>
>
> diff --git a/ovn/utilities/ovn-trace.c b/ovn/utilities/ovn-trace.c
> index 3a0780eb931e..860fd4b26be0 100644
> --- a/ovn/utilities/ovn-trace.c
> +++ b/ovn/utilities/ovn-trace.c
> @@ -346,6 +346,8 @@ struct ovntrace_datapath {
>  size_t n_flows, allocated_flows;
>
>  struct hmap mac_bindings;   /* Contains "struct
> ovntrace_mac_binding"s. */
> +
> +bool has_local_l3gateway;
>  };
>
>  struct ovntrace_port {
> @@ -570,6 +572,9 @@ read_ports(void)
>  port->peer->peer = port;
>  }
>  }
> +} else if (!strcmp(sbpb->type, "l3gateway")) {
> +/* Treat all gateways as local for our purposes. */
> +dp->has_local_l3gateway = true;
>  }
>  }
>
> @@ -1522,6 +1527,46 @@ execute_ct_next(const struct ovnact_ct_next
> *ct_next,
>  }
>
>  static void
> +execute_ct_nat(const struct ovnact_ct_nat *ct_nat,
> +   const struct ovntrace_datapath *dp, struct flow *uflow,
> +   enum ovnact_pipeline pipeline, struct ovs_list *super)
> +{
> +bool is_dst = ct_nat->ovnact.type == OVNACT_CT_DNAT;
> +if (!is_dst && dp->has_local_l3gateway && !ct_nat->ip) {
> +/* "ct_snat;" has no visible effect in a gateway router. */
> +return;
> +}
> +const char *direction = is_dst ? "dst" : "src";
> +
> +/* Make a sub-node for attaching the next table,
> + * and figure out the changes if any. */
> +struct flow ct_flow = *uflow;
> +struct ds s = DS_EMPTY_INITIALIZER;
> +ds_put_format(, "ct_%cnat", direction[0]);
> +if (ct_nat->ip) {
> +ds_put_format(, "(ip4.%s="IP_FMT")", direction,
> IP_ARGS(ct_nat->ip));
> +ovs_be32 *ip = is_dst ? _flow.nw_dst : _flow.nw_src;
> +*ip = ct_nat->ip;
> +
> +uint8_t state = is_dst ? CS_DST_NAT : CS_SRC_NAT;
> +ct_flow.ct_state |= state;
> +} else {
> +ds_put_format(, " /* assuming no un-%cnat entry, so no change
> */",
> +  direction[0]);
> +}
> +struct ovntrace_node *node = ovntrace_no

Re: [ovs-dev] [PATCH 23/27] ovn-trace: Add some basic tracing for ct_snat and ct_dnat actions.

2017-05-02 Thread Mickey Spiegel
One minor nit and one real comment below.

On Tue, May 2, 2017 at 11:07 AM, Ben Pfaff <b...@ovn.org> wrote:

> On Mon, May 01, 2017 at 05:50:57PM -0700, Mickey Spiegel wrote:
> > On Mon, May 1, 2017 at 5:12 PM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > > On Mon, May 01, 2017 at 03:39:32PM -0700, Mickey Spiegel wrote:
> > > > On Sun, Apr 30, 2017 at 4:22 PM, Ben Pfaff <b...@ovn.org> wrote:
> > > >
> > > > > Without this support, ovn-trace is not very useful with OpenStack,
> > > which
> > > > > uses connection tracking extensively.
> > > > >
> > > >
> > > > I scanned the patch set briefly, not what I would call a full review
> but
> > > > quick sanity checking. The only issue that I saw is described inline
> > > below.
> > >
> > > Thanks!
> > >
> > > > > +struct ovntrace_node *node = ovntrace_node_append(
> > > > > +super, OVNTRACE_NODE_TRANSFORMATION, "%s", ds_cstr());
> > > > > +ds_destroy();
> > > > > +
> > > > > +/* Trace the actions in the next table. */
> > > > > +trace__(dp, _flow, ct_nat->ltable, pipeline, >subs);
> > > > >
> > > >
> > > > Since OpenStack uses NAT on distributed routers, moving on to the
> next
> > > > table is the right thing to do.
> > > >
> > > > However, in case gateway routers are used, ct_snat without an IP
> address
> > > > does not do recirc.
> > > > Lines 832 to 842 of ovn/lib/actions.c:
> > > >
> > > > } else if (snat && ep->is_gateway_router) {
> > > > /* For performance reasons, we try to prevent additional
> > > >  * recirculations.  ct_snat which is used in a gateway router
> > > >  * does not need a recirculation.  ct_snat(IP) does need a
> > > >  * recirculation.  ct_snat in a distributed router needs
> > > >  * recirculation regardless of whether an IP address is
> > > >  * specified.
> > > >  * XXX Should we consider a method to let the actions specify
> > > >  * whether an action needs recirculation if there are more
> use
> > > >  * cases?. */
> > > > ct->recirc_table = NX_CT_RECIRC_NONE;
> > > >
> > > > Lines 4548, 4549 of ovn/northd/ovn-northd.c for a gateway router:
> > > >
> > > > ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 90,
> > > >   ds_cstr(), "ct_snat; next;");
> > > >
> > > > The corresponding lines 4565, 4566 of ovn/northd/ovn-northd.c for a
> > > > distributed router:
> > > >
> > > > ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT,
> 100,
> > > >   ds_cstr(), "ct_snat;");
> > > >
> > > > I think with this code you would be seeing double on a gateway
> router,
> > > > since both "ct_snat" and "next" would trace the actions in the next
> > > table.
> > >
> > > Oh, that's a good point.
> > >
> > > From lflow.c, a given router is a gateway router if its datapath is
> > > present on the local hypervisor and it has a local L3 gateway:
> > >
> > > static bool
> > > is_gateway_router(const struct sbrec_datapath_binding *ldp,
> > >   const struct hmap *local_datapaths)
> > > {
> > > struct local_datapath *ld =
> > > get_local_datapath(local_datapaths, ldp->tunnel_key);
> > > return ld ? ld->has_local_l3gateway : false;
> > > }
> > >
> > > Therefore, this is another bit of context that ovn-trace would need to
> > > be provided via command-line options.  I guess it would have to be
> > > something like "--gateway-router no,yes" to indicate, for example, that
> > > the first snat is not for a gateway router and that the second one is
> > > (or whatever).  And I'd tend to assume that the default is "no" since
> > > that makes the OpenStack case work OK.  Mickey and Guru, does this
> > > concept and syntax make sense?  If not, can you suggest a way?
> > >
> >
> > Two ways to figure out if a router is a gateway router or not:
> >
> > 1. If you have access to nb, if th

Re: [ovs-dev] [PATCH 23/27] ovn-trace: Add some basic tracing for ct_snat and ct_dnat actions.

2017-05-01 Thread Mickey Spiegel
On Mon, May 1, 2017 at 5:12 PM, Ben Pfaff <b...@ovn.org> wrote:

> On Mon, May 01, 2017 at 03:39:32PM -0700, Mickey Spiegel wrote:
> > On Sun, Apr 30, 2017 at 4:22 PM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > > Without this support, ovn-trace is not very useful with OpenStack,
> which
> > > uses connection tracking extensively.
> > >
> >
> > I scanned the patch set briefly, not what I would call a full review but
> > quick sanity checking. The only issue that I saw is described inline
> below.
>
> Thanks!
>
> > > +struct ovntrace_node *node = ovntrace_node_append(
> > > +super, OVNTRACE_NODE_TRANSFORMATION, "%s", ds_cstr());
> > > +ds_destroy();
> > > +
> > > +/* Trace the actions in the next table. */
> > > +trace__(dp, _flow, ct_nat->ltable, pipeline, >subs);
> > >
> >
> > Since OpenStack uses NAT on distributed routers, moving on to the next
> > table is the right thing to do.
> >
> > However, in case gateway routers are used, ct_snat without an IP address
> > does not do recirc.
> > Lines 832 to 842 of ovn/lib/actions.c:
> >
> > } else if (snat && ep->is_gateway_router) {
> > /* For performance reasons, we try to prevent additional
> >  * recirculations.  ct_snat which is used in a gateway router
> >  * does not need a recirculation.  ct_snat(IP) does need a
> >  * recirculation.  ct_snat in a distributed router needs
> >  * recirculation regardless of whether an IP address is
> >  * specified.
> >  * XXX Should we consider a method to let the actions specify
> >  * whether an action needs recirculation if there are more use
> >  * cases?. */
> > ct->recirc_table = NX_CT_RECIRC_NONE;
> >
> > Lines 4548, 4549 of ovn/northd/ovn-northd.c for a gateway router:
> >
> > ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 90,
> >   ds_cstr(), "ct_snat; next;");
> >
> > The corresponding lines 4565, 4566 of ovn/northd/ovn-northd.c for a
> > distributed router:
> >
> > ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100,
> >   ds_cstr(), "ct_snat;");
> >
> > I think with this code you would be seeing double on a gateway router,
> > since both "ct_snat" and "next" would trace the actions in the next
> table.
>
> Oh, that's a good point.
>
> From lflow.c, a given router is a gateway router if its datapath is
> present on the local hypervisor and it has a local L3 gateway:
>
> static bool
> is_gateway_router(const struct sbrec_datapath_binding *ldp,
>   const struct hmap *local_datapaths)
> {
> struct local_datapath *ld =
> get_local_datapath(local_datapaths, ldp->tunnel_key);
> return ld ? ld->has_local_l3gateway : false;
> }
>
> Therefore, this is another bit of context that ovn-trace would need to
> be provided via command-line options.  I guess it would have to be
> something like "--gateway-router no,yes" to indicate, for example, that
> the first snat is not for a gateway router and that the second one is
> (or whatever).  And I'd tend to assume that the default is "no" since
> that makes the OpenStack case work OK.  Mickey and Guru, does this
> concept and syntax make sense?  If not, can you suggest a way?
>

Two ways to figure out if a router is a gateway router or not:

1. If you have access to nb, if the logical router has options:chassis then
it is a gateway router.
2. From sb, while processing read_ports in ovn/utilities/ovn-trace.c, any
ports with type "l3gateway" on a datapath representing a router indicate
that the router is a gateway router. That is more or less what
ovn-controller does in "add_local_datapath" in ovn/controller/binding.c to
set "has_local_l3gateway", which ends up triggering no recirc in
ovn/lib/actions.c.

The next question is whether a specific gateway router should be treated as
local. Since ovn-trace has no knowledge of topology and hypervisors, it
seems like the consistent approach would be to treat all gateway routers as
local for the purposes of ovn-trace.

Mickey


> Thanks,
>
> Ben.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn: SFC Patch V3

2017-04-24 Thread Mickey Spiegel
On Mon, Apr 24, 2017 at 12:56 PM,  wrote:

> From: John McDowall 
>
>
> Fixed changes from Mickey's last review.
>
> Changes
>
> 1) Fixed re-circulation rules
>

Still a few modifications required. See comments inline. I just typed some
stuff out, have not run, built, or tested anything.


> 2) Fixed match statement - match is only applied to beginnning of chain in
>each direction.
> 3) Fixed array length of chain of VNFs. I have tested thi sup to three VNFs
>in a chain and it looks like it works in both directions.
>
> Areas to review
>
> 1) The logic now supports hair-pinnign the flow back to the original
> source to
>ensure that the MAC learnign problem is addressed. I tested this using
>ovn-trace - any better testing that I should do?
>
> Current todo list
>
> 1) I have standalone tests need to add tests to ovs/ovn framework.
> 2) Load-balancing support for port-pair-groups
> 3) Publish more detailed examples.
> 4) Submit suggestions to change and shorted the CLI names.
>
> Simple example using ovn-trace
>
> #!/bin/sh
> #
> clear
> ovn-nbctl ls-add swt1
>
> ovn-nbctl lsp-add swt1 swt1-appc
> ovn-nbctl lsp-add swt1 swt1-apps
> ovn-nbctl lsp-add swt1 swt1-vnfp1
> ovn-nbctl lsp-add swt1 swt1-vnfp2
>
> ovn-nbctl lsp-set-addresses swt1-appc "00:00:00:00:00:01 192.168.33.1"
> ovn-nbctl lsp-set-addresses swt1-apps "00:00:00:00:00:02 192.168.33.2"
> ovn-nbctl lsp-set-addresses swt1-vnfp1 00:00:00:00:00:03
> ovn-nbctl lsp-set-addresses swt1-vnfp2 00:00:00:00:00:04
> #
> # Configure Service chain
> #
> ovn-nbctl lsp-pair-add swt1 swt1-vnfp1 swt1-vnfp2 pp1
> ovn-nbctl lsp-chain-add swt1 pc1
> ovn-nbctl lsp-pair-group-add pc1 ppg1
> ovn-nbctl lsp-pair-group-add-port-pair ppg1 pp1
> ovn-nbctl lsp-chain-classifier-add swt1 pc1 swt1-appc "entry-lport"
> "bi-directional" pcc1
> #
> ovn-sbctl dump-flows
> #
> # Run trace command
> printf "\n-Flow 1 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-appc" && eth.src ==
> 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> printf "\n-Flow 2 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-vnfp1" && eth.src ==
> 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> printf "\n-Flow 3 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-apps" && eth.dst ==
> 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
> printf "\n-Flow 4 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-vnfp2" && eth.dst ==
> 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
> #
> # Cleanup
> #
> ovn-nbctl lsp-chain-classifier-del pcc1
> ovn-nbctl lsp-pair-group-del ppg1
> ovn-nbctl lsp-chain-del pc1
> ovn-nbctl lsp-pair-del pp1
> ovn-nbctl ls-del swt1
>
> Reported at: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-
> March/040381.html
> Reported at: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-
> May/041359.html
>
> Signed-off-by: John McDowall 
> Signed-off-by: Flavio Fernandes 
> Co-authored-by: Flavio Fernandes 
> ---
>  ovn/northd/ovn-northd.8.xml   |   69 ++-
>  ovn/northd/ovn-northd.c   |  382 -
>  ovn/ovn-architecture.7.xml|   91 
>  ovn/ovn-nb.ovsschema  |   87 ++-
>  ovn/ovn-nb.xml|  188 ++-
>  ovn/utilities/ovn-nbctl.8.xml |  231 
>  ovn/utilities/ovn-nbctl.c | 1208 ++
> +++
>  7 files changed, 2227 insertions(+), 29 deletions(-)
>
>



>
> diff --git ovn/northd/ovn-northd.c ovn/northd/ovn-northd.c
> index d0a5ba2..090f768 100644
> --- ovn/northd/ovn-northd.c
> +++ ovn/northd/ovn-northd.c
> @@ -106,13 +106,14 @@ enum ovn_stage {
>  PIPELINE_STAGE(SWITCH, IN,  PRE_LB, 4, "ls_in_pre_lb")
> \
>  PIPELINE_STAGE(SWITCH, IN,  PRE_STATEFUL,   5, "ls_in_pre_stateful")
> \
>  PIPELINE_STAGE(SWITCH, IN,  ACL,6, "ls_in_acl")
>  \
> -PIPELINE_STAGE(SWITCH, IN,  QOS_MARK,   7, "ls_in_qos_mark")
> \
> -PIPELINE_STAGE(SWITCH, IN,  LB, 8, "ls_in_lb")
> \
> -PIPELINE_STAGE(SWITCH, IN,  STATEFUL,   9, "ls_in_stateful")
> \
> -PIPELINE_STAGE(SWITCH, IN,  ARP_ND_RSP,10, "ls_in_arp_rsp")
>  \
> -PIPELINE_STAGE(SWITCH, IN,  DHCP_OPTIONS,  11, "ls_in_dhcp_options")
> \
> -PIPELINE_STAGE(SWITCH, IN,  DHCP_RESPONSE, 12, "ls_in_dhcp_response")
> \
> -PIPELINE_STAGE(SWITCH, IN,  L2_LKUP,   13, "ls_in_l2_lkup")
>  \
> +PIPELINE_STAGE(SWITCH, IN,  CHAIN,  7, "ls_in_chain")\
> +PIPELINE_STAGE(SWITCH, IN,  QOS_MARK,   8, "ls_in_qos_mark")\
> +PIPELINE_STAGE(SWITCH, IN,  LB, 9, "ls_in_lb")
> \
> +PIPELINE_STAGE(SWITCH, IN,  STATEFUL,  10, "ls_in_stateful")
> \
> +PIPELINE_STAGE(SWITCH, IN,  ARP_ND_RSP,11, "ls_in_arp_rsp")
>  \
> +PIPELINE_STAGE(SWITCH, IN,  DHCP_OPTIONS,  12, "ls_in_dhcp_options")
> \
> +

Re: [ovs-dev] [PATCH] ovn.at: Fix "ovn -- 1 LR with distributed router gateway port" test

2017-04-23 Thread Mickey Spiegel
On Thu, Apr 20, 2017 at 6:32 PM, YAMAMOTO Takashi <yamam...@ovn.org> wrote:

> NetBSD implementation of wc command outputs extra whitespaces
> like the following.  Tweak the test to success on such environments.
>
> % echo hoge|wc -l|hexdump -C
>   20 20 20 20 20 20 20 31  0a   |   1.|
> 0009
> %
>
> The failing test was introduced by
> commit 41a15b71ed1ef35aa612a1128082219fbfc3f327
> (ovn: Introduce distributed gateway port and "chassisredirect" port
> binding)
>
> Signed-off-by: YAMAMOTO Takashi <yamam...@ovn.org>
>

There are three more in one of the tests in system-ovn.at, affecting make
check-kernel.

Acked-by: Mickey Spiegel <mickeys@gmail.com>



> ---
>  tests/ovn.at | 18 ++
>  1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index af77c19..1bffc4c 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -6567,20 +6567,14 @@ as hv3 ovs-ofctl dump-flows br-int
>  echo "--"
>
>  # Check that redirect mapping is programmed only on hv2
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=33 | grep
> =0x3,metadata=0x1 | wc -l], [0], [0
> -])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=33 | grep
> =0x3,metadata=0x1 | grep load:0x2- | wc -l], [0], [1
> -])
> +AT_CHECK([test `as hv1 ovs-ofctl dump-flows br-int table=33 | grep
> =0x3,metadata=0x1 | wc -l` -eq 0])
> +AT_CHECK([test `as hv2 ovs-ofctl dump-flows br-int table=33 | grep
> =0x3,metadata=0x1 | grep load:0x2- | wc -l` -eq 1])
>  # Check that hv1 sends chassisredirect port traffic to hv2
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep
> =0x3,metadata=0x1 | grep output | wc -l], [0], [1
> -])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=32 | grep
> =0x3,metadata=0x1 | wc -l], [0], [0
> -])
> +AT_CHECK([test `as hv1 ovs-ofctl dump-flows br-int table=32 | grep
> =0x3,metadata=0x1 | grep output | wc -l` -eq 1])
> +AT_CHECK([test `as hv2 ovs-ofctl dump-flows br-int table=32 | grep
> =0x3,metadata=0x1 | wc -l` -eq 0])
>  # Check that arp reply on distributed gateway port is only programmed on
> hv2
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep arp | grep
> =0x2,metadata=0x1 | wc -l], [0], [0
> -])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep arp | grep
> =0x2,metadata=0x1 | wc -l], [0], [1
> -])
> +AT_CHECK([test `as hv1 ovs-ofctl dump-flows br-int | grep arp | grep
> =0x2,metadata=0x1 | wc -l` -eq 0])
> +AT_CHECK([test `as hv2 ovs-ofctl dump-flows br-int | grep arp | grep
> =0x2,metadata=0x1 | wc -l` -eq 1])
>
>
>  ip_to_hex() {
> --
> 2.5.4 (Apple Git-61)
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] system-ovn.at: Add test for ping other router's port on distributed router

2017-04-20 Thread Mickey Spiegel
I forgot one other comment.

On Thu, Apr 20, 2017 at 11:05 AM, Mickey Spiegel <mickeys@gmail.com>
wrote:

>
> On Tue, Apr 18, 2017 at 4:49 AM, Guoshuai Li <l...@dtdream.com> wrote:
>
>> Signed-off-by: Guoshuai Li <l...@dtdream.com>
>> ---
>>  tests/system-ovn.at | 101 ++
>> ++
>>  tests/system-traffic.at |  20 ++
>>  2 files changed, 121 insertions(+)
>>
>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>> index dd62bd1..68da38a 100644
>> --- a/tests/system-ovn.at
>> +++ b/tests/system-ovn.at
>
>
>  ... I have not looked at the system-ovn test yet.
>
>>
>>
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index c042773..295e606 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -3678,3 +3678,23 @@ NS_CHECK_EXEC([at_ns0], [ping -q -c 1 -w 3
>> 10.4.2.2], [1], [ignore])
>>
>>  OVS_TRAFFIC_VSWITCHD_STOP(["/dropping VLAN \(0\|300\) packet received
>> on dot1q-tunnel port/d"])
>>  AT_CLEANUP
>> +
>> +AT_SETUP([datapath - SNAT and UNSNAT])
>>
>
The name should be more specific. This does not just test SNAT and UNSNAT
in the datapath, it includes an action in between that forces processing to
userspace. Something like "datapath - SNAT, userspace action, UNSNAT"?

Mickey


> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +AT_CHECK([ovs-ofctl add-flow br0 "table=0, 
>> priority=100,in_port=1,ip,nw_dst=20.0.0.2
>> actions=dec_ttl(),mod_dl_src:00:00:02:01:02:01,mod_dl_dst:00
>> :00:02:01:02:02,resubmit(,1)"])
>> +AT_CHECK([ovs-ofctl add-flow br0 "table=1, 
>> priority=100,ip,nw_src=192.168.1.2
>> actions=ct(commit,table=2,zone=6,nat(src=20.0.0.1))"])
>>
>
> There should be another table added here with a flow that does the clone
> with nested ct_clear actions. The use of ct_clear changes how the unsnat in
> table 3 is processed.
>
> Mickey
>
>
>> +AT_CHECK([ovs-ofctl add-flow br0 "table=2, 
>> priority=100,icmp,nw_dst=20.0.0.2,icmp_type=8,icmp_code=0
>> actions=push:NXM_OF_IP_SRC[],push:NXM_OF_IP_DST[],pop:NXM_OF
>> _IP_SRC[],pop:NXM_OF_IP_DST[],load:0xff->NXM_NX_IP_TTL[],loa
>> d:0->NXM_OF_ICMP_TYPE[],dec_ttl(),mod_dl_src:00:00:02:01:
>> 02:02,mod_dl_dst:00:00:02:01:02:01,resubmit(,3)"])
>> +AT_CHECK([ovs-ofctl add-flow br0 "table=3, priority=100,ip,nw_dst=20.0.0.1
>> actions=ct(table=4,zone=6,nat)"])
>> +AT_CHECK([ovs-ofctl add-flow br0 "table=4, 
>> priority=100,ip,nw_dst=192.168.1.2
>> actions=dec_ttl(),mod_dl_src:00:00:01:01:02:01,mod_dl_dst:f0
>> :00:00:01:02:01,load:0->NXM_OF_IN_PORT[],output:1"])
>> +
>> +ADD_NAMESPACES(foo1)
>> +ADD_VETH(foo1, foo1, br0, "192.168.1.2/24", "f0:00:00:01:02:01",
>> "192.168.1.1")
>> +NS_CHECK_EXEC([foo1], [arp -s 192.168.1.1 00:00:01:01:02:01])
>> +
>> +NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2 20.0.0.2 | FORMAT_PING],
>> [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> --
>> 2.10.1.windows.1
>>
>> This patch is used to analyze "ovn: unsnat handling error for Distributed
>> Gateway" problems:
>>
>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-April/331033.html
>>
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] system-ovn.at: Add test for ping other router's port on distributed router

2017-04-20 Thread Mickey Spiegel
On Tue, Apr 18, 2017 at 4:49 AM, Guoshuai Li  wrote:

> Signed-off-by: Guoshuai Li 
> ---
>  tests/system-ovn.at | 101 ++
> ++
>  tests/system-traffic.at |  20 ++
>  2 files changed, 121 insertions(+)
>
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index dd62bd1..68da38a 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at


 ... I have not looked at the system-ovn test yet.

>
>
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index c042773..295e606 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -3678,3 +3678,23 @@ NS_CHECK_EXEC([at_ns0], [ping -q -c 1 -w 3
> 10.4.2.2], [1], [ignore])
>
>  OVS_TRAFFIC_VSWITCHD_STOP(["/dropping VLAN \(0\|300\) packet received on
> dot1q-tunnel port/d"])
>  AT_CLEANUP
> +
> +AT_SETUP([datapath - SNAT and UNSNAT])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "table=0, 
> priority=100,in_port=1,ip,nw_dst=20.0.0.2
> actions=dec_ttl(),mod_dl_src:00:00:02:01:02:01,mod_dl_dst:
> 00:00:02:01:02:02,resubmit(,1)"])
> +AT_CHECK([ovs-ofctl add-flow br0 "table=1, priority=100,ip,nw_src=192.168.1.2
> actions=ct(commit,table=2,zone=6,nat(src=20.0.0.1))"])
>

There should be another table added here with a flow that does the clone
with nested ct_clear actions. The use of ct_clear changes how the unsnat in
table 3 is processed.

Mickey


> +AT_CHECK([ovs-ofctl add-flow br0 "table=2, 
> priority=100,icmp,nw_dst=20.0.0.2,icmp_type=8,icmp_code=0
> actions=push:NXM_OF_IP_SRC[],push:NXM_OF_IP_DST[],pop:NXM_
> OF_IP_SRC[],pop:NXM_OF_IP_DST[],load:0xff->NXM_NX_IP_TTL[],
> load:0->NXM_OF_ICMP_TYPE[],dec_ttl(),mod_dl_src:00:00:02:
> 01:02:02,mod_dl_dst:00:00:02:01:02:01,resubmit(,3)"])
> +AT_CHECK([ovs-ofctl add-flow br0 "table=3, priority=100,ip,nw_dst=20.0.0.1
> actions=ct(table=4,zone=6,nat)"])
> +AT_CHECK([ovs-ofctl add-flow br0 "table=4, priority=100,ip,nw_dst=192.168.1.2
> actions=dec_ttl(),mod_dl_src:00:00:01:01:02:01,mod_dl_dst:
> f0:00:00:01:02:01,load:0->NXM_OF_IN_PORT[],output:1"])
> +
> +ADD_NAMESPACES(foo1)
> +ADD_VETH(foo1, foo1, br0, "192.168.1.2/24", "f0:00:00:01:02:01",
> "192.168.1.1")
> +NS_CHECK_EXEC([foo1], [arp -s 192.168.1.1 00:00:01:01:02:01])
> +
> +NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2 20.0.0.2 | FORMAT_PING],
> [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> --
> 2.10.1.windows.1
>
> This patch is used to analyze "ovn: unsnat handling error for Distributed
> Gateway" problems:
>
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-April/331033.html
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVN: SFC Patch V2

2017-04-19 Thread Mickey Spiegel
On Thu, Apr 13, 2017 at 6:20 PM, John McDowall <
jmcdow...@paloaltonetworks.com> wrote:

> From: jmcdow...@paloaltonetworks.com
>
>
> I think I have covered all the current comments and have a first level
> of tests written and passing. The tests are not integrated with the ovs
> test framework - once we have agreed that all the issues are resolved I
> will do that. It would help everyone could review the CLI commands
> before I add the test cases - less work.
>
> Changes
>
> 1) Now re-circulates the flows from the last VNF in the chain to the
>original entry point.
>

It does not look to me like this code actually does that. See comments
below.

2) Now supports non-IP traffic and hence also IPv6
> 3) Added support for "match statement"
> 4) Added check to limit the number of chains attached to a port is 1.
> 5) Added show command for lsp-chain-classifier.
>
> Areas to review
>
> 1) The logic now supports hair-pinning the flow back to the original
> source to
>ensure that the MAC learning issue is addressed.
> 2) Do the command names make sense - currently rather long and complex.
>
> Current todo list
>
> 1) I have standalone tests need to add tests to ovs/ovn framework.
> 2) Load-balancing support for port-pair-groups
> 3) Publish more detailed examples.
>
> Simple example using ovn-trace
>
> #!/bin/sh
> #
> clear
> ovn-nbctl ls-add swt1
>
> ovn-nbctl lsp-add swt1 swt1-appc
> ovn-nbctl lsp-add swt1 swt1-apps
> ovn-nbctl lsp-add swt1 swt1-vnfp1
> ovn-nbctl lsp-add swt1 swt1-vnfp2
>
> ovn-nbctl lsp-set-addresses swt1-appc "00:00:00:00:00:01 192.168.33.1"
> ovn-nbctl lsp-set-addresses swt1-apps "00:00:00:00:00:02 192.168.33.2"
> ovn-nbctl lsp-set-addresses swt1-vnfp1 00:00:00:00:00:03
> ovn-nbctl lsp-set-addresses swt1-vnfp2 00:00:00:00:00:04
> #
> # Configure Service chain
> #
> ovn-nbctl lsp-pair-add swt1 swt1-vnfp1 swt1-vnfp2 pp1
> ovn-nbctl lsp-chain-add swt1 pc1
> ovn-nbctl lsp-pair-group-add pc1 ppg1
> ovn-nbctl lsp-pair-group-add-port-pair ppg1 pp1
> ovn-nbctl lsp-chain-classifier-add swt1 pc1 swt1-appc "entry-lport"
> "bi-directional" pcc1
> #
> ovn-sbctl dump-flows
> #
> # Run trace command
> printf "\n-Flow 1 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-appc" && eth.src ==
> 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> printf "\n-Flow 2 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-vnfp1" && eth.src ==
> 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> printf "\n-Flow 3 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-apps" && eth.dst ==
> 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
> printf "\n-Flow 4 -\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-vnfp2" && eth.dst ==
> 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
> #
> # Cleanup
> #
> ovn-nbctl lsp-chain-classifier-del pcc1
> ovn-nbctl lsp-pair-group-del ppg1
> ovn-nbctl lsp-chain-del pc1
> ovn-nbctl lsp-pair-del pp1
> ovn-nbctl ls-del swt1
>
> Co-authored-by: Flavio Fernandes 
> Reported at: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-
> March/040381.html
> Reported at: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-
> May/041359.html
>
> Signed-off-by: John McDowall 
> ---
>  ovn/northd/ovn-northd.8.xml   |   68 ++-
>  ovn/northd/ovn-northd.c   |  348 +++-
>  ovn/ovn-architecture.7.xml|   91 
>  ovn/ovn-nb.ovsschema  |   87 ++-
>  ovn/ovn-nb.xml|  188 ++-
>  ovn/utilities/ovn-nbctl.8.xml |  231 
>  ovn/utilities/ovn-nbctl.c | 1208 ++
> +++
>  7 files changed, 2193 insertions(+), 28 deletions(-)
>
> diff --git ovn/northd/ovn-northd.8.xml ovn/northd/ovn-northd.8.xml
> index ab8fd88..61def9f 100644
> --- ovn/northd/ovn-northd.8.xml
> +++ ovn/northd/ovn-northd.8.xml
> @@ -362,7 +362,61 @@
>
>  
>
> -Ingress Table 7: from-lport QoS marking
> + Ingress Table 7: from-lport Port Chaining
> +
> +
> +  Logical flows in this table closely reproduce those in the
> +  QoS table in the OVN_Northbound database
> +  for the from-lport direction.
> +
> +
> +
> +  
> +For every port-chain a set of rules will be added to direct
> traffic
> +through the port pairs defined in the port-chain. A port chain
> +is composed of an ordered set of port-pair-groups that contain
> one or
> +more port-pairs. Traffic is directed into the port-chain by
> creating a
> +port-chain-classifier. A port-chain can be reused by different
> +port-chain-classifier instances allowing a port chain to be
> +applied to multiple traffic paths and application traffic types.
> +
> +The port-chain-classifier defines a starting port or ending port
> and
> +a direction for the traffic, either uni-directional or
> bi-directional.
> +In addition a match expression can be defined to further 

Re: [ovs-dev] [ovs-discuss] ovn: unsnat handling error for Distributed Gateway

2017-04-10 Thread Mickey Spiegel
On Sun, Apr 9, 2017 at 3:23 PM, Mickey Spiegel <mickeys@gmail.com>
wrote:

>
>
> On Thu, Apr 6, 2017 at 7:34 AM, Guoshuai Li <l...@dtdream.com> wrote:
>
>>
>> revese my topology:
>>
>>  +-++
>>  |  VM  172.16.1.7  |
>>  +-++
>>|
>>  +-++
>>  |  Logical Switch  |
>>  +-++
>>|172.16.1.254
>>   10.157.142.3 +---++
>>   ++  Logical Router 1  +
>>   |++
>> +-++
>> |  Logical Switch  |
>> +--+
>>   |++
>>   ++  Logical Router 2  |
>>   10.157.142.1 ++
>>
>>
>> Hi All, I am having a problem for ovn and need help, thanks.
>>>
>>>
>>> I created two logical routes and connected the two LogicalRoutes through
>>> a external LogicalSwitch (connected to the external network) .
>>>
>>> And then LogicalRoute-1 connected to the VM through the internal
>>> LogicalSwitch .
>>>
>>> my topology:
>>>
>>>   10.157.142.3  172.16.1.254
>>>++ +-++
>>>  +-++
>>>   ++  Logical Router 1 +--|
>>> Logical Switch  +---+ VM 172.16.1.7   |
>>>   |++ +--+
>>>  +--+
>>> +-++
>>> |  Logical Switch  |
>>> +--+
>>>   |++
>>>   ++  Logical Router 2  |
>>>++
>>>   10.157.142.1
>>>
>>> I tested the master and Branch2.7, it Can not be transferred from VM
>>> (172.16.1.7) to LogicaRouter-2 's port (10.157.142.
>>>
>> Sorry, The destination address is 10.157.142.1, And The SNAT/unSNAT
>> address is 10.157.142.3.
>>
>>> ) via ping.
>>> My logical router is a distributed gateway, and the two logical router
>>> ports that connect external LogicalSwitch are on the same chassis.
>>> If the two logical router ports are not on the same chassis ping is also
>>> OK, And ping from VM (172.16.1.7) to external network is also OK.
>>>
>>> I looked at the openflow tables on gateway chassis,  I suspected unsnat
>>> handling error in Router1 input for icmp replay.
>>> I think it is necessary to replace the destination address 10.157.142.3
>>> with 172.16.1.7 in Table 19 and route 172.16.1.7 in Table 21, but now the
>>> route match is 10.157.142.0/24.
>>>
>>> cookie=0x92bd0055, duration=68.468s, table=16, n_packets=1, n_bytes=98,
>>> idle_age=36, priority=50,reg14=0x4,metadata=0x7,dl_dst=fa:16:3e:58:1c:8a
>>> actions=resubmit(,17)
>>> cookie=0x45765344, duration=68.467s, table=17, n_packets=1, n_bytes=98,
>>> idle_age=36, priority=0,metadata=0x7 actions=resubmit(,18)
>>> cookie=0xaeaaed29, duration=68.479s, table=18, n_packets=1, n_bytes=98,
>>> idle_age=36, priority=0,metadata=0x7 actions=resubmit(,19)
>>> cookie=0xce785d51, duration=68.479s, table=19, n_packets=1, n_bytes=98,
>>> idle_age=36, priority=100,ip,reg14=0x4,metadata=0x7,nw_dst=10.157.142.3
>>> actions=ct(table=20,zone=NXM_NX_REG12[0..15],nat)
>>> cookie=0xbd994421, duration=68.481s, table=20, n_packets=1, n_bytes=98,
>>> idle_age=36, priority=0,metadata=0x7 actions=resubmit(,21)
>>> cookie=0xaea3a6ae, duration=68.479s, table=21, n_packets=1, n_bytes=98,
>>> idle_age=36, priority=49,ip,metadata=0x7,nw_dst=10.157.142.0/24
>>> actions=dec_ttl(),move:NXM_OF_IP_DST[]->NXM_NX_XXREG0[96..12
>>> 7],load:0xa9d8e03->NXM_NX_XXREG0[64..95],mod_dl_src:fa:16:3e
>>> :58:1c:8a,load:0x4->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0]
>>> ,resubmit(,22)
>>> cookie=0xce6e8d4e, duration=68.482s, table=22, n_packets=1, n_bytes=98,
>>> idle_age=36, priority=0,ip,metadata=0x7 actions=push:NXM_NX_REG0[],pus
>>> h:NXM_NX_XXREG0[96.

Re: [ovs-dev] [ovs-discuss] ovn: unsnat handling error for Distributed Gateway

2017-04-09 Thread Mickey Spiegel
On Thu, Apr 6, 2017 at 7:34 AM, Guoshuai Li  wrote:

>
> revese my topology:
>
>  +-++
>  |  VM  172.16.1.7  |
>  +-++
>|
>  +-++
>  |  Logical Switch  |
>  +-++
>|172.16.1.254
>   10.157.142.3 +---++
>   ++  Logical Router 1  +
>   |++
> +-++
> |  Logical Switch  |
> +--+
>   |++
>   ++  Logical Router 2  |
>   10.157.142.1 ++
>
>
> Hi All, I am having a problem for ovn and need help, thanks.
>>
>>
>> I created two logical routes and connected the two LogicalRoutes through
>> a external LogicalSwitch (connected to the external network) .
>>
>> And then LogicalRoute-1 connected to the VM through the internal
>> LogicalSwitch .
>>
>> my topology:
>>
>>   10.157.142.3  172.16.1.254
>>++ +-++
>>+-++
>>   ++  Logical Router 1 +--|
>> Logical Switch  +---+ VM 172.16.1.7   |
>>   |++ +--+
>>+--+
>> +-++
>> |  Logical Switch  |
>> +--+
>>   |++
>>   ++  Logical Router 2  |
>>++
>>   10.157.142.1
>>
>> I tested the master and Branch2.7, it Can not be transferred from VM
>> (172.16.1.7) to LogicaRouter-2 's port (10.157.142.
>>
> Sorry, The destination address is 10.157.142.1, And The SNAT/unSNAT
> address is 10.157.142.3.
>
>> ) via ping.
>> My logical router is a distributed gateway, and the two logical router
>> ports that connect external LogicalSwitch are on the same chassis.
>> If the two logical router ports are not on the same chassis ping is also
>> OK, And ping from VM (172.16.1.7) to external network is also OK.
>>
>> I looked at the openflow tables on gateway chassis,  I suspected unsnat
>> handling error in Router1 input for icmp replay.
>> I think it is necessary to replace the destination address 10.157.142.3
>> with 172.16.1.7 in Table 19 and route 172.16.1.7 in Table 21, but now the
>> route match is 10.157.142.0/24.
>>
>> cookie=0x92bd0055, duration=68.468s, table=16, n_packets=1, n_bytes=98,
>> idle_age=36, priority=50,reg14=0x4,metadata=0x7,dl_dst=fa:16:3e:58:1c:8a
>> actions=resubmit(,17)
>> cookie=0x45765344, duration=68.467s, table=17, n_packets=1, n_bytes=98,
>> idle_age=36, priority=0,metadata=0x7 actions=resubmit(,18)
>> cookie=0xaeaaed29, duration=68.479s, table=18, n_packets=1, n_bytes=98,
>> idle_age=36, priority=0,metadata=0x7 actions=resubmit(,19)
>> cookie=0xce785d51, duration=68.479s, table=19, n_packets=1, n_bytes=98,
>> idle_age=36, priority=100,ip,reg14=0x4,metadata=0x7,nw_dst=10.157.142.3
>> actions=ct(table=20,zone=NXM_NX_REG12[0..15],nat)
>> cookie=0xbd994421, duration=68.481s, table=20, n_packets=1, n_bytes=98,
>> idle_age=36, priority=0,metadata=0x7 actions=resubmit(,21)
>> cookie=0xaea3a6ae, duration=68.479s, table=21, n_packets=1, n_bytes=98,
>> idle_age=36, priority=49,ip,metadata=0x7,nw_dst=10.157.142.0/24
>> actions=dec_ttl(),move:NXM_OF_IP_DST[]->NXM_NX_XXREG0[96..12
>> 7],load:0xa9d8e03->NXM_NX_XXREG0[64..95],mod_dl_src:fa:16:
>> 3e:58:1c:8a,load:0x4->NXM_NX_REG15[],load:0x1->NXM_NX_REG10
>> [0],resubmit(,22)
>> cookie=0xce6e8d4e, duration=68.482s, table=22, n_packets=1, n_bytes=98,
>> idle_age=36, priority=0,ip,metadata=0x7 actions=push:NXM_NX_REG0[],pus
>> h:NXM_NX_XXREG0[96..127],pop:NXM_NX_REG0[],mod_dl_dst:00:
>> 00:00:00:00:00,resubmit(,66),pop:NXM_NX_REG0[],resubmit(,23)
>> cookie=0xce89c4ed, duration=68.481s, table=23, n_packets=1, n_bytes=98,
>> idle_age=36, priority=150,reg15=0x4,metadata=0x7,dl_dst=00:00:00:00:00:00
>> actions=load:0x5->NXM_NX_REG15[],resubmit(,24)
>> cookie=0xb2d84350, duration=68.469s, table=24, n_packets=1, n_bytes=98,
>> idle_age=36, priority=100,ip,metadata=0x7,dl_dst=00:00:00:00:00:00
>>
>> I do not know why and need help, thanks.
>>
>
I was able to reproduce this. I agree with your analysis. Looking at
ovs-ofctl dump-flows, the packet counts indicate that the packet is subject
to ct(...,nat), but the routing table match is as if NAT never occurred.

I tried with gateway routers and it worked. There are some differences in
ovs-dpctl dump-flows.

For the case of gateway routers:

vagrant@compute2:~$ sudo ovs-dpctl dump-flows


Re: [ovs-dev] [ovs-dev, RFC] ovn: Revised support for service function chaining

2017-04-06 Thread Mickey Spiegel
John,

On Thu, Apr 6, 2017 at 9:15 AM, John McDowall <
jmcdow...@paloaltonetworks.com> wrote:

> Mickey,
>
>
>
> Thanks, here is what I propose, based on your comments:
>
>
>
> 1.   Support for non-ip traffic : I will make the agreed on changes.
>
> 2.   Match Filter: Will make the changes
>
Not sure that there are any changes to make other than documentation,
unless you found some magic solution :-). Are you referring to
documentation?

> 3.   Support for VNF’s that change src/dst: Will leave as an open
> issue for now and look for relevant use cases.
>
OK.

> 4.   Multiple classifiers: Will limit each port(application ) to a
> single chain for now, to prevent any errors. If there are use cases for
> multiple chains we can revisit
>
Would you actually check and limit?
Or just document?

> 5.   Mac Learning Issue: I looked at using other tunnels but as you
> correctly point out it adds a lot of complexity for not a huge amount of
> value. I think it is better to use the current Geneve overlay in OVN. I
> will make the changes to add a new register flag for chaining and have the
> flows exit the chain from the original src so the mac addresses are correct.
>
If you advance to an ingress table after S_SWITCH_IN_CHAIN, then you do not
need the flag. I saw the ability to start at a later table after I wrote
the flag comment, and forgot that that makes the flag unnecessary.

Mickey

>
>
> Regards
>
>
>
> John
>
>
>
>
>
>
>
> *From: *Mickey Spiegel <mickeys@gmail.com>
> *Date: *Wednesday, April 5, 2017 at 4:02 PM
> *To: *John McDowall <jmcdow...@paloaltonetworks.com>
> *Cc: *ovs dev <d...@openvswitch.org>
> *Subject: *Re: [ovs-dev] [ovs-dev, RFC] ovn: Revised support for service
> function chaining
>
>
>
> John,
>
>
>
> On Tue, Apr 4, 2017 at 10:03 AM, John McDowall <
> jmcdow...@paloaltonetworks.com> wrote:
>
> Mickey,
>
>
>
> Here are the proposed changes to address issues you brought up,  if these
> changes are acceptable I will add them and re-submit the patch.
>
>
>
>1. Support for Non-IP Traffic: The current code uses ipv4.src and
>ipv4.dst to identify flow source and destination. Change this is eth.src
>and eth.dst, there is no reason to require IP. This also addresses IPv6
>issues. If a user wants to add additional IP filtering they can add it to
>the “match” filter.
>
> This would resolve the issue with non-IP traffic.
>
>
>
> Do you want to support VNFs that modify the source or destination ethernet
> address?
>
> If so, that would require further investigation and additional logic.
>
>
>
>
>1. Match filter: Can cause conflicts but not really any different from
>ACL where user can construct conflicting filters. I will try and add some
>light checking (warnings) to catch any major conflicts.
>
> Are you referring to this issue that I had mentioned?
>
>
>
> a) If two different logical port chain classifiers with different "match"
> were specified on the same port, with different logical port chains that
> share at least one logical port pair, then the IP address does not
> distinguish between the two logical port chains.
>
>
>
> This seems like a tricky issue. I don't see any way to resolve it unless
> you use a tunneling format that specifies a chain ID such as NSH.
>
>
>
> If it is not resolved, then people have to think carefully whenever they
> specific a "match" condition. IMO this is less obvious than ACL conflicts,
> since it involves interactions between the "match" conditions and the
> logical port chain definitions.
>
>
>
>
>1. End of the service chain (MAC Learning Issue): I will add a new
>rule at the end of the chain to return the flow to the original src to
>ensure that the packet header is “correct” when leaving the host. I think
>I can set the recirculate flag at the end of the chain and then detect the
>flag and send it to the original destination.This will ensure that no
>packet leaves OVN with an “incorrect” header.
>
> When you say "original src", I assume you mean the location where eth.src
> resides?
>
>
>
> I am not sure what you mean by the "recirculate flag"?
>
> Are you referring to "flags.loopback"? Lots of things set that including
> DHCP and ARP responder, so you cannot infer much from the fact that the
> flag is set. In any case, that just lets the outport be the same as the
> inport. It does not determine what the outport is.
>
>
>
> To do item #3 I need some help composing the rule to recirculate the flow
> back t

[ovs-dev] [PATCH v6 2/2] ovn: Gratuitous ARP for distributed NAT rules

2017-03-30 Thread Mickey Spiegel
This patch extends gratuitous ARP support for NAT addresses so that it
applies to distributed NAT rules on a distributed logical router.
Distributed NAT rules have type "dnat_and_snat" and specify
'external_mac' and 'logical_port'.

Gratuitous ARP packets for distributed NAT rules are only generated on
the chassis where the 'logical_port' specified in the NAT rule resides.
Gratuitous ARPs are issued for the 'external_ip' address, resolving to
the 'external_mac'.

Since the MAC address varies for each distributed NAT rule, a separate
'nat_addresses' string must be generated for each distributed NAT rule.
For this reason, in the southbound 'Port_Binding',
'options:nat-addresses' is replaced by a 'nat_addresses' column that
can have an unlimited number of instances.  In order to allow for
upgrades, pinctrl in the ovn-controller can work off either the
'nat_addresses' column (if present), or 'options:nat-addresses'
otherwise.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 NEWS |   1 +
 ovn/controller/pinctrl.c | 108 +--
 ovn/northd/ovn-northd.c  |  92 
 ovn/ovn-sb.ovsschema |   9 ++--
 ovn/ovn-sb.xml   |  17 ++--
 tests/ovn.at |  64 +---
 6 files changed, 211 insertions(+), 80 deletions(-)

diff --git a/NEWS b/NEWS
index 00c9106..ec8572a 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,7 @@ Post-v2.7.0
  "dot1q-tunnel" port VLAN mode.
- OVN:
  * Make the DHCPv4 router setting optional.
+ * Gratuitous ARP for NAT addresses on a distributed logical router.
- Add the command 'ovs-appctl stp/show' (see ovs-vswitchd(8)).
 
 v2.7.0 - 21 Feb 2017
diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index e564a30..01259e2 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -1056,21 +1056,23 @@ send_garp_update(const struct sbrec_port_binding 
*binding_rec,
 if (!strcmp(binding_rec->type, "l3gateway")
 || !strcmp(binding_rec->type, "patch")) {
 struct lport_addresses *laddrs = NULL;
-laddrs = shash_find_data(nat_addresses, binding_rec->logical_port);
-if (!laddrs) {
-return;
-}
-int i;
-for (i = 0; i < laddrs->n_ipv4_addrs; i++) {
-char *name = xasprintf("%s-%s", binding_rec->logical_port,
-laddrs->ipv4_addrs[i].addr_s);
-garp = shash_find_data(_garp_data, name);
-if (garp) {
-garp->ofport = ofport;
-} else {
-add_garp(name, ofport, laddrs->ea, laddrs->ipv4_addrs[i].addr);
+while ((laddrs = shash_find_and_delete(nat_addresses,
+   binding_rec->logical_port))) {
+int i;
+for (i = 0; i < laddrs->n_ipv4_addrs; i++) {
+char *name = xasprintf("%s-%s", binding_rec->logical_port,
+laddrs->ipv4_addrs[i].addr_s);
+garp = shash_find_data(_garp_data, name);
+if (garp) {
+garp->ofport = ofport;
+} else {
+add_garp(name, ofport, laddrs->ea,
+ laddrs->ipv4_addrs[i].addr);
+}
+free(name);
 }
-free(name);
+destroy_lport_addresses(laddrs);
+free(laddrs);
 }
 return;
 }
@@ -1304,6 +1306,42 @@ extract_addresses_with_port(const char *addresses,
 }
 
 static void
+consider_nat_address(const char *nat_address,
+ const struct sbrec_port_binding *pb,
+ struct sset *nat_address_keys,
+ const struct lport_index *lports,
+ const struct sbrec_chassis *chassis,
+ struct shash *nat_addresses)
+{
+struct lport_addresses *laddrs = xmalloc(sizeof *laddrs);
+char *lport = NULL;
+if (!extract_addresses_with_port(nat_address, laddrs, )
+|| (!lport && !strcmp(pb->type, "patch"))) {
+free(laddrs);
+if (lport) {
+free(lport);
+}
+return;
+} else if (lport) {
+if (!pinctrl_is_chassis_resident(lports, chassis, lport)) {
+free(laddrs);
+free(lport);
+return;
+}
+free(lport);
+}
+
+int i;
+for (i = 0; i < laddrs->n_ipv4_addrs; i++) {
+char *name = xasprintf("%s-%s", pb->logical_port,
+laddrs->ipv4_addrs[i].addr_s);
+sset_add(nat_address_keys, name);
+free(name);
+}
+shash_add(nat_addresses, pb->logical_port, laddrs);
+}
+
+static void
 get

[ovs-dev] [PATCH v6 1/2] ovn: Gratuitous ARP for centralized NAT rules on a distributed router

2017-03-30 Thread Mickey Spiegel
This patch extends gratuitous ARP support for NAT addresses so that it
applies to centralized NAT rules on a distributed router, in addition to
the existing gratuitous ARP support for NAT addresses on gateway routers.
Centralized NAT rules have type other than "dnat_and_snat", or have type
"dnat_and_snat" but do not specify external_mac or logical_port.  These
NAT rules apply on the redirect-chassis.

Gratuitous ARP packets for centralized NAT rules on a distributed router
are only generated on the redirect-chassis.  This is achieved by extending
the syntax for "options:nat-addresses" in the southbound database,
allowing the condition 'is_chassis_resident("LPORT_NAME")' to be appended
after the MAC and IP addresses.  This condition is automatically inserted
by ovn-northd when the northbound "options:nat-addresses" is set to
"router" and the peer is a distributed gateway port.

A separate patch will be required to support gratuitous ARP for
distributed NAT rules that specify logical_port and external_mac.  Since
the MAC address differs and the logical port often resides on a different
chassis from the redirect-chassis, these addresses cannot be included in
the same "nat-addresses" string as for centralized NAT rules.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/controller/pinctrl.c | 115 ---
 ovn/lib/ovn-util.c   |  38 +---
 ovn/lib/ovn-util.h   |   2 +
 ovn/northd/ovn-northd.c  |  52 +++--
 ovn/ovn-nb.xml   |  38 +---
 ovn/ovn-sb.xml   |  32 +
 tests/ovn.at |  75 +++
 7 files changed, 310 insertions(+), 42 deletions(-)

diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index b342189..e564a30 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -37,6 +37,7 @@
 #include "lib/dhcp.h"
 #include "ovn-controller.h"
 #include "ovn/actions.h"
+#include "ovn/lex.h"
 #include "ovn/lib/logical-fields.h"
 #include "ovn/lib/ovn-dhcp.h"
 #include "ovn/lib/ovn-util.h"
@@ -1048,8 +1049,12 @@ send_garp_update(const struct sbrec_port_binding 
*binding_rec,
  ld->localnet_port->logical_port));
 
 volatile struct garp_data *garp = NULL;
-/* Update GARP for NAT IP if it exists. */
-if (!strcmp(binding_rec->type, "l3gateway")) {
+/* Update GARP for NAT IP if it exists.  Consider port bindings with type
+ * "l3gateway" for logical switch ports attached to gateway routers, and
+ * port bindings with type "patch" for logical switch ports attached to
+ * distributed gateway ports. */
+if (!strcmp(binding_rec->type, "l3gateway")
+|| !strcmp(binding_rec->type, "patch")) {
 struct lport_addresses *laddrs = NULL;
 laddrs = shash_find_data(nat_addresses, binding_rec->logical_port);
 if (!laddrs) {
@@ -1173,6 +1178,7 @@ get_localnet_vifs_l3gwports(const struct ovsrec_bridge 
*br_int,
 if (!iface_rec->n_ofport) {
 continue;
 }
+/* Get localnet port with its ofport. */
 if (localnet) {
 int64_t ofport = iface_rec->ofport[0];
 if (ofport < 1 || ofport > ofp_to_u16(OFPP_MAX)) {
@@ -1181,6 +1187,7 @@ get_localnet_vifs_l3gwports(const struct ovsrec_bridge 
*br_int,
 simap_put(localnet_ofports, localnet, ofport);
 continue;
 }
+/* Get localnet vif. */
 const char *iface_id = smap_get(_rec->external_ids,
 "iface-id");
 if (!iface_id) {
@@ -1202,24 +1209,105 @@ get_localnet_vifs_l3gwports(const struct ovsrec_bridge 
*br_int,
 
 const struct local_datapath *ld;
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
-if (!ld->has_local_l3gateway) {
+if (!ld->localnet_port) {
 continue;
 }
 
+/* Get l3gw ports.  Consider port bindings with type "l3gateway"
+ * that connect to gateway routers (if local), and consider port
+ * bindings of type "patch" since they might connect to
+ * distributed gateway ports with NAT addresses. */
 for (size_t i = 0; i < ld->ldatapath->n_lports; i++) {
 const struct sbrec_port_binding *pb = ld->ldatapath->lports[i];
-if (!strcmp(pb->type, "l3gateway")
-/* && it's on this chassis */) {
+if ((ld->has_local_l3gateway && !strcmp(pb->type, "l3gateway"))
+|| !strcmp(pb->

Re: [ovs-dev] [PATCH v5 3/3] ovn: Gratuitous ARP for distributed NAT rules

2017-03-29 Thread Mickey Spiegel
On Wed, Mar 29, 2017 at 10:16 AM, Guru Shetty <g...@ovn.org> wrote:

>
>
> On 27 March 2017 at 18:34, Mickey Spiegel <mickeys@gmail.com> wrote:
>
>> This patch extends gratuitous ARP support for NAT addresses so that it
>> applies to distributed NAT rules on a distributed logical router.
>> Distributed NAT rules have type "dnat_and_snat" and specify
>> 'external_mac' and 'logical_port'.
>>
>> Gratuitous ARP packets for distributed NAT rules are only generated on
>> the chassis where the 'logical_port' specified in the NAT rule resides.
>> Gratuitous ARPs are issued for the 'external_ip' address, resolving to
>> the 'external_mac'.
>>
>> Since the MAC address varies for each distributed NAT rule, a separate
>> 'nat_addresses' string must be generated for each distributed NAT rule.
>> For this reason, in the southbound 'Port_Binding',
>> 'options:nat-addresses' is replaced by a 'nat_addresses' column that
>> can have an unlimited number of instances.  In order to allow for
>> upgrades, pinctrl in the ovn-controller can work off either the
>> 'nat_addresses' column (if present), or 'options:nat-addresses'
>> otherwise.
>>
>> Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
>>
>
> I get a simple warning when I compile this:
> ovn/controller/pinctrl.c: In function ‘send_garp_update’:
> ovn/controller/pinctrl.c:1060:47: warning: suggest parentheses around
> assignment used as truth value [-Wparentheses]
>binding_rec->logical_port))
> {
>

Added parentheses.


>
> If I run the test with valgrind enabled, the test fails. e.g:
> make check-valgrind TESTSUITEFLAGS="2312"
>
> ./ovn.at:6767: sort packets
> --- expout  2017-03-28 23:57:05.117974381 -0700
> +++ /root/git/openvswitch/tests/testsuite.dir/at-groups/2312/stdout
> 2017-03-28 23:57:05.117974381 -0700
> @@ -1,4 +1,4 @@
>  f00108060001080006040001f001c0a8
> 0001c0a80001
> +f00108060001080006040001f001c0a8
> 0001c0a80001
> +f00108060001080006040001f001c0a8
> 0002c0a80002
>  f00108060001080006040001f001c0a8
> 0002c0a80002
> -f00308060001080006040001f003c0a8
> 0003c0a80003
> -f00408060001080006040001f004c0a8
> 0004c0a80004
>

Race conditions. I think I have to clean up completely and start again to
make the timing more deterministic.

I hit a race condition with make check-valgrind on the previous patch. It
is running so slowly that it catches the first GARP packet and runs the
check before the second GARP packet arrives. This is fixed by simply
bumping up the wait condition from 50 bytes to 100 bytes.

Mickey
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5 2/3] ovn: Gratuitous ARP for centralized NAT rules on a distributed router

2017-03-27 Thread Mickey Spiegel
This patch extends gratuitous ARP support for NAT addresses so that it
applies to centralized NAT rules on a distributed router, in addition to
the existing gratuitous ARP support for NAT addresses on gateway routers.
Centralized NAT rules have type other than "dnat_and_snat", or have type
"dnat_and_snat" but do not specify external_mac or logical_port.  These
NAT rules apply on the redirect-chassis.

Gratuitous ARP packets for centralized NAT rules on a distributed router
are only generated on the redirect-chassis.  This is achieved by extending
the syntax for "options:nat-addresses" in the southbound database,
allowing the condition 'is_chassis_resident("LPORT_NAME")' to be appended
after the MAC and IP addresses.  This condition is automatically inserted
by ovn-northd when the northbound "options:nat-addresses" is set to
"router" and the peer is a distributed gateway port.

A separate patch will be required to support gratuitous ARP for
distributed NAT rules that specify logical_port and external_mac.  Since
the MAC address differs and the logical port often resides on a different
chassis from the redirect-chassis, these addresses cannot be included in
the same "nat-addresses" string as for centralized NAT rules.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/controller/pinctrl.c | 115 ---
 ovn/lib/ovn-util.c   |  38 +---
 ovn/lib/ovn-util.h   |   2 +
 ovn/northd/ovn-northd.c  |  52 +++--
 ovn/ovn-nb.xml   |  38 +---
 ovn/ovn-sb.xml   |  32 +
 tests/ovn.at |  75 +++
 7 files changed, 310 insertions(+), 42 deletions(-)

diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index b342189..e564a30 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -37,6 +37,7 @@
 #include "lib/dhcp.h"
 #include "ovn-controller.h"
 #include "ovn/actions.h"
+#include "ovn/lex.h"
 #include "ovn/lib/logical-fields.h"
 #include "ovn/lib/ovn-dhcp.h"
 #include "ovn/lib/ovn-util.h"
@@ -1048,8 +1049,12 @@ send_garp_update(const struct sbrec_port_binding 
*binding_rec,
  ld->localnet_port->logical_port));
 
 volatile struct garp_data *garp = NULL;
-/* Update GARP for NAT IP if it exists. */
-if (!strcmp(binding_rec->type, "l3gateway")) {
+/* Update GARP for NAT IP if it exists.  Consider port bindings with type
+ * "l3gateway" for logical switch ports attached to gateway routers, and
+ * port bindings with type "patch" for logical switch ports attached to
+ * distributed gateway ports. */
+if (!strcmp(binding_rec->type, "l3gateway")
+|| !strcmp(binding_rec->type, "patch")) {
 struct lport_addresses *laddrs = NULL;
 laddrs = shash_find_data(nat_addresses, binding_rec->logical_port);
 if (!laddrs) {
@@ -1173,6 +1178,7 @@ get_localnet_vifs_l3gwports(const struct ovsrec_bridge 
*br_int,
 if (!iface_rec->n_ofport) {
 continue;
 }
+/* Get localnet port with its ofport. */
 if (localnet) {
 int64_t ofport = iface_rec->ofport[0];
 if (ofport < 1 || ofport > ofp_to_u16(OFPP_MAX)) {
@@ -1181,6 +1187,7 @@ get_localnet_vifs_l3gwports(const struct ovsrec_bridge 
*br_int,
 simap_put(localnet_ofports, localnet, ofport);
 continue;
 }
+/* Get localnet vif. */
 const char *iface_id = smap_get(_rec->external_ids,
 "iface-id");
 if (!iface_id) {
@@ -1202,24 +1209,105 @@ get_localnet_vifs_l3gwports(const struct ovsrec_bridge 
*br_int,
 
 const struct local_datapath *ld;
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
-if (!ld->has_local_l3gateway) {
+if (!ld->localnet_port) {
 continue;
 }
 
+/* Get l3gw ports.  Consider port bindings with type "l3gateway"
+ * that connect to gateway routers (if local), and consider port
+ * bindings of type "patch" since they might connect to
+ * distributed gateway ports with NAT addresses. */
 for (size_t i = 0; i < ld->ldatapath->n_lports; i++) {
 const struct sbrec_port_binding *pb = ld->ldatapath->lports[i];
-if (!strcmp(pb->type, "l3gateway")
-/* && it's on this chassis */) {
+if ((ld->has_local_l3gateway && !strcmp(pb->type, "l3gateway"))
+|| !strcmp(pb->type, "patch")) {

[ovs-dev] [PATCH v5 3/3] ovn: Gratuitous ARP for distributed NAT rules

2017-03-27 Thread Mickey Spiegel
This patch extends gratuitous ARP support for NAT addresses so that it
applies to distributed NAT rules on a distributed logical router.
Distributed NAT rules have type "dnat_and_snat" and specify
'external_mac' and 'logical_port'.

Gratuitous ARP packets for distributed NAT rules are only generated on
the chassis where the 'logical_port' specified in the NAT rule resides.
Gratuitous ARPs are issued for the 'external_ip' address, resolving to
the 'external_mac'.

Since the MAC address varies for each distributed NAT rule, a separate
'nat_addresses' string must be generated for each distributed NAT rule.
For this reason, in the southbound 'Port_Binding',
'options:nat-addresses' is replaced by a 'nat_addresses' column that
can have an unlimited number of instances.  In order to allow for
upgrades, pinctrl in the ovn-controller can work off either the
'nat_addresses' column (if present), or 'options:nat-addresses'
otherwise.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 NEWS |   1 +
 ovn/controller/pinctrl.c | 108 +--
 ovn/northd/ovn-northd.c  |  85 +
 ovn/ovn-sb.ovsschema |   9 ++--
 ovn/ovn-sb.xml   |  17 ++--
 tests/ovn.at |  45 +---
 6 files changed, 185 insertions(+), 80 deletions(-)

diff --git a/NEWS b/NEWS
index 00c9106..ec8572a 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,7 @@ Post-v2.7.0
  "dot1q-tunnel" port VLAN mode.
- OVN:
  * Make the DHCPv4 router setting optional.
+ * Gratuitous ARP for NAT addresses on a distributed logical router.
- Add the command 'ovs-appctl stp/show' (see ovs-vswitchd(8)).
 
 v2.7.0 - 21 Feb 2017
diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index e564a30..50b010a 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -1056,21 +1056,23 @@ send_garp_update(const struct sbrec_port_binding 
*binding_rec,
 if (!strcmp(binding_rec->type, "l3gateway")
 || !strcmp(binding_rec->type, "patch")) {
 struct lport_addresses *laddrs = NULL;
-laddrs = shash_find_data(nat_addresses, binding_rec->logical_port);
-if (!laddrs) {
-return;
-}
-int i;
-for (i = 0; i < laddrs->n_ipv4_addrs; i++) {
-char *name = xasprintf("%s-%s", binding_rec->logical_port,
-laddrs->ipv4_addrs[i].addr_s);
-garp = shash_find_data(_garp_data, name);
-if (garp) {
-garp->ofport = ofport;
-} else {
-add_garp(name, ofport, laddrs->ea, laddrs->ipv4_addrs[i].addr);
+while (laddrs = shash_find_and_delete(nat_addresses,
+  binding_rec->logical_port)) {
+int i;
+for (i = 0; i < laddrs->n_ipv4_addrs; i++) {
+char *name = xasprintf("%s-%s", binding_rec->logical_port,
+laddrs->ipv4_addrs[i].addr_s);
+garp = shash_find_data(_garp_data, name);
+if (garp) {
+garp->ofport = ofport;
+} else {
+add_garp(name, ofport, laddrs->ea,
+ laddrs->ipv4_addrs[i].addr);
+}
+free(name);
 }
-free(name);
+destroy_lport_addresses(laddrs);
+free(laddrs);
 }
 return;
 }
@@ -1304,6 +1306,42 @@ extract_addresses_with_port(const char *addresses,
 }
 
 static void
+consider_nat_address(const char *nat_address,
+ const struct sbrec_port_binding *pb,
+ struct sset *nat_address_keys,
+ const struct lport_index *lports,
+ const struct sbrec_chassis *chassis,
+ struct shash *nat_addresses)
+{
+struct lport_addresses *laddrs = xmalloc(sizeof *laddrs);
+char *lport = NULL;
+if (!extract_addresses_with_port(nat_address, laddrs, )
+|| (!lport && !strcmp(pb->type, "patch"))) {
+free(laddrs);
+if (lport) {
+free(lport);
+}
+return;
+} else if (lport) {
+if (!pinctrl_is_chassis_resident(lports, chassis, lport)) {
+free(laddrs);
+free(lport);
+return;
+}
+free(lport);
+}
+
+int i;
+for (i = 0; i < laddrs->n_ipv4_addrs; i++) {
+char *name = xasprintf("%s-%s", pb->logical_port,
+laddrs->ipv4_addrs[i].addr_s);
+sset_add(nat_address_keys, name);
+free(name);
+}
+shash_add(nat_addresses, pb->logical_port, laddrs);
+}
+
+static void
 get_nat_add

[ovs-dev] [PATCH v5 1/3] ovn: Fix options:router-port in Gratuitous ARP tests

2017-03-27 Thread Mickey Spiegel
In two of the Gratuitous ARP tests, "options:router-port"
is not set correctly.  This does not currently affect
validity of the tests since the next line resets
"options:router-port" to the correct value.

Reported-by: Guruchuran Shetty <g...@ovn.org>
Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 tests/ovn.at | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index bbbec90..8b7ba12 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -5266,7 +5266,7 @@ ovn-nbctl create Logical_Router name=lr0 
options:chassis=hv1
 # Add router port to gateway router
 ovn-nbctl lrp-add lr0 lrp0 f0:00:00:00:00:01 192.168.0.1/24
 ovn-nbctl lsp-add ls0 lrp0-rp -- set Logical_Switch_Port lrp0-rp \
-type=router options:router-port=lrp0-rp addresses='"f0:00:00:00:00:01"'
+type=router options:router-port=lrp0 addresses='"f0:00:00:00:00:01"'
 # Add nat-address option
 ovn-nbctl lsp-set-options lrp0-rp router-port=lrp0 
nat-addresses="f0:00:00:00:00:01 192.168.0.2"
 
@@ -5314,7 +5314,7 @@ ovn-nbctl create Logical_Router name=lr0 
options:chassis=hv1
 # Add router port to gateway router
 ovn-nbctl lrp-add lr0 lrp0 f0:00:00:00:00:01 192.168.0.1/24
 ovn-nbctl lsp-add ls0 lrp0-rp -- set Logical_Switch_Port lrp0-rp \
-type=router options:router-port=lrp0-rp addresses='"f0:00:00:00:00:01"'
+type=router options:router-port=lrp0 addresses='"f0:00:00:00:00:01"'
 # Add nat-address option
 ovn-nbctl lsp-set-options lrp0-rp router-port=lrp0 nat-addresses="router"
 # Add NAT rules
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 1/2] ovn: Gratuitous ARP for centralized NAT rules on a distributed router

2017-03-21 Thread Mickey Spiegel
On Tue, Mar 21, 2017 at 1:39 PM, Guru Shetty <g...@ovn.org> wrote:

>
>
> On 17 March 2017 at 15:30, Mickey Spiegel <mickeys@gmail.com> wrote:
>
>> This patch extends gratuitous ARP support for NAT addresses so that it
>> applies to centralized NAT rules on a distributed router, in addition to
>> the existing gratuitous ARP support for NAT addresses on gateway routers.
>>
>> Gratuitous ARP packets for centralized NAT rules on a distributed router
>> are only generated on the redirect-chassis.
>
>
> A comment here on what centralized NAT rules are will be useful when this
> is seen a couple of months from now.
>

I will add the explanation of centralized NAT rules.


>
>
>> This is achieved by extending
>> the syntax for "options:nat-addresses" in the southbound database,
>> allowing the condition 'is_chassis_resident("LPORT_NAME")' to be appended
>> after the MAC and IP addresses.  This condition is automatically inserted
>> by ovn-northd when the northbound "options:nat-addresses" is set to
>> "router" and the peer is a distributed gateway port.
>>
>> A separate patch will be required to support gratuitous ARP for
>> distributed NAT rules that specify logical_port and external_mac.  Since
>> the MAC address differs and the logical port often resides on a different
>> chassis from the redirect-chassis, these addresses cannot be included in
>> the same "nat-addresses" string as for centralized NAT rules.
>>
>> Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
>> ---
>>  ovn/controller/pinctrl.c | 104 ++
>> ++---
>>  ovn/lib/ovn-util.c   |  38 ++---
>>  ovn/lib/ovn-util.h   |   2 +
>>  ovn/northd/ovn-northd.c  |  52 +---
>>  ovn/ovn-nb.xml   |  33 ---
>>  ovn/ovn-sb.xml   |  31 ++
>>  tests/ovn.at |  70 +++
>>  7 files changed, 289 insertions(+), 41 deletions(-)
>>
>> diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
>> index b342189..08af792 100644
>> --- a/ovn/controller/pinctrl.c
>> +++ b/ovn/controller/pinctrl.c
>> @@ -37,6 +37,7 @@
>>  #include "lib/dhcp.h"
>>  #include "ovn-controller.h"
>>  #include "ovn/actions.h"
>> +#include "ovn/lex.h"
>>  #include "ovn/lib/logical-fields.h"
>>  #include "ovn/lib/ovn-dhcp.h"
>>  #include "ovn/lib/ovn-util.h"
>> @@ -1049,7 +1050,8 @@ send_garp_update(const struct sbrec_port_binding
>> *binding_rec,
>>
>>  volatile struct garp_data *garp = NULL;
>>  /* Update GARP for NAT IP if it exists. */
>> -if (!strcmp(binding_rec->type, "l3gateway")) {
>> +if (!strcmp(binding_rec->type, "l3gateway")
>> +|| !strcmp(binding_rec->type, "patch")) {
>>
> A comment above on why we should also look at "patch" will be useful.
>

Replace the comment above with something along the following lines?
/* Update GARP for NAT IP if it exists. Consider port bindings with type
 * "l3gateway" for logical switch ports attached to gateway routers, and
 * port bindings with type "patch" for logical switch ports attached to
 * distributed gateway ports. */


>
>>  struct lport_addresses *laddrs = NULL;
>>  laddrs = shash_find_data(nat_addresses,
>> binding_rec->logical_port);
>>  if (!laddrs) {
>> @@ -1202,24 +1204,101 @@ get_localnet_vifs_l3gwports(const struct
>> ovsrec_bridge *br_int,
>>
>>  const struct local_datapath *ld;
>>  HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>> -if (!ld->has_local_l3gateway) {
>> +if (!ld->localnet_port) {
>>  continue;
>>  }
>>
>>  for (size_t i = 0; i < ld->ldatapath->n_lports; i++) {
>>  const struct sbrec_port_binding *pb =
>> ld->ldatapath->lports[i];
>> -if (!strcmp(pb->type, "l3gateway")
>> -/* && it's on this chassis */) {
>> +if ((ld->has_local_l3gateway && !strcmp(pb->type,
>> "l3gateway"))
>> +|| !strcmp(pb->type, "patch")) {
>>
> A comment above on why we are considering "patch" will be useful.
>

Something along the lines?
/* Consider port bindings of type "l3gateway" that connect to gateway
routers,
 * and port 

[ovs-dev] [PATCH v4 2/2] ovn: Gratuitous ARP for distributed NAT rules

2017-03-17 Thread Mickey Spiegel
This patch extends gratuitous ARP support for NAT addresses so that it
applies to distributed NAT rules on a distributed logical router.

Gratuitous ARP packets for distributed NAT rules are only generated on
the chassis where the 'logical_port' specified in the NAT rule resides.
Gratuitous ARPs are issued for the 'external_ip' address, resolving to
the 'external_mac'.

Since the MAC address varies for each distributed NAT rule, a separate
'nat_addresses' string must be generated for each distributed NAT rule.
For this reason, in the southbound 'Port_Binding',
'options:nat-addresses' is replaced by a 'nat_addresses' column that
can have an unlimited number of instances.  In order to allow for
upgrades, pinctrl in the ovn-controller can work off either the
'nat_addresses' column (if present), or 'options:nat-addresses'
otherwise.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 NEWS |  1 +
 ovn/controller/pinctrl.c | 78 
 ovn/northd/ovn-northd.c  | 85 +---
 ovn/ovn-sb.ovsschema |  9 +++--
 ovn/ovn-sb.xml   | 17 --
 tests/ovn.at | 33 ---
 6 files changed, 158 insertions(+), 65 deletions(-)

diff --git a/NEWS b/NEWS
index e2e456a..42088be 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,7 @@ Post-v2.7.0
  "dot1q-tunnel" port VLAN mode.
- OVN:
  * Make the DHCPv4 router setting optional.
+ * Gratuitous ARP for NAT addresses on a distributed logical router.
 
 v2.7.0 - 21 Feb 2017
 -
diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index 08af792..acfbcb9 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -1295,6 +1295,42 @@ extract_addresses_with_port(const char *addresses,
 }
 
 static void
+consider_nat_address(const char *nat_address,
+ const struct sbrec_port_binding *pb,
+ struct sset *nat_address_keys,
+ const struct lport_index *lports,
+ const struct sbrec_chassis *chassis,
+ struct shash *nat_addresses)
+{
+struct lport_addresses *laddrs = xmalloc(sizeof *laddrs);
+char *lport = NULL;
+if (!extract_addresses_with_port(nat_address, laddrs, )
+|| (!lport && !strcmp(pb->type, "patch"))) {
+free(laddrs);
+if (lport) {
+free(lport);
+}
+return;
+} else if (lport) {
+if (!pinctrl_is_chassis_resident(lports, chassis, lport)) {
+free(laddrs);
+free(lport);
+return;
+}
+free(lport);
+}
+
+int i;
+for (i = 0; i < laddrs->n_ipv4_addrs; i++) {
+char *name = xasprintf("%s-%s", pb->logical_port,
+laddrs->ipv4_addrs[i].addr_s);
+sset_add(nat_address_keys, name);
+free(name);
+}
+shash_add(nat_addresses, pb->logical_port, laddrs);
+}
+
+static void
 get_nat_addresses_and_keys(struct sset *nat_address_keys,
struct sset *local_l3gw_ports,
const struct lport_index *lports,
@@ -1308,38 +1344,24 @@ get_nat_addresses_and_keys(struct sset 
*nat_address_keys,
 if (!pb) {
 continue;
 }
-const char *nat_addresses_options = smap_get(>options,
- "nat-addresses");
-if (!nat_addresses_options) {
-continue;
-}
 
-struct lport_addresses *laddrs = xmalloc(sizeof *laddrs);
-char *lport = NULL;
-if (!extract_addresses_with_port(nat_addresses_options, laddrs, )
-|| (!lport && !strcmp(pb->type, "patch"))) {
-free(laddrs);
-if (lport) {
-free(lport);
+if (pb->n_nat_addresses) {
+for (int i = 0; i < pb->n_nat_addresses; i++) {
+consider_nat_address(pb->nat_addresses[i], pb,
+ nat_address_keys, lports, chassis,
+ nat_addresses);
 }
-continue;
-} else if (lport) {
-if (!pinctrl_is_chassis_resident(lports, chassis, lport)) {
-free(laddrs);
-free(lport);
-continue;
+} else {
+/* Continue to support options:nat-addresses for version
+ * upgrade. */
+const char *nat_addresses_options = smap_get(>options,
+ "nat-addresses");
+if (nat_addresses_options) {
+consider_nat_address(nat_addresses_options, pb,
+ nat_address_keys, lports, chassis,
+ nat_addresses

[ovs-dev] [PATCH v4 1/2] ovn: Gratuitous ARP for centralized NAT rules on a distributed router

2017-03-17 Thread Mickey Spiegel
This patch extends gratuitous ARP support for NAT addresses so that it
applies to centralized NAT rules on a distributed router, in addition to
the existing gratuitous ARP support for NAT addresses on gateway routers.

Gratuitous ARP packets for centralized NAT rules on a distributed router
are only generated on the redirect-chassis.  This is achieved by extending
the syntax for "options:nat-addresses" in the southbound database,
allowing the condition 'is_chassis_resident("LPORT_NAME")' to be appended
after the MAC and IP addresses.  This condition is automatically inserted
by ovn-northd when the northbound "options:nat-addresses" is set to
"router" and the peer is a distributed gateway port.

A separate patch will be required to support gratuitous ARP for
distributed NAT rules that specify logical_port and external_mac.  Since
the MAC address differs and the logical port often resides on a different
chassis from the redirect-chassis, these addresses cannot be included in
the same "nat-addresses" string as for centralized NAT rules.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/controller/pinctrl.c | 104 ---
 ovn/lib/ovn-util.c   |  38 ++---
 ovn/lib/ovn-util.h   |   2 +
 ovn/northd/ovn-northd.c  |  52 +---
 ovn/ovn-nb.xml   |  33 ---
 ovn/ovn-sb.xml   |  31 ++
 tests/ovn.at |  70 +++
 7 files changed, 289 insertions(+), 41 deletions(-)

diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index b342189..08af792 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -37,6 +37,7 @@
 #include "lib/dhcp.h"
 #include "ovn-controller.h"
 #include "ovn/actions.h"
+#include "ovn/lex.h"
 #include "ovn/lib/logical-fields.h"
 #include "ovn/lib/ovn-dhcp.h"
 #include "ovn/lib/ovn-util.h"
@@ -1049,7 +1050,8 @@ send_garp_update(const struct sbrec_port_binding 
*binding_rec,
 
 volatile struct garp_data *garp = NULL;
 /* Update GARP for NAT IP if it exists. */
-if (!strcmp(binding_rec->type, "l3gateway")) {
+if (!strcmp(binding_rec->type, "l3gateway")
+|| !strcmp(binding_rec->type, "patch")) {
 struct lport_addresses *laddrs = NULL;
 laddrs = shash_find_data(nat_addresses, binding_rec->logical_port);
 if (!laddrs) {
@@ -1202,24 +1204,101 @@ get_localnet_vifs_l3gwports(const struct ovsrec_bridge 
*br_int,
 
 const struct local_datapath *ld;
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
-if (!ld->has_local_l3gateway) {
+if (!ld->localnet_port) {
 continue;
 }
 
 for (size_t i = 0; i < ld->ldatapath->n_lports; i++) {
 const struct sbrec_port_binding *pb = ld->ldatapath->lports[i];
-if (!strcmp(pb->type, "l3gateway")
-/* && it's on this chassis */) {
+if ((ld->has_local_l3gateway && !strcmp(pb->type, "l3gateway"))
+|| !strcmp(pb->type, "patch")) {
 sset_add(local_l3gw_ports, pb->logical_port);
 }
 }
 }
 }
 
+static bool
+pinctrl_is_chassis_resident(const struct lport_index *lports,
+const struct sbrec_chassis *chassis,
+const char *port_name)
+{
+const struct sbrec_port_binding *pb
+= lport_lookup_by_name(lports, port_name);
+return pb && pb->chassis && pb->chassis == chassis;
+}
+
+/* Extracts the mac, IPv4 and IPv6 addresses, and logical port from
+ * 'addresses' which should be of the format 'MAC [IP1 IP2 ..]
+ * [is_chassis_resident("LPORT_NAME")]', where IPn should be a valid IPv4
+ * or IPv6 address, and stores them in the 'ipv4_addrs' and 'ipv6_addrs'
+ * fields of 'laddrs'.  The logical port name is stored in 'lport'.
+ *
+ * Returns true if at least 'MAC' is found in 'address', false otherwise.
+ *
+ * The caller must call destroy_lport_addresses() and free(*lport). */
+static bool
+extract_addresses_with_port(const char *addresses,
+struct lport_addresses *laddrs,
+char **lport)
+{
+int ofs;
+if (!extract_addresses(addresses, laddrs, )) {
+return false;
+} else if (ofs >= strlen(addresses)) {
+return true;
+}
+
+struct lexer lexer;
+lexer_init(, addresses + ofs);
+lexer_get();
+
+if (lexer.error || lexer.token.type != LEX_T_ID
+|| !lexer_match_id(, "is_chassis_resident")) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+VLOG_INFO_RL(, "invalid syntax '%s' in address", addresses);
+  

Re: [ovs-dev] [PATCH v3 2/3] ovn: Gratuitous ARP for centralized NAT rules on a distributed router

2017-03-17 Thread Mickey Spiegel
On Fri, Mar 17, 2017 at 12:47 PM, Guru Shetty <g...@ovn.org> wrote:

>
>
> On 2 February 2017 at 20:48, Mickey Spiegel <mickeys@gmail.com> wrote:
>
>> This patch extends gratuitous ARP support for NAT addresses so that it
>> applies to centralized NAT rules on a distributed router, in addition to
>> the existing gratuitous ARP support for NAT addresses on gateway routers.
>>
>> Gratuitous ARP packets for centralized NAT rules on a distributed router
>> are only generated on the redirect-chassis.  This is achieved by extending
>> the syntax for "options:nat-addresses" in the southbound database,
>> allowing the condition 'is_chassis_resident("LPORT_NAME")' to be appended
>> after the MAC and IP addresses.  This condition is automatically inserted
>> by ovn-northd when the northbound "options:nat-addresses" is set to
>> "router" and the peer is a distributed gateway port.
>>
>> A separate patch will be required to support gratuitous ARP for
>> distributed NAT rules that specify logical_port and external_mac.  Since
>> the MAC address differs and the logical port often resides on a different
>> chassis from the redirect-chassis, these addresses cannot be included in
>> the same "nat-addresses" string as for centralized NAT rules.
>>
>> Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
>>
>
> Would you please mind re-spinning this? It does not apply anymore.
>

In addition to the rebase, in the next patch for distributed NAT rules:
- updated ovn-sb.xml references to OVS version 2.8 rather than OVS version
2.7.
- added a NEWS item.
- streamlined code to remove an unnecessary malloc that I had added in
  ovn-northd.c ovn_port_update_sbrec.

Mickey


>
>> ---
>>  ovn/controller/pinctrl.c | 104 ++
>> ++---
>>  ovn/lib/ovn-util.c   |  38 ++---
>>  ovn/lib/ovn-util.h   |   2 +
>>  ovn/northd/ovn-northd.c  |  52 +---
>>  ovn/ovn-nb.xml   |  33 ---
>>  ovn/ovn-sb.xml   |  31 ++
>>  tests/ovn.at |  70 +++
>>  7 files changed, 289 insertions(+), 41 deletions(-)
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev, RFC] ovn: Revised support for service function chaining

2017-03-15 Thread Mickey Spiegel
On Mon, Mar 13, 2017 at 1:28 PM, John McDowall <
jmcdow...@paloaltonetworks.com> wrote:

> This patch set is an alternative implementation of service function
> chaining (SFC) for OVS/OVN. The major change from the previous patch is
> that the overloading of the ACL stage in ovn-northd.c has been removed
> and replaced with additional logic in the CHAIN stage.
>
> This was done to improve modularity of the code as it was felt that
> overloading the ACL stage did not add a lot and made it hard to compose
> reusable SFCs.
>
> The new approach can still use the matching logic in OVN as a match
> argument has been added. This has not been fully implemented yet as it
> may need some error checking to ensure the match does not conflicit
> with the lport in the chain and the bi-directional case.
>
> The other major change is the logic for the final destination of the
> flows exiting the service chain. This has been simplified such that the
> rules in the chain stage determine when the flow is exiting the port-chain
> and then the flow just follows the normal path to the src or dst of the
> flow.
>
> Areas to review
>
> 1) Logic for delivering flow after it leaves the chain. I think it is now
> general and should work across subnets etc.
>

This revision does address one of my concerns with the previous revision,
specifically the previous proposal to specify a 'last_hop_port' at the end
of the logical port chain.

However, it does not address my other major concern about the behavior at
the end of the service chain, the interaction with the way OVN forwards
traffic to the outside world.


Repeating my comments from the last revision:

In OVN without SFC, traffic is forwarded through a logical switch to a
logical switch port of type 'localnet' that effectively resides on all
hypervisors, that then forwards to the physical L2 network.  When a VM
residing on that same logical switch originates traffic destined for the
outside world, it is directed to the local instance of the type 'localnet'
logical switch port on the VM's hypervisor.  The physical L2 network learns
that the VM's MAC address resides on that hypervisor, and forwards all
traffic to that MAC address to that specific hypervisor.  Typically the
physical L2 network first learns the VM's MAC address due to a gratuitous
ARP packet sent by the ovn-controller on the VM's hypervisor.

In this SFC proposal, at the end of the logical port chain, traffic is
forwarded directly to the destination.  This might work for destinations
attached directly to OVN, but it would break forwarding to the physical L2
network:

1. If the last port pair group in the logical port chain contains more than
one port pair spread across more than one chassis, then this would
completely break upstream L2 learning.  Traffic would have to be forwarded
to a common point before being sent upstream.

2. What if the SFC classifier granularity is finer than per source VM, and
the multiple logical port chains (that one VM's traffic can be redirected
to) end at different chassis?

3. The gratuitous ARP packets generated by OVN for VIFs residing on the
same logical switch as the 'localnet' port are sent from each VIF's
chassis.  This may be different than the chassis at the end of the logical
port chain.

There are some topologies where this SFC proposal will work even for
traffic destined for the outside world: if there are no VIFs on the same
logical switch as the 'localnet' port, and either a gateway router is used
or a distributed router with only centralized NAT rules is used.  If a
distributed router is specified with distributed NAT rules, then point 3
above gets even worse, since ARP replies for a distributed NAT external IP
address are restricted to the chassis where the corresponding VIF resides.

My preference is to return to the originating hypervisor at the end of the
logical port chain.  With such an approach, L2 and L3 forwarding are
unaffected.
However, there is a question how to determine from a packet at the end of
the logical port chain, what is the originating hypervisor?
The best answer is to use an encapsulation such as Geneve or NSH that can
carry context information.
Another alternative is to do a lookup of the source address when the chain
was entered in the "exit-lport" direction.


An example to clarify the problem:

Single Logical Switch LS1.
VM1 on HV1 with MAC1, IP1.
Logical Port Chain LPC1 with one Logical Port Pair Group LPPG1.
LPPG1 has two Logical Port Pairs LPP1 and LPP2, residing on HV2 and HV3
respectively.

When VM1 comes up, HV1 will send a gratuitous ARP packet out the localnet
interface on HV1 to the physical L2 network, using MAC1 and IP1.
The physical L2 network will learn that MAC1 is on HV1.

VM1 sends a packet to the outside world.
The packet gets redirected down LPC1 to LPP1 on HV2. Using the logic below,
the packet will go through a normal L2 destination lookup on HV2, so it
will be sent out the localnet interface on HV2 to the physical L2 network.

Re: [ovs-dev] OVN: Compromised Chassis Mitigation

2017-03-15 Thread Mickey Spiegel
On Wed, Mar 15, 2017 at 7:18 AM, Lance Richardson <lrich...@redhat.com>
wrote:

> > From: "Mickey Spiegel" <mickeys@gmail.com>
> > To: "Lance Richardson" <lrich...@redhat.com>
> > Cc: "Russell Bryant" <russ...@ovn.org>, "devovs" <d...@openvswitch.org>
> > Sent: Tuesday, March 14, 2017 3:06:53 PM
> > Subject: Re: [ovs-dev] OVN: Compromised Chassis Mitigation
> >
>
> Hi Mickey,
>
> Thanks for the excellent feedback.  Here's the latest pass:
>

Thanks for driving this.

All of this looks good to me :-)

I hope others can provide some feedback on the question at the bottom.

Mickey


> 1) Add a new column, "role", of type "string" to the remote connection
>table. If set, role-based access control is applied to transactions
>on these connections using "role" as the index to the "RBAC_Roles"
>table. If not set, role-based access control is not applied (e.g.
>local unix: remotes between northd and ovsdb will not require RBAC
>and will therefore not set the "role" column).
>
>For connections having role-based ACLs enabled, a reliable client ID
>is required. This will require the use of SSL and client certificates
>with CN field containing the client ID.
>
> 2) Add a new table, "RBAC_Roles", which is indexed by a role name
>and contains two columns:
>   "name":Name of role, type string. Corresponds to "role"
>  column in remote connection table.
>   "permissions": A map of string (table name) to UUID (row in the
>  "RBAC_Permissions" table).
>
>The purpose of this table is to select a row in the RBAC_Permissions
>table based on the transaction client's "role" and the name of a
>table to be modified by an operation within a transaction. Having
>this level of indirection allows new roles and access controls to
>be created and managed dynamically, without having to update code
>or schemas.
>
> 3) Add a new table, "RBAC_Permissions" which is initialized to contain one
>row for each table in the schema that can be modified by ovn-controller.
>Each row contains:
>
>   - An "authorization" column containing a set of "string" type, where
> each string is the name of a column (or column:key) whose contents
> are to be compared against the ID of client attempting the
> transaction
> (CN field from client certificate). If this set is empty, all IDs
> are
> considered to be authorized.  If this set contains more than one
> string,
> at least one must contain the client ID in order for the action to
> be
> considered authorized.
>
>   - An "insert_delete" column of type boolean. If true, insertions
> are allowed by any client and deletions are allowed for rows
> meeting the authorization requirement.
>
>   - An "update" column of type "set of strings". Each string is the
> name of a column (or column:key) for which modification is allowed
> in rows meeting the authorization requirement.
>
> For the current implementation of the OVN_Southbound schema, these tables
> would have the following contents:
>
> RBAC_Roles:
> name:"controller"
> permissions: "Chassis": ,
>  "Encap":   ,
>  "Port_Binding": RBAC_Permissions>,
>  "MAC_Binding":  RBAC_Permissions>
>
> RBAC_Permissions:
>Chassis row:
>   authorization: "chassis"
>   insert_delete: "true"
>   update:"nb_cfg", "external_ids", "encaps",
> "vtep_logical_switches"
>  Modification of these columns is allowed for rows
> which in
>  which the authorization check passes.
>
>Encap row:
>   authorization: "chassis"  New column containing CN ID of row creator.
>   insert_delete: "true"
>   update:"type", "options", "ip"
>
>Port_Binding row:
>   authorization: "" All chassis are authorized. In a recent
> live migration proposal, this column would contain
> "options:chassis" and "options:migration-
> destination".
>   insert_delete: "false"
>   update:"chassis"
>
>MAC_Binding row:
>   au

Re: [ovs-dev] OVN: Compromised Chassis Mitigation

2017-03-14 Thread Mickey Spiegel
On Tue, Mar 14, 2017 at 12:01 PM, Lance Richardson <lrich...@redhat.com>
wrote:

>
>
> - Original Message -----
> > From: "Mickey Spiegel" <mickeys@gmail.com>
> > To: "Lance Richardson" <lrich...@redhat.com>
> > Cc: "Russell Bryant" <russ...@ovn.org>, "devovs" <d...@openvswitch.org>
> > Sent: Tuesday, March 14, 2017 2:27:30 PM
> > Subject: Re: [ovs-dev] OVN: Compromised Chassis Mitigation
> >
> > On Tue, Mar 14, 2017 at 11:14 AM, Lance Richardson <lrich...@redhat.com>
> > wrote:
> >
> > > > From: "Russell Bryant" <russ...@ovn.org>
> > > > To: "Mickey Spiegel" <mickeys@gmail.com>
> > > > Cc: "Lance Richardson" <lrich...@redhat.com>, "devovs" <
> > > d...@openvswitch.org>
> > > > Sent: Tuesday, March 14, 2017 1:48:55 PM
> > > > Subject: Re: [ovs-dev] OVN: Compromised Chassis Mitigation
> > > >
> > > > On Tue, Mar 14, 2017 at 5:08 AM, Mickey Spiegel <
> mickeys@gmail.com>
> > > > wrote:
> > > > >>   - An "authorization" column containing a set of "string"
> type,
> > > where
> > > > >> each string is the name of a column (or column:key) that
> must
> > > > >> contain
> > > > >> the ID of client attempting the transaction (CN field from
> > > client
> > > > >> certificate). If this set is empty, all IDs are
> considered to
> > > be
> > > > >> authorized.  If this set contains more than one string, at
> > > least
> > > > >> one
> > > > >> must contain the client ID in order to be considered
> > > authorized.
> > > > >
> > > > >
> > > > > This is the "where" column in the RBAC approach, where the "CN = "
> > > > > part of the logical expression is implied, and only the column
> part is
> > > > > specified.
> > > > >
> > > > > When access is attempted from an ovn-controller, then the
> > > > > authorization rule applies. However, there are other things
> > > > > accessing OVN SB DB that are allowed to change these things
> > > > > even though they do not have a CN, or at least not one that
> > > > > looks like a chassis name. For example, ovn-northd.
> > > > > How is this controlled?
> > > > > This is exactly why the indirection through role is suggested, to
> > > > > allow access to various things without hardcoding specific logic
> > > > > to determine what is subject to the rules.
> > > >
> > > >
> > > > I had imagined we'd deploy OVN such that ovn-northd would connect to
> a
> > > > different ovsdb remote that didn't have ACLs turned on.
> > > >
> > >
> > > I think we do need a way to specify whether ACLs should be applied to
> > > transactions on a given client connection.
> > >
> > > We could have ACLs implicitly apply only to read-only connections, but
> > > I think it would be better to make this an explicit configuration item.
> > >
> > > One way to do this would be to add a "role" attribute to each remote
> and
> > > have a table to map "role" to a specific ACL table.
> > >
> > > We could also consider adding an ACL table reference column to the
> > > OVN_Southbound "Connection" table and make ACLs part of remote
> > > configuration; if an ACL table is configured for a particular remote,
> > > ACLs from that table would be used for transactions on that remote.
> > >
> >
> > I don't follow. What are the definitions of ACL table and the reference?
> >
> > I thought there is only one ACL / Permissions table. The reason for
> > the indirection through "role" is to identify the rows in the table that
> > apply. There may be multiple rows that apply.
> >
>
> I was suggesting a simplification that would eliminate the need for that
> indirection by having the remote refer directly to an ACL/Permissions
> table. The drawback to this simplification would be having to modify
> the db schema in order to add a new role (a new ACL/Permissions table
> would be needed for each added role), versus being able to define
> new roles and rule sets dynamically.  For OVN_Southbound, this doesn't
&

Re: [ovs-dev] OVN: Compromised Chassis Mitigation

2017-03-14 Thread Mickey Spiegel
On Tue, Mar 14, 2017 at 11:14 AM, Lance Richardson <lrich...@redhat.com>
wrote:

> > From: "Russell Bryant" <russ...@ovn.org>
> > To: "Mickey Spiegel" <mickeys@gmail.com>
> > Cc: "Lance Richardson" <lrich...@redhat.com>, "devovs" <
> d...@openvswitch.org>
> > Sent: Tuesday, March 14, 2017 1:48:55 PM
> > Subject: Re: [ovs-dev] OVN: Compromised Chassis Mitigation
> >
> > On Tue, Mar 14, 2017 at 5:08 AM, Mickey Spiegel <mickeys@gmail.com>
> > wrote:
> > >>   - An "authorization" column containing a set of "string" type,
> where
> > >> each string is the name of a column (or column:key) that must
> > >> contain
> > >> the ID of client attempting the transaction (CN field from
> client
> > >> certificate). If this set is empty, all IDs are considered to
> be
> > >> authorized.  If this set contains more than one string, at
> least
> > >> one
> > >> must contain the client ID in order to be considered
> authorized.
> > >
> > >
> > > This is the "where" column in the RBAC approach, where the "CN = "
> > > part of the logical expression is implied, and only the column part is
> > > specified.
> > >
> > > When access is attempted from an ovn-controller, then the
> > > authorization rule applies. However, there are other things
> > > accessing OVN SB DB that are allowed to change these things
> > > even though they do not have a CN, or at least not one that
> > > looks like a chassis name. For example, ovn-northd.
> > > How is this controlled?
> > > This is exactly why the indirection through role is suggested, to
> > > allow access to various things without hardcoding specific logic
> > > to determine what is subject to the rules.
> >
> >
> > I had imagined we'd deploy OVN such that ovn-northd would connect to a
> > different ovsdb remote that didn't have ACLs turned on.
> >
>
> I think we do need a way to specify whether ACLs should be applied to
> transactions on a given client connection.
>
> We could have ACLs implicitly apply only to read-only connections, but
> I think it would be better to make this an explicit configuration item.
>
> One way to do this would be to add a "role" attribute to each remote and
> have a table to map "role" to a specific ACL table.
>
> We could also consider adding an ACL table reference column to the
> OVN_Southbound "Connection" table and make ACLs part of remote
> configuration; if an ACL table is configured for a particular remote,
> ACLs from that table would be used for transactions on that remote.
>

I don't follow. What are the definitions of ACL table and the reference?

I thought there is only one ACL / Permissions table. The reason for
the indirection through "role" is to identify the rows in the table that
apply. There may be multiple rows that apply.

I guess another alternative is to have row just be a column in the
ACL / Permissions table.

Mickey


> I think either approach would provide similar flexibility, the
> second seems a little more straightforward.
>
> Thoughts?
>
> Thanks,
>
>Lance
>
> > Currently, our OpenStack deployment always co-locates ovn-northd with
> > the OVN databases, so it just connects over a unix socket.
> >
> > --
> > Russell Bryant
> >
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVN: Compromised Chassis Mitigation

2017-03-14 Thread Mickey Spiegel
On Mon, Mar 13, 2017 at 1:20 PM, Lance Richardson <lrich...@redhat.com>
wrote:

> > From: "Mickey Spiegel" <mickeys@gmail.com>
> > To: "Lance Richardson" <lrich...@redhat.com>
> > Cc: "devovs" <d...@openvswitch.org>
> > Sent: Thursday, March 9, 2017 6:49:53 PM
> > Subject: Re: [ovs-dev] OVN: Compromised Chassis Mitigation
> >
> > On Thu, Mar 9, 2017 at 8:52 AM, Lance Richardson <lrich...@redhat.com>
> > wrote:
> >
> > > > From: "Mickey Spiegel" <mickeys@gmail.com>
> > > > To: "Lance Richardson" <lrich...@redhat.com>
> > > > Cc: "devovs" <d...@openvswitch.org>
> > > > Sent: Wednesday, March 8, 2017 10:41:01 PM
> > > > Subject: Re: [ovs-dev] OVN: Compromised Chassis Mitigation
> > > >
> > > > On Wed, Mar 8, 2017 at 1:28 PM, Lance Richardson <
> lrich...@redhat.com>
> > > > wrote:
> > > >
>
> 
>
> >
> > > BTW, one other thought was to support expressions like those used for
> > > "select"
> > > operations to specify ACLs.  This is probably too heavy, but I do
> wonder if
> > > code for handling "select" expressions could be reused here (would
> need to
> > > extend it to allow expressions involving the per-session CN name).
> > >
> >
> > I like this idea. I do not think that it is too heavy.
> >
> > Lance and I discussed this during the OVN meeting, coming up with the
> > following observations and questions.
> >
> > The "RBAC_Permissions" table would have another column:
> > - "where", which specifies the conditions that must be satisfied in order
> >   for this permission to take effect. These conditions apply to all
> > operations
> >   allowed by the permission, except for "insert".
> >
> > One difference from the "where" condition for "select" operations is that
> > the column value needs to be compared to the CN name rather than a
> > specified constant value.
> >
> > Another question is whether we need to support different "where"
> > conditions for different columns or different operations on the same
> > table?
> > If so, would we allow one role to refer to multiple "RBAC_Permissions"
> > with the same value of "table"?
> > This would require the addition of error handling code for the case
> > where the same operation or the same column is specified in different
> > permissions referred to by the same role.
> >
> > Looking at the syntax for the "where" condition for "select" operations
> > after the meeting, I noticed that it uses a JSON array. However,
> > scanning the code, it looks to me like the JSON array syntax is not
> > supported by the OVSDB IDL, so this syntax could not be used directly
> > by "RBAC_Permissions"?
> > Does that mean that the code in ovsdb/condition.c could not be reused
> > for this purpose?
> >
> > If we are OK with just one "where" clause in each permission, then we
> > could just add "where_column" and "where_function". However, if we
> > want to allow multiple clauses in a condition, then we would need
> > something more complex. One simple but somewhat ugly approach is
> > to put column name in the "key" and function in the "value".
> >
> > A more complex approach is to go with a full expression syntax. A few
> > design issues arise when thinking about how to define this syntax:
> > - Is a string specifying the column name good enough, or would we also
> >   need to support column_name:key?
> >   e.g. to evaluate "options:chassis" in "Port_Binding"?
> > - Is there anything other than CN name that we would compare to?
> >   Do we need a corresponding symbols table?
> > - Do we allow constants?
> >
> > Mickey
> >
> >
> >
> > >
> > > I think we're getting closer to a good solution... thanks again!
> > >
> > > Regards,
> > >
> > > Lance
> > >
> >
>
> In thinking about how to resolve some of the points above, (still planning
> to respond), I wondered if the original proposal could be improved on. How
> about this:
>
> Use a single transaction ACL table, implemented as follows:
>

I view this proposal as a variant of the RBAC approach.
I guess the RBAC approach and the root ACL table were not that
far apart.
This table i

Re: [ovs-dev] OVN: Compromised Chassis Mitigation

2017-03-09 Thread Mickey Spiegel
On Thu, Mar 9, 2017 at 8:52 AM, Lance Richardson <lrich...@redhat.com>
wrote:

> > From: "Mickey Spiegel" <mickeys@gmail.com>
> > To: "Lance Richardson" <lrich...@redhat.com>
> > Cc: "devovs" <d...@openvswitch.org>
> > Sent: Wednesday, March 8, 2017 10:41:01 PM
> > Subject: Re: [ovs-dev] OVN: Compromised Chassis Mitigation
> >
> > On Wed, Mar 8, 2017 at 1:28 PM, Lance Richardson <lrich...@redhat.com>
> > wrote:
> >
> > > This email (prompted by recent discussions in IRC on the subject)
> > > outlines some of the options that have been discussed for securing
> > > OVN_Southbound from a compromised chassis, and includes a strawman
> > > proposal for an ovsdb transaction ACL implementation.
> > >
> > > Feedback appreciated, hopefully we can discuss in IRC tomorrow.
> > >
> >
> > Thanks for the proposal. Some comments and an alternative
> > approach to option 3 near the bottom.
>
> Hi Mickey,
>
> Thanks for taking a look! More below...
>
> >
> > I should also note that in addition to the control plane issues
> > due to compromised chassis described here, there is also a
> > data plane issue with respect to tunnels, or at least there was
> > a couple of months ago. When a geneve tunnel packet is
> > received without the geneve options field, this results in a
> > cached flow that drops all packets on the tunnel, even
> > subsequent packets with a valid geneve options field. I peeked
> > at the OVS flow creation code for tunnels, but it was not
> > immediately obvious whether there is a problem in the OVS
> > code, or we just need to amend OVN coding of table 0 to
> > check for presence of the geneve options field.
> >
> >
> > > Regards,
> > >
> > >Lance Richardson
> > >
> > >
> > > Problem Description
> > > ---
> > > Each ovn-controller instance currently has full write access to the OVN
> > > southbound database.  This means that a single compromised chassis can
> > > potentially disrupt every chassis in an OVN network.
> > >
> > > Goals of Solution
> > > ---
> > > Limiting the potential damage that can be inflicted on an OVN network
> by
> > > a compromised chassis will mean restricting the set of objects in
> > > OVN_Southbound that can be modified by a chassis to a minimum.  In the
> > > current implementation, there are a number of tables that do not need
> > > to be modified by ovn-controller:
> > > SB_Global
> > > Address_Set
> > > Logical_Flow
> > > Multicast_Group
> > > Datapath_Binding
> > > DHCP_Options
> > > DHCPv6_Options
> > > Connection
> > > SSL
> > >
> > > Tables that do need to be modified by ovn-controller include:
> > > Chassis
> > >Rows in this table are added and updated by ovn-controller.
> > >While there have been proposals to make this table read-
> > >only for ovn-controller, note that the nb_cfg column still needs
> > >to be updated by the associated chassis in order for the
> > >"--wait=hv" option of ovn-nbctl to work.
> > >
> > > Encap
> > >Rows in this table are added/deleted/modified by ovn-controller.
> > >
> > > Port_Binding
> > >Rows in this table are added/deleted/modified by ovn-northd,
> with
> > >the exception of the "chassis" column, which is updated by
> > >ovn-controller.
> > >
> > > MAC_Binding
> > >Rows in this table are inserted/deleted/modified by all chassis.
> > >
> > > Possible Solutions
> > > ---
> > > Several possible implementations have been proposed on the ovs-dev
> mailing
> > > list and/or discussed in IRC, including:
> > >
> > >   1) Eliminate the need for writes to the southbound database by ovn-
> > >  controller, adding new mechanisms for managing tables that are
> > >  currently written to by ovn-controller.
> > >
> > >   2) Enhance ovsdb-server to support role-based (or id-based) access
> > >  control mechanisms, and use these mechanisms to restrict write
> > >  access to the southbound database by ovn-controller.
> > >
> > >   3) Eliminate all write access to the southbound database

Re: [ovs-dev] [PATCH] Support multiple logical routing port configuration "redirect-chassis" on a distributed router

2017-02-27 Thread Mickey Spiegel
This is a quick preliminary review. I will review this in more detail
tomorrow afternoon.

On Mon, Feb 27, 2017 at 5:12 AM, Guoshuai Li  wrote:

> The main application scenario of this patch is that the user flow wants to
> different destination addresses through different external networks.
> This scenario requires a distributed route to be associated with
> multiple external network logical switches.
>
> In a distributed router, the NAT logical flow table is generated based on
> the external IP lookup distributed router port, otherwise not generated.
>

I see your problem that you need some way to figure out which router
gateway port this NAT rule should be associated with, now that you have
multiple distributed gateway ports on the same logical router.

However, there is currently no restriction that NAT external IP addresses
need to match an existing subnet on a router port. I am uncomfortable with
the addition of such a restriction in this patch, since it will not support
scenarios that are valid in OVN and in OpenStack today.


> When the destination address of the packet is an external IP of the NAT
> rule,
> and the ingress port is not a gateway,
> it is necessary to route the actual outgoing port.
>

Are you suggesting that static routes need to be programmed to NAT external
addresses?
That would significantly complicate what the user needs to do in order to
make NAT on a distributed router work.
Doesn't this break existing tests?
My suggestion would be to set outport when you set REGBIT_NAT_REDIRECT, in
the DNAT and UNSNAT stages.

I would also like to hear back from Russell or others about the "routed
external networks" discussion at the OpenStack PTG in Atlanta.
Does a router still have only one external network?
Does a router still have only one external IPv4 address?
Is it limited to one segment?
If there can be multiple external IPv4 addresses, then there might be some
overlap with this proposal.

Mickey


>
> Signed-off-by: Guoshuai Li 
> Co-authored-by: Dong Jun 
> ---
>  ovn/northd/ovn-northd.8.xml |  22 ++---
>  ovn/northd/ovn-northd.c | 232 +++---
> --
>  ovn/ovn-nb.xml  |  12 ++-
>  3 files changed, 137 insertions(+), 129 deletions(-)
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovn-controller: Assign ct_zone id to local datapaths instead of lports

2017-02-23 Thread Mickey Spiegel
On Thu, Feb 23, 2017 at 6:04 AM,  wrote:

> From: Numan Siddique 
>
> Having zone id per datapath is more than sufficient, because the
> CT tuple information will be unique anyway with in the logical
> datapath.
>

This proposal conflicts with another proposal that is currently in flight (
https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328759.html),
where we were thinking of using ct_label in SFC for load balancing between
multiple port pairs in one port pair group. In that case, for each SFC hop
we would need to pick up a different value from ct_label, so for SFC ports
we would need a different ct_zone for each logical port in one logical
switch.

Another issue is that this breaks the current use of ct_label.blocked in
ACLs. If the ingress ACL allows a connection but the egress ACL blocks the
connection, then ingress will be clearing the bit while egress will be
setting the bit. Perhaps this could be resolved by replacing
ct_label.blocked with ct_label.blocked_ingress and ct_label.blocked_egress?
There might be other solutions, depending on the future patch that sends to
CT only once for both ingress and egress.

Mickey


>
> In our testing we have observed that, the packet between two ports of
> a datapath within the same chassis is sent to the CT twice (both in
> ingress and egress pipeline) with (2 different zone ids) resulting in
> some performance hit. With this patch, the packet will use the same
> zone id. This doesn't improve the performace, but a future patch may
> optimize this scenario by sending the packet to CT only once.
>
> Signed-off-by: Numan Siddique 
> ---
>  ovn/controller/ovn-controller.8.xml |  8 
>  ovn/controller/ovn-controller.c | 17 +++--
>  ovn/controller/physical.c   |  6 --
>  3 files changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/ovn/controller/ovn-controller.8.xml b/ovn/controller/ovn-
> controller.8.xml
> index c92fd55..b8bec6c 100644
> --- a/ovn/controller/ovn-controller.8.xml
> +++ b/ovn/controller/ovn-controller.8.xml
> @@ -209,14 +209,14 @@
>  external_ids:ct-zone-* in the Bridge
> table
>
>
> -Logical ports and gateway routers are assigned a connection
> +Logical switch and router datapaths are assigned a connection
>  tracking zone by ovn-controller for stateful
>  services.  To keep state across restarts of
>  ovn-controller, these keys are stored in the
>  integration bridge's Bridge table.  The name contains a prefix
>  of ct-zone- followed by the name of the logical
> -port or gateway router's zone key.  The value for this key
> -identifies the zone used for this port.
> +datapath's zone key.  The value for this key identifies the zone
> used
> +for the datapath.
>
>
>
> @@ -309,7 +309,7 @@
>
>ct-zone-list
>
> -Lists each local logical port and its connection tracking zone.
> +Lists each local logical datapath and its connection tracking
> zone.
>
>
>inject-pkt microflow
> diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-
> controller.c
> index ea299da..696723d 100644
> --- a/ovn/controller/ovn-controller.c
> +++ b/ovn/controller/ovn-controller.c
> @@ -307,7 +307,7 @@ get_ovnsb_remote(struct ovsdb_idl *ovs_idl)
>  }
>
>  static void
> -update_ct_zones(struct sset *lports, const struct hmap *local_datapaths,
> +update_ct_zones(const struct hmap *local_datapaths,
>  struct simap *ct_zones, unsigned long *ct_zone_bitmap,
>  struct shash *pending_ct_zones)
>  {
> @@ -316,15 +316,15 @@ update_ct_zones(struct sset *lports, const struct
> hmap *local_datapaths,
>  const char *user;
>  struct sset all_users = SSET_INITIALIZER(_users);
>
> -SSET_FOR_EACH(user, lports) {
> -sset_add(_users, user);
> -}
> -
>  /* Local patched datapath (gateway routers) need zones assigned. */
>  const struct local_datapath *ld;
>  HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>  /* XXX Add method to limit zone assignment to logical router
>   * datapaths with NAT */
> +char *dp_user = xasprintf(UUID_FMT,
> +  UUID_ARGS(>datapath->
> header_.uuid));
> +sset_add(_users, dp_user);
> +free(dp_user);
>  char *dnat = alloc_nat_zone_key(>datapath->header_.uuid,
> "dnat");
>  char *snat = alloc_nat_zone_key(>datapath->header_.uuid,
> "snat");
>  sset_add(_users, dnat);
> @@ -350,10 +350,7 @@ update_ct_zones(struct sset *lports, const struct
> hmap *local_datapaths,
>  }
>  }
>
> -/* xxx This is wasteful to assign a zone to each port--even if no
> - * xxx security policy is applied. */
> -
> -/* Assign a unique zone id for each logical port and two zones
> +/* Assign a unique zone id for each 

Re: [ovs-dev] [ovs-dev, RFC] ovn: support for service function chaining

2017-02-10 Thread Mickey Spiegel
On Thu, Feb 2, 2017 at 3:22 PM,  wrote:

> From: John McDowall 
>
> This patchset is the first round at having Service Function Chaining
> functionality through OVN. The implementation is done entirely
> on the northbound side of OVN. It is a bump on the wire implementation,
> so no attempt is being made in keeping state while packets visit each
> hop of the chain. ACLs are used as the classifiers, with the augmentation
> of action SFC, as well as option column.
>
> The current implementation of traffic redirection to the service chain
> is implemented by adding an additional action 'sfc' to the ACL stage. This
> overloads the ACL stage and this might not be the best long term approach.
> Guidence on whether this is "good enough" for now would be appreciated.
>
> How to leverage load balancing is also an open issue. The current LB
> solution
> in OVN is L3 based. Suggestions on how to implement LB at L2 for SFC would
> also be appreciated.
>
> This is not yet ready to be merged, as it lacks unit tests and a rigorous
> code review. Nevertheless, it works fine if you take into account a
> number of limitations that include:
>
> * missing load balancer integration;
> * no ipv6 support;
> * chain spanning logical switches (not supported);
> * bidirectional chains (not implemented);
> * no test cases.
> * other suggestions?
>
> This is the code that was used for SFC demo and talk at OVSCon 2016.
>
> Changes:
>
>  * ovn-northd.xml: Added documentation for SFC ACL Action
>  * ovn-northd.c: Added new stage for SFC and modified ACL stage to include
> sfc action
>  * ovn-architecture.xml Included architecture of SFC in documentation
>  * ovn-nb.ovsschema: Extended schema to include port-chain,
> port-pair-groups, port-pairs
> and added ACL SFC action
>  * ovn-nb.xml: Added documentation for extensions to  ovn-nbctl for
> port-chain,
> port-pair-groups, port-pairs and ACL SFC action
>  * ovn-nbctl.c: Added code to extend ovn-nbctl for port-chain,
> port-pair-groups, port-pairs and ACL SFC action
>
> Current Limitations
>
> This is not yet ready to be merged, as it lacks unit tests and a rigorous
> code review. Nevertheless, it works fine if you take into account a
> few limitations:
>
> * missing load balancer integration.
> * no ipv6 support.
> * chain spanning logical switches (not supported).
> * bidirectional chains (not implemented).
> * no test cases.
> * documentation needs rework as there have been several changes as the
> code has progressed.
>
> Before I work on the limitations and start adding test cases I would like
> to make sure I am on the right track to get this approved for submission.
> Once it is approved for OVN/OVS I can add it to Openstack and also planning
> on using it for container service chaining.
>
> Questions:
>
> * Is the basic approach aligned with the direction of OVN/OVS.
>

The basic approach of SFC classification followed by a service chain
comprised of port pair groups seems reasonably well aligned with OVN/OVS.

I am concerned about the behavior at the end of the service chain, most
significantly the interaction with the way OVN forwards traffic to the
outside world.

In OVN without SFC, traffic is forwarded through a logical switch to a
logical switch port of type 'localnet' that effectively resides on all
hypervisors, that then forwards to the physical L2 network.  When a VM
residing on that same logical switch originates traffic destined for the
outside world, it is directed to the local instance of the type 'localnet'
logical switch port on the VM's hypervisor.  The physical L2 network learns
that the VM's MAC address resides on that hypervisor, and forwards all
traffic to that MAC address to that specific hypervisor.  Typically the
physical L2 network first learns the VM's MAC address due to a gratuitous
ARP packet sent by the ovn-controller on the VM's hypervisor.

In this SFC proposal, at the end of the logical port chain, traffic is
forwarded directly to the destination.  This might work for destinations
attached directly to OVN, but it would break forwarding to the physical L2
network:

1. If the last port pair group in the logical port chain contains more than
one port pair spread across more than one chassis, then this would
completely break upstream L2 learning.  Traffic would have to be forwarded
to a common point before being sent upstream.

2. What if the SFC classifier granularity is finer than per source VM, and
the multiple logical port chains (that one VM's traffic can be redirected
to) end at different chassis?

3. The gratuitous ARP packets generated by OVN for VIFs residing on the
same logical switch as the 'localnet' port are sent from each VIF's
chassis.  This may be different than the chassis at the end of the logical
port chain.

There are some topologies where this SFC proposal will work even for
traffic destined for the outside world: if there are no VIFs on the same
logical 

[ovs-dev] [PATCH] AUTHORS: Add Mickey Spiegel

2017-02-10 Thread Mickey Spiegel
Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 AUTHORS.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/AUTHORS.rst b/AUTHORS.rst
index b567fcc..3833041 100644
--- a/AUTHORS.rst
+++ b/AUTHORS.rst
@@ -207,6 +207,7 @@ Maxime Coquelin maxime.coque...@redhat.com
 Mehak Mahajan   mmaha...@nicira.com
 Michael Arnaldi arnaldimich...@gmail.com
 Michal Weglicki michalx.wegli...@intel.com
+Mickey Spiegel  mickeys@gmail.com
 Mijo Safradin   m...@linux.vnet.ibm.com
 Minoru TAKAHASHItakahashi.mino...@gmail.com
 Murphy McCauley murphy.mccau...@gmail.com
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH branch-2.7] ovn: Mention distributed NAT in NEWS

2017-02-10 Thread Mickey Spiegel
Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 NEWS | 10 ++
 debian/changelog | 10 ++
 2 files changed, 20 insertions(+)

diff --git a/NEWS b/NEWS
index 3006f77..89bb026 100644
--- a/NEWS
+++ b/NEWS
@@ -20,6 +20,16 @@ v2.7.0 - xx xxx 
information regarding remote connection configuration.
  * New appctl "inject-pkt" command in ovn-controller that allows
packets to be injected into the connected OVS instance.
+ * Distributed logical routers may now be connected directly to
+   logical switches with localnet ports, by specifying a
+   "redirect-chassis" on the distributed gateway port of the
+   logical router.  NAT rules may be specified directly on the
+   distributed logical router, and are handled either centrally on
+   the "redirect-chassis", or in many cases are handled locally on
+   the hypervisor where the corresponding logical port resides.
+   Gratuitous ARP for NAT addresses on a distributed logical
+   router is not yet supported, but will be added in a future
+   version.
- Fixed regression in table stats maintenance introduced in OVS
  2.3.0, wherein the number of OpenFlow table hits and misses was
  not accurate.
diff --git a/debian/changelog b/debian/changelog
index 9ea5f95..5290d33 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -21,6 +21,16 @@ openvswitch (2.7.0-1) unstable; urgency=low
information regarding remote connection configuration.
  * New appctl "inject-pkt" command in ovn-controller that allows
packets to be injected into the connected OVS instance.
+ * Distributed logical routers may now be connected directly to
+   logical switches with localnet ports, by specifying a
+   "redirect-chassis" on the distributed gateway port of the
+   logical router.  NAT rules may be specified directly on the
+   distributed logical router, and are handled either centrally on
+   the "redirect-chassis", or in many cases are handled locally on
+   the hypervisor where the corresponding logical port resides.
+   Gratuitous ARP for NAT addresses on a distributed logical
+   router is not yet supported, but will be added in a future
+   version.
- Fixed regression in table stats maintenance introduced in OVS
  2.3.0, wherein the number of OpenFlow table hits and misses was
  not accurate.
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 2/3] ovn: Gratuitous ARP for centralized NAT rules on a distributed router

2017-02-02 Thread Mickey Spiegel
This patch extends gratuitous ARP support for NAT addresses so that it
applies to centralized NAT rules on a distributed router, in addition to
the existing gratuitous ARP support for NAT addresses on gateway routers.

Gratuitous ARP packets for centralized NAT rules on a distributed router
are only generated on the redirect-chassis.  This is achieved by extending
the syntax for "options:nat-addresses" in the southbound database,
allowing the condition 'is_chassis_resident("LPORT_NAME")' to be appended
after the MAC and IP addresses.  This condition is automatically inserted
by ovn-northd when the northbound "options:nat-addresses" is set to
"router" and the peer is a distributed gateway port.

A separate patch will be required to support gratuitous ARP for
distributed NAT rules that specify logical_port and external_mac.  Since
the MAC address differs and the logical port often resides on a different
chassis from the redirect-chassis, these addresses cannot be included in
the same "nat-addresses" string as for centralized NAT rules.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/controller/pinctrl.c | 104 ---
 ovn/lib/ovn-util.c   |  38 ++---
 ovn/lib/ovn-util.h   |   2 +
 ovn/northd/ovn-northd.c  |  52 +---
 ovn/ovn-nb.xml   |  33 ---
 ovn/ovn-sb.xml   |  31 ++
 tests/ovn.at |  70 +++
 7 files changed, 289 insertions(+), 41 deletions(-)

diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index 0cdbf87..c75f753 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -37,6 +37,7 @@
 #include "lib/dhcp.h"
 #include "ovn-controller.h"
 #include "ovn/actions.h"
+#include "ovn/lex.h"
 #include "ovn/lib/logical-fields.h"
 #include "ovn/lib/ovn-dhcp.h"
 #include "ovn/lib/ovn-util.h"
@@ -1047,7 +1048,8 @@ send_garp_update(const struct sbrec_port_binding 
*binding_rec,
 
 volatile struct garp_data *garp = NULL;
 /* Update GARP for NAT IP if it exists. */
-if (!strcmp(binding_rec->type, "l3gateway")) {
+if (!strcmp(binding_rec->type, "l3gateway")
+|| !strcmp(binding_rec->type, "patch")) {
 struct lport_addresses *laddrs = NULL;
 laddrs = shash_find_data(nat_addresses, binding_rec->logical_port);
 if (!laddrs) {
@@ -1200,24 +1202,101 @@ get_localnet_vifs_l3gwports(const struct ovsrec_bridge 
*br_int,
 
 const struct local_datapath *ld;
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
-if (!ld->has_local_l3gateway) {
+if (!ld->localnet_port) {
 continue;
 }
 
 for (size_t i = 0; i < ld->ldatapath->n_lports; i++) {
 const struct sbrec_port_binding *pb = ld->ldatapath->lports[i];
-if (!strcmp(pb->type, "l3gateway")
-/* && it's on this chassis */) {
+if ((ld->has_local_l3gateway && !strcmp(pb->type, "l3gateway"))
+|| !strcmp(pb->type, "patch")) {
 sset_add(local_l3gw_ports, pb->logical_port);
 }
 }
 }
 }
 
+static bool
+pinctrl_is_chassis_resident(const struct lport_index *lports,
+const struct sbrec_chassis *chassis,
+const char *port_name)
+{
+const struct sbrec_port_binding *pb
+= lport_lookup_by_name(lports, port_name);
+return pb && pb->chassis && pb->chassis == chassis;
+}
+
+/* Extracts the mac, IPv4 and IPv6 addresses, and logical port from
+ * 'addresses' which should be of the format 'MAC [IP1 IP2 ..]
+ * [is_chassis_resident("LPORT_NAME")]', where IPn should be a valid IPv4
+ * or IPv6 address, and stores them in the 'ipv4_addrs' and 'ipv6_addrs'
+ * fields of 'laddrs'.  The logical port name is stored in 'lport'.
+ *
+ * Returns true if at least 'MAC' is found in 'address', false otherwise.
+ *
+ * The caller must call destroy_lport_addresses() and free(lport). */
+static bool
+extract_addresses_with_port(const char *addresses,
+struct lport_addresses *laddrs,
+char **lport)
+{
+int ofs;
+if (!extract_addresses(addresses, laddrs, )) {
+return false;
+} else if (ofs >= strlen(addresses)) {
+return true;
+}
+
+struct lexer lexer;
+lexer_init(, addresses + ofs);
+lexer_get();
+
+if (lexer.error || lexer.token.type != LEX_T_ID
+|| !lexer_match_id(, "is_chassis_resident")) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+VLOG_INFO_RL(, "invalid syntax '%s' in address", addresses);
+  

[ovs-dev] [PATCH v2 2/2] ovn: Gratuitous ARP for centralized NAT rules on a distributed router

2017-02-01 Thread Mickey Spiegel
This patch extends gratuitous ARP support for NAT addresses so that it
applies to centralized NAT rules on a distributed router, in addition to
the existing gratuitous ARP support for NAT addresses on gateway routers.

Gratuitous ARP packets for centralized NAT rules on a distributed router
are only generated on the redirect-chassis.  This is achieved by extending
the syntax for "options:nat-addresses" in the southbound database,
allowing the condition 'is_chassis_resident("LPORT_NAME")' to be appended
after the MAC and IP addresses.  This condition is automatically inserted
by ovn-northd when the northbound "options:nat-addresses" is set to
"router" and the peer is a distributed gateway port.

A separate patch will be required to support gratuitous ARP for
distributed NAT rules that specify logical_port and external_mac.  Since
the MAC address differs and the logical port often resides on a different
chassis from the redirect-chassis, these addresses cannot be included in
the same "nat-addresses" string as for centralized NAT rules.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/controller/pinctrl.c | 104 ---
 ovn/lib/ovn-util.c   |  38 ++---
 ovn/lib/ovn-util.h   |   2 +
 ovn/northd/ovn-northd.c  |  52 +---
 ovn/ovn-nb.xml   |  33 ---
 ovn/ovn-sb.xml   |  31 ++
 tests/ovn.at |  70 +++
 7 files changed, 289 insertions(+), 41 deletions(-)

diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index 0cdbf87..c75f753 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -37,6 +37,7 @@
 #include "lib/dhcp.h"
 #include "ovn-controller.h"
 #include "ovn/actions.h"
+#include "ovn/lex.h"
 #include "ovn/lib/logical-fields.h"
 #include "ovn/lib/ovn-dhcp.h"
 #include "ovn/lib/ovn-util.h"
@@ -1047,7 +1048,8 @@ send_garp_update(const struct sbrec_port_binding 
*binding_rec,
 
 volatile struct garp_data *garp = NULL;
 /* Update GARP for NAT IP if it exists. */
-if (!strcmp(binding_rec->type, "l3gateway")) {
+if (!strcmp(binding_rec->type, "l3gateway")
+|| !strcmp(binding_rec->type, "patch")) {
 struct lport_addresses *laddrs = NULL;
 laddrs = shash_find_data(nat_addresses, binding_rec->logical_port);
 if (!laddrs) {
@@ -1200,24 +1202,101 @@ get_localnet_vifs_l3gwports(const struct ovsrec_bridge 
*br_int,
 
 const struct local_datapath *ld;
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
-if (!ld->has_local_l3gateway) {
+if (!ld->localnet_port) {
 continue;
 }
 
 for (size_t i = 0; i < ld->ldatapath->n_lports; i++) {
 const struct sbrec_port_binding *pb = ld->ldatapath->lports[i];
-if (!strcmp(pb->type, "l3gateway")
-/* && it's on this chassis */) {
+if ((ld->has_local_l3gateway && !strcmp(pb->type, "l3gateway"))
+|| !strcmp(pb->type, "patch")) {
 sset_add(local_l3gw_ports, pb->logical_port);
 }
 }
 }
 }
 
+static bool
+pinctrl_is_chassis_resident(const struct lport_index *lports,
+const struct sbrec_chassis *chassis,
+const char *port_name)
+{
+const struct sbrec_port_binding *pb
+= lport_lookup_by_name(lports, port_name);
+return pb && pb->chassis && pb->chassis == chassis;
+}
+
+/* Extracts the mac, IPv4 and IPv6 addresses, and logical port from
+ * 'addresses' which should be of the format 'MAC [IP1 IP2 ..]
+ * [is_chassis_resident("LPORT_NAME")]', where IPn should be a valid IPv4
+ * or IPv6 address, and stores them in the 'ipv4_addrs' and 'ipv6_addrs'
+ * fields of 'laddrs'.  The logical port name is stored in 'lport'.
+ *
+ * Returns true if at least 'MAC' is found in 'address', false otherwise.
+ *
+ * The caller must call destroy_lport_addresses() and free(lport). */
+static bool
+extract_addresses_with_port(const char *addresses,
+struct lport_addresses *laddrs,
+char **lport)
+{
+int ofs;
+if (!extract_addresses(addresses, laddrs, )) {
+return false;
+} else if (ofs >= strlen(addresses)) {
+return true;
+}
+
+struct lexer lexer;
+lexer_init(, addresses + ofs);
+lexer_get();
+
+if (lexer.error || lexer.token.type != LEX_T_ID
+|| !lexer_match_id(, "is_chassis_resident")) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+VLOG_INFO_RL(, "invalid syntax '%s' in address", addresses);
+  

[ovs-dev] [PATCH v2 1/2] ovn: specify options:nat-addresses as "router"

2017-02-01 Thread Mickey Spiegel
Currently in OVN, the "nat-addresses" in the "options" column of a
logical switch port of type "router" must be specified manually.
Typically the user would specify as "nat-addresses" all of the NAT
external IP addresses and load balancer IP addresses that have
already been specified separately on the router.

This patch allows the logical switch port's "nat-addresses" to be
specified as the string "router".  When ovn-northd sees this string,
it automatically copies the following into the southbound
Port_Binding's "nat-addresses" in the "options" column:
The options:router-port's MAC address.
Each NAT external IP address (of any NAT type) specified on the
logical router of options:router-port.
Each load balancer IP address specified on the logical router of
options:router-port.
This will cause the controller where the gateway router resides to
issue gratuitous ARPs for each NAT external IP address and for each
load balancer IP address specified on the gateway router.

This patch is written as if it will be included in OVS 2.7.  If it
is deferred to OVS 2.8, then the OVS version mentioned in ovn-nb.xml
will need to be updated from OVS 2.7 to OVS 2.8.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/northd/ovn-northd.c | 116 ++--
 ovn/ovn-nb.xml  |  42 +++---
 tests/ovn.at|  60 +
 3 files changed, 187 insertions(+), 31 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index a4f76a9..79ebac4 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -1436,6 +1436,88 @@ join_logical_ports(struct northd_context *ctx,
 }
 
 static void
+ip_address_and_port_from_lb_key(const char *key, char **ip_address,
+uint16_t *port);
+
+static void
+get_router_load_balancer_ips(const struct ovn_datapath *od,
+ struct sset *all_ips)
+{
+if (!od->nbr) {
+return;
+}
+
+for (int i = 0; i < od->nbr->n_load_balancer; i++) {
+struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
+struct smap *vips = >vips;
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, vips) {
+/* node->key contains IP:port or just IP. */
+char *ip_address = NULL;
+uint16_t port;
+
+ip_address_and_port_from_lb_key(node->key, _address, );
+if (!ip_address) {
+continue;
+}
+
+if (!sset_contains(all_ips, ip_address)) {
+sset_add(all_ips, ip_address);
+}
+
+free(ip_address);
+}
+}
+}
+
+/* Returns a string consisting of the port's MAC address followed by the
+ * external IP addresses of all NAT rules defined on that router and the
+ * VIPs of all load balancers defined on that router.
+ *
+ * The caller must free the returned string with free(). */
+static char *
+get_nat_addresses(const struct ovn_port *op)
+{
+struct eth_addr mac;
+if (!op->nbrp || !op->od || !op->od->nbr
+|| (!op->od->nbr->n_nat && !op->od->nbr->n_load_balancer)
+|| !eth_addr_from_string(op->nbrp->mac, )) {
+return NULL;
+}
+
+struct ds addresses = DS_EMPTY_INITIALIZER;
+ds_put_format(, ETH_ADDR_FMT, ETH_ADDR_ARGS(mac));
+
+/* Get NAT IP addresses. */
+for (int i = 0; i < op->od->nbr->n_nat; i++) {
+const struct nbrec_nat *nat;
+nat = op->od->nbr->nat[i];
+
+ovs_be32 ip, mask;
+
+char *error = ip_parse_masked(nat->external_ip, , );
+if (error || mask != OVS_BE32_MAX) {
+free(error);
+continue;
+}
+ds_put_format(, " %s", nat->external_ip);
+}
+
+/* A set to hold all load-balancer vips. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+get_router_load_balancer_ips(op->od, _ips);
+
+const char *ip_address;
+SSET_FOR_EACH(ip_address, _ips) {
+ds_put_format(, " %s", ip_address);
+}
+sset_destroy(_ips);
+
+return ds_steal_cstr();
+}
+
+static void
 ovn_port_update_sbrec(const struct ovn_port *op,
   struct hmap *chassis_qdisc_queues)
 {
@@ -1524,7 +1606,15 @@ ovn_port_update_sbrec(const struct ovn_port *op,
 
 const char *nat_addresses = smap_get(>nbsp->options,
"nat-addresses");
-if (nat_addresses) {
+if (nat_addresses && !strcmp(nat_addresses, "router")) {
+if (op->peer && op->peer->nbrp) {
+char *nats = get_nat_addresses(op->

Re: [ovs-dev] [PATCH v12 6/6] ovn: specify options:nat-addresses as "router"

2017-01-27 Thread Mickey Spiegel
On Fri, Jan 27, 2017 at 11:16 AM, Guru Shetty  wrote:

>
>>
>> I should clarify that statement. It is a good thing if the chassis
>> changes, for example if doing simple high availability. The GARP
>> packet will fix L2 learning.
>>
>> As I think about it, if anyone uses logical routers without NAT
>> or load balancers, and the chassis changes, then GARP of the
>> router IP would still be useful. I guess a user could always
>> specify the router IP and MAC in "nat-addresses" to make that
>> happen. I am not sure if anyone is deploying OVN that way.
>>
>> Mickey
>>
>
> I applied the first 5 patches of the series to master. I couldn't apply it
> to 2.7 because of some conflicts. Would you mind reposting it just for 2.7
> with "branch-2.7" in the subject . For e.g: [PATCH branch-2.7]
>

Some previous patches made it into master but not yet
branch-2.7:

https://github.com/openvswitch/ovs/commit/ba8d3816e88f7a702635c09111f58352ecad6506
https://github.com/openvswitch/ovs/commit/41a15b71ed1ef35aa612a1128082219fbfc3f327

Next are a bunch of blp's patches that need to go in
before these 5 patches:

https://github.com/openvswitch/ovs/commit/80b6743d0ab3a39884fe873dd616cb49b6f55fab
https://github.com/openvswitch/ovs/commit/b3bd2c33e83e2039d75e830368a64d596f820aaa
https://github.com/openvswitch/ovs/commit/ebb467ff1c255813d6ba27d91ef6180e9a20fe0a
https://github.com/openvswitch/ovs/commit/8f5de08322673f4e60f44d599fa7ee4de65bc078
https://github.com/openvswitch/ovs/commit/c571f48c36223de360ba0fa4d89104a7da14dbca
https://github.com/openvswitch/ovs/commit/4c99cb181b6937efb3819cffc9765999fd7b7796
https://github.com/openvswitch/ovs/commit/db0e819be065c1474ceef232dcc1260c9a2e7c0e

blp said that he could help with the backport:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327884.html

Mickey

>
>
>>
>>
>>> Mickey
>>>
>>>

 Acked-by: Gurucharan Shetty 

>>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v12 6/6] ovn: specify options:nat-addresses as "router"

2017-01-27 Thread Mickey Spiegel
On Fri, Jan 27, 2017 at 10:29 AM, Mickey Spiegel <mickeys@gmail.com>
wrote:

> Thanks for the review.
>
> On Fri, Jan 27, 2017 at 10:20 AM, Guru Shetty <g...@ovn.org> wrote:
>
>>
>>
>> On 26 January 2017 at 01:20, Mickey Spiegel <mickeys@gmail.com>
>> wrote:
>>
>>> Currently in OVN, the "nat-addresses" in the "options" column of a
>>> logical switch port of type "router" must be specified manually.
>>> Typically the user would specify as "nat-addresses" all of the NAT
>>> external IP addresses and load balancer IP addresses that have
>>> already been specified separately on the router.
>>>
>>> This patch allows the logical switch port's "nat-addresses" to be
>>> specified as the string "router".  When ovn-northd sees this string,
>>> it automatically copies the following into the southbound
>>> Port_Binding's "nat-addresses" in the "options" column:
>>> The options:router-port's MAC address.
>>> Each NAT external IP address (of any NAT type) specified on the
>>> logical router of options:router-port.
>>> Each load balancer IP address specified on the logical router of
>>> options:router-port.
>>> This will cause the controller where the gateway router resides to
>>> issue gratuitous ARPs for each NAT external IP address and for each
>>> load balancer IP address specified on the gateway router.
>>>
>>> This patch is written as if it will be included in OVS 2.7.  If it
>>> is deferred to OVS 2.8, then the OVS version mentioned in ovn-nb.xml
>>> will need to be updated from OVS 2.7 to OVS 2.8.
>>>
>>> Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
>>> ---
>>>  ovn/northd/ovn-northd.c | 114 ++
>>> --
>>>  ovn/ovn-nb.xml  |  42 +++---
>>>  tests/ovn.at|  60 +
>>>  3 files changed, 185 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>>> index e24ff8f..efb8a74 100644
>>> --- a/ovn/northd/ovn-northd.c
>>> +++ b/ovn/northd/ovn-northd.c
>>> @@ -1436,6 +1436,86 @@ join_logical_ports(struct northd_context *ctx,
>>>  }
>>>
>>>  static void
>>> +ip_address_and_port_from_lb_key(const char *key, char **ip_address,
>>> +uint16_t *port);
>>> +
>>> +static void
>>> +get_router_load_balancer_ips(const struct ovn_datapath *od,
>>> + struct sset *all_ips)
>>> +{
>>> +if (!od->nbr) {
>>> +return;
>>> +}
>>> +
>>> +for (int i = 0; i < od->nbr->n_load_balancer; i++) {
>>> +struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
>>> +struct smap *vips = >vips;
>>> +struct smap_node *node;
>>> +
>>> +SMAP_FOR_EACH (node, vips) {
>>> +/* node->key contains IP:port or just IP. */
>>> +char *ip_address = NULL;
>>> +uint16_t port;
>>> +
>>> +ip_address_and_port_from_lb_key(node->key, _address,
>>> );
>>> +if (!ip_address) {
>>> +continue;
>>> +}
>>> +
>>> +if (!sset_contains(all_ips, ip_address)) {
>>> +sset_add(all_ips, ip_address);
>>> +}
>>> +
>>> +free(ip_address);
>>> +}
>>> +}
>>> +}
>>> +
>>> +/* Returns a string consisting of the port's MAC address followed by the
>>> + * external IP addresses of all NAT rules defined on that router and the
>>> + * VIPs of all load balancers defined on that router. */
>>>
>> A comment that the called has to free the returned string is useful.
>>
>
> I can add that.
>
>
>>
>> Also, the load balancer IP address can also be the IP address of the
>> router itself. For e.g: when it is ROUTER_IP:port. I guess, that is not a
>> problem when a GARP is sent for the router IP too.
>>
>
> If the router IP is used as a load balancer IP address or as a SNAT
> address, then sending GARP for the router IP is a good thing.
>

I should clarify that statement. It is a good thing if the chassis
changes, for example if doing simple high availability. The GARP
packet will fix L2 learning.

As I think about it, if anyone uses logical routers without NAT
or load balancers, and the chassis changes, then GARP of the
router IP would still be useful. I guess a user could always
specify the router IP and MAC in "nat-addresses" to make that
happen. I am not sure if anyone is deploying OVN that way.

Mickey


> Mickey
>
>
>>
>> Acked-by: Gurucharan Shetty <g...@ovn.org>
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v14 6/6] ovn: specify options:nat-addresses as "router"

2017-01-26 Thread Mickey Spiegel
Currently in OVN, the "nat-addresses" in the "options" column of a
logical switch port of type "router" must be specified manually.
Typically the user would specify as "nat-addresses" all of the NAT
external IP addresses and load balancer IP addresses that have
already been specified separately on the router.

This patch allows the logical switch port's "nat-addresses" to be
specified as the string "router".  When ovn-northd sees this string,
it automatically copies the following into the southbound
Port_Binding's "nat-addresses" in the "options" column:
The options:router-port's MAC address.
Each NAT external IP address (of any NAT type) specified on the
logical router of options:router-port.
Each load balancer IP address specified on the logical router of
options:router-port.
This will cause the controller where the gateway router resides to
issue gratuitous ARPs for each NAT external IP address and for each
load balancer IP address specified on the gateway router.

This patch is written as if it will be included in OVS 2.7.  If it
is deferred to OVS 2.8, then the OVS version mentioned in ovn-nb.xml
will need to be updated from OVS 2.7 to OVS 2.8.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/northd/ovn-northd.c | 114 ++--
 ovn/ovn-nb.xml  |  42 +++---
 tests/ovn.at|  60 +
 3 files changed, 185 insertions(+), 31 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index a4f76a9..8031148 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -1436,6 +1436,86 @@ join_logical_ports(struct northd_context *ctx,
 }
 
 static void
+ip_address_and_port_from_lb_key(const char *key, char **ip_address,
+uint16_t *port);
+
+static void
+get_router_load_balancer_ips(const struct ovn_datapath *od,
+ struct sset *all_ips)
+{
+if (!od->nbr) {
+return;
+}
+
+for (int i = 0; i < od->nbr->n_load_balancer; i++) {
+struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
+struct smap *vips = >vips;
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, vips) {
+/* node->key contains IP:port or just IP. */
+char *ip_address = NULL;
+uint16_t port;
+
+ip_address_and_port_from_lb_key(node->key, _address, );
+if (!ip_address) {
+continue;
+}
+
+if (!sset_contains(all_ips, ip_address)) {
+sset_add(all_ips, ip_address);
+}
+
+free(ip_address);
+}
+}
+}
+
+/* Returns a string consisting of the port's MAC address followed by the
+ * external IP addresses of all NAT rules defined on that router and the
+ * VIPs of all load balancers defined on that router. */
+static char *
+get_nat_addresses(const struct ovn_port *op)
+{
+struct eth_addr mac;
+if (!op->nbrp || !op->od || !op->od->nbr
+|| (!op->od->nbr->n_nat && !op->od->nbr->n_load_balancer)
+|| !eth_addr_from_string(op->nbrp->mac, )) {
+return NULL;
+}
+
+struct ds addresses = DS_EMPTY_INITIALIZER;
+ds_put_format(, ETH_ADDR_FMT, ETH_ADDR_ARGS(mac));
+
+/* Get NAT IP addresses. */
+for (int i = 0; i < op->od->nbr->n_nat; i++) {
+const struct nbrec_nat *nat;
+nat = op->od->nbr->nat[i];
+
+ovs_be32 ip, mask;
+
+char *error = ip_parse_masked(nat->external_ip, , );
+if (error || mask != OVS_BE32_MAX) {
+free(error);
+continue;
+}
+ds_put_format(, " %s", nat->external_ip);
+}
+
+/* A set to hold all load-balancer vips. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+get_router_load_balancer_ips(op->od, _ips);
+
+const char *ip_address;
+SSET_FOR_EACH(ip_address, _ips) {
+ds_put_format(, " %s", ip_address);
+}
+sset_destroy(_ips);
+
+return ds_steal_cstr();
+}
+
+static void
 ovn_port_update_sbrec(const struct ovn_port *op,
   struct hmap *chassis_qdisc_queues)
 {
@@ -1524,7 +1604,15 @@ ovn_port_update_sbrec(const struct ovn_port *op,
 
 const char *nat_addresses = smap_get(>nbsp->options,
"nat-addresses");
-if (nat_addresses) {
+if (nat_addresses && !strcmp(nat_addresses, "router")) {
+if (op->peer && op->peer->nbrp) {
+char *nats = get_nat_addresses(op->peer);
+if (nats) {
+smap_add(, "nat-addresses", nats);
+fre

[ovs-dev] [PATCH v14 3/6] ovn: distributed NAT flows

2017-01-26 Thread Mickey Spiegel
This patch implements the flows required in the ingress and egress
pipeline stages in order to support NAT on a distributed logical router.

NAT functionality is associated with the logical router gateway port.
The flows that carry out NAT functionality all have match conditions on
inport or outport equal to the logical router gateway port.  There are
additional flows that are used to redirect traffic when necessary,
using the tunnel key of a "chassisredirect" SB port binding in order to
redirect traffic to the instance of the logical router gateway port on
the centralized "redirect-chassis".

North/south traffic subject to one-to-one "dnat_and_snat" is handled
in a distributed manner, with south-to-north traffic going to the
local instance of the logical router gateway port.  North/south
traffic subject to (possibly one-to-many) "snat" is handled in a
centralized manner, with south-to-north traffic going to the instance
of the logical router gateway port on the "redirect-chassis".
North-to-south traffic is directed to the corresponding chassis by
limiting ARP responses to the appropriate instance of the logical
router gateway port on one chassis.  For centralized NAT rules, this
is the instance on the "redirect-chassis".  For distributed NAT rules,
this is the chassis where the corresponding logical port resides, using
an ethernet address specified in the NB NAT rule to trigger upstream
MAC learning.

East/west NAT traffic is all handled in a centralized manner.  While it
is certainly possible to handle some of this traffic in a distributed
manner, the centralized approach keeps the NAT flows simpler and
cleaner.  The expectation is that east/west NAT traffic is not as
important to optimize as north/south NAT traffic, with most east/west
traffic not requiring NAT.

Automated tests are currently limited to only a single node.  The
single node automated tests cover both north/south and east/west
traffic flows.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/controller/ovn-controller.c |   6 +-
 ovn/northd/ovn-northd.8.xml | 400 +++--
 ovn/northd/ovn-northd.c | 426 +++-
 ovn/ovn-architecture.7.xml  |   7 +-
 ovn/ovn-nb.ovsschema|   6 +-
 ovn/ovn-nb.xml  |  56 +-
 tests/system-ovn.at | 338 +++
 7 files changed, 1159 insertions(+), 80 deletions(-)

diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 7cef3f8..ea299da 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -323,10 +323,8 @@ update_ct_zones(struct sset *lports, const struct hmap 
*local_datapaths,
 /* Local patched datapath (gateway routers) need zones assigned. */
 const struct local_datapath *ld;
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
-if (!ld->has_local_l3gateway) {
-continue;
-}
-
+/* XXX Add method to limit zone assignment to logical router
+ * datapaths with NAT */
 char *dnat = alloc_nat_zone_key(>datapath->header_.uuid, "dnat");
 char *snat = alloc_nat_zone_key(>datapath->header_.uuid, "snat");
 sset_add(_users, dnat);
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 49e4291..ab8fd88 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -752,9 +752,25 @@ output;
column is set to router and
   the connected logical router port specifies a
-  redirect-chassis, the flow is only programmed on the
-  redirect-chassis.
+  redirect-chassis:
 
+
+
+  
+The flow for the connected logical router port's Ethernet
+address is only programmed on the redirect-chassis.
+  
+
+  
+If the logical router has rules specified in
+ with
+, then
+those addresses are also used to populate the switch's destination
+lookup on the chassis where
+ is
+resident.
+  
+
   
 
   
@@ -890,6 +906,23 @@ output;
   redirect-chassis.
 
   
+
+  
+
+  For each dnat_and_snat NAT rule on a distributed
+  router that specifies an external Ethernet address E,
+  a priority-50 flow that matches inport == GW
+   eth.dst == E, where GW
+  is the logical router gateway port, with action
+  next;.
+
+
+
+  This flow is only programmed on the gateway port instance on
+  the chassis where the logical_port specified in
+  the NAT rule resides.
+
+  
 
 
 
@@ -928,7 +961,9 @@ output;
   
   
   

[ovs-dev] [PATCH v14 5/6] ovn: rewrite redirect-chassis description in ovn-nb.xml

2017-01-26 Thread Mickey Spiegel
This optional patch addresses offline comments that the documentation
in ovn-nb.xml should not describe southbound constructs or flow
details, since it is user facing documentation.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/ovn-nb.xml | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 20797a6..c5ebbea 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -,31 +,31 @@
   
 
   If set, this indicates that this logical router port represents
-  a distributed gateway port.  In addition to the southbound
-  database port representing this distributed gateway port, another
-  port will be created in the southbound database that represents a
-  particular instance, bound to a specific chassis, of this
-  otherwise distributed logical router port.  This additional port
-  can then be specified as an outport in some of the
-  ingress pipeline flows.  This will cause matching packets to be
-  directed to a specific chassis to carry out the egress pipeline,
-  allowing a subset of logical router functionality to be
-  implemented in a centralized manner.  At the beginning of the
-  egress pipeline, the outport will be reset to the
-  value of the distributed port.
+  a distributed gateway port that connects this router to a logical
+  switch with a localnet port.  There may be at most one such
+  logical router port on each logical router.
 
 
 
-  This option specifies the name of the chassis to which
-  the additional southbound port binding of type
-  chassisredirect will be bound.
+  Even when a redirect-chassis is specified, the
+  logical router port still effectively resides on each chassis.
+  However, due to the implications of the use of L2 learning in the
+  physical network, as well as the need to support advanced features
+  such as one-to-many NAT (aka IP masquerading), a subset of the
+  logical router processing is handled in a centralized manner on
+  the specified redirect-chassis.
 
 
 
   When this option is specified, the peer logical switch port's
-   should be
-  set to router, so that the corresponding logical
-  switch destination lookup flow is only programmed on the
+   must be
+  set to router.  With this setting, the s specified in NAT rules are
+  automatically programmed in the peer logical switch's
+  destination lookup on the chassis where the  resides.  In addition, the
+  logical router's MAC address is automatically programmed in the
+  peer logical switch's destination lookup flow on the
   redirect-chassis.
 
   
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v14 4/6] ovn: ovn-nbctl commands for distributed NAT

2017-01-26 Thread Mickey Spiegel
This patch adds the new optional arguments "logical_port" and
"external_mac" to lr-nat-add, and displays that information in
lr-nat-list.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/utilities/ovn-nbctl.8.xml | 27 +++---
 ovn/utilities/ovn-nbctl.c | 54 +--
 tests/ovn-nbctl.at| 47 +
 tests/system-ovn.at   | 30 +---
 4 files changed, 119 insertions(+), 39 deletions(-)

diff --git a/ovn/utilities/ovn-nbctl.8.xml b/ovn/utilities/ovn-nbctl.8.xml
index f95b88d..d81e99f 100644
--- a/ovn/utilities/ovn-nbctl.8.xml
+++ b/ovn/utilities/ovn-nbctl.8.xml
@@ -444,7 +444,7 @@
 NAT Commands
 
 
-  [--may-exist] lr-nat-add router 
type external_ip logical_ip
+  [--may-exist] lr-nat-add router 
type external_ip logical_ip 
[logical_port external_mac]
   
 
   Adds the specified NAT to router.
@@ -453,6 +453,13 @@
   The external_ip is an IPv4 address.
   The logical_ip is an IPv4 network (e.g 192.168.1.0/24)
   or an IPv4 address.
+  The logical_port and external_mac are only
+  accepted when router is a distributed router (rather
+  than a gateway router) and type is
+  dnat_and_snat.
+  The logical_port is the name of an existing logical
+  switch port where the logical_ip resides.
+  The external_mac is an Ethernet address.
 
 
   When type is dnat, the externally
@@ -475,8 +482,22 @@
   the IP address in external_ip.
 
 
-  It is an error if a NAT already exists,
-  unless --may-exist is specified.
+  When the logical_port and external_mac
+  are specified, the NAT rule will be programmed on the chassis
+  where the logical_port resides.  This includes
+  ARP replies for the external_ip, which return the
+  value of external_mac.  All packets transmitted
+  with source IP address equal to external_ip will
+  be sent using the external_mac.
+
+
+  It is an error if a NAT already exists with the same values
+  of router, type, external_ip,
+  and logical_ip, unless --may-exist is
+  specified.  When --may-exist,
+  logical_port, and external_mac are all
+  specified, the existing values of logical_port and
+  external_mac are overwritten.
 
   
 
diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index f0ff27a..3dac434 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -390,7 +390,7 @@ Route commands:\n\
   lr-route-list ROUTER  print routes for ROUTER\n\
 \n\
 NAT commands:\n\
-  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP\n\
+  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC]\n\
 add a NAT to ROUTER\n\
   lr-nat-del ROUTER [TYPE [IP]]\n\
 remove NATs from ROUTER\n\
@@ -2239,6 +2239,30 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 new_logical_ip = normalize_ipv4_prefix(ipv4, plen);
 }
 
+const char *logical_port;
+const char *external_mac;
+if (ctx->argc == 6) {
+ctl_fatal("lr-nat-add with logical_port "
+  "must also specify external_mac.");
+} else if (ctx->argc == 7) {
+if (strcmp(nat_type, "dnat_and_snat")) {
+ctl_fatal("logical_port and external_mac are only valid when "
+  "type is \"dnat_and_snat\".");
+}
+
+logical_port = ctx->argv[5];
+lsp_by_name_or_uuid(ctx, logical_port, true);
+
+external_mac = ctx->argv[6];
+struct eth_addr ea;
+if (!eth_addr_from_string(external_mac, )) {
+ctl_fatal("invalid mac address %s.", external_mac);
+}
+} else {
+logical_port = NULL;
+external_mac = NULL;
+}
+
 bool may_exist = shash_find(>options, "--may-exist") != NULL;
 int is_snat = !strcmp("snat", nat_type);
 for (size_t i = 0; i < lr->n_nat; i++) {
@@ -2249,6 +2273,10 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 if (!strcmp(is_snat ? external_ip : new_logical_ip,
 is_snat ? nat->external_ip : nat->logical_ip)) {
 if (may_exist) {
+nbrec_nat_verify_logical_port(nat);
+nbrec_nat_verify_external_mac(nat);
+nbrec_nat_set_logical_port(nat, logical_port);
+nbrec_nat_set_external_mac(nat, external_mac);
 free(new_logical_ip);
 

[ovs-dev] [PATCH v14 1/6] ovn: move load balancing flows after NAT flows

2017-01-26 Thread Mickey Spiegel
This will make it easy for distributed NAT to reuse some of the
existing code for NAT flows, while leaving load balancing and defrag
as functionality specific to gateway routers.  There is no intent to
change any functionality in this patch.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/northd/ovn-northd.c | 140 
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 3b05470..5c03b04 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4128,76 +4128,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 const char *lb_force_snat_ip = get_force_snat_ip(od, "lb",
  _ip);
 
-/* A set to hold all ips that need defragmentation and tracking. */
-struct sset all_ips = SSET_INITIALIZER(_ips);
-
-for (int i = 0; i < od->nbr->n_load_balancer; i++) {
-struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
-struct smap *vips = >vips;
-struct smap_node *node;
-
-SMAP_FOR_EACH (node, vips) {
-uint16_t port = 0;
-
-/* node->key contains IP:port or just IP. */
-char *ip_address = NULL;
-ip_address_and_port_from_lb_key(node->key, _address, );
-if (!ip_address) {
-continue;
-}
-
-if (!sset_contains(_ips, ip_address)) {
-sset_add(_ips, ip_address);
-}
-
-/* Higher priority rules are added for load-balancing in DNAT
- * table.  For every match (on a VIP[:port]), we add two flows
- * via add_router_lb_flow().  One flow is for specific matching
- * on ct.new with an action of "ct_lb($targets);".  The other
- * flow is for ct.est with an action of "ct_dnat;". */
-ds_clear();
-ds_put_format(, "ct_lb(%s);", node->value);
-
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s",
-  ip_address);
-free(ip_address);
-
-if (port) {
-if (lb->protocol && !strcmp(lb->protocol, "udp")) {
-ds_put_format(, " && udp && udp.dst == %d",
-  port);
-} else {
-ds_put_format(, " && tcp && tcp.dst == %d",
-  port);
-}
-add_router_lb_flow(lflows, od, , , 120,
-   lb_force_snat_ip);
-} else {
-add_router_lb_flow(lflows, od, , , 110,
-   lb_force_snat_ip);
-}
-}
-}
-
-/* If there are any load balancing rules, we should send the
- * packet to conntrack for defragmentation and tracking.  This helps
- * with two things.
- *
- * 1. With tracking, we can send only new connections to pick a
- *DNAT ip address from a group.
- * 2. If there are L4 ports in load balancing rules, we need the
- *defragmentation to match on L4 ports. */
-const char *ip_address;
-SSET_FOR_EACH(ip_address, _ips) {
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s", ip_address);
-ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG,
-  100, ds_cstr(), "ct_next;");
-}
-
-sset_destroy(_ips);
-
 for (int i = 0; i < od->nbr->n_nat; i++) {
 const struct nbrec_nat *nat;
 
@@ -4352,6 +4282,76 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 * routing in the openflow pipeline. */
 ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
   "ip", "flags.loopback = 1; ct_dnat;");
+
+/* A set to hold all ips that need defragmentation and tracking. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+
+for (int i = 0; i < od->nbr->n_load_balancer; i++) {
+struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
+struct smap *vips = >vips;
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, vips) {
+uint16_t port = 0;
+
+/* node->key contains IP:port or just IP. */
+char *ip_address = NULL;
+ip_address_and_port_from_lb_key(node->key, _address, );
+ 

[ovs-dev] [PATCH v14 0/6] ovn: add distributed NAT capability

2017-01-26 Thread Mickey Spiegel
set will follow
once a decision is made on v8 patch 4 versus v9 patch 3.
The peer address patch was accepted, so it is no longer included in the
patch set.
The suggested new port type of "chassisredirect" has been dropped and
replaced by MLF_FORCE_CHASSIS_REDIRECT flag.

PATCH v7 -> PATCH v8
Incorporated incremental changes to is_chassis_resident() from blp.
Added patch that describes logical routers and logical patches in
ovn-architecture.
Renamed chassisredirect patch to emphasize distributed gateway ports
as well.
Added description of distributed gateway ports to ovn-architecture,
in distributed gateway port / chassisredirect patch.
Rewrote commit message for distributed gateway port / chassisredirect.

PATCH v6 -> PATCH v7
Rebase.
Documentation improvements to lsp addresses "router" patch as
suggested by blp.  Also added to ovn-nbctl documentation.

PATCH v5 -> PATCH v6
Added patch to automatically add router addresses to the addresses of
type "router" lsps.
Restricted logical switch destination lookup flows for logical router
distributed gateway port's MAC to the redirect chassis.
Automatically add distributed NAT MAC addresses to logical switch
destination lookup flows on the chassis where the NAT logical port resides.
Added tests for reachability from VIFs on the same logical switch as
localnet, through the logical router's distributed gateway port, to
internal VIFs.

PATCH v4 -> PATCH v5
Limited router ingress table 0 flow matching router ethernet address
on distributed gateway to redirect chassis.
Limited router ingress table 0 flows matching NAT ethernet address to
chassis where the NAT rule's logical port resides.
Rolled back changes to ICMP since they are not necessary.

PATCH v3 -> PATCH v4
Rebase

PATCH v2 -> PATCH v3
Added table to set egress loopback flag in the egress pipeline stage,
fixing east-west NAT across multiple chassis.

PATCH v1 -> PATCH v2
Added ovn-trace logic for chassisredirect ports, including automated test.
Added ovn-trace logic for egress loopback.
Fixed some bugs in ovn-trace register handling from ingress to egress,
and across patch ports (should these be filed separately as well?).

RFC v4 -> PATCH v1
Added egress loopback capability
Added east/west NAT tests to system-ovn.at (make check-kernel)
Added REGBIT_NAT_REDIRECT flows to IN_IP_ROUTING and IN_ARP_RESOLVE,
resolving remaining issues with east/west NAT

RFC v3 -> RFC v4
Rebased to pick up recent changes to ovn-controller, including a fix
to the localnet issue where VIFs had to be added on a chassis in order
to cause the localnet port to be instantiated.
The chassisredirect port logic was rewritten to avoid creating an
ofport.  Besides streamlining the code significantly, this fixed the
problem when the distributed port name was longer than 12 characters.
Restricted IPv6 ND replies for the router IP address to the redirect
chassis, similar to IPv4 ARP restrictions.
Added specific gateway redirect flows for unresolved ethernet
destination, so that ARP requests generated by the router are sent
through the redirect chassis regardless of NAT rules.
Relaxed checks in chassisredirect tests so that they are independent
of register assignments.
Renamed ovn-northd.c "l3gateway_port" to "l3dgw_port" in order to
avoid overlaps with gateway router terminology.

RFC v2 -> RFC v3
Reordered the first two patches.
Moved non-NAT specific flows from patch 5 to patch 2.
Added automated tests for is_chassis_resident (which is ready for
review) and chassisredirect patches.
Added flows to limit ICMP echo replies for router IPs on the gateway
interface, so that they are only generated on the redirect-chassis.

Mickey Spiegel (6):
  ovn: move load balancing flows after NAT flows
  ovn: avoid snat recirc only on gateway routers
  ovn: distributed NAT flows
  ovn: ovn-nbctl commands for distributed NAT
  ovn: rewrite redirect-chassis description in ovn-nb.xml
  ovn: specify options:nat-addresses as "router"

 include/ovn/actions.h   |   3 +
 ovn/controller/lflow.c  |  10 +
 ovn/controller/ovn-controller.c |   6 +-
 ovn/lib/actions.c   |  15 +-
 ovn/northd/ovn-northd.8.xml | 400 ++-
 ovn/northd/ovn-northd.c | 680 +++-
 ovn/ovn-architecture.7.xml  |   7 +-
 ovn/ovn-nb.ovsschema|   6 +-
 ovn/ovn-nb.xml  | 134 ++--
 ovn/ovn-sb.xml  |  23 +-
 ovn/utilities/ovn-nbctl.8.xml   |  27 +-
 ovn/utilities/ovn-nbctl.c   |  54 +++-
 tests/ovn-nbctl.at  |  47 ++-
 tests/ovn.at|  62 +++-
 tests/system-ovn.at | 320 +++
 15 files changed, 1569 insertions(+), 225 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v13 6/6] ovn: specify options:nat-addresses as "router"

2017-01-26 Thread Mickey Spiegel
Currently in OVN, the "nat-addresses" in the "options" column of a
logical switch port of type "router" must be specified manually.
Typically the user would specify as "nat-addresses" all of the NAT
external IP addresses and load balancer IP addresses that have
already been specified separately on the router.

This patch allows the logical switch port's "nat-addresses" to be
specified as the string "router".  When ovn-northd sees this string,
it automatically copies the following into the southbound
Port_Binding's "nat-addresses" in the "options" column:
The options:router-port's MAC address.
Each NAT external IP address (of any NAT type) specified on the
logical router of options:router-port.
Each load balancer IP address specified on the logical router of
options:router-port.
This will cause the controller where the gateway router resides to
issue gratuitous ARPs for each NAT external IP address and for each
load balancer IP address specified on the gateway router.

This patch is written as if it will be included in OVS 2.7.  If it
is deferred to OVS 2.8, then the OVS version mentioned in ovn-nb.xml
will need to be updated from OVS 2.7 to OVS 2.8.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/northd/ovn-northd.c | 114 ++--
 ovn/ovn-nb.xml  |  42 +++---
 tests/ovn.at|  60 +
 3 files changed, 185 insertions(+), 31 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 8e5a8ce..5b0a235 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -1436,6 +1436,86 @@ join_logical_ports(struct northd_context *ctx,
 }
 
 static void
+ip_address_and_port_from_lb_key(const char *key, char **ip_address,
+uint16_t *port);
+
+static void
+get_router_load_balancer_ips(const struct ovn_datapath *od,
+ struct sset *all_ips)
+{
+if (!od->nbr) {
+return;
+}
+
+for (int i = 0; i < od->nbr->n_load_balancer; i++) {
+struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
+struct smap *vips = >vips;
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, vips) {
+/* node->key contains IP:port or just IP. */
+char *ip_address = NULL;
+uint16_t port;
+
+ip_address_and_port_from_lb_key(node->key, _address, );
+if (!ip_address) {
+continue;
+}
+
+if (!sset_contains(all_ips, ip_address)) {
+sset_add(all_ips, ip_address);
+}
+
+free(ip_address);
+}
+}
+}
+
+/* Returns a string consisting of the port's MAC address followed by the
+ * external IP addresses of all NAT rules defined on that router and the
+ * VIPs of all load balancers defined on that router. */
+static char *
+get_nat_addresses(const struct ovn_port *op)
+{
+struct eth_addr mac;
+if (!op->nbrp || !op->od || !op->od->nbr
+|| (!op->od->nbr->n_nat && !op->od->nbr->n_load_balancer)
+|| !eth_addr_from_string(op->nbrp->mac, )) {
+return NULL;
+}
+
+struct ds addresses = DS_EMPTY_INITIALIZER;
+ds_put_format(, ETH_ADDR_FMT, ETH_ADDR_ARGS(mac));
+
+/* Get NAT IP addresses. */
+for (int i = 0; i < op->od->nbr->n_nat; i++) {
+const struct nbrec_nat *nat;
+nat = op->od->nbr->nat[i];
+
+ovs_be32 ip, mask;
+
+char *error = ip_parse_masked(nat->external_ip, , );
+if (error || mask != OVS_BE32_MAX) {
+free(error);
+continue;
+}
+ds_put_format(, " %s", nat->external_ip);
+}
+
+/* A set to hold all load-balancer vips. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+get_router_load_balancer_ips(op->od, _ips);
+
+const char *ip_address;
+SSET_FOR_EACH(ip_address, _ips) {
+ds_put_format(, " %s", ip_address);
+}
+sset_destroy(_ips);
+
+return ds_steal_cstr();
+}
+
+static void
 ovn_port_update_sbrec(const struct ovn_port *op,
   struct hmap *chassis_qdisc_queues)
 {
@@ -1524,7 +1604,15 @@ ovn_port_update_sbrec(const struct ovn_port *op,
 
 const char *nat_addresses = smap_get(>nbsp->options,
"nat-addresses");
-if (nat_addresses) {
+if (nat_addresses && !strcmp(nat_addresses, "router")) {
+if (op->peer && op->peer->nbrp) {
+char *nats = get_nat_addresses(op->peer);
+if (nats) {
+smap_add(, "nat-addresses", nats);
+fre

[ovs-dev] [PATCH v13 3/6] ovn: distributed NAT flows

2017-01-26 Thread Mickey Spiegel
This patch implements the flows required in the ingress and egress
pipeline stages in order to support NAT on a distributed logical router.

NAT functionality is associated with the logical router gateway port.
The flows that carry out NAT functionality all have match conditions on
inport or outport equal to the logical router gateway port.  There are
additional flows that are used to redirect traffic when necessary,
using the tunnel key of a "chassisredirect" SB port binding in order to
redirect traffic to the instance of the logical router gateway port on
the centralized "redirect-chassis".

North/south traffic subject to one-to-one "dnat_and_snat" is handled
in a distributed manner, with south-to-north traffic going to the
local instance of the logical router gateway port.  North/south
traffic subject to (possibly one-to-many) "snat" is handled in a
centralized manner, with south-to-north traffic going to the instance
of the logical router gateway port on the "redirect-chassis".
North-to-south traffic is directed to the corresponding chassis by
limiting ARP responses to the appropriate instance of the logical
router gateway port on one chassis.  For centralized NAT rules, this
is the instance on the "redirect-chassis".  For distributed NAT rules,
this is the chassis where the corresponding logical port resides, using
an ethernet address specified in the NB NAT rule to trigger upstream
MAC learning.

East/west NAT traffic is all handled in a centralized manner.  While it
is certainly possible to handle some of this traffic in a distributed
manner, the centralized approach keeps the NAT flows simpler and
cleaner.  The expectation is that east/west NAT traffic is not as
important to optimize as north/south NAT traffic, with most east/west
traffic not requiring NAT.

Automated tests are currently limited to only a single node.  The
single node automated tests cover both north/south and east/west
traffic flows.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/controller/ovn-controller.c |   6 +-
 ovn/northd/ovn-northd.8.xml | 400 +++--
 ovn/northd/ovn-northd.c | 426 +++-
 ovn/ovn-architecture.7.xml  |   7 +-
 ovn/ovn-nb.ovsschema|   6 +-
 ovn/ovn-nb.xml  |  56 +-
 tests/system-ovn.at | 338 +++
 7 files changed, 1159 insertions(+), 80 deletions(-)

diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 7cef3f8..ea299da 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -323,10 +323,8 @@ update_ct_zones(struct sset *lports, const struct hmap 
*local_datapaths,
 /* Local patched datapath (gateway routers) need zones assigned. */
 const struct local_datapath *ld;
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
-if (!ld->has_local_l3gateway) {
-continue;
-}
-
+/* XXX Add method to limit zone assignment to logical router
+ * datapaths with NAT */
 char *dnat = alloc_nat_zone_key(>datapath->header_.uuid, "dnat");
 char *snat = alloc_nat_zone_key(>datapath->header_.uuid, "snat");
 sset_add(_users, dnat);
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 49e4291..ab8fd88 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -752,9 +752,25 @@ output;
column is set to router and
   the connected logical router port specifies a
-  redirect-chassis, the flow is only programmed on the
-  redirect-chassis.
+  redirect-chassis:
 
+
+
+  
+The flow for the connected logical router port's Ethernet
+address is only programmed on the redirect-chassis.
+  
+
+  
+If the logical router has rules specified in
+ with
+, then
+those addresses are also used to populate the switch's destination
+lookup on the chassis where
+ is
+resident.
+  
+
   
 
   
@@ -890,6 +906,23 @@ output;
   redirect-chassis.
 
   
+
+  
+
+  For each dnat_and_snat NAT rule on a distributed
+  router that specifies an external Ethernet address E,
+  a priority-50 flow that matches inport == GW
+   eth.dst == E, where GW
+  is the logical router gateway port, with action
+  next;.
+
+
+
+  This flow is only programmed on the gateway port instance on
+  the chassis where the logical_port specified in
+  the NAT rule resides.
+
+  
 
 
 
@@ -928,7 +961,9 @@ output;
   
   
   

[ovs-dev] [PATCH v13 5/6] ovn: rewrite redirect-chassis description in ovn-nb.xml

2017-01-26 Thread Mickey Spiegel
This optional patch addresses offline comments that the documentation
in ovn-nb.xml should not describe southbound constructs or flow
details, since it is user facing documentation.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/ovn-nb.xml | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 20797a6..268354a 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -,24 +,19 @@
   
 
   If set, this indicates that this logical router port represents
-  a distributed gateway port.  In addition to the southbound
-  database port representing this distributed gateway port, another
-  port will be created in the southbound database that represents a
-  particular instance, bound to a specific chassis, of this
-  otherwise distributed logical router port.  This additional port
-  can then be specified as an outport in some of the
-  ingress pipeline flows.  This will cause matching packets to be
-  directed to a specific chassis to carry out the egress pipeline,
-  allowing a subset of logical router functionality to be
-  implemented in a centralized manner.  At the beginning of the
-  egress pipeline, the outport will be reset to the
-  value of the distributed port.
+  a distributed gateway port that connects this router to a logical
+  switch with a localnet port.  There may be at most one such
+  logical router port on each logical router.
 
 
 
-  This option specifies the name of the chassis to which
-  the additional southbound port binding of type
-  chassisredirect will be bound.
+  Even when a redirect-chassis is specified, the
+  logical router port still effectively resides on each chassis.
+  However, due to the implications of the use of L2 learning in the
+  physical network, as well as the need to support advanced features
+  such as one-to-many NAT (aka IP masquerading), a subset of the
+  logical router processing is handled in a centralized manner on
+  the specified redirect-chassis.
 
 
 
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v13 4/6] ovn: ovn-nbctl commands for distributed NAT

2017-01-26 Thread Mickey Spiegel
This patch adds the new optional arguments "logical_port" and
"external_mac" to lr-nat-add, and displays that information in
lr-nat-list.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/utilities/ovn-nbctl.8.xml | 27 +++---
 ovn/utilities/ovn-nbctl.c | 54 +--
 tests/ovn-nbctl.at| 47 +
 tests/system-ovn.at   | 30 +---
 4 files changed, 119 insertions(+), 39 deletions(-)

diff --git a/ovn/utilities/ovn-nbctl.8.xml b/ovn/utilities/ovn-nbctl.8.xml
index f95b88d..d81e99f 100644
--- a/ovn/utilities/ovn-nbctl.8.xml
+++ b/ovn/utilities/ovn-nbctl.8.xml
@@ -444,7 +444,7 @@
 NAT Commands
 
 
-  [--may-exist] lr-nat-add router 
type external_ip logical_ip
+  [--may-exist] lr-nat-add router 
type external_ip logical_ip 
[logical_port external_mac]
   
 
   Adds the specified NAT to router.
@@ -453,6 +453,13 @@
   The external_ip is an IPv4 address.
   The logical_ip is an IPv4 network (e.g 192.168.1.0/24)
   or an IPv4 address.
+  The logical_port and external_mac are only
+  accepted when router is a distributed router (rather
+  than a gateway router) and type is
+  dnat_and_snat.
+  The logical_port is the name of an existing logical
+  switch port where the logical_ip resides.
+  The external_mac is an Ethernet address.
 
 
   When type is dnat, the externally
@@ -475,8 +482,22 @@
   the IP address in external_ip.
 
 
-  It is an error if a NAT already exists,
-  unless --may-exist is specified.
+  When the logical_port and external_mac
+  are specified, the NAT rule will be programmed on the chassis
+  where the logical_port resides.  This includes
+  ARP replies for the external_ip, which return the
+  value of external_mac.  All packets transmitted
+  with source IP address equal to external_ip will
+  be sent using the external_mac.
+
+
+  It is an error if a NAT already exists with the same values
+  of router, type, external_ip,
+  and logical_ip, unless --may-exist is
+  specified.  When --may-exist,
+  logical_port, and external_mac are all
+  specified, the existing values of logical_port and
+  external_mac are overwritten.
 
   
 
diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index f0ff27a..3dac434 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -390,7 +390,7 @@ Route commands:\n\
   lr-route-list ROUTER  print routes for ROUTER\n\
 \n\
 NAT commands:\n\
-  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP\n\
+  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC]\n\
 add a NAT to ROUTER\n\
   lr-nat-del ROUTER [TYPE [IP]]\n\
 remove NATs from ROUTER\n\
@@ -2239,6 +2239,30 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 new_logical_ip = normalize_ipv4_prefix(ipv4, plen);
 }
 
+const char *logical_port;
+const char *external_mac;
+if (ctx->argc == 6) {
+ctl_fatal("lr-nat-add with logical_port "
+  "must also specify external_mac.");
+} else if (ctx->argc == 7) {
+if (strcmp(nat_type, "dnat_and_snat")) {
+ctl_fatal("logical_port and external_mac are only valid when "
+  "type is \"dnat_and_snat\".");
+}
+
+logical_port = ctx->argv[5];
+lsp_by_name_or_uuid(ctx, logical_port, true);
+
+external_mac = ctx->argv[6];
+struct eth_addr ea;
+if (!eth_addr_from_string(external_mac, )) {
+ctl_fatal("invalid mac address %s.", external_mac);
+}
+} else {
+logical_port = NULL;
+external_mac = NULL;
+}
+
 bool may_exist = shash_find(>options, "--may-exist") != NULL;
 int is_snat = !strcmp("snat", nat_type);
 for (size_t i = 0; i < lr->n_nat; i++) {
@@ -2249,6 +2273,10 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 if (!strcmp(is_snat ? external_ip : new_logical_ip,
 is_snat ? nat->external_ip : nat->logical_ip)) {
 if (may_exist) {
+nbrec_nat_verify_logical_port(nat);
+nbrec_nat_verify_external_mac(nat);
+nbrec_nat_set_logical_port(nat, logical_port);
+nbrec_nat_set_external_mac(nat, external_mac);
 free(new_logical_ip);
 

[ovs-dev] [PATCH v13 2/6] ovn: avoid snat recirc only on gateway routers

2017-01-26 Thread Mickey Spiegel
Currently, for performance reasons on gateway routers, ct_snat
that does not specify an IP address does not immediately trigger
recirculation.  On gateway routers, ct_snat that does not specify
an IP address happens in the UNSNAT pipeline stage, which is
followed by the DNAT pipeline stage that triggers recirculation
for all packets.  This DNAT pipeline stage recirculation takes
care of the recirculation needs of UNSNAT as well as other cases
such as UNDNAT.

On distributed routers, UNDNAT is handled in the egress pipeline
stage, separately from DNAT in the ingress pipeline stages.  The
DNAT pipeline stage only triggers recirculation for some packets.
Due to this difference in design, UNSNAT needs to trigger its own
recirculation.

This patch restricts the logic that avoids recirculation for
ct_snat, so that it only applies to datapaths representing
gateway routers.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 include/ovn/actions.h  |  3 +++
 ovn/controller/lflow.c | 10 ++
 ovn/lib/actions.c  | 15 +--
 ovn/ovn-sb.xml | 23 +++
 tests/ovn.at   |  2 +-
 5 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 1d7bd69..d2510fd 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -445,6 +445,9 @@ struct ovnact_encode_params {
 /* 'true' if the flow is for a switch. */
 bool is_switch;
 
+/* 'true' if the flow is for a gateway router. */
+bool is_gateway_router;
+
 /* A map from a port name to its connection tracking zone. */
 const struct simap *ct_zones;
 
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 2d9213b..fa00db2 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -107,6 +107,15 @@ is_switch(const struct sbrec_datapath_binding *ldp)
 
 }
 
+static bool
+is_gateway_router(const struct sbrec_datapath_binding *ldp,
+  const struct hmap *local_datapaths)
+{
+struct local_datapath *ld =
+get_local_datapath(local_datapaths, ldp->tunnel_key);
+return ld ? ld->has_local_l3gateway : false;
+}
+
 /* Adds the logical flows from the Logical_Flow table to flow tables. */
 static void
 add_logical_flows(struct controller_ctx *ctx, const struct lport_index *lports,
@@ -221,6 +230,7 @@ consider_logical_flow(const struct lport_index *lports,
 .lookup_port = lookup_port_cb,
 .aux = ,
 .is_switch = is_switch(ldp),
+.is_gateway_router = is_gateway_router(ldp, local_datapaths),
 .ct_zones = ct_zones,
 .group_table = group_table,
 
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index 90a2add..fff838b 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -829,12 +829,15 @@ encode_ct_nat(const struct ovnact_ct_nat *cn,
 ct = ofpacts->header;
 if (cn->ip) {
 ct->flags |= NX_CT_F_COMMIT;
-} else if (snat) {
-/* XXX: For performance reasons, we try to prevent additional
- * recirculations.  So far, ct_snat which is used in a gateway router
- * does not need a recirculation. ct_snat(IP) does need a
- * recirculation.  Should we consider a method to let the actions
- * specify whether an action needs recirculation if there more use
+} else if (snat && ep->is_gateway_router) {
+/* For performance reasons, we try to prevent additional
+ * recirculations.  ct_snat which is used in a gateway router
+ * does not need a recirculation.  ct_snat(IP) does need a
+ * recirculation.  ct_snat in a distributed router needs
+ * recirculation regardless of whether an IP address is
+ * specified.
+ * XXX Should we consider a method to let the actions specify
+ * whether an action needs recirculation if there are more use
  * cases?. */
 ct->recirc_table = NX_CT_RECIRC_NONE;
 }
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index f806af7..b33afd3 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -1128,11 +1128,26 @@
 
   
 ct_snat sends the packet through the SNAT zone to
-unSNAT any packet that was SNATed in the opposite direction.  If
-the packet needs to be sent to the next tables, then it should be
-followed by a next; action.  The next tables will not
-see the changes in the packet caused by the connection tracker.
+unSNAT any packet that was SNATed in the opposite direction.  The
+behavior on gateway routers differs from the behavior on a
+distributed router:
   
+  
+
+  On a gateway router, if the packet needs to be sent to the next
+  tables, then it should be followed by a next;
+  action.  The next tables will not see the changes in the packe

[ovs-dev] [PATCH v13 1/6] ovn: move load balancing flows after NAT flows

2017-01-26 Thread Mickey Spiegel
This will make it easy for distributed NAT to reuse some of the
existing code for NAT flows, while leaving load balancing and defrag
as functionality specific to gateway routers.  There is no intent to
change any functionality in this patch.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/northd/ovn-northd.c | 140 
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 87c80d1..219a69c 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4099,76 +4099,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 const char *lb_force_snat_ip = get_force_snat_ip(od, "lb",
  _ip);
 
-/* A set to hold all ips that need defragmentation and tracking. */
-struct sset all_ips = SSET_INITIALIZER(_ips);
-
-for (int i = 0; i < od->nbr->n_load_balancer; i++) {
-struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
-struct smap *vips = >vips;
-struct smap_node *node;
-
-SMAP_FOR_EACH (node, vips) {
-uint16_t port = 0;
-
-/* node->key contains IP:port or just IP. */
-char *ip_address = NULL;
-ip_address_and_port_from_lb_key(node->key, _address, );
-if (!ip_address) {
-continue;
-}
-
-if (!sset_contains(_ips, ip_address)) {
-sset_add(_ips, ip_address);
-}
-
-/* Higher priority rules are added for load-balancing in DNAT
- * table.  For every match (on a VIP[:port]), we add two flows
- * via add_router_lb_flow().  One flow is for specific matching
- * on ct.new with an action of "ct_lb($targets);".  The other
- * flow is for ct.est with an action of "ct_dnat;". */
-ds_clear();
-ds_put_format(, "ct_lb(%s);", node->value);
-
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s",
-  ip_address);
-free(ip_address);
-
-if (port) {
-if (lb->protocol && !strcmp(lb->protocol, "udp")) {
-ds_put_format(, " && udp && udp.dst == %d",
-  port);
-} else {
-ds_put_format(, " && tcp && tcp.dst == %d",
-  port);
-}
-add_router_lb_flow(lflows, od, , , 120,
-   lb_force_snat_ip);
-} else {
-add_router_lb_flow(lflows, od, , , 110,
-   lb_force_snat_ip);
-}
-}
-}
-
-/* If there are any load balancing rules, we should send the
- * packet to conntrack for defragmentation and tracking.  This helps
- * with two things.
- *
- * 1. With tracking, we can send only new connections to pick a
- *DNAT ip address from a group.
- * 2. If there are L4 ports in load balancing rules, we need the
- *defragmentation to match on L4 ports. */
-const char *ip_address;
-SSET_FOR_EACH(ip_address, _ips) {
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s", ip_address);
-ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG,
-  100, ds_cstr(), "ct_next;");
-}
-
-sset_destroy(_ips);
-
 for (int i = 0; i < od->nbr->n_nat; i++) {
 const struct nbrec_nat *nat;
 
@@ -4323,6 +4253,76 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 * routing in the openflow pipeline. */
 ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
   "ip", "flags.loopback = 1; ct_dnat;");
+
+/* A set to hold all ips that need defragmentation and tracking. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+
+for (int i = 0; i < od->nbr->n_load_balancer; i++) {
+struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
+struct smap *vips = >vips;
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, vips) {
+uint16_t port = 0;
+
+/* node->key contains IP:port or just IP. */
+char *ip_address = NULL;
+ip_address_and_port_from_lb_key(node->key, _address, );
+ 

[ovs-dev] [PATCH v13 0/6] ovn: add distributed NAT capability

2017-01-26 Thread Mickey Spiegel
 flag.

PATCH v7 -> PATCH v8
Incorporated incremental changes to is_chassis_resident() from blp.
Added patch that describes logical routers and logical patches in
ovn-architecture.
Renamed chassisredirect patch to emphasize distributed gateway ports
as well.
Added description of distributed gateway ports to ovn-architecture,
in distributed gateway port / chassisredirect patch.
Rewrote commit message for distributed gateway port / chassisredirect.

PATCH v6 -> PATCH v7
Rebase.
Documentation improvements to lsp addresses "router" patch as
suggested by blp.  Also added to ovn-nbctl documentation.

PATCH v5 -> PATCH v6
Added patch to automatically add router addresses to the addresses of
type "router" lsps.
Restricted logical switch destination lookup flows for logical router
distributed gateway port's MAC to the redirect chassis.
Automatically add distributed NAT MAC addresses to logical switch
destination lookup flows on the chassis where the NAT logical port resides.
Added tests for reachability from VIFs on the same logical switch as
localnet, through the logical router's distributed gateway port, to
internal VIFs.

PATCH v4 -> PATCH v5
Limited router ingress table 0 flow matching router ethernet address
on distributed gateway to redirect chassis.
Limited router ingress table 0 flows matching NAT ethernet address to
chassis where the NAT rule's logical port resides.
Rolled back changes to ICMP since they are not necessary.

PATCH v3 -> PATCH v4
Rebase

PATCH v2 -> PATCH v3
Added table to set egress loopback flag in the egress pipeline stage,
fixing east-west NAT across multiple chassis.

PATCH v1 -> PATCH v2
Added ovn-trace logic for chassisredirect ports, including automated test.
Added ovn-trace logic for egress loopback.
Fixed some bugs in ovn-trace register handling from ingress to egress,
and across patch ports (should these be filed separately as well?).

RFC v4 -> PATCH v1
Added egress loopback capability
Added east/west NAT tests to system-ovn.at (make check-kernel)
Added REGBIT_NAT_REDIRECT flows to IN_IP_ROUTING and IN_ARP_RESOLVE,
resolving remaining issues with east/west NAT

RFC v3 -> RFC v4
Rebased to pick up recent changes to ovn-controller, including a fix
to the localnet issue where VIFs had to be added on a chassis in order
to cause the localnet port to be instantiated.
The chassisredirect port logic was rewritten to avoid creating an
ofport.  Besides streamlining the code significantly, this fixed the
problem when the distributed port name was longer than 12 characters.
Restricted IPv6 ND replies for the router IP address to the redirect
chassis, similar to IPv4 ARP restrictions.
Added specific gateway redirect flows for unresolved ethernet
destination, so that ARP requests generated by the router are sent
through the redirect chassis regardless of NAT rules.
Relaxed checks in chassisredirect tests so that they are independent
of register assignments.
Renamed ovn-northd.c "l3gateway_port" to "l3dgw_port" in order to
avoid overlaps with gateway router terminology.

RFC v2 -> RFC v3
Reordered the first two patches.
Moved non-NAT specific flows from patch 5 to patch 2.
Added automated tests for is_chassis_resident (which is ready for
review) and chassisredirect patches.
Added flows to limit ICMP echo replies for router IPs on the gateway
interface, so that they are only generated on the redirect-chassis.

Mickey Spiegel (6):
  ovn: move load balancing flows after NAT flows
  ovn: avoid snat recirc only on gateway routers
  ovn: distributed NAT flows
  ovn: ovn-nbctl commands for distributed NAT
  ovn: rewrite redirect-chassis description in ovn-nb.xml
  ovn: specify options:nat-addresses as "router"

 include/ovn/actions.h   |   3 +
 ovn/controller/lflow.c  |  10 +
 ovn/controller/ovn-controller.c |   6 +-
 ovn/lib/actions.c   |  15 +-
 ovn/northd/ovn-northd.8.xml | 400 ++-
 ovn/northd/ovn-northd.c | 680 +++-
 ovn/ovn-architecture.7.xml  |   7 +-
 ovn/ovn-nb.ovsschema|   6 +-
 ovn/ovn-nb.xml  | 123 ++--
 ovn/ovn-sb.xml  |  23 +-
 ovn/utilities/ovn-nbctl.8.xml   |  27 +-
 ovn/utilities/ovn-nbctl.c   |  54 +++-
 tests/ovn-nbctl.at  |  47 ++-
 tests/ovn.at|  62 +++-
 tests/system-ovn.at | 320 +++
 15 files changed, 1561 insertions(+), 222 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v11 4/5] ovn: ovn-nbctl commands for distributed NAT

2017-01-26 Thread Mickey Spiegel
On Thu, Jan 26, 2017 at 9:20 AM, Guru Shetty <g...@ovn.org> wrote:

>
>
> On 21 January 2017 at 16:52, Mickey Spiegel <mickeys@gmail.com> wrote:
>
>> This patch adds the new optional arguments "logical_port" and
>> "external_mac" to lr-nat-add, and displays that information in
>> lr-nat-list.
>>
>> Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
>>
>
>
> Acked-by: Gurucharan Shetty <g...@ovn.org>
>
> On a different note, can the external_mac be the same as the
> logical_port's mac? I guess, it does not matter. It may makes sense to add
> that information in ovn-nb.
>

It just needs to be unique on the logical switch with the localnet port. If
the logical_port's mac is globally unique, then sure that can be used. I
can add that to ovn-nb.

Mickey
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v11 3/5] ovn: distributed NAT flows

2017-01-26 Thread Mickey Spiegel
On Thu, Jan 26, 2017 at 8:53 AM, Guru Shetty <g...@ovn.org> wrote:

>
>
> On 21 January 2017 at 16:52, Mickey Spiegel <mickeys@gmail.com> wrote:
>
>> This patch implements the flows required in the ingress and egress
>> pipeline stages in order to support NAT on a distributed logical router.
>>
>> NAT functionality is associated with the logical router gateway port.
>> The flows that carry out NAT functionality all have match conditions on
>> inport or outport equal to the logical router gateway port.  There are
>> additional flows that are used to redirect traffic when necessary,
>> using the tunnel key of a "chassisredirect" SB port binding in order to
>> redirect traffic to the instance of the logical router gateway port on
>> the centralized "redirect-chassis".
>>
>> North/south traffic subject to one-to-one "dnat_and_snat" is handled
>> in a distributed manner, with south-to-north traffic going to the
>> local instance of the logical router gateway port.  North/south
>> traffic subject to (possibly one-to-many) "snat" is handled in a
>> centralized manner, with south-to-north traffic going to the instance
>> of the logical router gateway port on the "redirect-chassis".
>> North-to-south traffic is directed to the corresponding chassis by
>> limiting ARP responses to the appropriate instance of the logical
>> router gateway port on one chassis.  For centralized NAT rules, this
>> is the instance on the "redirect-chassis".  For distributed NAT rules,
>> this is the chassis where the corresponding logical port resides, using
>> an ethernet address specified in the NB NAT rule to trigger upstream
>> MAC learning.
>>
>> East/west NAT traffic is all handled in a centralized manner.  While it
>> is certainly possible to handle some of this traffic in a distributed
>> manner, the centralized approach keeps the NAT flows simpler and
>> cleaner.  The expectation is that east/west NAT traffic is not as
>> important to optimize as north/south NAT traffic, with most east/west
>> traffic not requiring NAT.
>>
>> Automated tests are currently limited to only a single node.  The
>> single node automated tests cover both north/south and east/west
>> traffic flows.
>>
>> Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
>>
>  Acked-by: Gurucharan Shetty <g...@ovn.org>
>
> A few comments below...
>

Thanks for the review.


>
>
>> +/* For distributed router NAT, determine whether this NAT
>> rule
>> + * satisfies the conditions for distributed NAT processing.
>> */
>> +bool distributed = false;
>> +struct eth_addr mac;
>> +if (od->l3dgw_port && !strcmp(nat->type, "dnat_and_snat") &&
>> +nat->logical_port && nat->external_mac) {
>> +if (eth_addr_from_string(nat->external_mac, )) {
>> +distributed = true;
>> +} else {
>> +static struct vlog_rate_limit rl =
>> +VLOG_RATE_LIMIT_INIT(5, 1);
>> +VLOG_WARN_RL(, "bad mac %s for dnat in router "
>> +""UUID_FMT"", nat->external_mac,
>> UUID_ARGS(>key));
>>
> Does bad mac need a "continue" ?
>

The way I have it right now, if the MAC is bad (or if there is an
external_mac but no logical_port, or there is a logical_port but no
external_mac) then the NAT rule is still installed, but only on the
centralized instance of the distributed gateway port.
Do you think it is better to "continue", where the NAT rule will not
work at all until the MAC is fixed?


>
>
>
>> +}
>> +}
>> +
>>  /* Ingress UNSNAT table: It is for already established
>> connections'
>>   * reverse traffic. i.e., SNAT has already been done in
>> egress
>>   * pipeline and now the packet has entered the ingress
>> pipeline as
>>
>
> ...snip..
>
>
>> @@ -4161,21 +4290,87 @@ build_lrouter_flows(struct hmap *datapaths,
>> struct hmap *ports,
>>   * to a logical IP address. */
>>  if (!strcmp(nat->type, "dnat")
>>  || !strcmp(nat->type, "dnat_and_snat")) {
>> -/* Packet when it goes from the initiator to destination.
>> - * We need to zero the inport because the router can
>&

[ovs-dev] [PATCH v12 3/6] ovn: distributed NAT flows

2017-01-26 Thread Mickey Spiegel
This patch implements the flows required in the ingress and egress
pipeline stages in order to support NAT on a distributed logical router.

NAT functionality is associated with the logical router gateway port.
The flows that carry out NAT functionality all have match conditions on
inport or outport equal to the logical router gateway port.  There are
additional flows that are used to redirect traffic when necessary,
using the tunnel key of a "chassisredirect" SB port binding in order to
redirect traffic to the instance of the logical router gateway port on
the centralized "redirect-chassis".

North/south traffic subject to one-to-one "dnat_and_snat" is handled
in a distributed manner, with south-to-north traffic going to the
local instance of the logical router gateway port.  North/south
traffic subject to (possibly one-to-many) "snat" is handled in a
centralized manner, with south-to-north traffic going to the instance
of the logical router gateway port on the "redirect-chassis".
North-to-south traffic is directed to the corresponding chassis by
limiting ARP responses to the appropriate instance of the logical
router gateway port on one chassis.  For centralized NAT rules, this
is the instance on the "redirect-chassis".  For distributed NAT rules,
this is the chassis where the corresponding logical port resides, using
an ethernet address specified in the NB NAT rule to trigger upstream
MAC learning.

East/west NAT traffic is all handled in a centralized manner.  While it
is certainly possible to handle some of this traffic in a distributed
manner, the centralized approach keeps the NAT flows simpler and
cleaner.  The expectation is that east/west NAT traffic is not as
important to optimize as north/south NAT traffic, with most east/west
traffic not requiring NAT.

Automated tests are currently limited to only a single node.  The
single node automated tests cover both north/south and east/west
traffic flows.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/controller/ovn-controller.c |   6 +-
 ovn/northd/ovn-northd.8.xml | 400 +++--
 ovn/northd/ovn-northd.c | 425 +++-
 ovn/ovn-architecture.7.xml  |   7 +-
 ovn/ovn-nb.ovsschema|   6 +-
 ovn/ovn-nb.xml  |  49 -
 tests/system-ovn.at | 338 
 7 files changed, 1151 insertions(+), 80 deletions(-)

diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 7cef3f8..ea299da 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -323,10 +323,8 @@ update_ct_zones(struct sset *lports, const struct hmap 
*local_datapaths,
 /* Local patched datapath (gateway routers) need zones assigned. */
 const struct local_datapath *ld;
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
-if (!ld->has_local_l3gateway) {
-continue;
-}
-
+/* XXX Add method to limit zone assignment to logical router
+ * datapaths with NAT */
 char *dnat = alloc_nat_zone_key(>datapath->header_.uuid, "dnat");
 char *snat = alloc_nat_zone_key(>datapath->header_.uuid, "snat");
 sset_add(_users, dnat);
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 49e4291..ab8fd88 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -752,9 +752,25 @@ output;
column is set to router and
   the connected logical router port specifies a
-  redirect-chassis, the flow is only programmed on the
-  redirect-chassis.
+  redirect-chassis:
 
+
+
+  
+The flow for the connected logical router port's Ethernet
+address is only programmed on the redirect-chassis.
+  
+
+  
+If the logical router has rules specified in
+ with
+, then
+those addresses are also used to populate the switch's destination
+lookup on the chassis where
+ is
+resident.
+  
+
   
 
   
@@ -890,6 +906,23 @@ output;
   redirect-chassis.
 
   
+
+  
+
+  For each dnat_and_snat NAT rule on a distributed
+  router that specifies an external Ethernet address E,
+  a priority-50 flow that matches inport == GW
+   eth.dst == E, where GW
+  is the logical router gateway port, with action
+  next;.
+
+
+
+  This flow is only programmed on the gateway port instance on
+  the chassis where the logical_port specified in
+  the NAT rule resides.
+
+  
 
 
 
@@ -928,7 +961,9 @@ output;
   
   
 ip4.src or ip6.src is any IP
-addr

[ovs-dev] [PATCH v12 4/6] ovn: ovn-nbctl commands for distributed NAT

2017-01-26 Thread Mickey Spiegel
This patch adds the new optional arguments "logical_port" and
"external_mac" to lr-nat-add, and displays that information in
lr-nat-list.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/utilities/ovn-nbctl.8.xml | 27 +++---
 ovn/utilities/ovn-nbctl.c | 54 +--
 tests/ovn-nbctl.at| 47 +
 tests/system-ovn.at   | 30 +---
 4 files changed, 119 insertions(+), 39 deletions(-)

diff --git a/ovn/utilities/ovn-nbctl.8.xml b/ovn/utilities/ovn-nbctl.8.xml
index f95b88d..d81e99f 100644
--- a/ovn/utilities/ovn-nbctl.8.xml
+++ b/ovn/utilities/ovn-nbctl.8.xml
@@ -444,7 +444,7 @@
 NAT Commands
 
 
-  [--may-exist] lr-nat-add router 
type external_ip logical_ip
+  [--may-exist] lr-nat-add router 
type external_ip logical_ip 
[logical_port external_mac]
   
 
   Adds the specified NAT to router.
@@ -453,6 +453,13 @@
   The external_ip is an IPv4 address.
   The logical_ip is an IPv4 network (e.g 192.168.1.0/24)
   or an IPv4 address.
+  The logical_port and external_mac are only
+  accepted when router is a distributed router (rather
+  than a gateway router) and type is
+  dnat_and_snat.
+  The logical_port is the name of an existing logical
+  switch port where the logical_ip resides.
+  The external_mac is an Ethernet address.
 
 
   When type is dnat, the externally
@@ -475,8 +482,22 @@
   the IP address in external_ip.
 
 
-  It is an error if a NAT already exists,
-  unless --may-exist is specified.
+  When the logical_port and external_mac
+  are specified, the NAT rule will be programmed on the chassis
+  where the logical_port resides.  This includes
+  ARP replies for the external_ip, which return the
+  value of external_mac.  All packets transmitted
+  with source IP address equal to external_ip will
+  be sent using the external_mac.
+
+
+  It is an error if a NAT already exists with the same values
+  of router, type, external_ip,
+  and logical_ip, unless --may-exist is
+  specified.  When --may-exist,
+  logical_port, and external_mac are all
+  specified, the existing values of logical_port and
+  external_mac are overwritten.
 
   
 
diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index f0ff27a..3dac434 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -390,7 +390,7 @@ Route commands:\n\
   lr-route-list ROUTER  print routes for ROUTER\n\
 \n\
 NAT commands:\n\
-  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP\n\
+  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC]\n\
 add a NAT to ROUTER\n\
   lr-nat-del ROUTER [TYPE [IP]]\n\
 remove NATs from ROUTER\n\
@@ -2239,6 +2239,30 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 new_logical_ip = normalize_ipv4_prefix(ipv4, plen);
 }
 
+const char *logical_port;
+const char *external_mac;
+if (ctx->argc == 6) {
+ctl_fatal("lr-nat-add with logical_port "
+  "must also specify external_mac.");
+} else if (ctx->argc == 7) {
+if (strcmp(nat_type, "dnat_and_snat")) {
+ctl_fatal("logical_port and external_mac are only valid when "
+  "type is \"dnat_and_snat\".");
+}
+
+logical_port = ctx->argv[5];
+lsp_by_name_or_uuid(ctx, logical_port, true);
+
+external_mac = ctx->argv[6];
+struct eth_addr ea;
+if (!eth_addr_from_string(external_mac, )) {
+ctl_fatal("invalid mac address %s.", external_mac);
+}
+} else {
+logical_port = NULL;
+external_mac = NULL;
+}
+
 bool may_exist = shash_find(>options, "--may-exist") != NULL;
 int is_snat = !strcmp("snat", nat_type);
 for (size_t i = 0; i < lr->n_nat; i++) {
@@ -2249,6 +2273,10 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 if (!strcmp(is_snat ? external_ip : new_logical_ip,
 is_snat ? nat->external_ip : nat->logical_ip)) {
 if (may_exist) {
+nbrec_nat_verify_logical_port(nat);
+nbrec_nat_verify_external_mac(nat);
+nbrec_nat_set_logical_port(nat, logical_port);
+nbrec_nat_set_external_mac(nat, external_mac);
 free(new_logical_ip);
 return;
 }
@@ -2271,

[ovs-dev] [PATCH v12 2/6] ovn: avoid snat recirc only on gateway routers

2017-01-26 Thread Mickey Spiegel
Currently, for performance reasons on gateway routers, ct_snat
that does not specify an IP address does not immediately trigger
recirculation.  On gateway routers, ct_snat that does not specify
an IP address happens in the UNSNAT pipeline stage, which is
followed by the DNAT pipeline stage that triggers recirculation
for all packets.  This DNAT pipeline stage recirculation takes
care of the recirculation needs of UNSNAT as well as other cases
such as UNDNAT.

On distributed routers, UNDNAT is handled in the egress pipeline
stage, separately from DNAT in the ingress pipeline stages.  The
DNAT pipeline stage only triggers recirculation for some packets.
Due to this difference in design, UNSNAT needs to trigger its own
recirculation.

This patch restricts the logic that avoids recirculation for
ct_snat, so that it only applies to datapaths representing
gateway routers.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 include/ovn/actions.h  |  3 +++
 ovn/controller/lflow.c | 10 ++
 ovn/lib/actions.c  | 15 +--
 ovn/ovn-sb.xml | 23 +++
 tests/ovn.at   |  2 +-
 5 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 1d7bd69..d2510fd 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -445,6 +445,9 @@ struct ovnact_encode_params {
 /* 'true' if the flow is for a switch. */
 bool is_switch;
 
+/* 'true' if the flow is for a gateway router. */
+bool is_gateway_router;
+
 /* A map from a port name to its connection tracking zone. */
 const struct simap *ct_zones;
 
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 2d9213b..fa00db2 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -107,6 +107,15 @@ is_switch(const struct sbrec_datapath_binding *ldp)
 
 }
 
+static bool
+is_gateway_router(const struct sbrec_datapath_binding *ldp,
+  const struct hmap *local_datapaths)
+{
+struct local_datapath *ld =
+get_local_datapath(local_datapaths, ldp->tunnel_key);
+return ld ? ld->has_local_l3gateway : false;
+}
+
 /* Adds the logical flows from the Logical_Flow table to flow tables. */
 static void
 add_logical_flows(struct controller_ctx *ctx, const struct lport_index *lports,
@@ -221,6 +230,7 @@ consider_logical_flow(const struct lport_index *lports,
 .lookup_port = lookup_port_cb,
 .aux = ,
 .is_switch = is_switch(ldp),
+.is_gateway_router = is_gateway_router(ldp, local_datapaths),
 .ct_zones = ct_zones,
 .group_table = group_table,
 
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index 90a2add..fff838b 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -829,12 +829,15 @@ encode_ct_nat(const struct ovnact_ct_nat *cn,
 ct = ofpacts->header;
 if (cn->ip) {
 ct->flags |= NX_CT_F_COMMIT;
-} else if (snat) {
-/* XXX: For performance reasons, we try to prevent additional
- * recirculations.  So far, ct_snat which is used in a gateway router
- * does not need a recirculation. ct_snat(IP) does need a
- * recirculation.  Should we consider a method to let the actions
- * specify whether an action needs recirculation if there more use
+} else if (snat && ep->is_gateway_router) {
+/* For performance reasons, we try to prevent additional
+ * recirculations.  ct_snat which is used in a gateway router
+ * does not need a recirculation.  ct_snat(IP) does need a
+ * recirculation.  ct_snat in a distributed router needs
+ * recirculation regardless of whether an IP address is
+ * specified.
+ * XXX Should we consider a method to let the actions specify
+ * whether an action needs recirculation if there are more use
  * cases?. */
 ct->recirc_table = NX_CT_RECIRC_NONE;
 }
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index f806af7..b33afd3 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -1128,11 +1128,26 @@
 
   
 ct_snat sends the packet through the SNAT zone to
-unSNAT any packet that was SNATed in the opposite direction.  If
-the packet needs to be sent to the next tables, then it should be
-followed by a next; action.  The next tables will not
-see the changes in the packet caused by the connection tracker.
+unSNAT any packet that was SNATed in the opposite direction.  The
+behavior on gateway routers differs from the behavior on a
+distributed router:
   
+  
+
+  On a gateway router, if the packet needs to be sent to the next
+  tables, then it should be followed by a next;
+  action.  The next tables will not see the changes in the packe

[ovs-dev] [PATCH v12 1/6] ovn: move load balancing flows after NAT flows

2017-01-26 Thread Mickey Spiegel
This will make it easy for distributed NAT to reuse some of the
existing code for NAT flows, while leaving load balancing and defrag
as functionality specific to gateway routers.  There is no intent to
change any functionality in this patch.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/northd/ovn-northd.c | 140 
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 87c80d1..219a69c 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4099,76 +4099,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 const char *lb_force_snat_ip = get_force_snat_ip(od, "lb",
  _ip);
 
-/* A set to hold all ips that need defragmentation and tracking. */
-struct sset all_ips = SSET_INITIALIZER(_ips);
-
-for (int i = 0; i < od->nbr->n_load_balancer; i++) {
-struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
-struct smap *vips = >vips;
-struct smap_node *node;
-
-SMAP_FOR_EACH (node, vips) {
-uint16_t port = 0;
-
-/* node->key contains IP:port or just IP. */
-char *ip_address = NULL;
-ip_address_and_port_from_lb_key(node->key, _address, );
-if (!ip_address) {
-continue;
-}
-
-if (!sset_contains(_ips, ip_address)) {
-sset_add(_ips, ip_address);
-}
-
-/* Higher priority rules are added for load-balancing in DNAT
- * table.  For every match (on a VIP[:port]), we add two flows
- * via add_router_lb_flow().  One flow is for specific matching
- * on ct.new with an action of "ct_lb($targets);".  The other
- * flow is for ct.est with an action of "ct_dnat;". */
-ds_clear();
-ds_put_format(, "ct_lb(%s);", node->value);
-
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s",
-  ip_address);
-free(ip_address);
-
-if (port) {
-if (lb->protocol && !strcmp(lb->protocol, "udp")) {
-ds_put_format(, " && udp && udp.dst == %d",
-  port);
-} else {
-ds_put_format(, " && tcp && tcp.dst == %d",
-  port);
-}
-add_router_lb_flow(lflows, od, , , 120,
-   lb_force_snat_ip);
-} else {
-add_router_lb_flow(lflows, od, , , 110,
-   lb_force_snat_ip);
-}
-}
-}
-
-/* If there are any load balancing rules, we should send the
- * packet to conntrack for defragmentation and tracking.  This helps
- * with two things.
- *
- * 1. With tracking, we can send only new connections to pick a
- *DNAT ip address from a group.
- * 2. If there are L4 ports in load balancing rules, we need the
- *defragmentation to match on L4 ports. */
-const char *ip_address;
-SSET_FOR_EACH(ip_address, _ips) {
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s", ip_address);
-ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG,
-  100, ds_cstr(), "ct_next;");
-}
-
-sset_destroy(_ips);
-
 for (int i = 0; i < od->nbr->n_nat; i++) {
 const struct nbrec_nat *nat;
 
@@ -4323,6 +4253,76 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 * routing in the openflow pipeline. */
 ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
   "ip", "flags.loopback = 1; ct_dnat;");
+
+/* A set to hold all ips that need defragmentation and tracking. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+
+for (int i = 0; i < od->nbr->n_load_balancer; i++) {
+struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
+struct smap *vips = >vips;
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, vips) {
+uint16_t port = 0;
+
+/* node->key contains IP:port or just IP. */
+char *ip_address = NULL;
+ip_address_and_port_from_lb_key(node->key, _address, );
+ 

[ovs-dev] [PATCH v12 0/6] ovn: add distributed NAT capability

2017-01-26 Thread Mickey Spiegel
() from blp.
Added patch that describes logical routers and logical patches in
ovn-architecture.
Renamed chassisredirect patch to emphasize distributed gateway ports
as well.
Added description of distributed gateway ports to ovn-architecture,
in distributed gateway port / chassisredirect patch.
Rewrote commit message for distributed gateway port / chassisredirect.

PATCH v6 -> PATCH v7
Rebase.
Documentation improvements to lsp addresses "router" patch as
suggested by blp.  Also added to ovn-nbctl documentation.

PATCH v5 -> PATCH v6
Added patch to automatically add router addresses to the addresses of
type "router" lsps.
Restricted logical switch destination lookup flows for logical router
distributed gateway port's MAC to the redirect chassis.
Automatically add distributed NAT MAC addresses to logical switch
destination lookup flows on the chassis where the NAT logical port resides.
Added tests for reachability from VIFs on the same logical switch as
localnet, through the logical router's distributed gateway port, to
internal VIFs.

PATCH v4 -> PATCH v5
Limited router ingress table 0 flow matching router ethernet address
on distributed gateway to redirect chassis.
Limited router ingress table 0 flows matching NAT ethernet address to
chassis where the NAT rule's logical port resides.
Rolled back changes to ICMP since they are not necessary.

PATCH v3 -> PATCH v4
Rebase

PATCH v2 -> PATCH v3
Added table to set egress loopback flag in the egress pipeline stage,
fixing east-west NAT across multiple chassis.

PATCH v1 -> PATCH v2
Added ovn-trace logic for chassisredirect ports, including automated test.
Added ovn-trace logic for egress loopback.
Fixed some bugs in ovn-trace register handling from ingress to egress,
and across patch ports (should these be filed separately as well?).

RFC v4 -> PATCH v1
Added egress loopback capability
Added east/west NAT tests to system-ovn.at (make check-kernel)
Added REGBIT_NAT_REDIRECT flows to IN_IP_ROUTING and IN_ARP_RESOLVE,
resolving remaining issues with east/west NAT

RFC v3 -> RFC v4
Rebased to pick up recent changes to ovn-controller, including a fix
to the localnet issue where VIFs had to be added on a chassis in order
to cause the localnet port to be instantiated.
The chassisredirect port logic was rewritten to avoid creating an
ofport.  Besides streamlining the code significantly, this fixed the
problem when the distributed port name was longer than 12 characters.
Restricted IPv6 ND replies for the router IP address to the redirect
chassis, similar to IPv4 ARP restrictions.
Added specific gateway redirect flows for unresolved ethernet
destination, so that ARP requests generated by the router are sent
through the redirect chassis regardless of NAT rules.
Relaxed checks in chassisredirect tests so that they are independent
of register assignments.
Renamed ovn-northd.c "l3gateway_port" to "l3dgw_port" in order to
avoid overlaps with gateway router terminology.

RFC v2 -> RFC v3
Reordered the first two patches.
Moved non-NAT specific flows from patch 5 to patch 2.
Added automated tests for is_chassis_resident (which is ready for
review) and chassisredirect patches.
Added flows to limit ICMP echo replies for router IPs on the gateway
interface, so that they are only generated on the redirect-chassis.

Mickey Spiegel (6):
  ovn: move load balancing flows after NAT flows
  ovn: avoid snat recirc only on gateway routers
  ovn: distributed NAT flows
  ovn: ovn-nbctl commands for distributed NAT
  ovn: rewrite redirect-chassis description in ovn-nb.xml
  ovn: specify options:nat-addresses as "router"

 include/ovn/actions.h   |   3 +
 ovn/controller/lflow.c  |  10 +
 ovn/controller/ovn-controller.c |   6 +-
 ovn/lib/actions.c   |  15 +-
 ovn/northd/ovn-northd.8.xml | 400 ++-
 ovn/northd/ovn-northd.c | 679 +++-
 ovn/ovn-architecture.7.xml  |   7 +-
 ovn/ovn-nb.ovsschema|   6 +-
 ovn/ovn-nb.xml  | 116 +--
 ovn/ovn-sb.xml  |  23 +-
 ovn/utilities/ovn-nbctl.8.xml   |  27 +-
 ovn/utilities/ovn-nbctl.c   |  54 +++-
 tests/ovn-nbctl.at  |  47 ++-
 tests/ovn.at|  62 +++-
 tests/system-ovn.at | 320 +++
 15 files changed, 1553 insertions(+), 222 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] What's the manner of sending GARP for distributed dnat_and_snat?

2017-01-25 Thread Mickey Spiegel
On Wed, Jan 25, 2017 at 12:26 AM, Dong Jun  wrote:

> Hi
>
> I learned about the distributed dnat_and_snat. Now I don't see what's
> the manner of sending GARP for distributed dnat_and_snat IP. In the past,
> we set nat_addresses column in lsp that connected to gateway lrp. Now the
> type of lrp was changed from l3gateway to patch, does this affect GARP for
> distributed dnat_and_snat IP?
>

GARP is not supported in the current patch set for distributed NAT. I have
started working on a couple of patches for GARP with distributed NAT. The
first patch is allowing options nat-addresses to be set to the string
"router", in which case northd will collect all the NAT external IP
addresses and load balancer IP addresses from VIPs and set the port_binding
nat-addresses correspondingly. The second patch will extend GARP
functionality for distributed NAT, by replacing options nat-addresses with
a column nat_addresses that with max:unlimited, and the optional addition
of a port name to the end of the string. When the port name is present,
GARP will only be issued for the addresses in that column on the chassis
where the specified port is resident.

Mickey
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v11 5/5] ovn: rewrite redirect-chassis description in ovn-nb.xml

2017-01-21 Thread Mickey Spiegel
This optional patch addresses offline comments that the documentation
in ovn-nb.xml should not describe southbound constructs or flow
details, since it is user facing documentation.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/ovn-nb.xml | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 6b193c4..2af46b6 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -,24 +,19 @@
   
 
   If set, this indicates that this logical router port represents
-  a distributed gateway port.  In addition to the southbound
-  database port representing this distributed gateway port, another
-  port will be created in the southbound database that represents a
-  particular instance, bound to a specific chassis, of this
-  otherwise distributed logical router port.  This additional port
-  can then be specified as an outport in some of the
-  ingress pipeline flows.  This will cause matching packets to be
-  directed to a specific chassis to carry out the egress pipeline,
-  allowing a subset of logical router functionality to be
-  implemented in a centralized manner.  At the beginning of the
-  egress pipeline, the outport will be reset to the
-  value of the distributed port.
+  a distributed gateway port that connects this router to a logical
+  switch with a localnet port.  There may be at most one such
+  logical router port on each logical router.
 
 
 
-  This option specifies the name of the chassis to which
-  the additional southbound port binding of type
-  chassisredirect will be bound.
+  Even when a redirect-chassis is specified, the
+  logical router port still effectively resides on each chassis.
+  However, due to the implications of the use of L2 learning in the
+  physical network, as well as the need to support advanced features
+  such as one-to-many NAT (aka IP masquerading), a subset of the
+  logical router processing is handled in a centralized manner on
+  the specified redirect-chassis.
 
 
 
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v11 3/5] ovn: distributed NAT flows

2017-01-21 Thread Mickey Spiegel
This patch implements the flows required in the ingress and egress
pipeline stages in order to support NAT on a distributed logical router.

NAT functionality is associated with the logical router gateway port.
The flows that carry out NAT functionality all have match conditions on
inport or outport equal to the logical router gateway port.  There are
additional flows that are used to redirect traffic when necessary,
using the tunnel key of a "chassisredirect" SB port binding in order to
redirect traffic to the instance of the logical router gateway port on
the centralized "redirect-chassis".

North/south traffic subject to one-to-one "dnat_and_snat" is handled
in a distributed manner, with south-to-north traffic going to the
local instance of the logical router gateway port.  North/south
traffic subject to (possibly one-to-many) "snat" is handled in a
centralized manner, with south-to-north traffic going to the instance
of the logical router gateway port on the "redirect-chassis".
North-to-south traffic is directed to the corresponding chassis by
limiting ARP responses to the appropriate instance of the logical
router gateway port on one chassis.  For centralized NAT rules, this
is the instance on the "redirect-chassis".  For distributed NAT rules,
this is the chassis where the corresponding logical port resides, using
an ethernet address specified in the NB NAT rule to trigger upstream
MAC learning.

East/west NAT traffic is all handled in a centralized manner.  While it
is certainly possible to handle some of this traffic in a distributed
manner, the centralized approach keeps the NAT flows simpler and
cleaner.  The expectation is that east/west NAT traffic is not as
important to optimize as north/south NAT traffic, with most east/west
traffic not requiring NAT.

Automated tests are currently limited to only a single node.  The
single node automated tests cover both north/south and east/west
traffic flows.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/controller/ovn-controller.c |   6 +-
 ovn/northd/ovn-northd.8.xml | 400 +++--
 ovn/northd/ovn-northd.c | 425 +++-
 ovn/ovn-architecture.7.xml  |   7 +-
 ovn/ovn-nb.ovsschema|   6 +-
 ovn/ovn-nb.xml  |  49 -
 tests/system-ovn.at | 338 
 7 files changed, 1151 insertions(+), 80 deletions(-)

diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 7cef3f8..ea299da 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -323,10 +323,8 @@ update_ct_zones(struct sset *lports, const struct hmap 
*local_datapaths,
 /* Local patched datapath (gateway routers) need zones assigned. */
 const struct local_datapath *ld;
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
-if (!ld->has_local_l3gateway) {
-continue;
-}
-
+/* XXX Add method to limit zone assignment to logical router
+ * datapaths with NAT */
 char *dnat = alloc_nat_zone_key(>datapath->header_.uuid, "dnat");
 char *snat = alloc_nat_zone_key(>datapath->header_.uuid, "snat");
 sset_add(_users, dnat);
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 49e4291..ab8fd88 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -752,9 +752,25 @@ output;
column is set to router and
   the connected logical router port specifies a
-  redirect-chassis, the flow is only programmed on the
-  redirect-chassis.
+  redirect-chassis:
 
+
+
+  
+The flow for the connected logical router port's Ethernet
+address is only programmed on the redirect-chassis.
+  
+
+  
+If the logical router has rules specified in
+ with
+, then
+those addresses are also used to populate the switch's destination
+lookup on the chassis where
+ is
+resident.
+  
+
   
 
   
@@ -890,6 +906,23 @@ output;
   redirect-chassis.
 
   
+
+  
+
+  For each dnat_and_snat NAT rule on a distributed
+  router that specifies an external Ethernet address E,
+  a priority-50 flow that matches inport == GW
+   eth.dst == E, where GW
+  is the logical router gateway port, with action
+  next;.
+
+
+
+  This flow is only programmed on the gateway port instance on
+  the chassis where the logical_port specified in
+  the NAT rule resides.
+
+  
 
 
 
@@ -928,7 +961,9 @@ output;
   
   
 ip4.src or ip6.src is any IP
-addr

[ovs-dev] [PATCH v11 4/5] ovn: ovn-nbctl commands for distributed NAT

2017-01-21 Thread Mickey Spiegel
This patch adds the new optional arguments "logical_port" and
"external_mac" to lr-nat-add, and displays that information in
lr-nat-list.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/utilities/ovn-nbctl.8.xml | 27 +++---
 ovn/utilities/ovn-nbctl.c | 54 +--
 tests/ovn-nbctl.at| 47 +
 tests/system-ovn.at   | 30 +---
 4 files changed, 119 insertions(+), 39 deletions(-)

diff --git a/ovn/utilities/ovn-nbctl.8.xml b/ovn/utilities/ovn-nbctl.8.xml
index f95b88d..d81e99f 100644
--- a/ovn/utilities/ovn-nbctl.8.xml
+++ b/ovn/utilities/ovn-nbctl.8.xml
@@ -444,7 +444,7 @@
 NAT Commands
 
 
-  [--may-exist] lr-nat-add router 
type external_ip logical_ip
+  [--may-exist] lr-nat-add router 
type external_ip logical_ip 
[logical_port external_mac]
   
 
   Adds the specified NAT to router.
@@ -453,6 +453,13 @@
   The external_ip is an IPv4 address.
   The logical_ip is an IPv4 network (e.g 192.168.1.0/24)
   or an IPv4 address.
+  The logical_port and external_mac are only
+  accepted when router is a distributed router (rather
+  than a gateway router) and type is
+  dnat_and_snat.
+  The logical_port is the name of an existing logical
+  switch port where the logical_ip resides.
+  The external_mac is an Ethernet address.
 
 
   When type is dnat, the externally
@@ -475,8 +482,22 @@
   the IP address in external_ip.
 
 
-  It is an error if a NAT already exists,
-  unless --may-exist is specified.
+  When the logical_port and external_mac
+  are specified, the NAT rule will be programmed on the chassis
+  where the logical_port resides.  This includes
+  ARP replies for the external_ip, which return the
+  value of external_mac.  All packets transmitted
+  with source IP address equal to external_ip will
+  be sent using the external_mac.
+
+
+  It is an error if a NAT already exists with the same values
+  of router, type, external_ip,
+  and logical_ip, unless --may-exist is
+  specified.  When --may-exist,
+  logical_port, and external_mac are all
+  specified, the existing values of logical_port and
+  external_mac are overwritten.
 
   
 
diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index f0ff27a..3dac434 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -390,7 +390,7 @@ Route commands:\n\
   lr-route-list ROUTER  print routes for ROUTER\n\
 \n\
 NAT commands:\n\
-  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP\n\
+  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC]\n\
 add a NAT to ROUTER\n\
   lr-nat-del ROUTER [TYPE [IP]]\n\
 remove NATs from ROUTER\n\
@@ -2239,6 +2239,30 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 new_logical_ip = normalize_ipv4_prefix(ipv4, plen);
 }
 
+const char *logical_port;
+const char *external_mac;
+if (ctx->argc == 6) {
+ctl_fatal("lr-nat-add with logical_port "
+  "must also specify external_mac.");
+} else if (ctx->argc == 7) {
+if (strcmp(nat_type, "dnat_and_snat")) {
+ctl_fatal("logical_port and external_mac are only valid when "
+  "type is \"dnat_and_snat\".");
+}
+
+logical_port = ctx->argv[5];
+lsp_by_name_or_uuid(ctx, logical_port, true);
+
+external_mac = ctx->argv[6];
+struct eth_addr ea;
+if (!eth_addr_from_string(external_mac, )) {
+ctl_fatal("invalid mac address %s.", external_mac);
+}
+} else {
+logical_port = NULL;
+external_mac = NULL;
+}
+
 bool may_exist = shash_find(>options, "--may-exist") != NULL;
 int is_snat = !strcmp("snat", nat_type);
 for (size_t i = 0; i < lr->n_nat; i++) {
@@ -2249,6 +2273,10 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 if (!strcmp(is_snat ? external_ip : new_logical_ip,
 is_snat ? nat->external_ip : nat->logical_ip)) {
 if (may_exist) {
+nbrec_nat_verify_logical_port(nat);
+nbrec_nat_verify_external_mac(nat);
+nbrec_nat_set_logical_port(nat, logical_port);
+nbrec_nat_set_external_mac(nat, external_mac);
 free(new_logical_ip);
 return;
 }
@@ -2271,

[ovs-dev] [PATCH v11 2/5] ovn: avoid snat recirc only on gateway routers

2017-01-21 Thread Mickey Spiegel
Currently, for performance reasons on gateway routers, ct_snat
that does not specify an IP address does not immediately trigger
recirculation.  On gateway routers, ct_snat that does not specify
an IP address happens in the UNSNAT pipeline stage, which is
followed by the DNAT pipeline stage that triggers recirculation
for all packets.  This DNAT pipeline stage recirculation takes
care of the recirculation needs of UNSNAT as well as other cases
such as UNDNAT.

On distributed routers, UNDNAT is handled in the egress pipeline
stage, separately from DNAT in the ingress pipeline stages.  The
DNAT pipeline stage only triggers recirculation for some packets.
Due to this difference in design, UNSNAT needs to trigger its own
recirculation.

This patch restricts the logic that avoids recirculation for
ct_snat, so that it only applies to datapaths representing
gateway routers.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 include/ovn/actions.h  |  3 +++
 ovn/controller/lflow.c | 10 ++
 ovn/lib/actions.c  | 15 +--
 ovn/ovn-sb.xml | 23 +++
 tests/ovn.at   |  2 +-
 5 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 1d7bd69..d2510fd 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -445,6 +445,9 @@ struct ovnact_encode_params {
 /* 'true' if the flow is for a switch. */
 bool is_switch;
 
+/* 'true' if the flow is for a gateway router. */
+bool is_gateway_router;
+
 /* A map from a port name to its connection tracking zone. */
 const struct simap *ct_zones;
 
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 2d9213b..fa00db2 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -107,6 +107,15 @@ is_switch(const struct sbrec_datapath_binding *ldp)
 
 }
 
+static bool
+is_gateway_router(const struct sbrec_datapath_binding *ldp,
+  const struct hmap *local_datapaths)
+{
+struct local_datapath *ld =
+get_local_datapath(local_datapaths, ldp->tunnel_key);
+return ld ? ld->has_local_l3gateway : false;
+}
+
 /* Adds the logical flows from the Logical_Flow table to flow tables. */
 static void
 add_logical_flows(struct controller_ctx *ctx, const struct lport_index *lports,
@@ -221,6 +230,7 @@ consider_logical_flow(const struct lport_index *lports,
 .lookup_port = lookup_port_cb,
 .aux = ,
 .is_switch = is_switch(ldp),
+.is_gateway_router = is_gateway_router(ldp, local_datapaths),
 .ct_zones = ct_zones,
 .group_table = group_table,
 
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index 90a2add..fff838b 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -829,12 +829,15 @@ encode_ct_nat(const struct ovnact_ct_nat *cn,
 ct = ofpacts->header;
 if (cn->ip) {
 ct->flags |= NX_CT_F_COMMIT;
-} else if (snat) {
-/* XXX: For performance reasons, we try to prevent additional
- * recirculations.  So far, ct_snat which is used in a gateway router
- * does not need a recirculation. ct_snat(IP) does need a
- * recirculation.  Should we consider a method to let the actions
- * specify whether an action needs recirculation if there more use
+} else if (snat && ep->is_gateway_router) {
+/* For performance reasons, we try to prevent additional
+ * recirculations.  ct_snat which is used in a gateway router
+ * does not need a recirculation.  ct_snat(IP) does need a
+ * recirculation.  ct_snat in a distributed router needs
+ * recirculation regardless of whether an IP address is
+ * specified.
+ * XXX Should we consider a method to let the actions specify
+ * whether an action needs recirculation if there are more use
  * cases?. */
 ct->recirc_table = NX_CT_RECIRC_NONE;
 }
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index f806af7..b33afd3 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -1128,11 +1128,26 @@
 
   
 ct_snat sends the packet through the SNAT zone to
-unSNAT any packet that was SNATed in the opposite direction.  If
-the packet needs to be sent to the next tables, then it should be
-followed by a next; action.  The next tables will not
-see the changes in the packet caused by the connection tracker.
+unSNAT any packet that was SNATed in the opposite direction.  The
+behavior on gateway routers differs from the behavior on a
+distributed router:
   
+  
+
+  On a gateway router, if the packet needs to be sent to the next
+  tables, then it should be followed by a next;
+  action.  The next tables will not see the changes in the packet
+   

[ovs-dev] [PATCH v11 1/5] ovn: move load balancing flows after NAT flows

2017-01-21 Thread Mickey Spiegel
This will make it easy for distributed NAT to reuse some of the
existing code for NAT flows, while leaving load balancing and defrag
as functionality specific to gateway routers.  There is no intent to
change any functionality in this patch.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/northd/ovn-northd.c | 140 
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 87c80d1..219a69c 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4099,76 +4099,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 const char *lb_force_snat_ip = get_force_snat_ip(od, "lb",
  _ip);
 
-/* A set to hold all ips that need defragmentation and tracking. */
-struct sset all_ips = SSET_INITIALIZER(_ips);
-
-for (int i = 0; i < od->nbr->n_load_balancer; i++) {
-struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
-struct smap *vips = >vips;
-struct smap_node *node;
-
-SMAP_FOR_EACH (node, vips) {
-uint16_t port = 0;
-
-/* node->key contains IP:port or just IP. */
-char *ip_address = NULL;
-ip_address_and_port_from_lb_key(node->key, _address, );
-if (!ip_address) {
-continue;
-}
-
-if (!sset_contains(_ips, ip_address)) {
-sset_add(_ips, ip_address);
-}
-
-/* Higher priority rules are added for load-balancing in DNAT
- * table.  For every match (on a VIP[:port]), we add two flows
- * via add_router_lb_flow().  One flow is for specific matching
- * on ct.new with an action of "ct_lb($targets);".  The other
- * flow is for ct.est with an action of "ct_dnat;". */
-ds_clear();
-ds_put_format(, "ct_lb(%s);", node->value);
-
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s",
-  ip_address);
-free(ip_address);
-
-if (port) {
-if (lb->protocol && !strcmp(lb->protocol, "udp")) {
-ds_put_format(, " && udp && udp.dst == %d",
-  port);
-} else {
-ds_put_format(, " && tcp && tcp.dst == %d",
-  port);
-}
-add_router_lb_flow(lflows, od, , , 120,
-   lb_force_snat_ip);
-} else {
-add_router_lb_flow(lflows, od, , , 110,
-   lb_force_snat_ip);
-}
-}
-}
-
-/* If there are any load balancing rules, we should send the
- * packet to conntrack for defragmentation and tracking.  This helps
- * with two things.
- *
- * 1. With tracking, we can send only new connections to pick a
- *DNAT ip address from a group.
- * 2. If there are L4 ports in load balancing rules, we need the
- *defragmentation to match on L4 ports. */
-const char *ip_address;
-SSET_FOR_EACH(ip_address, _ips) {
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s", ip_address);
-ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG,
-  100, ds_cstr(), "ct_next;");
-}
-
-sset_destroy(_ips);
-
 for (int i = 0; i < od->nbr->n_nat; i++) {
 const struct nbrec_nat *nat;
 
@@ -4323,6 +4253,76 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 * routing in the openflow pipeline. */
 ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
   "ip", "flags.loopback = 1; ct_dnat;");
+
+/* A set to hold all ips that need defragmentation and tracking. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+
+for (int i = 0; i < od->nbr->n_load_balancer; i++) {
+struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
+struct smap *vips = >vips;
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, vips) {
+uint16_t port = 0;
+
+/* node->key contains IP:port or just IP. */
+char *ip_address = NULL;
+ip_address_and_port_from_lb_key(node->key, _address, );
+ 

Re: [ovs-dev] [PATCH v3 7/8] actions: Make "next" action able to jump from egress to ingress pipeline.

2017-01-21 Thread Mickey Spiegel
On Sat, Jan 21, 2017 at 12:32 PM, Ben Pfaff <b...@ovn.org> wrote:

> On Sat, Jan 21, 2017 at 12:18:59PM -0800, Mickey Spiegel wrote:
> > On Sat, Jan 21, 2017 at 11:13 AM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > > This feature is useful for centralized gateways.
> > >
> > > Signed-off-by: Ben Pfaff <b...@ovn.org>
> > > Acked-by: Mickey Spiegel <mickeys@gmail.com>
> > >
> >
> > The ovn-trace.c changes look good to me. No more comments.
> >
> > Thank you very much for this patch set!
>
> Thanks a lot for all the reviews.  I applied these to master.
>
> Will you rebase the remainder of your patch series and adapt it to use
> this new mechanism?  Then we can get it into master, and I'll backport
> it and all of the dependencies to branch-2.7 (which should be easy).
>

Yes, starting now. Thanks again.

Mickey
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 7/8] actions: Make "next" action able to jump from egress to ingress pipeline.

2017-01-21 Thread Mickey Spiegel
On Sat, Jan 21, 2017 at 11:13 AM, Ben Pfaff <b...@ovn.org> wrote:

> This feature is useful for centralized gateways.
>
> Signed-off-by: Ben Pfaff <b...@ovn.org>
> Acked-by: Mickey Spiegel <mickeys@gmail.com>
>

The ovn-trace.c changes look good to me. No more comments.

Thank you very much for this patch set!

Mickey

---
>  include/ovn/actions.h | 63 --
>  ovn/controller/lflow.c|  7 +++--
>  ovn/lib/actions.c | 70 ++
> ++---
>  ovn/ovn-sb.xml| 12 ++--
>  ovn/utilities/ovn-trace.c | 35 ++--
>  tests/ovn.at  | 22 ++-
>  tests/test-ovn.c  |  6 ++--
>  7 files changed, 167 insertions(+), 48 deletions(-)
>
> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
> index f392d03..6691116 100644
> --- a/include/ovn/actions.h
> +++ b/include/ovn/actions.h
> @@ -151,13 +151,15 @@ struct ovnact_next {
>  struct ovnact ovnact;
>
>  /* Arguments. */
> -uint8_t ltable; /* Logical table ID of next table. */
> +uint8_t ltable;/* Logical table ID of next table. */
> +enum ovnact_pipeline pipeline; /* Pipeline of next table. */
>
>  /* Information about the flow that the action is in.  This does not
> affect
>   * behavior, since the implementation of "next" doesn't depend on the
>   * source table or pipeline.  It does affect how ovnacts_format()
> prints
>   * the action. */
> -uint8_t src_ltable;/* Logical table ID of source table. */
> +uint8_t src_ltable;/* Logical table ID of source
> table. */
> +enum ovnact_pipeline src_pipeline; /* Pipeline of source table. */
>  };
>
>  /* OVNACT_LOAD. */
> @@ -402,22 +404,26 @@ struct ovnact_parse_params {
>  /* hmap of 'struct dhcp_opts_map'  to support 'put_dhcpv6_opts'
> action */
>  const struct hmap *dhcpv6_opts;
>
> -/* OVN maps each logical flow table (ltable), one-to-one, onto a
> physical
> - * OpenFlow flow table (ptable).  A number of parameters describe this
> - * mapping and data related to flow tables:
> +/* Each OVN flow exists in a logical table within a logical pipeline.
> + * These parameters express this context for a set of OVN actions
> being
> + * parsed:
>   *
> - * - 'first_ptable' and 'n_tables' define the range of OpenFlow
> tables
> - *to which the logical "next" action should be able to jump.
> - *Logical table 0 maps to OpenFlow table 'first_ptable',
> logical
> - *table 1 to 'first_ptable + 1', and so on.  If 'n_tables' is
> 0
> - *then "next" is disallowed entirely.
> + * - 'n_tables' is the number of tables in the logical ingress and
> + *egress pipelines, that is, "next" may specify a table less
> than
> + *or equal to 'n_tables'.  If 'n_tables' is 0 then "next" is
> + *disallowed entirely.
>   *
> - * - 'cur_ltable' is an offset from 'first_ptable' (e.g. 0 <=
> - *   cur_ltable < n_tables) of the logical flow that contains the
> - *   actions.  If cur_ltable + 1 < n_tables, then this defines the
> - *   default table that "next" will jump to. */
> -uint8_t n_tables;   /* Number of flow tables. */
> -uint8_t cur_ltable; /* 0 <= cur_ltable < n_tables. */
> + * - 'cur_ltable' is the logical table of the current flow, within
> + *   'pipeline'.  If cur_ltable + 1 < n_tables, then this defines
> the
> + *   default table that "next" will jump to.
> + *
> + * - 'pipeline' is the logical pipeline.  It is the default
> pipeline to
> + *   which 'next' will jump.  If 'pipeline' is OVNACT_P_EGRESS,
> then
> + *   'next' will also be able to jump into the ingress pipeline,
> but
> + *   the reverse is not true. */
> +enum ovnact_pipeline pipeline; /* Logical pipeline. */
> +uint8_t n_tables;  /* Number of logical flow tables. */
> +uint8_t cur_ltable;/* 0 <= cur_ltable < n_tables. */
>  };
>
>  bool ovnacts_parse(struct lexer *, const struct ovnact_parse_params *,
> @@ -448,20 +454,23 @@ struct ovnact_encode_params {
>   * OpenFlow flow table (ptable).  A number of parameters describe this
>   * mapping and data related to flow tables:
>   *
> - * - 'first_ptable' and 'n_tables' define the range of OpenFlow
> tables
> - *to which the logical "next" a

Re: [ovs-dev] [PATCH v3 1/8] ovn-trace: Fix selection of table that "next" jumps to.

2017-01-21 Thread Mickey Spiegel
On Sat, Jan 21, 2017 at 11:13 AM, Ben Pfaff <b...@ovn.org> wrote:

> The common case is that "next" advances to the next table, but it can
> jump to any table.
>
> Reported-by: Mickey Spiegel <mickeys@gmail.com>
> Signed-off-by: Ben Pfaff <b...@ovn.org>
>

Acked-by: Mickey Spiegel <mickeys@gmail.com>


> ---
>  ovn/utilities/ovn-trace.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/ovn/utilities/ovn-trace.c b/ovn/utilities/ovn-trace.c
> index 9487b1f..c15ea0b 100644
> --- a/ovn/utilities/ovn-trace.c
> +++ b/ovn/utilities/ovn-trace.c
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (c) 2016 Nicira, Inc.
> + * Copyright (c) 2016, 2017 Nicira, Inc.
>   *
>   * Licensed under the Apache License, Version 2.0 (the "License");
>   * you may not use this file except in compliance with the License.
> @@ -1376,7 +1376,7 @@ trace_actions(const struct ovnact *ovnacts, size_t
> ovnacts_len,
>  break;
>
>  case OVNACT_NEXT:
> -trace__(dp, uflow, table_id + 1, pipeline, super);
> +trace__(dp, uflow, ovnact_get_NEXT(a)->ltable, pipeline,
> super);
>  break;
>
>  case OVNACT_LOAD:
> --
> 2.10.2
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 00/10] Add actions for egress loopback

2017-01-21 Thread Mickey Spiegel
On Sat, Jan 21, 2017 at 11:23 AM, Ben Pfaff <b...@ovn.org> wrote:

> On Fri, Jan 20, 2017 at 04:00:34PM -0800, Mickey Spiegel wrote:
> > On Fri, Jan 20, 2017 at 3:33 PM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > > On Fri, Jan 20, 2017 at 03:17:19PM -0800, Mickey Spiegel wrote:
> > > > On Fri, Jan 20, 2017 at 2:43 PM, Ben Pfaff <b...@ovn.org> wrote:
> > > >
> > > > > On Fri, Jan 20, 2017 at 12:29:49PM -0800, Mickey Spiegel wrote:
> > > > > > I would also need to add in_port to symtab in
> > > ovn/lib/logical-fields.c so
> > > > > > that I can clear it.
> > > > >
> > > > > Can you explain why in_port needs to be cleared?
> > > > >
> > > >
> > > > My thought was that the packet should look like it arrived on the
> > > > distributed gateway port.
> > > > One question is whether there can be any bad implications if
> > > > in_port and logical inport do not match?
> > > > It is also possible that the packet will return back to the original
> > > > port after SNAT and DNAT on the distributed gateway port, so
> > > > if I do not clear in_port then I would have to set flags.loopback.
> > >
> > > Once a packet has been translated from physical to logical, the only
> > > real use of in_port is to discard packets that would loop back on the
> > > physical input port.  If we want to disable that, one way to do it is
> to
> > > set flags.loopback to 1, although that also disables discarding packets
> > > that would loop back to the logical input port.
> > >
> >
> > I was thinking that egress loopback should be similar to traversing a
> > logical patch port. When traversing a logical patch port, in_port is
> > cleared.
> >
> > If you have a problem with adding in_port to symtab, then I can just
> > set flags.loopback to 1. I would rather have the loopback check, but
> > at the moment add_route in ovn-northd.c sets flags.loopback to 1 in
> > all cases, so there is no difference on a logical router. Otherwise, I
> > will probably throw in a small patch to add in_port to symtab.
>
> I don't want to add in_port to symtab because it breaks layering.  That
> is, in_port is a physical port, not a logical port, and the logical
> flows don't have a way to understand its value.  I guess that they can
> reasonably set it to "0", but that's a pretty limited use.
>
> I wouldn't want to name it in_port in any case because that would just
> cause confusion.
>
> If setting flags.loopback to 1 is a reasonable solution, let's use that.
>

I will go with flags.loopback = 1.  As I mentioned earlier, at the moment
flags.loopback is always set to 1 on logical router datapaths, in the
IP routing pipeline stage.   I don't think that is the right thing to do,
but I
am not sure what will break if that is removed.  In any case, IP routing
is always protected by ttl, so even if there is a loop the packet will
eventually be dropped.

>
> Stepping back a bit, I am not certain that it ever makes sense to
> discard packets in OVN because the physical ingress and egress ports
> match.  It might be reasonable to always set the physical ingress port
> to 0 as part of physical to logical translation at the beginning of the
> pipeline.
>

The check does seem to be somewhat redundant with the logical port
loopback check in table 34.  The only way to pass the check in table 34
is if flags.loopback is set, in which case in_port will be cleared in table
64.

Mickey
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 6/7] actions: Make "next" action able to jump from egress to ingress pipeline.

2017-01-20 Thread Mickey Spiegel
On Fri, Jan 20, 2017 at 2:48 PM, Ben Pfaff <b...@ovn.org> wrote:

> This feature is useful for centralized gateways.
>
> Signed-off-by: Ben Pfaff <b...@ovn.org>
>

Acked-by: Mickey Spiegel <mickeys@gmail.com>

I think there is some missing functionality in ovn-trace.c.
It looks to me like ovn-trace.c assumes that "next" actions
always go to the next table, i.e. it ignores "(3)" or
"(pipeline=ingress, table=3)".  For my particular usage,
egress loopback will only happen after NAT, so the trace
will never reach the "next(pipeline=ingress, table=0)"
action.

Mickey


> ---
>  include/ovn/actions.h | 63 --
>  ovn/controller/lflow.c|  7 +++--
>  ovn/lib/actions.c | 70 ++
> ++---
>  ovn/ovn-sb.xml| 12 ++--
>  ovn/utilities/ovn-trace.c |  3 ++
>  tests/ovn.at  | 22 ++-
>  tests/test-ovn.c  |  6 ++--
>  7 files changed, 138 insertions(+), 45 deletions(-)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 4/7] actions: Omit table number when possible for formatting "next" action.

2017-01-20 Thread Mickey Spiegel
On Fri, Jan 20, 2017 at 2:48 PM, Ben Pfaff <b...@ovn.org> wrote:

> Until now, formatting the "next" action has always required including
> the table number, because the action struct didn't include enough context
> so that the formatter could decide whether the table number was the next
> table or some other table.  This is more or less OK, but an upcoming commit
> will add a "pipeline" field to the "next" action, which means that the same
> policy there would require that the pipeline always be printed.  That's a
> little obnoxious because 99+% of the time, the pipeline to be printed is
> the same pipeline that the flow is in and printing it would be distracting.
> So it's better to store some context to help with formatting.  This commit
> begins adopting that policy for the existing table number field.
>
> Signed-off-by: Ben Pfaff <b...@ovn.org>
>

Acked-by: Mickey Spiegel <mickeys@gmail.com>

One comment inline.


> ---
>  include/ovn/actions.h |  8 
>  ovn/lib/actions.c | 43 +--
>  tests/ovn.at  |  8 
>  3 files changed, 33 insertions(+), 26 deletions(-)
>
> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
> index 92c..38764fe 100644
> --- a/include/ovn/actions.h
> +++ b/include/ovn/actions.h
> @@ -143,7 +143,15 @@ struct ovnact_null {
>  /* OVNACT_NEXT. */
>  struct ovnact_next {
>  struct ovnact ovnact;
> +
> +/* Arguments. */
>  uint8_t ltable; /* Logical table ID of next table. */
> +
> +/* Information about the flow that the action is in.  This does not
> affect
> + * behavior, since the implementation of "next" doesn't depend on the
> + * source table or pipeline.  It does affect how ovnacts_format()
> prints
> + * the action. */
> +uint8_t src_ltable;/* Logical table ID of source table. */
>  };
>
>  /* OVNACT_LOAD. */
> diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
> index 2162dad..bcc690f 100644
> --- a/ovn/lib/actions.c
> +++ b/ovn/lib/actions.c
> @@ -259,36 +259,35 @@ parse_NEXT(struct action_context *ctx)
>  {
>  if (!ctx->pp->n_tables) {
>  lexer_error(ctx->lexer, "\"next\" action not allowed here.");
> -} else if (lexer_match(ctx->lexer, LEX_T_LPAREN)) {
> -int ltable;
> -
> -if (!lexer_force_int(ctx->lexer, ) ||
> -!lexer_force_match(ctx->lexer, LEX_T_RPAREN)) {
> -return;
> -}
> +return;
> +}
>
> -if (ltable >= ctx->pp->n_tables) {
> -lexer_error(ctx->lexer,
> -"\"next\" argument must be in range 0 to %d.",
> - ctx->pp->n_tables - 1);
> -return;
> -}
> +int table = ctx->pp->cur_ltable + 1;
> +if (lexer_match(ctx->lexer, LEX_T_LPAREN)
> +&& (!lexer_force_int(ctx->lexer, ) ||
> +!lexer_force_match(ctx->lexer, LEX_T_RPAREN))) {
> +return;
> +}
>
> -ovnact_put_NEXT(ctx->ovnacts)->ltable = ltable;
> -} else {
> -if (ctx->pp->cur_ltable < ctx->pp->n_tables) {
> -ovnact_put_NEXT(ctx->ovnacts)->ltable = ctx->pp->cur_ltable
> + 1;
> -} else {
> -lexer_error(ctx->lexer,
> -"\"next\" action not allowed in last table.");
> -}
> +if (table >= ctx->pp->n_tables) {
> +lexer_error(ctx->lexer,
> +"\"next\" action cannot advance beyond table %d.",
> +ctx->pp->n_tables - 1);
>

Should there be a "return;" here?

Mickey


>  }
> +
> +struct ovnact_next *next = ovnact_put_NEXT(ctx->ovnacts);
> +next->ltable = table;
> +next->src_ltable = ctx->pp->cur_ltable;
>  }
>
>  static void
>  format_NEXT(const struct ovnact_next *next, struct ds *s)
>  {
> -ds_put_format(s, "next(%d);", next->ltable);
> +if (next->ltable != next->src_ltable + 1) {
> +ds_put_format(s, "next(%d);", next->ltable);
> +} else {
> +ds_put_cstr(s, "next;");
> +}
>  }
>
>  static void
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 67d73c5..f71a4af 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -643,9 +643,9 @@ output;
>
>  # next
>  next;
> -formats as next(11);
>  encodes as resubmit(,27)
>  next(11);
> +formats as next;
>  e

Re: [ovs-dev] [PATCH 00/10] Add actions for egress loopback

2017-01-20 Thread Mickey Spiegel
On Fri, Jan 20, 2017 at 3:33 PM, Ben Pfaff <b...@ovn.org> wrote:

> On Fri, Jan 20, 2017 at 03:17:19PM -0800, Mickey Spiegel wrote:
> > On Fri, Jan 20, 2017 at 2:43 PM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > > On Fri, Jan 20, 2017 at 12:29:49PM -0800, Mickey Spiegel wrote:
> > > > On Fri, Jan 20, 2017 at 9:16 AM, Ben Pfaff <b...@ovn.org> wrote:
> > > >
> > > > > I believe that, with these patches, egress loopback as proposed by
> > > Mickey's
> > > > > patches can be implemented with:
> > > > > clone { inport = outport; outport = ""; flags.loopback = 0;
> > > > > reg0 = 0; reg1 = 0; ... regN = 0;
> > > > > next(pipeline=ingress, table=0); }
> > > > >
> > > >
> > > > My main concern is maintainability as new flags or registers are
> added.
> > > > Having one line of code buried deep inside ovn/northd/ovn-northd.c
> that
> > > > needs to be updated whenever a flag or register is added worries me.
> > > > Does it make sense to add "clear_regs" and "clear_flags" actions in
> > > > order to address that concern?
> > >
> > > > I would also need to add in_port to symtab in
> ovn/lib/logical-fields.c so
> > > > that I can clear it.
> > >
> > > Can you explain why in_port needs to be cleared?
> > >
> >
> > My thought was that the packet should look like it arrived on the
> > distributed gateway port.
> > One question is whether there can be any bad implications if
> > in_port and logical inport do not match?
> > It is also possible that the packet will return back to the original
> > port after SNAT and DNAT on the distributed gateway port, so
> > if I do not clear in_port then I would have to set flags.loopback.
>
> Once a packet has been translated from physical to logical, the only
> real use of in_port is to discard packets that would loop back on the
> physical input port.  If we want to disable that, one way to do it is to
> set flags.loopback to 1, although that also disables discarding packets
> that would loop back to the logical input port.
>

I was thinking that egress loopback should be similar to traversing a
logical patch port. When traversing a logical patch port, in_port is
cleared.

If you have a problem with adding in_port to symtab, then I can just
set flags.loopback to 1. I would rather have the loopback check, but
at the moment add_route in ovn-northd.c sets flags.loopback to 1 in
all cases, so there is no difference on a logical router. Otherwise, I
will probably throw in a small patch to add in_port to symtab.

Mickey
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 00/10] Add actions for egress loopback

2017-01-20 Thread Mickey Spiegel
On Fri, Jan 20, 2017 at 9:16 AM, Ben Pfaff <b...@ovn.org> wrote:

> I believe that, with these patches, egress loopback as proposed by Mickey's
> patches can be implemented with:
> clone { inport = outport; outport = ""; flags.loopback = 0;
> reg0 = 0; reg1 = 0; ... regN = 0;
> next(pipeline=ingress, table=0); }
>

My main concern is maintainability as new flags or registers are added.
Having one line of code buried deep inside ovn/northd/ovn-northd.c that
needs to be updated whenever a flag or register is added worries me.
Does it make sense to add "clear_regs" and "clear_flags" actions in
order to address that concern?

I would also need to add in_port to symtab in ovn/lib/logical-fields.c so
that I can clear it.


> Ben Pfaff (10):
>   actions: Fix "arp" and "nd_na" followed by another action.
>   lex: Make lexer_force_match() work for LEX_T_END.
>   actions: Make "arp { drop; };" acceptable.
>   actions: Make "free" functions per-struct, not per-action.
>   actions: Add new OVN action "clone".
>   actions: Separate action structures for "next" and "ct_next".
>   actions: Omit table number when possible for formatting "next" action.
>   actions: Introduce enum ovnact_pipeline.
>   actions: Make "next" action able to jump from egress to ingress
> pipeline.
>   actions: Add new "ct_clear" action.
>

For patches 1 through 4, 6, and 8:
Acked-by: Mickey Spiegel <mickeys@gmail.com>

I commented separately on patches 5 and 7.

I could not apply patches 9 and 10 since I manually fixed patch 7
and the indexes did not match.

Mickey


>  include/ovn/actions.h |  91 ++-
>  include/ovn/lex.h |   4 +-
>  ovn/controller/lflow.c|   7 +-
>  ovn/lib/actions.c | 284 +++---
> 
>  ovn/lib/lex.c |  13 ++-
>  ovn/ovn-sb.xml|  26 -
>  ovn/utilities/ovn-trace.c |  65 +++
>  tests/ovn.at  |  45 +++-
>  tests/test-ovn.c  |   6 +-
>  9 files changed, 352 insertions(+), 189 deletions(-)
>
> --
> 2.10.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 07/10] actions: Omit table number when possible for formatting "next" action.

2017-01-20 Thread Mickey Spiegel
On Fri, Jan 20, 2017 at 9:16 AM, Ben Pfaff  wrote:

> Until now, formatting the "next" action has always required including
> the table number, because the action struct didn't include enough context
> so that the formatter could decide whether the table number was the next
> table or some other table.  This is more or less OK, but an upcoming commit
> will add a "pipeline" field to the "next" action, which means that the same
> policy there would require that the pipeline always be printed.  That's a
> little obnoxious because 99+% of the time, the pipeline to be printed is
> the same pipeline that the flow is in and printing it would be distracting.
> So it's better to store some context to help with formatting.  This commit
> begins adopting that policy for the existing table number field.
>
> Signed-off-by: Ben Pfaff 
> ---
>  include/ovn/actions.h |  8 
>  ovn/lib/actions.c | 29 +++--
>  tests/ovn.at  |  8 
>  3 files changed, 31 insertions(+), 14 deletions(-)
>
> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
> index 92c..38764fe 100644
> --- a/include/ovn/actions.h
> +++ b/include/ovn/actions.h
> @@ -143,7 +143,15 @@ struct ovnact_null {
>  /* OVNACT_NEXT. */
>  struct ovnact_next {
>  struct ovnact ovnact;
> +
> +/* Arguments. */
>  uint8_t ltable; /* Logical table ID of next table. */
> +
> +/* Information about the flow that the action is in.  This does not
> affect
> + * behavior, since the implementation of "next" doesn't depend on the
> + * source table or pipeline.  It does affect how ovnacts_format()
> prints
> + * the action. */
> +uint8_t src_ltable;/* Logical table ID of source table. */
>  };
>
>  /* OVNACT_LOAD. */
> diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
> index 2162dad..5548234 100644
> --- a/ovn/lib/actions.c
> +++ b/ovn/lib/actions.c
> @@ -259,7 +259,11 @@ parse_NEXT(struct action_context *ctx)
>  {
>  if (!ctx->pp->n_tables) {
>  lexer_error(ctx->lexer, "\"next\" action not allowed here.");
> -} else if (lexer_match(ctx->lexer, LEX_T_LPAREN)) {
> +return;
> +}
> +
> +int table = ctx->pp->cur_ltable + 1;
> +if (lexer_match(ctx->lexer, LEX_T_LPAREN)) {
>  int ltable;
>
>  if (!lexer_force_int(ctx->lexer, ) ||
> @@ -273,22 +277,27 @@ parse_NEXT(struct action_context *ctx)
>   ctx->pp->n_tables - 1);
>  return;
>  }
>

You never set "table = ltable;", which belongs here. As a result,
"next(11);" always encodes as "next;", i.e. the number is ignored.


> +}
>
> -ovnact_put_NEXT(ctx->ovnacts)->ltable = ltable;
> -} else {
> -if (ctx->pp->cur_ltable < ctx->pp->n_tables) {
> -ovnact_put_NEXT(ctx->ovnacts)->ltable = ctx->pp->cur_ltable
> + 1;
> -} else {
> -lexer_error(ctx->lexer,
> -"\"next\" action not allowed in last table.");
> -}
> +if (table >= ctx->pp->n_tables) {
> +lexer_error(ctx->lexer,
> +"\"next\" action cannot advance beyond table %d.",
> +ctx->pp->n_tables - 1);
>  }
> +
> +struct ovnact_next *next = ovnact_put_NEXT(ctx->ovnacts);
> +next->ltable = table;
> +next->src_ltable = ctx->pp->cur_ltable;
>  }
>
>  static void
>  format_NEXT(const struct ovnact_next *next, struct ds *s)
>  {
> -ds_put_format(s, "next(%d);", next->ltable);
> +if (next->ltable != next->src_ltable + 1) {
> +ds_put_format(s, "next(%d);", next->ltable);
> +} else {
> +ds_put_cstr(s, "next;");
> +}
>  }
>
>  static void
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 67d73c5..f71a4af 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -643,9 +643,9 @@ output;
>
>  # next
>  next;
> -formats as next(11);
>  encodes as resubmit(,27)
>  next(11);
> +formats as next;
>  encodes as resubmit(,27)
>  next(0);
>  encodes as resubmit(,16)
> @@ -657,7 +657,7 @@ next();
>  next(10;
>  Syntax error at `;' expecting `)'.
>  next(16);
> -"next" argument must be in range 0 to 15.
> +"next" action cannot advance beyond table 15.
>

Using this diff, there are two different responses
depending on whether the command is "next;" or
"next(11);".
The former gives the "cannot advance beyond" message.
The latter gives the "in range 0 to 15" message.
Unless the error messages are changed, this change
should be reverted.

Mickey


>  # Loading a constant value.
>  tcp.dst=80;
> @@ -678,7 +678,7 @@ ip.ttl=4;
>  encodes as set_field:4->nw_ttl
>  has prereqs eth.type == 0x800 || eth.type == 0x86dd
>  outport="eth0"; next; outport="LOCAL"; next;
> -formats as outport = "eth0"; next(11); outport = "LOCAL"; next(11);
> +formats as outport = "eth0"; next; outport = "LOCAL"; next;
>  encodes as set_field:0x5->reg15,resubmit(
> 

Re: [ovs-dev] [PATCH 05/10] actions: Add new OVN action "clone".

2017-01-20 Thread Mickey Spiegel
On Fri, Jan 20, 2017 at 9:16 AM, Ben Pfaff <b...@ovn.org> wrote:

> Signed-off-by: Ben Pfaff <b...@ovn.org>
>

Acked-by: Mickey Spiegel <mickeys@gmail.com>

One comment below, found a copy/paste error in ovn-sb.xml.

---
>  include/ovn/actions.h |  5 ++--
>  ovn/lib/actions.c | 61 ++
> ++---
>  ovn/ovn-sb.xml| 10 
>  ovn/utilities/ovn-trace.c | 21 +++-
>  tests/ovn.at  |  5 
>  5 files changed, 85 insertions(+), 17 deletions(-)
>




>
> diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
> index 5704f41..bca82e0 100644
> --- a/ovn/ovn-sb.xml
> +++ b/ovn/ovn-sb.xml
> @@ -1137,6 +1137,16 @@
>
>  
>
> +
> +clone { action; ...
> };
> +
> +  Makes a copy of the packet being processed and executes each
> +  action on the copy.  Actions following the
> +  arp action, if any, apply to the original, unmodified
>

s/arp/clone

Mickey

+  packet.  This can be used as a way to ``save and restore'' the
> packet
> +  around a set of actions that may modify it and should not
> persist.
> +
> +
>  arp { action; ... };
>  
>
>


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 8/8] ovn: ovn-nbctl commands for distributed NAT

2017-01-17 Thread Mickey Spiegel
This patch adds the new optional arguments "logical_port" and
"external_mac" to lr-nat-add, and displays that information in
lr-nat-list.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/utilities/ovn-nbctl.8.xml | 27 +++---
 ovn/utilities/ovn-nbctl.c | 54 +--
 tests/ovn-nbctl.at| 47 +
 tests/system-ovn.at   | 30 +---
 4 files changed, 119 insertions(+), 39 deletions(-)

diff --git a/ovn/utilities/ovn-nbctl.8.xml b/ovn/utilities/ovn-nbctl.8.xml
index 4911c6a..c408484 100644
--- a/ovn/utilities/ovn-nbctl.8.xml
+++ b/ovn/utilities/ovn-nbctl.8.xml
@@ -444,7 +444,7 @@
 NAT Commands
 
 
-  [--may-exist] lr-nat-add router 
type external_ip logical_ip
+  [--may-exist] lr-nat-add router 
type external_ip logical_ip 
[logical_port external_mac]
   
 
   Adds the specified NAT to router.
@@ -453,6 +453,13 @@
   The external_ip is an IPv4 address.
   The logical_ip is an IPv4 network (e.g 192.168.1.0/24)
   or an IPv4 address.
+  The logical_port and external_mac are only
+  accepted when router is a distributed router (rather
+  than a gateway router) and type is
+  dnat_and_snat.
+  The logical_port is the name of an existing logical
+  switch port where the logical_ip resides.
+  The external_mac is an Ethernet address.
 
 
   When type is dnat, the externally
@@ -475,8 +482,22 @@
   the IP address in external_ip.
 
 
-  It is an error if a NAT already exists,
-  unless --may-exist is specified.
+  When the logical_port and external_mac
+  are specified, the NAT rule will be programmed on the chassis
+  where the logical_port resides.  This includes
+  ARP replies for the external_ip, which return the
+  value of external_mac.  All packets transmitted
+  with source IP address equal to external_ip will
+  be sent using the external_mac.
+
+
+  It is an error if a NAT already exists with the same values
+  of router, type, external_ip,
+  and logical_ip, unless --may-exist is
+  specified.  When --may-exist,
+  logical_port, and external_mac are all
+  specified, the existing values of logical_port and
+  external_mac are overwritten.
 
   
 
diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index 4397daf..661f7de 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -384,7 +384,7 @@ Route commands:\n\
   lr-route-list ROUTER  print routes for ROUTER\n\
 \n\
 NAT commands:\n\
-  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP\n\
+  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC]\n\
 add a NAT to ROUTER\n\
   lr-nat-del ROUTER [TYPE [IP]]\n\
 remove NATs from ROUTER\n\
@@ -2233,6 +2233,30 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 new_logical_ip = normalize_ipv4_prefix(ipv4, plen);
 }
 
+const char *logical_port;
+const char *external_mac;
+if (ctx->argc == 6) {
+ctl_fatal("lr-nat-add with logical_port "
+  "must also specify external_mac.");
+} else if (ctx->argc == 7) {
+if (strcmp(nat_type, "dnat_and_snat")) {
+ctl_fatal("logical_port and external_mac are only valid when "
+  "type is \"dnat_and_snat\".");
+}
+
+logical_port = ctx->argv[5];
+lsp_by_name_or_uuid(ctx, logical_port, true);
+
+external_mac = ctx->argv[6];
+struct eth_addr ea;
+if (!eth_addr_from_string(external_mac, )) {
+ctl_fatal("invalid mac address %s.", external_mac);
+}
+} else {
+logical_port = NULL;
+external_mac = NULL;
+}
+
 bool may_exist = shash_find(>options, "--may-exist") != NULL;
 int is_snat = !strcmp("snat", nat_type);
 for (size_t i = 0; i < lr->n_nat; i++) {
@@ -2243,6 +2267,10 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 if (!strcmp(is_snat ? external_ip : new_logical_ip,
 is_snat ? nat->external_ip : nat->logical_ip)) {
 if (may_exist) {
+nbrec_nat_verify_logical_port(nat);
+nbrec_nat_verify_external_mac(nat);
+nbrec_nat_set_logical_port(nat, logical_port);
+nbrec_nat_set_external_mac(nat, external_mac);
 free(new_logical_ip);
 return;
 }
@@ -2265,

[ovs-dev] [PATCH v10 6/8] ovn: avoid snat recirc only on gateway routers

2017-01-17 Thread Mickey Spiegel
Currently, for performance reasons on gateway routers, ct_snat
that does not specify an IP address does not immediately trigger
recirculation.  On gateway routers, ct_snat that does not specify
an IP address happens in the UNSNAT pipeline stage, which is
followed by the DNAT pipeline stage that triggers recirculation
for all packets.  This DNAT pipeline stage recirculation takes
care of the recirculation needs of UNSNAT as well as other cases
such as UNDNAT.

On distributed routers, UNDNAT is handled in the egress pipeline
stage, separately from DNAT in the ingress pipeline stages.  The
DNAT pipeline stage only triggers recirculation for some packets.
Due to this difference in design, UNSNAT needs to trigger its own
recirculation.

This patch restricts the logic that avoids recirculation for
ct_snat, so that it only applies to datapaths representing
gateway routers.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 include/ovn/actions.h  |  3 +++
 ovn/controller/lflow.c | 10 ++
 ovn/lib/actions.c  | 15 +--
 ovn/ovn-sb.xml | 23 +++
 tests/ovn.at   |  2 +-
 5 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 5f9d203..810b901 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -418,6 +418,9 @@ struct ovnact_encode_params {
 /* 'true' if the flow is for a switch. */
 bool is_switch;
 
+/* 'true' if the flow is for a gateway router. */
+bool is_gateway_router;
+
 /* A map from a port name to its connection tracking zone. */
 const struct simap *ct_zones;
 
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index c41368f..b011109 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -107,6 +107,15 @@ is_switch(const struct sbrec_datapath_binding *ldp)
 
 }
 
+static bool
+is_gateway_router(const struct sbrec_datapath_binding *ldp,
+  const struct hmap *local_datapaths)
+{
+struct local_datapath *ld =
+get_local_datapath(local_datapaths, ldp->tunnel_key);
+return ld ? ld->has_local_l3gateway : false;
+}
+
 /* Adds the logical flows from the Logical_Flow table to flow tables. */
 static void
 add_logical_flows(struct controller_ctx *ctx, const struct lport_index *lports,
@@ -220,6 +229,7 @@ consider_logical_flow(const struct lport_index *lports,
 .lookup_port = lookup_port_cb,
 .aux = ,
 .is_switch = is_switch(ldp),
+.is_gateway_router = is_gateway_router(ldp, local_datapaths),
 .ct_zones = ct_zones,
 .group_table = group_table,
 
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index dda675b..5c57ab7 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -803,12 +803,15 @@ encode_ct_nat(const struct ovnact_ct_nat *cn,
 ct = ofpacts->header;
 if (cn->ip) {
 ct->flags |= NX_CT_F_COMMIT;
-} else if (snat) {
-/* XXX: For performance reasons, we try to prevent additional
- * recirculations.  So far, ct_snat which is used in a gateway router
- * does not need a recirculation. ct_snat(IP) does need a
- * recirculation.  Should we consider a method to let the actions
- * specify whether an action needs recirculation if there more use
+} else if (snat && ep->is_gateway_router) {
+/* For performance reasons, we try to prevent additional
+ * recirculations.  ct_snat which is used in a gateway router
+ * does not need a recirculation.  ct_snat(IP) does need a
+ * recirculation.  ct_snat in a distributed router needs
+ * recirculation regardless of whether an IP address is
+ * specified.
+ * XXX Should we consider a method to let the actions specify
+ * whether an action needs recirculation if there are more use
  * cases?. */
 ct->recirc_table = NX_CT_RECIRC_NONE;
 }
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index a7c29c3..8fe0e2b 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -1122,11 +1122,26 @@
 
   
 ct_snat sends the packet through the SNAT zone to
-unSNAT any packet that was SNATed in the opposite direction.  If
-the packet needs to be sent to the next tables, then it should be
-followed by a next; action.  The next tables will not
-see the changes in the packet caused by the connection tracker.
+unSNAT any packet that was SNATed in the opposite direction.  The
+behavior on gateway routers differs from the behavior on a
+distributed router:
   
+  
+
+  On a gateway router, if the packet needs to be sent to the next
+  tables, then it should be followed by a next;
+  action.  The next tables will not see the changes in the packet
+   

[ovs-dev] [PATCH v10 5/8] ovn: move load balancing flows after NAT flows

2017-01-17 Thread Mickey Spiegel
This will make it easy for distributed NAT to reuse some of the
existing code for NAT flows, while leaving load balancing and defrag
as functionality specific to gateway routers.  There is no intent to
change any functionality in this patch.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Gurucharan Shetty <g...@ovn.org>
---
 ovn/northd/ovn-northd.c | 140 
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 87c80d1..219a69c 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4099,76 +4099,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 const char *lb_force_snat_ip = get_force_snat_ip(od, "lb",
  _ip);
 
-/* A set to hold all ips that need defragmentation and tracking. */
-struct sset all_ips = SSET_INITIALIZER(_ips);
-
-for (int i = 0; i < od->nbr->n_load_balancer; i++) {
-struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
-struct smap *vips = >vips;
-struct smap_node *node;
-
-SMAP_FOR_EACH (node, vips) {
-uint16_t port = 0;
-
-/* node->key contains IP:port or just IP. */
-char *ip_address = NULL;
-ip_address_and_port_from_lb_key(node->key, _address, );
-if (!ip_address) {
-continue;
-}
-
-if (!sset_contains(_ips, ip_address)) {
-sset_add(_ips, ip_address);
-}
-
-/* Higher priority rules are added for load-balancing in DNAT
- * table.  For every match (on a VIP[:port]), we add two flows
- * via add_router_lb_flow().  One flow is for specific matching
- * on ct.new with an action of "ct_lb($targets);".  The other
- * flow is for ct.est with an action of "ct_dnat;". */
-ds_clear();
-ds_put_format(, "ct_lb(%s);", node->value);
-
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s",
-  ip_address);
-free(ip_address);
-
-if (port) {
-if (lb->protocol && !strcmp(lb->protocol, "udp")) {
-ds_put_format(, " && udp && udp.dst == %d",
-  port);
-} else {
-ds_put_format(, " && tcp && tcp.dst == %d",
-  port);
-}
-add_router_lb_flow(lflows, od, , , 120,
-   lb_force_snat_ip);
-} else {
-add_router_lb_flow(lflows, od, , , 110,
-   lb_force_snat_ip);
-}
-}
-}
-
-/* If there are any load balancing rules, we should send the
- * packet to conntrack for defragmentation and tracking.  This helps
- * with two things.
- *
- * 1. With tracking, we can send only new connections to pick a
- *DNAT ip address from a group.
- * 2. If there are L4 ports in load balancing rules, we need the
- *defragmentation to match on L4 ports. */
-const char *ip_address;
-SSET_FOR_EACH(ip_address, _ips) {
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s", ip_address);
-ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG,
-  100, ds_cstr(), "ct_next;");
-}
-
-sset_destroy(_ips);
-
 for (int i = 0; i < od->nbr->n_nat; i++) {
 const struct nbrec_nat *nat;
 
@@ -4323,6 +4253,76 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 * routing in the openflow pipeline. */
 ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
   "ip", "flags.loopback = 1; ct_dnat;");
+
+/* A set to hold all ips that need defragmentation and tracking. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+
+for (int i = 0; i < od->nbr->n_load_balancer; i++) {
+struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
+struct smap *vips = >vips;
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, vips) {
+uint16_t port = 0;
+
+/* node->key contains IP:port or just IP. */
+char *ip_address = NULL;
+ip_address_and_port_from_lb_key(node->key, _address, );
+ 

[ovs-dev] [PATCH v10 4/8] ovn: add egress_loopback action

2017-01-17 Thread Mickey Spiegel
This patch adds an action that loops a clone of the packet back to the
beginning of the ingress pipeline with logical inport equal to the value
of the current logical outport.  The following actions are executed on
the clone:

clears the connection tracking state
in_port = 0
inport = outport
outport = 0
flags = 0
reg0 ... reg9 = 0
nested actions from inside "{ ... }"
for example "reg9[1] = 1" to indicate that egress loopback has
occurred
executes the ingress pipeline as a subroutine

This action is expected to be executed in the egress pipeline.  No
changes are made to the logical datapath or to the connection tracking
zones, which will continue to be correct when carrying out loopback
from the egress pipeline to the ingress pipeline.

This capability is needed in order to implement some of the east/west
distributed NAT flows.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 include/ovn/actions.h |  5 +++-
 ovn/controller/lflow.c|  1 +
 ovn/lib/actions.c | 71 +--
 ovn/ovn-sb.xml| 35 +++
 ovn/utilities/ovn-trace.c | 49 
 tests/ovn.at  |  4 +++
 tests/test-ovn.c  |  1 +
 7 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 0bf6145..5f9d203 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -68,7 +68,8 @@ struct simap;
 OVNACT(PUT_ND,ovnact_put_mac_bind)  \
 OVNACT(PUT_DHCPV4_OPTS, ovnact_put_dhcp_opts)   \
 OVNACT(PUT_DHCPV6_OPTS, ovnact_put_dhcp_opts)   \
-OVNACT(SET_QUEUE,   ovnact_set_queue)
+OVNACT(SET_QUEUE,   ovnact_set_queue)   \
+OVNACT(EGRESS_LOOPBACK, ovnact_nest)
 
 /* enum ovnact_type, with a member OVNACT_ for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -444,6 +445,8 @@ struct ovnact_encode_params {
 uint8_t output_ptable;  /* OpenFlow table for 'output' to resubmit. */
 uint8_t mac_bind_ptable;/* OpenFlow table for 'get_arp'/'get_nd' to
resubmit. */
+uint8_t ingress_ptable; /* OpenFlow table for 'egress_loopback' to
+   resubmit. */
 };
 
 void ovnacts_encode(const struct ovnact[], size_t ovnacts_len,
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 3d7633e..c41368f 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -226,6 +226,7 @@ consider_logical_flow(const struct lport_index *lports,
 .first_ptable = first_ptable,
 .output_ptable = output_ptable,
 .mac_bind_ptable = OFTABLE_MAC_BINDING,
+.ingress_ptable = OFTABLE_LOG_INGRESS_PIPELINE,
 };
 ovnacts_encode(ovnacts.data, ovnacts.size, , );
 ovnacts_free(ovnacts.data, ovnacts.size);
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index 686ecc5..dda675b 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -169,6 +169,21 @@ put_load(uint64_t value, enum mf_field_id dst, int ofs, 
int n_bits,
 bitwise_copy(_value, 8, 0, sf->value, sf->field->n_bytes, ofs, n_bits);
 bitwise_one(ofpact_set_field_mask(sf), sf->field->n_bytes, ofs, n_bits);
 }
+
+static void
+put_move(enum mf_field_id src, int src_ofs,
+ enum mf_field_id dst, int dst_ofs,
+ int n_bits,
+ struct ofpbuf *ofpacts)
+{
+struct ofpact_reg_move *move = ofpact_put_REG_MOVE(ofpacts);
+move->src.field = mf_from_id(src);
+move->src.ofs = src_ofs;
+move->src.n_bits = n_bits;
+move->dst.field = mf_from_id(dst);
+move->dst.ofs = dst_ofs;
+move->dst.n_bits = n_bits;
+}
 
 /* Context maintained during ovnacts_parse(). */
 struct action_context {
@@ -1021,7 +1036,10 @@ free_CT_LB(struct ovnact_ct_lb *ct_lb)
 }
 
 /* Implements the "arp" and "nd_na" actions, which execute nested actions on a
- * packet derived from the one being processed. */
+ * packet derived from the one being processed.  Also implements the
+ * "egress_loopback" action, which executes nested actions after clearing
+ * registers and connection state, then loops the packet back to the
+ * beginning of the ingress pipeline. */
 static void
 parse_nested_action(struct action_context *ctx, enum ovnact_type type,
 const char *prereq)
@@ -1055,7 +1073,9 @@ parse_nested_action(struct action_context *ctx, enum 
ovnact_type type,
 return;
 }
 
-add_prerequisite(ctx, prereq);
+if (prereq) {
+add_prerequisite(ctx, prereq);
+}
 
 struct ovnact_nest *on = ovnact_put(ctx->ovnacts, type, sizeof *on);
 on->nested_len = nested.size;
@@ -1075,6 +1095,12 @@ parse_ND_NA(struct action_context *ctx)
 }
 
 static void
+parse_EGRESS_LOOPBACK(struct action_context *ctx)
+{
+parse_nested_action(ctx, OVNACT_EGRESS_LOOPBACK

[ovs-dev] [PATCH v10 3/8] ovn: Introduce distributed gateway port and "chassisredirect" port binding

2017-01-17 Thread Mickey Spiegel
Currently OVN distributed logical routers achieve reachability to
physical networks by passing through a "join" logical switch to a
centralized gateway router, which then connects to another logical
switch that has a localnet port connecting to the physical network.

This patch adds logical port and port binding abstractions that allow
an OVN distributed logical router to connect directly to a logical
switch that has a localnet port connecting to the physical network.
In this patch, this logical router port is called a "distributed
gateway port".

The primary design goal of distributed gateway ports is to allow as
much traffic as possible to be handled locally on the hypervisor
where a VM or container resides.  Whenever possible, packets from
the VM or container to the outside world should be processed
completely on that VM's or container's hypervisor, eventually
traversing a localnet port instance on that hypervisor to the
physical network.  Whenever possible, packets from the outside
world to a VM or container should be directed through the physical
network directly to the VM's or container's hypervisor, where the
packet will enter the integration bridge through a localnet port.

However, due to the implications of the use of L2 learning in the
physical network, as well as the need to support advanced features
such as one-to-many NAT (aka IP masquerading), where multiple
logical IP addresses spread across multiple chassis are mapped to
one external IP address, it will be necessary to handle some of the
logical router processing on a specific chassis in a centralized
manner.  For this reason, the user must associate a chassis with
each distributed gateway port.

In order to allow for the distributed processing of some packets,
distributed gateway ports need to be logical patch ports that
effectively reside on every hypervisor, rather than "l3gateway"
ports that are bound to a particular chassis.  However, the flows
associated with distributed gateway ports often need to be
associated with physical locations.  This is implemented in this
patch (and subsequent patches) by adding "is_chassis_resident()"
match conditions to several logical router flows.

While most of the physical location dependent aspects of distributed
gateway ports can be handled by restricting some flows to specific
chassis, one additional mechanism is required.  When a packet
leaves the ingress pipeline and the logical egress port is the
distributed gateway port, one of two different sets of actions is
required at table 32:
- If the packet can be handled locally on the sender's hypervisor
  (e.g. one-to-one NAT traffic), then the packet should just be
  resubmitted locally to table 33, in the normal manner for
  distributed logical patch ports.
- However, if the packet needs to be handled on the chassis
  associated with the distributed gateway port (e.g. one-to-many
  SNAT traffic or non-NAT traffic), then table 32 must send the
  packet on a tunnel port to that chassis.
In order to trigger the second set of actions, the
"chassisredirect" type of southbound port_binding is introduced.
Setting the logical egress port to the type "chassisredirect"
logical port is simply a way to indicate that although the packet
is destined for the distributed gateway port, it needs to be
redirected to a different chassis.  At table 32, packets with this
logical egress port are sent to a specific chassis, in the same
way that table 32 directs packets whose logical egress port is a
VIF or a type "l3gateway" port to different chassis.  Once the
packet arrives at that chassis, table 33 resets the logical egress
port to the value representing the distributed gateway port.  For
each distributed gateway port, there is one type "chassisredirect"
port, in addition to the distributed logical patch port
representing the distributed gateway port.

A "chassisredirect" port represents a particular instance, bound
to a specific chassis, of an otherwise distributed port.  A
"chassisredirect" port is associated with a chassis in the same
manner as a "l3gateway" port.  However, unlike "l3gateway" ports,
"chassisredirect" ports have no associated IP or MAC addresses,
and "chassisredirect" ports should never be used as the "inport".
Any pipeline stages that depend on port specific IP or MAC addresses
should be carried out in the context of the distributed gateway
port's logical patch port.

Although the abstraction represented by the "chassisredirect" port
binding is generalized, in this patch the "chassisredirect" port binding
is only created for NB logical router ports that specify the new
"redirect-chassis" option.  There is no explicit notion of a
"chassisredirect" port in the NB database.  The expectation is when
capabilities are implemented that take advantage of "

[ovs-dev] [PATCH v10 2/8] ovn: add is_chassis_resident match expression component

2017-01-17 Thread Mickey Spiegel
This patch introduces a new match expression component
is_chassis_resident().  Unlike match expression comparisons,
is_chassis_resident is not pushed down to OpenFlow.  It is a
conditional that is evaluated in the controller during expr_simplify(),
when it is replaced by a boolean expression.  The is_chassis_resident
conditional evaluates to "true" when the specified string identifies a
port name that is resident on this controller chassis, i.e., the
corresponding southbound database Port_Binding has a chassis column
that matches this chassis.  Otherwise it evaluates to "false".

This allows higher level features to specify flows that are only
installed on some chassis rather than on all chassis with the
corresponding datapath.

Suggested-by: Ben Pfaff <b...@ovn.org>
Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Ben Pfaff <b...@ovn.org>
---
 include/ovn/expr.h  |  22 +-
 ovn/controller/lflow.c  |  31 ++--
 ovn/controller/lflow.h  |   5 +-
 ovn/controller/ovn-controller.c |   5 +-
 ovn/lib/expr.c  | 160 ++--
 ovn/ovn-sb.xml  |  14 
 ovn/utilities/ovn-trace.c   |  21 +-
 tests/ovn.at|  14 
 tests/test-ovn.c|  24 +-
 9 files changed, 279 insertions(+), 17 deletions(-)

diff --git a/include/ovn/expr.h b/include/ovn/expr.h
index 2169a8c..711713e 100644
--- a/include/ovn/expr.h
+++ b/include/ovn/expr.h
@@ -292,6 +292,15 @@ enum expr_type {
 EXPR_T_AND, /* Logical AND of 2 or more subexpressions. */
 EXPR_T_OR,  /* Logical OR of 2 or more subexpressions. */
 EXPR_T_BOOLEAN, /* True or false constant. */
+EXPR_T_CONDITION,   /* Conditional to be evaluated in the
+ * controller during expr_simplify(),
+ * prior to constructing OpenFlow matches. */
+};
+
+/* Expression condition type. */
+enum expr_cond_type {
+EXPR_COND_CHASSIS_RESIDENT, /* Check if specified logical port name is
+ * resident on the controller chassis. */
 };
 
 /* Relational operator. */
@@ -349,6 +358,14 @@ struct expr {
 
 /* EXPR_T_BOOLEAN. */
 bool boolean;
+
+/* EXPR_T_CONDITION. */
+struct {
+enum expr_cond_type type;
+bool not;
+/* XXX Should arguments for conditions be generic? */
+char *string;
+} cond;
 };
 };
 
@@ -375,7 +392,10 @@ void expr_destroy(struct expr *);
 
 struct expr *expr_annotate(struct expr *, const struct shash *symtab,
char **errorp);
-struct expr *expr_simplify(struct expr *);
+struct expr *expr_simplify(struct expr *,
+   bool (*is_chassis_resident)(const void *c_aux,
+   const char *port_name),
+   const void *c_aux);
 struct expr *expr_normalize(struct expr *);
 
 bool expr_honors_invariants(const struct expr *);
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 71d8c59..3d7633e 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -50,12 +50,18 @@ struct lookup_port_aux {
 const struct sbrec_datapath_binding *dp;
 };
 
+struct condition_aux {
+const struct lport_index *lports;
+const struct sbrec_chassis *chassis;
+};
+
 static void consider_logical_flow(const struct lport_index *lports,
   const struct mcgroup_index *mcgroups,
   const struct sbrec_logical_flow *lflow,
   const struct hmap *local_datapaths,
   struct group_table *group_table,
   const struct simap *ct_zones,
+  const struct sbrec_chassis *chassis,
   struct hmap *dhcp_opts,
   struct hmap *dhcpv6_opts,
   uint32_t *conj_id_ofs,
@@ -85,6 +91,16 @@ lookup_port_cb(const void *aux_, const char *port_name, 
unsigned int *portp)
 }
 
 static bool
+is_chassis_resident_cb(const void *c_aux_, const char *port_name)
+{
+const struct condition_aux *c_aux = c_aux_;
+
+const struct sbrec_port_binding *pb
+= lport_lookup_by_name(c_aux->lports, port_name);
+return pb && pb->chassis && pb->chassis == c_aux->chassis;
+}
+
+static bool
 is_switch(const struct sbrec_datapath_binding *ldp)
 {
 return smap_get(>external_ids, "logical-switch") != NULL;
@@ -98,6 +114,7 @@ add_logical_flows(struct controller_ctx *ctx, const struct 
lport_index *lports,
   const struct hmap *local_datapaths,
   struct group_table *group_table,
   const

[ovs-dev] [PATCH v10 1/8] ovn: document logical routers and logical patch ports in ovn-architecture

2017-01-17 Thread Mickey Spiegel
This patch adds a description of logical routers and logical patch ports,
including gateway routers, to ovn/ovn-architecture.7.xml.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/ovn-architecture.7.xml | 148 ++---
 1 file changed, 140 insertions(+), 8 deletions(-)

diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index b049e51..d92f878 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -381,6 +381,36 @@
   switch.  Logical switches and routers are both implemented as logical
   datapaths.
 
+
+
+  
+Logical ports represent the points of connectivity in and
+out of logical switches and logical routers.  Some common types of
+logical ports are:
+  
+
+  
+
+  Logical ports representing VIFs.
+
+
+
+  Localnet ports represent the points of connectivity
+  between logical switches and the physical network.  They are
+  implemented as OVS patch ports between the integration bridge
+  and the separate Open vSwitch bridge that underlay physical
+  ports attach to.
+
+
+
+  Logical patch ports represent the points of
+  connectivity between logical switches and logical routers, and
+  in some cases between peer logical routers.  There is a pair of
+  logical patch ports at each such point of connectivity, one on
+  each side.
+
+  
+
   
 
   Life Cycle of a VIF
@@ -1040,17 +1070,119 @@
 is a container nested with a VM, then before sending the packet the
 actions push on a VLAN header with an appropriate VLAN ID.
   
-
-  
-If the logical egress port is a logical patch port, then table 65
-outputs to an OVS patch port that represents the logical patch port.
-The packet re-enters the OpenFlow flow table from the OVS patch port's
-peer in table 0, which identifies the logical datapath and logical
-input port based on the OVS patch port's OpenFlow port number.
-  
 
   
 
+  Logical Routers and Logical Patch Ports
+
+  
+Typically logical routers and logical patch ports do not have a
+physical location and effectively reside on every hypervisor.  This is
+the case for logical patch ports between logical routers and logical
+switches behind those logical routers, to which VMs (and VIFs) attach.
+  
+
+  
+Consider a packet sent from one virtual machine or container to another
+VM or container that resides on a different subnet.  The packet will
+traverse tables 0 to 65 as described in the previous section
+Architectural Physical Life Cycle of a Packet, using the
+logical datapath representing the logical switch that the sender is
+attached to.  At table 32, the packet will use the fallback flow that
+resubmits locally to table 33 on the same hypervisor.  In this case,
+all of the processing from table 0 to table 65 occurs on the hypervisor
+where the sender resides.
+  
+
+  
+When the packet reaches table 65, the logical egress port is a logical
+patch port.  The behavior at table 65 differs depending on the OVS
+version:
+  
+
+  
+
+  In OVS versions 2.6 and earlier, table 65 outputs to an OVS patch
+  port that represents the logical patch port.  The packet re-enters
+  the OpenFlow flow table from the OVS patch port's peer in table 0,
+  which identifies the logical datapath and logical input port based
+  on the OVS patch port's OpenFlow port number.
+
+
+
+  In OVS versions 2.7 and later, the packet is cloned and resubmitted
+  directly to OpenFlow flow table 16, setting the logical ingress
+  port to the peer logical patch port, and using the peer logical
+  patch port's logical datapath (that represents the logical router).
+
+  
+
+  
+The packet re-enters the ingress pipeline in order to traverse tables
+16 to 65 again, this time using the logical datapath representing the
+logical router.  The processing continues as described in the previous
+section Architectural Physical Life Cycle of a Packet.
+When the packet reachs table 65, the logical egress port will once
+again be a logical patch port.  In the same manner as described above,
+this logical patch port will cause the packet to be resubmitted to
+OpenFlow tables 16 to 65, this time using the logical datapath
+representing the logical switch that the destination VM or container
+is attached to.
+  
+
+  
+The packet traverses tables 16 to 65 a third and final time.  If the
+destination VM or container resides on a remote hypervisor, then table
+32 will send the packet on a tunnel port from the sender's hypervisor
+to the remote hypervisor.  Finally table 65 will output the packet
+directly to the destinat

[ovs-dev] [PATCH v9 3/3] ovn: introduce distributed gateway port

2017-01-13 Thread Mickey Spiegel
Currently OVN distributed logical routers achieve reachability to
physical networks by passing through a "join" logical switch to a
centralized gateway router, which then connects to another logical
switch that has a localnet port connecting to the physical network.

This patch adds logical port and port binding abstractions that allow
an OVN distributed logical router to connect directly to a logical
switch that has a localnet port connecting to the physical network.
In this patch, this logical router port is called a "distributed
gateway port".

The primary design goal of distributed gateway ports is to allow as
much traffic as possible to be handled locally on the hypervisor
where a VM or container resides.  Whenever possible, packets from
the VM or container to the outside world should be processed
completely on that VM's or container's hypervisor, eventually
traversing a localnet port instance on that hypervisor to the
physical network.  Whenever possible, packets from the outside
world to a VM or container should be directed through the physical
network directly to the VM's or container's hypervisor, where the
packet will enter the integration bridge through a localnet port.

However, due to the implications of the use of L2 learning in the
physical network, as well as the need to support advanced features
such as one-to-many NAT (aka IP masquerading), where multiple
logical IP addresses spread across multiple chassis are mapped to
one external IP address, it will be necessary to handle some of the
logical router processing on a specific chassis in a centralized
manner.  For this reason, the user must associate a
"redirect-chassis" with each distributed gateway port.

In order to allow for the distributed processing of some packets,
distributed gateway ports need to be logical patch ports that
effectively reside on every hypervisor, rather than "l3gateway"
ports that are bound to a particular chassis.  However, the flows
associated with distributed gateway ports often need to be
associated with physical locations.  This is implemented in this
patch (and subsequent patches) by adding "is_chassis_resident()"
match conditions to several logical router flows.

While most of the physical location dependent aspects of distributed
gateway ports can be handled by restricting some flows to specific
chassis, one additional mechanism is required.  When a packet
leaves the ingress pipeline and the logical egress port is the
distributed gateway port, one of two different sets of actions is
required at table 32:
- If the packet can be handled locally on the sender's hypervisor
  (e.g. one-to-one NAT traffic), then the packet should just be
  resubmitted locally to table 33, in the normal manner for
  distributed logical patch ports.
- However, if the packet needs to be handled on the chassis
  associated with the distributed gateway port (e.g. one-to-many
  SNAT traffic or non-NAT traffic), then table 32 must send the
  packet on a tunnel port to that chassis.
In order to trigger the second set of actions, the
MLF_FORCE_CHASSIS_REDIRECT flag is added.  For port_bindings with
type "patch", when a "redirect-chassis" is specified, a flow is
added to table 32 that matches when the logical egress port is
the distributed gateway port and MLF_FORCE_CHASSIS_REDIRECT is
set.  This flow sends the packet through a tunnel to the
"redirect-chassis", in the same way that table 32 directs packets
whose logical egress port is a VIF or a type "l3gateway" port to
different chassis.  When the logical egress port is the
distributed gateway port and MLF_FORCE_CHASSIS_REDIRECT is
cleared, the packet will fall through to the table 32 priority 0
fallback flow and be resubmitted to table 33 locally.

A port_binding of type "patch" is associated with a chassis in a
similar manner as a "l3gateway" port.  However, unlike "l3gateway"
ports, "patch" ports are effectively resident on each hypervisor
(subject to conditional monitoring constraints) even when there
is a "redirect-chassis" specified.  The effect of associating a
"redirect-chassis" with a logical router port is to cause the
additional table 32 flow to be created (through the southbound
port_binding), and to restrict some flows to the
"redirect-chassis" through "is_chassis_resident()" match
conditions.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/controller/binding.c|   8 +
 ovn/controller/ovn-controller.c |   4 +
 ovn/controller/physical.c   |  34 +
 ovn/lib/logical-fields.c|   4 +
 ovn/lib/logical-fields.h|   6 +
 ovn/northd/ovn-northd.8.xml |  94 +++-
 ovn/northd/ovn-northd.c | 123 ++-
 ovn/ovn-architecture.7.xml  | 128 +++-
 ovn/ovn-nb.ovsschema|   9 +-
 ovn/ovn-nb.xml  |  33 +++

[ovs-dev] [PATCH v9 2/3] ovn: add is_chassis_resident match expression component

2017-01-13 Thread Mickey Spiegel
This patch introduces a new match expression component
is_chassis_resident().  Unlike match expression comparisons,
is_chassis_resident is not pushed down to OpenFlow.  It is a
conditional that is evaluated in the controller during expr_simplify(),
when it is replaced by a boolean expression.  The is_chassis_resident
conditional evaluates to "true" when the specified string identifies a
port name that is resident on this controller chassis, i.e., the
corresponding southbound database Port_Binding has a chassis column
that matches this chassis.  Otherwise it evaluates to "false".

This allows higher level features to specify flows that are only
installed on some chassis rather than on all chassis with the
corresponding datapath.

Suggested-by: Ben Pfaff <b...@ovn.org>
Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
Acked-by: Ben Pfaff <b...@ovn.org>
---
 include/ovn/expr.h  |  22 +-
 ovn/controller/lflow.c  |  31 ++--
 ovn/controller/lflow.h  |   5 +-
 ovn/controller/ovn-controller.c |   5 +-
 ovn/lib/expr.c  | 160 ++--
 ovn/ovn-sb.xml  |  14 
 ovn/utilities/ovn-trace.c   |  21 +-
 tests/ovn.at|  14 
 tests/test-ovn.c|  24 +-
 9 files changed, 279 insertions(+), 17 deletions(-)

diff --git a/include/ovn/expr.h b/include/ovn/expr.h
index 2169a8c..711713e 100644
--- a/include/ovn/expr.h
+++ b/include/ovn/expr.h
@@ -292,6 +292,15 @@ enum expr_type {
 EXPR_T_AND, /* Logical AND of 2 or more subexpressions. */
 EXPR_T_OR,  /* Logical OR of 2 or more subexpressions. */
 EXPR_T_BOOLEAN, /* True or false constant. */
+EXPR_T_CONDITION,   /* Conditional to be evaluated in the
+ * controller during expr_simplify(),
+ * prior to constructing OpenFlow matches. */
+};
+
+/* Expression condition type. */
+enum expr_cond_type {
+EXPR_COND_CHASSIS_RESIDENT, /* Check if specified logical port name is
+ * resident on the controller chassis. */
 };
 
 /* Relational operator. */
@@ -349,6 +358,14 @@ struct expr {
 
 /* EXPR_T_BOOLEAN. */
 bool boolean;
+
+/* EXPR_T_CONDITION. */
+struct {
+enum expr_cond_type type;
+bool not;
+/* XXX Should arguments for conditions be generic? */
+char *string;
+} cond;
 };
 };
 
@@ -375,7 +392,10 @@ void expr_destroy(struct expr *);
 
 struct expr *expr_annotate(struct expr *, const struct shash *symtab,
char **errorp);
-struct expr *expr_simplify(struct expr *);
+struct expr *expr_simplify(struct expr *,
+   bool (*is_chassis_resident)(const void *c_aux,
+   const char *port_name),
+   const void *c_aux);
 struct expr *expr_normalize(struct expr *);
 
 bool expr_honors_invariants(const struct expr *);
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 71d8c59..3d7633e 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -50,12 +50,18 @@ struct lookup_port_aux {
 const struct sbrec_datapath_binding *dp;
 };
 
+struct condition_aux {
+const struct lport_index *lports;
+const struct sbrec_chassis *chassis;
+};
+
 static void consider_logical_flow(const struct lport_index *lports,
   const struct mcgroup_index *mcgroups,
   const struct sbrec_logical_flow *lflow,
   const struct hmap *local_datapaths,
   struct group_table *group_table,
   const struct simap *ct_zones,
+  const struct sbrec_chassis *chassis,
   struct hmap *dhcp_opts,
   struct hmap *dhcpv6_opts,
   uint32_t *conj_id_ofs,
@@ -85,6 +91,16 @@ lookup_port_cb(const void *aux_, const char *port_name, 
unsigned int *portp)
 }
 
 static bool
+is_chassis_resident_cb(const void *c_aux_, const char *port_name)
+{
+const struct condition_aux *c_aux = c_aux_;
+
+const struct sbrec_port_binding *pb
+= lport_lookup_by_name(c_aux->lports, port_name);
+return pb && pb->chassis && pb->chassis == c_aux->chassis;
+}
+
+static bool
 is_switch(const struct sbrec_datapath_binding *ldp)
 {
 return smap_get(>external_ids, "logical-switch") != NULL;
@@ -98,6 +114,7 @@ add_logical_flows(struct controller_ctx *ctx, const struct 
lport_index *lports,
   const struct hmap *local_datapaths,
   struct group_table *group_table,
   const

[ovs-dev] [PATCH v9 1/3] ovn: document logical routers and logical patch ports in ovn-architecture

2017-01-13 Thread Mickey Spiegel
This patch adds a description of logical routers and logical patch ports,
including gateway routers, to ovn/ovn-architecture.7.xml.

Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
---
 ovn/ovn-architecture.7.xml | 148 ++---
 1 file changed, 140 insertions(+), 8 deletions(-)

diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index b049e51..d92f878 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -381,6 +381,36 @@
   switch.  Logical switches and routers are both implemented as logical
   datapaths.
 
+
+
+  
+Logical ports represent the points of connectivity in and
+out of logical switches and logical routers.  Some common types of
+logical ports are:
+  
+
+  
+
+  Logical ports representing VIFs.
+
+
+
+  Localnet ports represent the points of connectivity
+  between logical switches and the physical network.  They are
+  implemented as OVS patch ports between the integration bridge
+  and the separate Open vSwitch bridge that underlay physical
+  ports attach to.
+
+
+
+  Logical patch ports represent the points of
+  connectivity between logical switches and logical routers, and
+  in some cases between peer logical routers.  There is a pair of
+  logical patch ports at each such point of connectivity, one on
+  each side.
+
+  
+
   
 
   Life Cycle of a VIF
@@ -1040,17 +1070,119 @@
 is a container nested with a VM, then before sending the packet the
 actions push on a VLAN header with an appropriate VLAN ID.
   
-
-  
-If the logical egress port is a logical patch port, then table 65
-outputs to an OVS patch port that represents the logical patch port.
-The packet re-enters the OpenFlow flow table from the OVS patch port's
-peer in table 0, which identifies the logical datapath and logical
-input port based on the OVS patch port's OpenFlow port number.
-  
 
   
 
+  Logical Routers and Logical Patch Ports
+
+  
+Typically logical routers and logical patch ports do not have a
+physical location and effectively reside on every hypervisor.  This is
+the case for logical patch ports between logical routers and logical
+switches behind those logical routers, to which VMs (and VIFs) attach.
+  
+
+  
+Consider a packet sent from one virtual machine or container to another
+VM or container that resides on a different subnet.  The packet will
+traverse tables 0 to 65 as described in the previous section
+Architectural Physical Life Cycle of a Packet, using the
+logical datapath representing the logical switch that the sender is
+attached to.  At table 32, the packet will use the fallback flow that
+resubmits locally to table 33 on the same hypervisor.  In this case,
+all of the processing from table 0 to table 65 occurs on the hypervisor
+where the sender resides.
+  
+
+  
+When the packet reaches table 65, the logical egress port is a logical
+patch port.  The behavior at table 65 differs depending on the OVS
+version:
+  
+
+  
+
+  In OVS versions 2.6 and earlier, table 65 outputs to an OVS patch
+  port that represents the logical patch port.  The packet re-enters
+  the OpenFlow flow table from the OVS patch port's peer in table 0,
+  which identifies the logical datapath and logical input port based
+  on the OVS patch port's OpenFlow port number.
+
+
+
+  In OVS versions 2.7 and later, the packet is cloned and resubmitted
+  directly to OpenFlow flow table 16, setting the logical ingress
+  port to the peer logical patch port, and using the peer logical
+  patch port's logical datapath (that represents the logical router).
+
+  
+
+  
+The packet re-enters the ingress pipeline in order to traverse tables
+16 to 65 again, this time using the logical datapath representing the
+logical router.  The processing continues as described in the previous
+section Architectural Physical Life Cycle of a Packet.
+When the packet reachs table 65, the logical egress port will once
+again be a logical patch port.  In the same manner as described above,
+this logical patch port will cause the packet to be resubmitted to
+OpenFlow tables 16 to 65, this time using the logical datapath
+representing the logical switch that the destination VM or container
+is attached to.
+  
+
+  
+The packet traverses tables 16 to 65 a third and final time.  If the
+destination VM or container resides on a remote hypervisor, then table
+32 will send the packet on a tunnel port from the sender's hypervisor
+to the remote hypervisor.  Finally table 65 will output the packet
+directly to the destinat

Re: [ovs-dev] [PATCH v7 3/7] ovn: Introduce "chassisredirect" port binding

2017-01-13 Thread Mickey Spiegel
On Fri, Jan 13, 2017 at 4:21 PM, Ben Pfaff <b...@ovn.org> wrote:

> On Fri, Jan 13, 2017 at 02:19:21PM -0800, Mickey Spiegel wrote:
> > On Thu, Jan 12, 2017 at 5:12 PM, Mickey Spiegel <mickeys@gmail.com>
> > wrote:
> >
> > >
> > > On Sun, Jan 8, 2017 at 10:30 PM, Mickey Spiegel <mickeys@gmail.com
> >
> > > wrote:
> > >
> > >>
> > >> On Fri, Jan 6, 2017 at 8:31 PM, Mickey Spiegel <mickeys@gmail.com
> >
> > >> wrote:
> > >>
> > >>>
> > >>> On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <
> mickeys@gmail.com>
> > >>> wrote:
> > >>>
> > >>>>
> > >>>> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <b...@ovn.org> wrote:
> > >>>>
> > >>>>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
> > >>>>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <b...@ovn.org> wrote:
> > >>>>> >
> > >>>>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
> > >>>>> > > > Currently OVN handles all logical router ports in a
> distributed
> > >>>>> manner,
> > >>>>> > > > creating instances on each chassis.  The logical router
> ingress
> > >>>>> and
> > >>>>> > > > egress pipelines are traversed locally on the source chassis.
> > >>>>> > > >
> > >>>>> > > > In order to support advanced features such as one-to-many NAT
> > >>>>> (aka IP
> > >>>>> > > > masquerading), where multiple private IP addresses spread
> across
> > >>>>> > > > multiple chassis are mapped to one public IP address, it
> will be
> > >>>>> > > > necessary to handle some of the logical router processing on
> a
> > >>>>> specific
> > >>>>> > > > chassis in a centralized manner.
> > >>>>> > > >
> > >>>>> > > > The goal of this patch is to develop abstractions that allow
> for
> > >>>>> a
> > >>>>> > > > subset of router gateway traffic to be handled in a
> centralized
> > >>>>> manner
> > >>>>> > > > (e.g. one-to-many NAT traffic), while allowing for other
> subsets
> > >>>>> of
> > >>>>> > > > router gateway traffic to be handled in a distributed manner
> > >>>>> (e.g.
> > >>>>> > > > floating IP traffic).
> > >>>>> > > >
> > >>>>> > > > This patch introduces a new type of SB port_binding called
> > >>>>> > > > "chassisredirect".  A "chassisredirect" port represents a
> > >>>>> particular
> > >>>>> > > > instance, bound to a specific chassis, of an otherwise
> > >>>>> distributed
> > >>>>> > > > port.  The ovn-controller on that chassis populates the
> "chassis"
> > >>>>> > > > column for this record as an indication for other
> > >>>>> ovn-controllers of
> > >>>>> > > > its physical location.  Other ovn-controllers do not treat
> this
> > >>>>> port
> > >>>>> > > > as a local port.
> > >>>>> > > >
> > >>>>> > > > A "chassisredirect" port should never be used as an "inport".
> > >>>>> When an
> > >>>>> > > > ingress pipeline sets the "outport", it may set the value to
> a
> > >>>>> logical
> > >>>>> > > > port of type "chassisredirect".  This will cause the packet
> to be
> > >>>>> > > > directed to a specific chassis to carry out the egress
> logical
> > >>>>> router
> > >>>>> > > > pipeline, in the same way that a logical switch forwards
> egress
> > >>>>> traffic
> > >>>>> > > > to a VIF port residing on a specific chassis.  At the
> beginning
> > >>>>> of the
> > >>>>> > >

Re: [ovs-dev] [PATCH v7 3/7] ovn: Introduce "chassisredirect" port binding

2017-01-13 Thread Mickey Spiegel
On Thu, Jan 12, 2017 at 5:12 PM, Mickey Spiegel <mickeys@gmail.com>
wrote:

>
> On Sun, Jan 8, 2017 at 10:30 PM, Mickey Spiegel <mickeys@gmail.com>
> wrote:
>
>>
>> On Fri, Jan 6, 2017 at 8:31 PM, Mickey Spiegel <mickeys@gmail.com>
>> wrote:
>>
>>>
>>> On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <mickeys@gmail.com>
>>> wrote:
>>>
>>>>
>>>> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <b...@ovn.org> wrote:
>>>>
>>>>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
>>>>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <b...@ovn.org> wrote:
>>>>> >
>>>>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
>>>>> > > > Currently OVN handles all logical router ports in a distributed
>>>>> manner,
>>>>> > > > creating instances on each chassis.  The logical router ingress
>>>>> and
>>>>> > > > egress pipelines are traversed locally on the source chassis.
>>>>> > > >
>>>>> > > > In order to support advanced features such as one-to-many NAT
>>>>> (aka IP
>>>>> > > > masquerading), where multiple private IP addresses spread across
>>>>> > > > multiple chassis are mapped to one public IP address, it will be
>>>>> > > > necessary to handle some of the logical router processing on a
>>>>> specific
>>>>> > > > chassis in a centralized manner.
>>>>> > > >
>>>>> > > > The goal of this patch is to develop abstractions that allow for
>>>>> a
>>>>> > > > subset of router gateway traffic to be handled in a centralized
>>>>> manner
>>>>> > > > (e.g. one-to-many NAT traffic), while allowing for other subsets
>>>>> of
>>>>> > > > router gateway traffic to be handled in a distributed manner
>>>>> (e.g.
>>>>> > > > floating IP traffic).
>>>>> > > >
>>>>> > > > This patch introduces a new type of SB port_binding called
>>>>> > > > "chassisredirect".  A "chassisredirect" port represents a
>>>>> particular
>>>>> > > > instance, bound to a specific chassis, of an otherwise
>>>>> distributed
>>>>> > > > port.  The ovn-controller on that chassis populates the "chassis"
>>>>> > > > column for this record as an indication for other
>>>>> ovn-controllers of
>>>>> > > > its physical location.  Other ovn-controllers do not treat this
>>>>> port
>>>>> > > > as a local port.
>>>>> > > >
>>>>> > > > A "chassisredirect" port should never be used as an "inport".
>>>>> When an
>>>>> > > > ingress pipeline sets the "outport", it may set the value to a
>>>>> logical
>>>>> > > > port of type "chassisredirect".  This will cause the packet to be
>>>>> > > > directed to a specific chassis to carry out the egress logical
>>>>> router
>>>>> > > > pipeline, in the same way that a logical switch forwards egress
>>>>> traffic
>>>>> > > > to a VIF port residing on a specific chassis.  At the beginning
>>>>> of the
>>>>> > > > egress pipeline, the "outport" will be reset to the value of the
>>>>> > > > distributed port.
>>>>> > > >
>>>>> > > > For outbound traffic to be handled in a centralized manner, the
>>>>> > > > "outport" should be set to the "chassisredirect" port
>>>>> representing
>>>>> > > > centralized gateway functionality in the otherwise distributed
>>>>> router.
>>>>> > > > For outbound traffic to be handled in a distributed manner,
>>>>> locally on
>>>>> > > > the source chassis, the "outport" should be set to the existing
>>>>> "patch"
>>>>> > > > port representing distributed gateway functionality.
>>>>> > > >

Re: [ovs-dev] [PATCH v7 4/7] ovn: add egress loopback capability

2017-01-09 Thread Mickey Spiegel
On Mon, Jan 9, 2017 at 2:44 PM, Ben Pfaff <b...@ovn.org> wrote:

> On Mon, Jan 09, 2017 at 02:30:54PM -0800, Mickey Spiegel wrote:
> > On Mon, Jan 9, 2017 at 2:22 PM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > > On Fri, Jan 06, 2017 at 04:28:00PM -0800, Mickey Spiegel wrote:
> > > > On Fri, Jan 6, 2017 at 3:57 PM, Ben Pfaff <b...@ovn.org> wrote:
> > > >
> > > > > On Fri, Jan 06, 2017 at 12:00:31PM -0800, Mickey Spiegel wrote:
> > > > > > This patch adds the capability to force loopback at the end of
> the
> > > > > > egress pipeline.  A new flags.force_egress_loopback symbol is
> > > defined,
> > > > > > along with corresponding flags bits.  When
> > > flags.force_egress_loopback
> > > > > > is set, at OFTABLE_LOG_TO_PHY, instead of the packet being sent
> out
> > > to
> > > > > > the peer patch port or out the outport, the packet is forced
> back to
> > > > > > the beginning of the ingress pipeline with inport = outport.  All
> > > > > > other registers are cleared, as if the packet just arrived on
> that
> > > > > > inport.
> > > > > >
> > > > > > This capability is needed in order to implement some of the
> east/west
> > > > > > distributed NAT flows.
> > > > > >
> > > > > > Note: The existing flags.loopback allows a packet to go from the
> end
> > > > > > of the ingress pipeline to the beginning of the egress pipeline
> with
> > > > > > outport = inport, which is different.
> > > > > >
> > > > > > Initially, there are no tests incorporated in this patch.  This
> > > > > > functionality is tested in a subsequent distributed NAT flows
> patch.
> > > > > > Tests specific to egress loopback may be added once the
> capability
> > > > > > to inject a packet with one of the flags bits set is added.
> > > > > >
> > > > > > Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
> > > > >
> > > > > I don't really understand this yet.
> > > > >
> > > > > Does this need to be a flag or can it be an action, i.e. one that
> > > > > immediately jumps back to the beginning of the ingress pipeline.
> Then
> > > > > we don't need hard-coded flags, we can just have used-defined
> register
> > > > > bits, etc.
> > > > >
> > > >
> > > > Since I am figuring out whether to do egress loopback at the end of
> > > > the egress pipeline, I could get rid of the FORCE_EGRESS_LOOPBACK
> > > > flag and use an action instead.
> > >
> > > OK.
> > >
> > > > I think I still need the EGRESS_LOOPBACK_OCCURRED bit to avoid
> > > > the packet getting dropped in table 1 because the logical router
> receives
> > > > a packet with its own IP address as source.
> > >
> > > I think that could be avoided, too, with a little more adjustment.
> > > First, instead of zeroing all the registers, maintain them (and then
> > > zero registers that should be zeroed using OVN logical actions).
> > > Second, use some designated bit in a register for this particular
> > > purpose.
> > >
> > > (In case it is not clear, my preference, overall, is to put policy, as
> > > much as possible, into the logical flow table instead of into the
> > > mechanism that surrounds it.)
> > >
> >
> > Probably the register bit setting should be within the clone.
> > Are you OK with setting a specific register bit in an OVN action
> definition?
>
> What do you mean by "within the clone"?
>
> To clarify what I'm talking about, I was expecting something like this
> to appear in the OVN logical actions:
> reg0=0; reg1=0; reg2=0; ...; regX[Y] = 1; recirculate;
> Maybe we need to save and restore the registers around the
> recirculation, in which case we can define a new OVN logical action that
> does that, e.g. maybe an OVN clone action:
> clone { reg0=0; reg1=0; reg2=0; ...; regX[Y] = 1; recirculate; }
>

I was thinking a single egress_loopback action that took care of the
clone, setting inport = outport, clearing the registers, and resubmitting
to table 16 all under the covers.

Should ovn-northd really be coding "reg0=0; reg1=0; ...; reg9=0"?

Mickey
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 4/7] ovn: add egress loopback capability

2017-01-09 Thread Mickey Spiegel
On Fri, Jan 6, 2017 at 4:28 PM, Mickey Spiegel <mickeys@gmail.com>
wrote:

>
> On Fri, Jan 6, 2017 at 3:57 PM, Ben Pfaff <b...@ovn.org> wrote:
>
>> On Fri, Jan 06, 2017 at 12:00:31PM -0800, Mickey Spiegel wrote:
>> > This patch adds the capability to force loopback at the end of the
>> > egress pipeline.  A new flags.force_egress_loopback symbol is defined,
>> > along with corresponding flags bits.  When flags.force_egress_loopback
>> > is set, at OFTABLE_LOG_TO_PHY, instead of the packet being sent out to
>> > the peer patch port or out the outport, the packet is forced back to
>> > the beginning of the ingress pipeline with inport = outport.  All
>> > other registers are cleared, as if the packet just arrived on that
>> > inport.
>> >
>> > This capability is needed in order to implement some of the east/west
>> > distributed NAT flows.
>> >
>> > Note: The existing flags.loopback allows a packet to go from the end
>> > of the ingress pipeline to the beginning of the egress pipeline with
>> > outport = inport, which is different.
>> >
>> > Initially, there are no tests incorporated in this patch.  This
>> > functionality is tested in a subsequent distributed NAT flows patch.
>> > Tests specific to egress loopback may be added once the capability
>> > to inject a packet with one of the flags bits set is added.
>> >
>> > Signed-off-by: Mickey Spiegel <mickeys@gmail.com>
>>
>> I don't really understand this yet.
>>
>> Does this need to be a flag or can it be an action, i.e. one that
>> immediately jumps back to the beginning of the ingress pipeline.  Then
>> we don't need hard-coded flags, we can just have used-defined register
>> bits, etc.
>>
>
> Since I am figuring out whether to do egress loopback at the end of
> the egress pipeline, I could get rid of the FORCE_EGRESS_LOOPBACK
> flag and use an action instead.
>
> I think I still need the EGRESS_LOOPBACK_OCCURRED bit to avoid
> the packet getting dropped in table 1 because the logical router receives
> a packet with its own IP address as source.
>

The alternative to the EGRESS_LOOPBACK_OCCURRED bit is to avoid
programming router port IP addresses that match an SNAT address in the
ingress router table 1 "ip4.src == " flow.  There is already similar logic
for
the ingress router table 1 "ip4.dst == " flow.

Some background on the use of egress loopback with NAT.
The intention is to use egress loopback only with NAT, for east/west
traffic destined to a NAT address. The packet flow is:
- router ingress pipeline on source hypervisor
  GW_REDIRECT pipeline stage near the end changes outport to
  distributed gateway port and forces the traffic to the "redirect-chassis"
  (either replacing outport with type "chassisredirect" port, or setting
  force_chassis_redirect flag, whichever mechanism we decide on).
- router egress pipeline on the "redirect-chassis" applies SNAT or
  UNDNAT which changes ip4.src, then triggers egress loopback.
- router ingress pipeline on the "redirect-chassis" receives a packet
  with inport = distributed gateway port, and ip4.src = a SNAT or
  DNAT external IP address, which could be the same as that router
  port's IP address. Without some change, either
  EGRESS_LOOPBACK_OCCURRED bit or relaxing which router
  port IP addresses are programmed in the drop flow, the packet
  would be dropped at ingress router table 1.
- router ingress pipeline on the "redirect-chassis" applies DNAT or
  UNSNAT which changes ip4.dst, which is then used for the IP
  Routing lookup.
- router egress pipeline on the "redirect-chassis" forwards to the
  outport in the normal manner to the destination logical switch.

Mickey


>> This needs real documentation in ovn-sb.xml instead of just being added
>> to a list.
>>
>> Mickey
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 3/7] ovn: Introduce "chassisredirect" port binding

2017-01-08 Thread Mickey Spiegel
On Fri, Jan 6, 2017 at 8:31 PM, Mickey Spiegel <mickeys@gmail.com>
wrote:

>
> On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <mickeys@gmail.com>
> wrote:
>
>>
>> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <b...@ovn.org> wrote:
>>
>>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
>>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <b...@ovn.org> wrote:
>>> >
>>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
>>> > > > Currently OVN handles all logical router ports in a distributed
>>> manner,
>>> > > > creating instances on each chassis.  The logical router ingress and
>>> > > > egress pipelines are traversed locally on the source chassis.
>>> > > >
>>> > > > In order to support advanced features such as one-to-many NAT (aka
>>> IP
>>> > > > masquerading), where multiple private IP addresses spread across
>>> > > > multiple chassis are mapped to one public IP address, it will be
>>> > > > necessary to handle some of the logical router processing on a
>>> specific
>>> > > > chassis in a centralized manner.
>>> > > >
>>> > > > The goal of this patch is to develop abstractions that allow for a
>>> > > > subset of router gateway traffic to be handled in a centralized
>>> manner
>>> > > > (e.g. one-to-many NAT traffic), while allowing for other subsets of
>>> > > > router gateway traffic to be handled in a distributed manner (e.g.
>>> > > > floating IP traffic).
>>> > > >
>>> > > > This patch introduces a new type of SB port_binding called
>>> > > > "chassisredirect".  A "chassisredirect" port represents a
>>> particular
>>> > > > instance, bound to a specific chassis, of an otherwise distributed
>>> > > > port.  The ovn-controller on that chassis populates the "chassis"
>>> > > > column for this record as an indication for other ovn-controllers
>>> of
>>> > > > its physical location.  Other ovn-controllers do not treat this
>>> port
>>> > > > as a local port.
>>> > > >
>>> > > > A "chassisredirect" port should never be used as an "inport".
>>> When an
>>> > > > ingress pipeline sets the "outport", it may set the value to a
>>> logical
>>> > > > port of type "chassisredirect".  This will cause the packet to be
>>> > > > directed to a specific chassis to carry out the egress logical
>>> router
>>> > > > pipeline, in the same way that a logical switch forwards egress
>>> traffic
>>> > > > to a VIF port residing on a specific chassis.  At the beginning of
>>> the
>>> > > > egress pipeline, the "outport" will be reset to the value of the
>>> > > > distributed port.
>>> > > >
>>> > > > For outbound traffic to be handled in a centralized manner, the
>>> > > > "outport" should be set to the "chassisredirect" port representing
>>> > > > centralized gateway functionality in the otherwise distributed
>>> router.
>>> > > > For outbound traffic to be handled in a distributed manner,
>>> locally on
>>> > > > the source chassis, the "outport" should be set to the existing
>>> "patch"
>>> > > > port representing distributed gateway functionality.
>>> > > >
>>> > > > Inbound traffic will be directed to the appropriate chassis by
>>> > > > restricting source MAC address usage and ARP responses to that
>>> chassis,
>>> > > > or by running dynamic routing protocols.
>>> > > >
>>> > > > Note that "chassisredirect" ports have no associated IP or MAC
>>> addresses.
>>> > > > Any pipeline stages that depend on port specific IP or MAC
>>> addresses
>>> > > > should be carried out in the context of the distributed port.
>>> > > >
>>> > > > Although the abstraction represented by the "chassisredirect" port
>>> > > > binding is generalized, in this patch the "chassisredirect" port
>>> binding
>

  1   2   >