Re: [ovs-discuss] OVN 21.12 and external ports

2022-10-06 Thread Daniel Alvarez Sanchez
+Lucas Martins  to this thread who's been working on
this particular area

On Thu, Oct 6, 2022 at 1:44 PM Numan Siddique  wrote:

> On Thu, Oct 6, 2022 at 4:27 AM Michał Nasiadka 
> wrote:
> >
> > Hello,
> >
> > I’m running OpenStack Wallaby and using Ironic for Bare Metal
> provisioning.
> > Neutron creates External ports for bare metal instances and uses
> ha_chassis_group.
> > Neutron normally defines a different priority for Routers LRP gateway
> chassis and ha_chassis_group.
> >
> > I have a router with two VLANs attached - external (used for internet
> connectivity - SNAT or DNAT/Floating IP) and internal VLAN network hosting
> bare metal servers (and some Geneve networks for VMs).
> >
> > If an External port’s HA chassis group active chassis is different than
> gateway chassis (external vlan network) active chassis - those bare metal
> servers have intermittent network connectivity for any traffic going
> through that router.
> >
> > Is that desired effect in OVN?
>
> I think it would cause some issues.  I'm not able to exactly recall
> why, but I've seen similar behavior and it is recommended that the
> same controller which is actively handling the gateway traffic also
> handles the external ports.  Maybe there is mac flapping of the router
> port ips.
> That could be the issue.  You can perhaps arp for the router ip from
> your bare metal machine and see if you get 2 arp replies - one from
> the controller which binds the external port and one from the gateway
> chassis controller.
>
> Thanks
> Numan
>
>
>
> >
> > Best regards,
> > Michal Nasiadka
> >
> >
> >
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovn dvr] dvr for vlan is not completely supported

2022-05-18 Thread Daniel Alvarez Sanchez
Thanks Liu

On Wed, May 18, 2022 at 8:47 AM 刘勰  wrote:

> Hi Daniel,
> Thanks for your reply.
> Yeah, we already config 'ovn-chassis-mac-mappings' for every chassis and
> flooding still exist.  And i dont think 'ovn-chassis-mac-mappings' is the
> solution for the matter.
> I think the core of this problem is how to let TOR learn MAC entry of vm.
>

Sorry, yes, for some reason I was thinking you were referring to the router
MAC.

I found a discuss related the matter[1].
>

Yes! Actually I initiated that thread :)
+Ankur who started to do some work in this area in case he has any updates.
I believe that sending periodic RARPs for VIFs on localnet switches would
be an ok solution but for large networks maybe this results in a lot of
broadcast traffic.

>
> [1]
> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-September/050678.html
>
>
>
> 发件人:Daniel Alvarez 
> 发送日期:2022-05-18 14:11:17
> 收件人:"刘勰" 
> 抄送人:ovs-discuss@openvswitch.org
> 主题:Re: [ovs-discuss] [ovn dvr] dvr for vlan is not completely supported
>
> Hi Liu,
>
> On 18 May 2022, at 05:08, 刘勰  wrote:
>
> 
> Hi folks, I have worked for dvr on vlan many days.
> I found that TOR switch would always broadcasting the traffic whose mac
> was not learn by TOR because the ovn-controller had already arp-proxy for
> it.
> One people had discuss it and put forward some solution to solve [1]. I
> wonder whether it is under development. Are there any plans?
>
>
> I believe this functionality is already available. In OpenStack we added
> support [0] to it some time back.
>
> Hope it helps.
>
> Daniel
>
> [0]
> https://opendev.org/openstack/puppet-ovn/commit/73a1d569220d2601d9446838caaacea4168d5ac4
>
>
>
> [1]https://www.openvswitch.org/support/ovscon2018/6/1440-sharma.pdf
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [ovn] TFTP server - next server missing?

2022-05-06 Thread Daniel Alvarez Sanchez
Hi folks,

While doing some tests with PXE booting and OVN, we noticed that even
though the tftp-server option was sent by ovn-controller, the baremetal
node wouldn't try to reach it to download the image. Comparing it to the
output sent by dnsmasq, it looks like we're missing the next server option.

After this 'hardcoded'  [0] (and dirty) patch, it seemed to work.

Is this something we should add to OVN? For example, when the tftp-*
options are set in the DHCP_Options table, have ovn-controller send the
next-server address in the DHCP offer?

Thanks!
daniel

[0]


diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index ae3da332c..6c2c75a64 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -2259,6 +2259,7 @@ pinctrl_handle_put_dhcp_opts(

 if (*in_dhcp_msg_type != OVN_DHCP_MSG_INFORM) {
 dhcp_data->yiaddr = (msg_type == DHCP_MSG_NAK) ? 0 : *offer_ip;
+dhcp_data->siaddr = (ovs_be32) inet_addr("172.27.7.29");
 } else {
 dhcp_data->yiaddr = 0;
 }
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Do entries in the MAC_Binding table age out? Seeing incorrect entry after a VIP moved

2022-01-26 Thread Daniel Alvarez Sanchez
Hey Brendan,

On Wed, Jan 26, 2022 at 12:52 PM Brendan Doyle 
wrote:

> Hi,
>
> So I have an underlay VIP that is reachable via a Gateway. The VIP moved
> to a new
> hypervisor after a simulated power failure (all hypervisors rebooted).
> When things came
> back OVN was resolving it to the wrong MAC address. When I looked in the
> MAC_binding
> table I saw that all entries for that address were wrong. I flushed the
> table and everything
> was fine (entries got updated with the correct MAC).
>
> So It seems to me that these entries must not be aged out? How is this
> supposed to work?
>

These entries do not age out. We discussed the topic a couple of other
times (eg. [0]) without a clear conclusion.
Also, we observed that in certain environments the size of this table can
increase to a point where the SB database starts to be huge (>1GB) mostly
due to the entries in this table.
To avoid this, we landed a patch in OpenStack recently that could be useful
for you [1].

[0] https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048936.html
[1] https://review.opendev.org/c/openstack/neutron/+/813610

>
> Thanks
>
> Brendan.
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN LSP with a unknown in address will not build arp response lflows

2021-11-04 Thread Daniel Alvarez Sanchez
adding the list back

On Fri, Oct 29, 2021 at 10:04 AM 鲁 成  wrote:

> When it come to me, I think LSP with address "fa:16:3e:b3:c0:e5
> 192.168.111.42" and unknown
> unknown it means port can send traffic with any mac address
>
> But for address "fa:16:3e:b3:c0:e5”, maybe we should make an arp reply for
> this address, don’t you think?
>

This used to be the former behavior but we hit use cases where a VM could
send traffic from a particular port with that IP address (192.168.111.42 in
your example) but a different MAC.
An example of this use case is NIC teaming where an IP fails over to a
different port but the MAC address is different.

The patch that changed this behavior is here:

https://patchwork.ozlabs.org/patch/1258152/

Hope it helps!
daniel


>
> Thanks
>
> 从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送
>
>
>
> *发件人: *Daniel Alvarez Sanchez 
> *发送时间: *Friday, October 29, 2021 3:58 PM
> *收件人: *鲁 成 
> *抄送: *b...@openvswitch.org
> *主题: *Re: [ovs-discuss] OVN LSP with a unknown in address will not build
> arp response lflows
>
>
>
> Hi,
>
>
>
> On Fri, Oct 29, 2021 at 5:50 AM 鲁 成  wrote:
>
> *Environment info:*
> OVN 21.06
>
> OVS 2.12.0
>
> *Reproduction:*
> 1. Create a port with neutronclient assign it to a node and close port
> security group
>
> 2. Create a ovs port and add it to br-int, and set interface iface-id same
> as neutron port uuid
>
> After it Neutron will create a LSP in OVN NB, and append unknow into LSP’s
> address field
>
> Check it in script[1]
>
>
>
> Port info:
> ()[root@ovn-tool-0 /]# ovn-nbctl find Logical_Switch_Port
> name=6a8064f9-f2cc-407d-b8da-345c6a216cb3
>
> _uuid   : 88fd1a84-8695-4cef-b916-45531edaf0db
>
> addresses   : ["fa:16:3e:b3:c0:e5 192.168.111.42", unknown]
>
> dhcpv4_options  : 1a8ca1af-519c-4aa2-b3a3-cc74955dee1f
>
> dhcpv6_options  : []
>
> dynamic_addresses   : []
>
> enabled : true
>
> external_ids: {"neutron:cidrs"="192.168.111.42/24",
> "neutron:device_id"="", "neutron:device_owner"="",
> "neutron:network_name"=neutron-6ac00688-422f-4a4f-99ae-b092b2d87f7b,
> "neutron:port_name"=lc-tap-2,
> "neutron:project_id"="498e2a96e4cc4edeb0c525a081dd6830",
> "neutron:revision_number"="4", "neutron:security_group_ids"=""}
>
> ha_chassis_group: []
>
> name: "6a8064f9-f2cc-407d-b8da-345c6a216cb3"
>
> options : {mcast_flood_reports="true",
> requested-chassis=node-1.domain.tld}
>
> parent_name : []
>
> port_security   : []
>
> tag : []
>
> tag_request : []
>
> type: ""
>
> up  : false
>
>
>
> *Results:*
> OVN will not build arp responder lfows for this LSP
>
>
>
>
>
> I believe that this is the expected behavior as you disable port security,
> meaning that the traffic from that port can come from any MAC address (it's
> unknown to OVN). Hence, it is up to the VM/container/whatever to reply to
> ARP requests and OVN should not reply on its behalf.
>
>
>
> Hope this helps.
>
>
>
> Thanks!
>
> daniel
>
>
>
>
>
>
>
> *Script:*
>
> [1]:
>
> #!/usr/bin/bash
>
>
>
> # Create port
>
> # neutron port-create --name lucheng-tap
> --binding:host_id=node-3.domain.tld share_net
>
>
>
> HOST=""
>
> MAC=""
>
>
>
> get_port_info() {
>
> source openrc
>
> port_id="$1"
>
> HOST=$(neutron port-show -F binding:host_id -f value "$port_id")
>
> MAC=$(neutron port-show -F mac_address -f value "$port_id")
>
> ip_info=$(neutron port-show -F fixed_ips -f value "$port_id")
>
> echo Port "$port_id" Mac: "$MAC" HOST: "$HOST"
>
> echo IP Info: "$ip_info"
>
> }
>
>
>
> create_ns() {
>
> port_id="$1"
>
> iface_name="lc-tap-${port_id:0:8}"
>
> netns_name="lc-vm-${port_id:0:8}"
>
> ssh "$HOST" ovs-vsctl add-port br-int "$iface_name" \
>
>   -- set Interface "$iface_name" type=internal \
>
>   -- set Interface "$iface_name" external_ids:iface-id="$port_id" \
>
>   -- set Interface "$iface_name" external_ids:attached-mac="$MAC" \
>
>   -- set Int

Re: [ovs-discuss] OVN LSP with a unknown in address will not build arp response lflows

2021-10-29 Thread Daniel Alvarez Sanchez
Hi,

On Fri, Oct 29, 2021 at 5:50 AM 鲁 成  wrote:

> *Environment info:*
> OVN 21.06
>
> OVS 2.12.0
>
> *Reproduction:*
> 1. Create a port with neutronclient assign it to a node and close port
> security group
>
> 2. Create a ovs port and add it to br-int, and set interface iface-id same
> as neutron port uuid
>
> After it Neutron will create a LSP in OVN NB, and append unknow into LSP’s
> address field
>
> Check it in script[1]
>
>
>
> Port info:
> ()[root@ovn-tool-0 /]# ovn-nbctl find Logical_Switch_Port
> name=6a8064f9-f2cc-407d-b8da-345c6a216cb3
>
> _uuid   : 88fd1a84-8695-4cef-b916-45531edaf0db
>
> addresses   : ["fa:16:3e:b3:c0:e5 192.168.111.42", unknown]
>
> dhcpv4_options  : 1a8ca1af-519c-4aa2-b3a3-cc74955dee1f
>
> dhcpv6_options  : []
>
> dynamic_addresses   : []
>
> enabled : true
>
> external_ids: {"neutron:cidrs"="192.168.111.42/24",
> "neutron:device_id"="", "neutron:device_owner"="",
> "neutron:network_name"=neutron-6ac00688-422f-4a4f-99ae-b092b2d87f7b,
> "neutron:port_name"=lc-tap-2,
> "neutron:project_id"="498e2a96e4cc4edeb0c525a081dd6830",
> "neutron:revision_number"="4", "neutron:security_group_ids"=""}
>
> ha_chassis_group: []
>
> name: "6a8064f9-f2cc-407d-b8da-345c6a216cb3"
>
> options : {mcast_flood_reports="true",
> requested-chassis=node-1.domain.tld}
>
> parent_name : []
>
> port_security   : []
>
> tag : []
>
> tag_request : []
>
> type: ""
>
> up  : false
>
>
>
> *Results:*
> OVN will not build arp responder lfows for this LSP
>


I believe that this is the expected behavior as you disable port security,
meaning that the traffic from that port can come from any MAC address (it's
unknown to OVN). Hence, it is up to the VM/container/whatever to reply to
ARP requests and OVN should not reply on its behalf.

Hope this helps.

Thanks!
daniel



>
> *Script:*
>
> [1]:
>
> #!/usr/bin/bash
>
>
>
> # Create port
>
> # neutron port-create --name lucheng-tap
> --binding:host_id=node-3.domain.tld share_net
>
>
>
> HOST=""
>
> MAC=""
>
>
>
> get_port_info() {
>
> source openrc
>
> port_id="$1"
>
> HOST=$(neutron port-show -F binding:host_id -f value "$port_id")
>
> MAC=$(neutron port-show -F mac_address -f value "$port_id")
>
> ip_info=$(neutron port-show -F fixed_ips -f value "$port_id")
>
> echo Port "$port_id" Mac: "$MAC" HOST: "$HOST"
>
> echo IP Info: "$ip_info"
>
> }
>
>
>
> create_ns() {
>
> port_id="$1"
>
> iface_name="lc-tap-${port_id:0:8}"
>
> netns_name="lc-vm-${port_id:0:8}"
>
> ssh "$HOST" ovs-vsctl add-port br-int "$iface_name" \
>
>   -- set Interface "$iface_name" type=internal \
>
>   -- set Interface "$iface_name" external_ids:iface-id="$port_id" \
>
>   -- set Interface "$iface_name" external_ids:attached-mac="$MAC" \
>
>   -- set Interface "$iface_name" external_ids:iface-status=active
>
>
>
> ssh "$HOST" ip netns add "$netns_name"
>
> ssh "$HOST" ip l set dev "$iface_name" address "$MAC"
>
> ssh "$HOST" ip l set "$iface_name" netns "$netns_name"
>
> ssh "$HOST" ip netns exec "$netns_name" ip l set lo up
>
> ssh "$HOST" ip netns exec "$netns_name" ip l set "$iface_name" up
>
> }
>
>
>
> main() {
>
> get_port_info "$1"
>
> create_ns "$1"
>
> }
>
>
>
> main $@
>
> neutron port-update --no-security-groups [port uuid]
>
> neutron port-update --port_security_enabled=false [port uuid]
>
>
>
> *What I found:*
>
> When try to build_lswitch_arp_nd_responder_known_ips in ovn northd, it
> will skip LSP, which has unknow flag.
>
> static void
>
> build_lswitch_arp_nd_responder_known_ips(struct ovn_port *op,
>
>  struct hmap *lflows,
>
>  struct hmap *ports,
>
>  struct ds *actions,
>
>  struct ds *match)
>
> {
>
> ...
>
> if (lsp_is_external(op->nbsp) || op->has_unknown) {
>
> return;
>
> }
>
>
>
> 从 Windows 版邮件 发送
>
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] BGP EVPN support

2021-03-25 Thread Daniel Alvarez Sanchez
Thanks Krzysztof, all

Let me see if I understand the 'native' proposal. Please amend as necessary
:)

On Tue, Mar 16, 2021 at 9:28 PM Krzysztof Klimonda <
kklimo...@syntaxhighlighted.com> wrote:

>
>
> On Tue, Mar 16, 2021, at 19:15, Mark Gray wrote:
> > On 16/03/2021 15:41, Krzysztof Klimonda wrote:
> > > Yes, that seems to be prerequisite (or one of prerequisites) for
> keeping current DPDK / offload capabilities, as far as I understand. By
> Proxy ARP/NDP I think you mean responding to ARP and NDP on behalf of the
> system where FRR is running?
> > >
> > > As for whether to go ovn-kubernetes way and try to implement it with
> existing primitives, or add BGP support directly into OVN, I feel like this
> should be a core feature of OVN itself and not something that could be
> built on top of it by a careful placement of logical switches, routers and
> ports. This would also help with management (you would configure new BGP
> connection by modifying northbound DB) and simplify troubleshooting in case
> something is not working as expected.
> > >
> >
> > There would be quite a lot of effort to implement BGP support directly
> > into OVN as per all the relevant BGP RPCs .. and the effort to maintain.
> > Another option might be to make FRR Openflow-aware and enabling it to
> > program Openflow flows directly into an OVN bridge much like it does
> > into the kernel today. FRR does provide some flexibility to extend like
> > that through the use of something like FPM
> > (http://docs.frrouting.org/projects/dev-guide/en/latest/fpm.html)
>
> Indeed, when I wrote "adding BGP support directly to OVN" I didn't really
> mean implementing BGP protocol directly in OVN, but rather implementing
> integration with FRR directly in OVN, and not by reusing existing
> resources. Making ovn-controller into fully fledged BGP peer seems.. like a
> nice expansion of the initial idea, assuming that the protocol could be
> offloaded to some library, but it's probably not a hard requirement for the
> initial implementation, as long as OVS can be programmed to deliver BGP
> traffic to FRR.
>
> When you write that FRR would program flows on OVS bridge, do you have
> something specific in mind? I thought the discussion so far was mostly one
> way BGP announcement with FRR "simply" announcing specific prefixes from
> the chassis nodes. Do you have something more in mind, like programming
> routes received from BGP router into OVN?
>

That's what I had also in my mind, ie. "announcing specific prefixes from
the chassis nodes". I'd leave the importing routers part for a later stage
if that's really a requirement.

For the announcing part, let's say we try to remove the kernel as much as
we can based on the discussion on this thread, then we're left with:

- Route announcement:
  - Configure some loopback IP address in OVN per hypervisor which is going
to be the nexthop of all routes announced from that chassis
  - Configuring OVN to tell which prefixes to announce - CMS responsibility
and some knob added into OVN NB as Mark suggests
  - OVN to talk to FRR somehow (gRPC?) to advertise the loopback IP as
directly connected and the rest of IPs in the chassis via its loopback IP

- Extra resources/flows:

Similar to [0]  we'd need:
  - 1 localnet LS per node that will receive the traffic directed to the
OVN loopback address
  - 1 gateway LR per node responsible for routing the traffic within the
node
  - 1 transit LS per node that connects the previous gateway to the
infrastructure router
  - 1 infrastructure LR to which all LS that require to expose routes will
attach to (consuming one IP address from each subnet exposed)
WARNING: scale!!

My question earlier this thread was who's responsible for creating all
these resources. If we don't want to put the burden of this to the CMS, are
you proposing OVN to do it 'under the hood'? What about the IP addresses
that we'll be consuming from the tenants? Maybe if doing it under the hood,
that's not required and we can do it in OpenFlow some other way. Is this
what you mean?

IMO, it is very complicated but maybe it brings a lot of value to OVN. The
benefit of the PoC approach is that we can use BGP (almost) today without
any changes to OVN or Neutron but I am for sure open to discuss the
'native' way more in depth!

[0]
https://raw.githubusercontent.com/ovn-org/ovn-kubernetes/master/docs/design/current_ovn_topology.svg

- Physical interfaces:
The PoC was under the assumption that all the compute nodes will have a
default ECMP route in the form of: 0.0.0.0 via nic1 via nic2
If we want to match this, we probably need to add OpenFlow rules to the
provider bridge and add both NICs to it.
If the NICs are used as well for control plane, management, storage or
whatever, we are creating a dependency on OVS for all that. I believe that
it is fine to lose the data plane if OVS goes down but not everything else.
Somebody may have a suggestion here though.
If not, we still need the kernel to do some steering 

Re: [ovs-discuss] BGP EVPN support

2021-03-16 Thread Daniel Alvarez Sanchez
On Tue, Mar 16, 2021 at 3:20 PM Krzysztof Klimonda <
kklimo...@syntaxhighlighted.com> wrote:

>
> On Tue, Mar 16, 2021, at 14:45, Luis Tomas Bolivar wrote:
>
> Of course we are fully open to redesign it if there is a better approach!
> And that was indeed the intention when linking to the current efforts,
> figure out if that was a "valid" way of doing it, and how it can be
> improved/redesigned. The main idea behind the current design was not to
> need modifications to core OVN as well as to minimize the complexity, i.e.,
> not having to implement another kind of controller for managing the extra
> OF flows.
>
> Regarding the metadata/localport, I have a couple of questions, mainly due
> to me not knowing enough about ovn/localport:
> 1) Isn't the metadata managed through a namespace? And the end of the day
> that is also visible from the hypervisor, as well as the OVS bridges
>
>
> Indeed, that's true - you can reach tenant's network from ovnmeta-
> namespace (where metadata proxy lives), however from what I remember while
> testing you can only establish connection to VMs running on the same
> hypervisor. Granted, this is less about "hardening" per se - any potential
> takeover of the hypervisor is probably giving the attacker enough tools to
> own entire overlay network anyway. Perhaps it's just giving me a bad
> feeling, where what should be an isolated public facing network can be
> reached from hypervisor without going through expected network path.
>
> 2) Another difference is that we are using BGP ECMP and therefore not
> associating any nic/bond to br-ex, and that is why we require some
> rules/routes to redirect the traffic to br-ex.
>
>
> That's an interesting problem  - I wonder if that can even be done in OVS
> today (for example with multipath action) and how would ovs handle incoming
> traffic (what flows are needed to handle that properly). I guess someone
> with OVS internals knowledge would have to chime in on this one.
>

OVN supports ECMP since 20.03 [0] and some enhancement for rerouting
policies has been added recently [1] so yeah it can be done in OVS as well
AFAIU.

[0] https://github.com/ovn-org/ovn/blob/master/NEWS#L113
[1] https://github.com/ovn-org/ovn/blob/master/NEWS#L12

>
> Thanks for your input! Really appreciated!
>
> Cheers,
> Luis
>
> On Tue, Mar 16, 2021 at 2:22 PM Krzysztof Klimonda <
> kklimo...@syntaxhighlighted.com> wrote:
>
>
> Would it make more sense to reverse this part of the design? I was
> thinking of having each chassis its own IPv4/IPv6 address used for next-hop
> in announcements and OF flows installed to direct BGP control packets over
> to the host system, in a similar way how localport is used today for
> neutron's metadata service (although I'll admit that I haven't looked into
> how this integrates with dpdk and offload).
>
>
> This way we can also simplify host's networking configuration as extra
> routing rules and arp entries are no longer needed (I think it would be
> preferable, from security perspective, for hypervisor to not have a direct
> access to overlay networks which seems to be the case when you use rules
> like that).
>
> --
>   Krzysztof Klimonda
>   kklimo...@syntaxhighlighted.com
>
>
>
> On Tue, Mar 16, 2021, at 13:56, Luis Tomas Bolivar wrote:
>
> Hi Krzysztof,
>
> On Tue, Mar 16, 2021 at 12:54 PM Krzysztof Klimonda <
> kklimo...@syntaxhighlighted.com> wrote:
>
>
> Hi Luis,
>
> I haven't yet had time to give it a try in our lab, but from reading your
> blog posts I have a quick question. How does it work when either DPDK or
> NIC offload is used for OVN traffic? It seems you are (de-)encapsulating
> traffic on chassis nodes by routing them through kernel - is this current
> design or just an artifact of PoC code?
>
>
> You are correct, that is a limitation as we are using kernel routing for
> N/S traffic, so DPDK/NIC offloading could not be used. That said, the E/W
> traffic still uses the OVN overlay and Geneve tunnels.
>
>
>
>
> --
>   Krzysztof Klimonda
>   kklimo...@syntaxhighlighted.com
>
>
>
> On Mon, Mar 15, 2021, at 11:29, Luis Tomas Bolivar wrote:
>
> Hi Sergey, all,
>
> In fact we are working on a solution based on FRR where a (python) agent
> reads from OVN SB DB (port binding events) and triggers FRR so that the
> needed routes gets advertised. It leverages kernel networking to redirect
> the traffic to the OVN overlay, and therefore does not require any
> modifications to ovn itself (at least for now). The PoC code can be found
> here: https://github.com/luis5tb/bgp-agent
>
> And there is a series of blog posts related to how to use it on OpenStack
> and how it works:
> - OVN-BGP agent introduction:
> https://ltomasbo.wordpress.com/2021/02/04/openstack-networking-with-bgp/
> - How to set ip up on DevStack Environment:
> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/
> - In-depth traffic flow inspection:
> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-in-depth-traffic-flow-inspection/

Re: [ovs-discuss] BGP EVPN support

2021-03-16 Thread Daniel Alvarez Sanchez
On Tue, Mar 16, 2021 at 2:45 PM Luis Tomas Bolivar 
wrote:

> Of course we are fully open to redesign it if there is a better approach!
> And that was indeed the intention when linking to the current efforts,
> figure out if that was a "valid" way of doing it, and how it can be
> improved/redesigned. The main idea behind the current design was not to
> need modifications to core OVN as well as to minimize the complexity, i.e.,
> not having to implement another kind of controller for managing the extra
> OF flows.
>
> Regarding the metadata/localport, I have a couple of questions, mainly due
> to me not knowing enough about ovn/localport:
> 1) Isn't the metadata managed through a namespace? And the end of the day
> that is also visible from the hypervisor, as well as the OVS bridges
> 2) Another difference is that we are using BGP ECMP and therefore not
> associating any nic/bond to br-ex, and that is why we require some
> rules/routes to redirect the traffic to br-ex.
>
> Thanks for your input! Really appreciated!
>
> Cheers,
> Luis
>
> On Tue, Mar 16, 2021 at 2:22 PM Krzysztof Klimonda <
> kklimo...@syntaxhighlighted.com> wrote:
>
>> Would it make more sense to reverse this part of the design? I was
>> thinking of having each chassis its own IPv4/IPv6 address used for next-hop
>> in announcements and OF flows installed to direct BGP control packets over
>> to the host system, in a similar way how localport is used today for
>> neutron's metadata service (although I'll admit that I haven't looked into
>> how this integrates with dpdk and offload).
>>
>
Hi Krzysztof, not sure I follow your suggestion but let me see if I do.
With this PoC, the kernel will do:

1) Routing to/from physical interface to OVN
2) Proxy ARP
3) Proxy NDP

Also FRR will advertise directly connected routes based on the IPs
configured on dummy interfaces.
All this comes with the benefit that no changes are required in the CMS or
OVN itself.

If I understand your proposal well, you would like to do 1), 2) and 3) in
OpenFlow so an agent running on all compute nodes is going to be
responsible for this? Or you propose adding extra OVN resources in a
similar way to what ovn-kubernetes does today [0] and in this case:

- Create an OVN Gateway router and connect it to the provider Logical Switch
- Advertise host routes through the Gateway Router IP address for each
node. This would consume one IP address per provider network per node
- Some external entity to configure ECMP routing to the ToRs
- Who creates/manages the infra resources? Onboarding new hypervisors
requires IPAM and more
- OpenStack provides flexibility to its users to customize their own
networking (more than ovn-kubernetes I believe). Mixing user created
network resources with infra resources in the same OVN cluster is non
trivial (eg. maintenance tasks, migration to OVN, ...)
- Scaling issues due to the larger number of resources/flows?

[0]
https://raw.githubusercontent.com/ovn-org/ovn-kubernetes/master/docs/design/current_ovn_topology.svg

This way we can also simplify host's networking configuration as extra
>> routing rules and arp entries are no longer needed (I think it would be
>> preferable, from security perspective, for hypervisor to not have a direct
>> access to overlay networks which seems to be the case when you use rules
>> like that).
>>
>
I agree with the fact that it'd simplify the host networking but will
overcomplicate the rest (unless I'm missing something which is more than
possible :)

Thanks a lot for the discussion,
Daniel


>
>> --
>>   Krzysztof Klimonda
>>   kklimo...@syntaxhighlighted.com
>>
>>
>>
>> On Tue, Mar 16, 2021, at 13:56, Luis Tomas Bolivar wrote:
>>
>> Hi Krzysztof,
>>
>> On Tue, Mar 16, 2021 at 12:54 PM Krzysztof Klimonda <
>> kklimo...@syntaxhighlighted.com> wrote:
>>
>>
>> Hi Luis,
>>
>> I haven't yet had time to give it a try in our lab, but from reading your
>> blog posts I have a quick question. How does it work when either DPDK or
>> NIC offload is used for OVN traffic? It seems you are (de-)encapsulating
>> traffic on chassis nodes by routing them through kernel - is this current
>> design or just an artifact of PoC code?
>>
>>
>> You are correct, that is a limitation as we are using kernel routing for
>> N/S traffic, so DPDK/NIC offloading could not be used. That said, the E/W
>> traffic still uses the OVN overlay and Geneve tunnels.
>>
>>
>>
>>
>> --
>>   Krzysztof Klimonda
>>   kklimo...@syntaxhighlighted.com
>>
>>
>>
>> On Mon, Mar 15, 2021, at 11:29, Luis Tomas Bolivar wrote:
>>
>> Hi Sergey, all,
>>
>> In fact we are working on a solution based on FRR where a (python) agent
>> reads from OVN SB DB (port binding events) and triggers FRR so that the
>> needed routes gets advertised. It leverages kernel networking to redirect
>> the traffic to the OVN overlay, and therefore does not require any
>> modifications to ovn itself (at least for now). The PoC code can be found
>> here: https://github.com/luis5tb/bgp-agent
>>
>> And 

[ovs-discuss] [OVN] Should we tunnel traffic on localnet switches?

2021-02-15 Thread Daniel Alvarez Sanchez
Hi folks,

Recently we found out that due to a misconfiguration of the OVN bridge
mappings, traffic that should be sent out to an external bridge was
tunneled to the destination. Since the traffic was working, it took a while
to spot the misconfiguration.

While this can be ok as it keeps everything functional, it can have an
impact in the throughput and the overall performance. The intent of this
email is to gather feedback as to whether we should keep this behavior or
rather, drop the traffic and log the misconfiguration issue (like
patch-port is missing, review the bridge mappings configuration?).

Looking forward to hearing from you.
Thanks a lot,
daniel
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Dynamic Routing

2021-01-18 Thread Daniel Alvarez Sanchez
Thanks a lot Ankur for your responses. Now it's much clearer for me :)

Is this patch [0] related to this effort in any way? I wonder if that'd for
example allow to have gateway ports bound to a particular chassis and, in
the same router, have another gateway port which is not bounded and N/S
traffic can be processed distributedly for this use case.

Thanks again!
daniel

[0] https://mail.openvswitch.org/pipermail/ovs-dev/2021-January/379446.html

On Sat, Jan 9, 2021 at 5:35 AM Ankur Sharma 
wrote:

> Hi Daniel,
>
> Glad to see your interest and queries. Please see responses below:
>
> Q1) Who is responsible for creating the VTEP endpoints on each hypervisor?
> Are they assumed to be created in advance or somehow this solution will
> take care of it? If the latter, how will it work and how will 'ovn-routing'
> know the addresses of the endpoints? OVN VTEP gateways?
>
> [ANKUR]: VTEP information is expected to be added out of band. For
> example, in our case the external gateway VTEP endpoint is added as a
> chassis by the management plane.
>
> 2) In the diagram at [0], what's the 'MAC ROUTER'? Is this OVN Logical
> Router connected to a Logical Switch with a localnet port and this MAC
> address corresponds to such port in the router? Or it would be the MAC
> address of '10.0.0.1'. What if two VMs in the same LS reside on different
> hypervisors, would you still advertise the same MAC but use a different VNI?
>
> With OVN routers being distributed we'd have the same MAC address
> advertised on multiple HVs and we need to use different VNIs to distinguish
> them, right?
> [ANKUR]: MAC ROUTER is the former, i.e OVN logical router connected to
> transit logical switch (please note that for NS connectivity in this case
> logical switch need not have localnet port, since we are not converting the
> packet to a VLAN packet). For all the VMs behind the logical router the
> advertised VNI is same and it is that of transit logical switch.
>
> Transit logical switch is the switch that connects OVN logical router with
> external router.
>
>
>
> 3) If two OVN VMs want to reach each other, it will still use the Geneve
> overlay right? This whole solution is mainly for incoming traffic or I'm
> missing something?
> [ANKUR]: Yes, now the workflow is that for EW traffic it is regular geneve
> encap and for NS instead of converting the packet to VLAN we forward it to
> external gateway using VXLAN and hence remove the requirement of a gateway
> chassis for NO NAT Cases.
>
>
> Please feel free to let us know, if you have further queries.
>
>
> Regards,
> Ankur
> --
> *From:* Greg Smith 
> *Sent:* Thursday, January 7, 2021 8:38 AM
> *To:* Daniel Alvarez Sanchez ; Ankur Sharma <
> ankur.sha...@nutanix.com>; Greg A. Smith 
> *Cc:* Frode Nordahl ; ovs-discuss <
> ovs-discuss@openvswitch.org>
> *Subject:* Re: [ovs-discuss] OVN Dynamic Routing
>
>
> + Greg A Smith
>
>
>
> *From: *Daniel Alvarez Sanchez 
> *Date: *Thursday, January 7, 2021 at 4:17 AM
> *To: *Ankur Sharma 
> *Cc: *Frode Nordahl , Greg Smith <
> g...@nutanix.com>, ovs-discuss 
> *Subject: *Re: [ovs-discuss] OVN Dynamic Routing
>
>
>
> Thanks Ankur, all for the presentation and slides.
>
>
>
> If I may, I have a some questions regarding the proposed solution:
>
>
>
> 1) Who is responsible for creating the VTEP endpoints on each hypervisor?
> Are they assumed to be created in advance or somehow this solution will
> take care of it? If the latter, how will it work and how will 'ovn-routing'
> know the addresses of the endpoints? OVN VTEP gateways?
>
>
>
> 2) In the diagram at [0], what's the 'MAC ROUTER'? Is this OVN Logical
> Router connected to a Logical Switch with a localnet port and this MAC
> address corresponds to such port in the router? Or it would be the MAC
> address of '10.0.0.1'. What if two VMs in the same LS reside on different
> hypervisors, would you still advertise the same MAC but use a different VNI?
>
> With OVN routers being distributed we'd have the same MAC address
> advertised on multiple HVs and we need to use different VNIs to distinguish
> them, right?
>
>
>
> 3) If two OVN VMs want to reach each other, it will still use the Geneve
> overlay right? This whole solution is mainly for incoming traffic or I'm
> missing something?
>
>
>
> I'm sorry if the questions are a bit blurry but I guess that after
> reviewing the slides and recording I didn't quite grasp it :)
>
>
>
> Thanks a lot in advance!
>
> daniel
>
>
>
> [0] https://youtu.be/9DL8M1d4xLY?t=330 [youtu.be]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__youtu.be_9DL8M1d4xLY-3Ft-3D

Re: [ovs-discuss] OVN Dynamic Routing

2021-01-07 Thread Daniel Alvarez Sanchez
Thanks Ankur, all for the presentation and slides.

If I may, I have a some questions regarding the proposed solution:

1) Who is responsible for creating the VTEP endpoints on each hypervisor?
Are they assumed to be created in advance or somehow this solution will
take care of it? If the latter, how will it work and how will 'ovn-routing'
know the addresses of the endpoints? OVN VTEP gateways?

2) In the diagram at [0], what's the 'MAC ROUTER'? Is this OVN Logical
Router connected to a Logical Switch with a localnet port and this MAC
address corresponds to such port in the router? Or it would be the MAC
address of '10.0.0.1'. What if two VMs in the same LS reside on different
hypervisors, would you still advertise the same MAC but use a different VNI?
With OVN routers being distributed we'd have the same MAC address
advertised on multiple HVs and we need to use different VNIs to distinguish
them, right?

3) If two OVN VMs want to reach each other, it will still use the Geneve
overlay right? This whole solution is mainly for incoming traffic or I'm
missing something?

I'm sorry if the questions are a bit blurry but I guess that after
reviewing the slides and recording I didn't quite grasp it :)

Thanks a lot in advance!
daniel

[0] https://youtu.be/9DL8M1d4xLY?t=330


On Mon, Dec 14, 2020 at 8:25 PM Ankur Sharma 
wrote:

> Hi Frode,
>
> Glad to see your message.
> Yes, while we started with EVPN as our main use case, we agree that it is
> more of a generic dynamic routing capability in OVN.
>
> Sure, we will kickstart the discussions around this on mailing list as
> well.
>
>
> Thanks
>
> Regards,
> Ankur
> --
> *From:* Frode Nordahl 
> *Sent:* Thursday, December 10, 2020 1:10 AM
> *To:* Ankur Sharma 
> *Cc:* Greg Smith ; ovs-discuss <
> ovs-discuss@openvswitch.org>
> *Subject:* OVN Dynamic Routing
>
> Hello, Ankur, Greg, All,
>
> Thank you for sharing your view on dynamic routing support for OVN
> during OVSCON 2020 [0].
>
> I believe this is a topic that interests multiple parties in the
> community, and it applies to multiple topologies/use cases, not just
> EVPN.
>
> Would you be interested in presenting and discussing the proposed
> design on the mailing list?
>
> 0:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.openvswitch.org_support_ovscon2020_=DwIBaQ=s883GpUCOChKOHiocYtGcg=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY=3HUT7aQXAsSImmITSraoqdZ6mWcRxUtwVRKAfH3ygQA=Xzze2S3f7Rstp66gRz7MSuTGvGltr3t8uxyx5M3R8og=
>
> --
> Frode Nordahl
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovn] Broken ovs localport flow for ovnmeta namespaces created by neutron

2020-12-17 Thread Daniel Alvarez Sanchez
On Tue, Dec 15, 2020 at 11:39 AM Krzysztof Klimonda <
kklimo...@syntaxhighlighted.com> wrote:

> Hi,
>
> Just as a quick update - I've updated our ovn version to 20.12.0 snapshot
> (d8bc0377c) and so far the problem hasn't yet reoccurred after over 24
> hours of tempest testing.
>

We could reproduce the issue with 20.12 and master. Also this is not
related exclusively to localports but to any port potentially.
Dumitru posted a fix for this:

http://patchwork.ozlabs.org/project/ovn/patch/1608197000-637-1-git-send-email-dce...@redhat.com/

Thanks!
daniel

>
> Best Regards,
> -Chris
>
>
> On Tue, Dec 15, 2020, at 11:13, Daniel Alvarez Sanchez wrote:
>
> Hey Krzysztof,
>
> On Fri, Nov 20, 2020 at 1:17 PM Krzysztof Klimonda <
> kklimo...@syntaxhighlighted.com> wrote:
>
> Hi,
>
> Doing some tempest runs on our pre-prod environment (stable/ussuri with
> ovn 20.06.2 release) I've noticed that some network connectivity tests were
> failing randomly. I've reproduced that by conitnously rescuing and
> unrescuing instance - network connectivity from and to VM works in general
> (dhcp is fine, access from outside is fine), however VM has no access to
> its metadata server (via 169.254.169.254 ip address). Tracing packet from
> VM to metadata via:
>
> 8<8<8<
> ovs-appctl ofproto/trace br-int
> in_port=tapa489d406-91,dl_src=fa:16:3e:2c:b0:fd,dl_dst=fa:16:3e:8b:b5:39
> 8<8<8<
>
> ends with
>
> 8<8<8<
> 65. reg15=0x1,metadata=0x97e, priority 100, cookie 0x15ec4875
> output:1187
>  >> Nonexistent output port
> 8<8<8<
>
> And I can verify that there is no flow for the actual ovnmeta tap
> interface (tap67731b0a-c0):
>
> 8<8<8<
> # docker exec -it openvswitch_vswitchd ovs-ofctl dump-flows br-int |grep
> -E output:'("tap67731b0a-c0"|1187)'
>  cookie=0x15ec4875, duration=1868.378s, table=65, n_packets=524,
> n_bytes=40856, priority=100,reg15=0x1,metadata=0x97e actions=output:1187
> #
> 8<8<8<
>
> From ovs-vswitchd.log it seems the interface tap67731b0a-c0 was added with
> index 1187, then deleted, and re-added with index 1189 - that's probably
> due to the fact that that is the only VM in that network and I'm constantly
> hard rebooting it via rescue/unrescue:
>
> 8<8<8<
> 2020-11-20T11:41:18.347Z|08043|bridge|INFO|bridge br-int: added interface
> tap67731b0a-c0 on port 1187
> 2020-11-20T11:41:30.813Z|08044|bridge|INFO|bridge br-int: deleted
> interface tapa489d406-91 on port 1186
> 2020-11-20T11:41:30.816Z|08045|bridge|WARN|could not open network device
> tapa489d406-91 (No such device)
> 2020-11-20T11:41:31.040Z|08046|bridge|INFO|bridge br-int: deleted
> interface tap67731b0a-c0 on port 1187
> 2020-11-20T11:41:31.044Z|08047|bridge|WARN|could not open network device
> tapa489d406-91 (No such device)
> 2020-11-20T11:41:31.050Z|08048|bridge|WARN|could not open network device
> tapa489d406-91 (No such device)
> 2020-11-20T11:41:31.235Z|08049|connmgr|INFO|br-int<->unix#31: 2069
> flow_mods in the last 43 s (858 adds, 814 deletes, 397 modifications)
> 2020-11-20T11:41:33.057Z|08050|bridge|INFO|bridge br-int: added interface
> tapa489d406-91 on port 1188
> 2020-11-20T11:41:33.582Z|08051|bridge|INFO|bridge br-int: added interface
> tap67731b0a-c0 on port 1189
> 2020-11-20T11:42:31.235Z|08052|connmgr|INFO|br-int<->unix#31: 168
> flow_mods in the 2 s starting 59 s ago (114 adds, 10 deletes, 44
> modifications)
> 8<8<8<
>
> Once I restart ovn-controller it recalculates local ovs flows and the
> problem is fixed so I'm assuming it's a local problem and not related to NB
> and SB databases.
>
>
> I have seen exactly the same which with 20.09, for the same port input and
> output ofports do not match:
>
> bash-4.4# ovs-ofctl dump-flows br-int table=0 | grep 745
>  cookie=0x38937d8e, duration=40387.372s, table=0, n_packets=1863,
> n_bytes=111678, idle_age=1, priority=100,in_port=745
> actions=load:0x4b->NXM_NX_REG13[],load:0x6a->NXM_NX_REG11[],load:0x69->NXM_NX_REG12[],load:0x18d->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8)
>
>
> bash-4.4# ovs-ofctl dump-flows br-int table=65 | grep 8937d8e
>  cookie=0x38937d8e, duration=40593.699s, table=65, n_packets=1848,
> n_bytes=98960, idle_age=2599, priority=100,reg15=0x1,metadata=0x18d
> actions=output:737
>
>
> In table=0, the ofport is fine (745) but in the output stage it is using a
> different one (737).
>
> By checking the OVS database transaction history, that port, at some
> point, had the 

Re: [ovs-discuss] [ovn] Broken ovs localport flow for ovnmeta namespaces created by neutron

2020-12-15 Thread Daniel Alvarez Sanchez
Hey Krzysztof,

On Fri, Nov 20, 2020 at 1:17 PM Krzysztof Klimonda <
kklimo...@syntaxhighlighted.com> wrote:

> Hi,
>
> Doing some tempest runs on our pre-prod environment (stable/ussuri with
> ovn 20.06.2 release) I've noticed that some network connectivity tests were
> failing randomly. I've reproduced that by conitnously rescuing and
> unrescuing instance - network connectivity from and to VM works in general
> (dhcp is fine, access from outside is fine), however VM has no access to
> its metadata server (via 169.254.169.254 ip address). Tracing packet from
> VM to metadata via:
>
> 8<8<8<
> ovs-appctl ofproto/trace br-int
> in_port=tapa489d406-91,dl_src=fa:16:3e:2c:b0:fd,dl_dst=fa:16:3e:8b:b5:39
> 8<8<8<
>
> ends with
>
> 8<8<8<
> 65. reg15=0x1,metadata=0x97e, priority 100, cookie 0x15ec4875
> output:1187
>  >> Nonexistent output port
> 8<8<8<
>
> And I can verify that there is no flow for the actual ovnmeta tap
> interface (tap67731b0a-c0):
>
> 8<8<8<
> # docker exec -it openvswitch_vswitchd ovs-ofctl dump-flows br-int |grep
> -E output:'("tap67731b0a-c0"|1187)'
>  cookie=0x15ec4875, duration=1868.378s, table=65, n_packets=524,
> n_bytes=40856, priority=100,reg15=0x1,metadata=0x97e actions=output:1187
> #
> 8<8<8<
>
> From ovs-vswitchd.log it seems the interface tap67731b0a-c0 was added with
> index 1187, then deleted, and re-added with index 1189 - that's probably
> due to the fact that that is the only VM in that network and I'm constantly
> hard rebooting it via rescue/unrescue:
>
> 8<8<8<
> 2020-11-20T11:41:18.347Z|08043|bridge|INFO|bridge br-int: added interface
> tap67731b0a-c0 on port 1187
> 2020-11-20T11:41:30.813Z|08044|bridge|INFO|bridge br-int: deleted
> interface tapa489d406-91 on port 1186
> 2020-11-20T11:41:30.816Z|08045|bridge|WARN|could not open network device
> tapa489d406-91 (No such device)
> 2020-11-20T11:41:31.040Z|08046|bridge|INFO|bridge br-int: deleted
> interface tap67731b0a-c0 on port 1187
> 2020-11-20T11:41:31.044Z|08047|bridge|WARN|could not open network device
> tapa489d406-91 (No such device)
> 2020-11-20T11:41:31.050Z|08048|bridge|WARN|could not open network device
> tapa489d406-91 (No such device)
> 2020-11-20T11:41:31.235Z|08049|connmgr|INFO|br-int<->unix#31: 2069
> flow_mods in the last 43 s (858 adds, 814 deletes, 397 modifications)
> 2020-11-20T11:41:33.057Z|08050|bridge|INFO|bridge br-int: added interface
> tapa489d406-91 on port 1188
> 2020-11-20T11:41:33.582Z|08051|bridge|INFO|bridge br-int: added interface
> tap67731b0a-c0 on port 1189
> 2020-11-20T11:42:31.235Z|08052|connmgr|INFO|br-int<->unix#31: 168
> flow_mods in the 2 s starting 59 s ago (114 adds, 10 deletes, 44
> modifications)
> 8<8<8<
>
> Once I restart ovn-controller it recalculates local ovs flows and the
> problem is fixed so I'm assuming it's a local problem and not related to NB
> and SB databases.
>
>
I have seen exactly the same which with 20.09, for the same port input and
output ofports do not match:

bash-4.4# ovs-ofctl dump-flows br-int table=0 | grep 745
 cookie=0x38937d8e, duration=40387.372s, table=0, n_packets=1863,
n_bytes=111678, idle_age=1, priority=100,in_port=745
actions=load:0x4b->NXM_NX_REG13[],load:0x6a->NXM_NX_REG11[],load:0x69->NXM_NX_REG12[],load:0x18d->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8)


bash-4.4# ovs-ofctl dump-flows br-int table=65 | grep 8937d8e
 cookie=0x38937d8e, duration=40593.699s, table=65, n_packets=1848,
n_bytes=98960, idle_age=2599, priority=100,reg15=0x1,metadata=0x18d
actions=output:737


In table=0, the ofport is fine (745) but in the output stage it is using a
different one (737).

By checking the OVS database transaction history, that port, at some point,
had the id 737:

record 6516: 2020-12-14 22:22:54.184

  table Interface row "tap71a5dfc1-10" (073801e2):
ofport=737
  table Open_vSwitch row 1d9566c8 (1d9566c8):
cur_cfg=2023

So it looks like ovn-controller is not updating the ofport in the physical
flows for the output stage.

We'll try to figure out if this happens also in master.

Thanks,
daniel


> --
>   Krzysztof Klimonda
>   kklimo...@syntaxhighlighted.com
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] Too many resubmits for packets coming from "external" network

2020-09-29 Thread Daniel Alvarez Sanchez
On Tue, Sep 29, 2020 at 1:14 PM Dumitru Ceara  wrote:

> On 9/29/20 1:07 PM, Krzysztof Klimonda wrote:
> > On Tue, Sep 29, 2020, at 12:40, Dumitru Ceara wrote:
> >> On 9/29/20 12:14 PM, Daniel Alvarez Sanchez wrote:
> >>>
> >>>
> >>> On Tue, Sep 29, 2020 at 11:14 AM Krzysztof Klimonda
> >>>  >>> <mailto:kklimo...@syntaxhighlighted.com>> wrote:
> >>>
> >>> On Tue, Sep 29, 2020, at 10:40, Dumitru Ceara wrote:
> >>> > On 9/29/20 12:42 AM, Krzysztof Klimonda wrote:
> >>> > > Hi Dumitru,
> >>> > >
> >>> > > This cluster is IPv4-only for now - there are no IPv6 networks
> >>> defined at all - overlay or underlay.
> >>> > >
> >>> > > However, once I increase a number of routers to ~250, a similar
> >>> behavior can be observed when I send ARP packets for non-existing
> >>> IPv4 addresses. The following warnings will flood ovs-vswitchd.log
> >>> for every address not known to OVN when I run `fping -g
> >>> 192.168.0.0/16` <http://192.168.0.0/16> <http://192.168.0.0/16>:
> >>> > >
> >>> > > ---8<---8<---8<---
> >>> > >
> >>>
>  2020-09-28T22:26:40.967Z|21996|ofproto_dpif_xlate(handler6)|WARN|over 4096
> >>> resubmit actions on bridge br-int while processing
> >>>
>  
> arp,in_port=1,vlan_tci=0x,dl_src=fa:16:3e:75:38:be,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.0.1,arp_tpa=192.168.0.35,arp_op=1,arp_sha=fa:16:3e:75:38:be,arp_tha=00:00:00:00:00:00
> >>> > > ---8<---8<---8<---
> >>> > >
> >>> > > This is even a larger concern for me, as some of our clusters
> >>> would be exposed to the internet where we can't easily prevent
> >>> scanning of an entire IP range.
> >>> > >
> >>> > > Perhaps this is something that should be handled differently
> for
> >>> traffic coming from external network? Is there any reason why OVN
> is
> >>> not dropping ARP requests and IPv6 ND for IP addresses it knows
> >>> nothing about? Or maybe OVN should drop most of BUM traffic on
> >>> external network in general? I think all this network is used for
> is
> >>> SNAT and/or SNAT+DNAT for overlay networks.
> >>> > >
> >>> >
> >>> > Ok, so I guess we need a combination of the existing broadcast
> domain
> >>> > limiting options:
> >>> >
> >>> > 1. send ARP/NS packets only to router ports that own the target
> IP
> >>> address.
> >>> > 2. flood IPv6 ND RS packets only to router ports with IPv6
> addresses
> >>> > configured and ipv6_ra_configs.address_mode set.
> >>> > 3. according to the logical switch multicast configuration either
> >>> flood
> >>> > unkown IP multicast or forward it only to hosts that registered
> >>> for the
> >>> > IP multicast group.
> >>> > 4. drop all other BUM traffic.
> >>> >
> >>> > From the above, 1 and 3 are already implemented. 2 is what I
> suggested
> >>> > earlier. 4 would probably turn out to be configuration option
> that
> >>> needs
> >>> > to be explicitly enabled on the logical switch connected to the
> >>> external
> >>> > network.
> >>> >
> >>> > Would this work for you?
> >>>
> >>> I believe it would work for me, although it may be a good idea to
> >>> consult with neutron developers and see if they have any input on
> that.
> >>>
> >>>
> >>> I think that's a good plan. Implementing 4) via a configuration option
> >>> sounds smart. From an OpenStack point of view, I think that as all the
> >>> ports are known, we can just have it on by default.
> >>> We need to make sure it works for 'edge' cases like virtual ports, load
> >>> balancers and subports (ports with a parent port and a tag) but the
> idea
> >>> sounds great to me.
> >>>
> >>> Thanks folks for the discussion!
> >>
> >> Thinking more about it it's probably not OK to drop all other BUM
> >> traffic. Instead

Re: [ovs-discuss] [OVN] Too many resubmits for packets coming from "external" network

2020-09-29 Thread Daniel Alvarez Sanchez
On Tue, Sep 29, 2020 at 11:14 AM Krzysztof Klimonda <
kklimo...@syntaxhighlighted.com> wrote:

> On Tue, Sep 29, 2020, at 10:40, Dumitru Ceara wrote:
> > On 9/29/20 12:42 AM, Krzysztof Klimonda wrote:
> > > Hi Dumitru,
> > >
> > > This cluster is IPv4-only for now - there are no IPv6 networks defined
> at all - overlay or underlay.
> > >
> > > However, once I increase a number of routers to ~250, a similar
> behavior can be observed when I send ARP packets for non-existing IPv4
> addresses. The following warnings will flood ovs-vswitchd.log for every
> address not known to OVN when I run `fping -g 192.168.0.0/16`
> :
> > >
> > > ---8<---8<---8<---
> > > 2020-09-28T22:26:40.967Z|21996|ofproto_dpif_xlate(handler6)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> arp,in_port=1,vlan_tci=0x,dl_src=fa:16:3e:75:38:be,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.0.1,arp_tpa=192.168.0.35,arp_op=1,arp_sha=fa:16:3e:75:38:be,arp_tha=00:00:00:00:00:00
> > > ---8<---8<---8<---
> > >
> > > This is even a larger concern for me, as some of our clusters would be
> exposed to the internet where we can't easily prevent scanning of an entire
> IP range.
> > >
> > > Perhaps this is something that should be handled differently for
> traffic coming from external network? Is there any reason why OVN is not
> dropping ARP requests and IPv6 ND for IP addresses it knows nothing about?
> Or maybe OVN should drop most of BUM traffic on external network in
> general? I think all this network is used for is SNAT and/or SNAT+DNAT for
> overlay networks.
> > >
> >
> > Ok, so I guess we need a combination of the existing broadcast domain
> > limiting options:
> >
> > 1. send ARP/NS packets only to router ports that own the target IP
> address.
> > 2. flood IPv6 ND RS packets only to router ports with IPv6 addresses
> > configured and ipv6_ra_configs.address_mode set.
> > 3. according to the logical switch multicast configuration either flood
> > unkown IP multicast or forward it only to hosts that registered for the
> > IP multicast group.
> > 4. drop all other BUM traffic.
> >
> > From the above, 1 and 3 are already implemented. 2 is what I suggested
> > earlier. 4 would probably turn out to be configuration option that needs
> > to be explicitly enabled on the logical switch connected to the external
> > network.
> >
> > Would this work for you?
>
> I believe it would work for me, although it may be a good idea to consult
> with neutron developers and see if they have any input on that.
>

I think that's a good plan. Implementing 4) via a configuration option
sounds smart. From an OpenStack point of view, I think that as all the
ports are known, we can just have it on by default.
We need to make sure it works for 'edge' cases like virtual ports, load
balancers and subports (ports with a parent port and a tag) but the idea
sounds great to me.

Thanks folks for the discussion!

>
> >
> > Thanks,
> > Dumitru
> >
> > > -- Krzysztof Klimonda kklimo...@syntaxhighlighted.com On Mon, Sep 28,
> > > 2020, at 21:14, Dumitru Ceara wrote:
> > >> On 9/28/20 5:33 PM, Krzysztof Klimonda wrote:
> > >>> Hi,
> > >>>
> > >> Hi Krzysztof,
> > >>
> > >>> We're still doing some scale tests of OpenStack ussuri with ml2/ovn
> driver. We've deployed 140 virtualized compute nodes, and started creating
> routers that share single external network between them. Additionally, each
> router is connected to a private network.
> > >>> Previously[1] we hit a problem of too many logical flows being
> generated per router connected to the same "external" network - this put
> too much stress on ovn-controller and ovs-vswitchd on compute nodes, and
> we've applied a patch[2] to limit a number of logical flows created per
> router.
> > >>> After we dealt with that we've done more testing and created 200
> routers connected to single external network. After that we've noticed the
> following logs in ovs-vswitchd.log:
> > >>>
> > >>> ---8<---8<---8<---
> > >>>
> 2020-09-28T11:10:18.938Z|18401|ofproto_dpif_xlate(handler9)|WARN|over 4096
> resubmit actions on bridge br-int while processing
> icmp6,in_port=1,vlan_tci=0x,dl_src=fa:16:3e:9b:77:c3,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::f816:3eff:fe9b:77c3,ipv6_dst=ff02::2,ipv6_label=0x2564e,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
> > >>> ---8<---8<---8<---
> > >>>
> > >>> That starts happening after I create ~178 routers connected to the
> same external network.
> > >>>
> > >>> IPv6 RS ICMP packets are coming from the external network - that's
> due to the fact that all virtual compute nodes have IPv6 address on their
> interface used for the external network and are trying to discover a
> gateway. That's by accident, and we can remove IPv6 address from that
> interface, however I'm worried that it would just hide some bigger issue
> with flows generated by OVN.
> > >>>
> > >> Is this an IPv4 cluster; are there IPv6 addresses configured on the
> > >> logical router ports 

Re: [ovs-discuss] [OVN] Packets flooded when using VLAN backed networks

2020-09-17 Thread Daniel Alvarez Sanchez
Thanks Ankur for your reply!

On Sat, Sep 12, 2020 at 4:27 AM Ankur Sharma 
wrote:

> Hi Daniel,
>
> Thanks a lot for starting the thread.
> Yes, you have a valid observation.
>
> "The reason is that, as we translate the eth.src to that of the
> "ovn-chassis-mac-mappings", the ToR will never see a packet whose eth.src
> is either vm1 or vm3 so it'll never learn their addresses and flood the
> traffic to all ports."
>
> The main reason is not chassis-mac-mappings, but the way OVN LR works.
>

Yes, that's right! Sorry for the confusion. Even with the
'reside-on-redirect-chassis' option we'd be observing this as well.

>
> TOPOLOGY:
> LS1  LR LS2
>
> VM1--LS1---CHASSIS1
> VM2--LS2---CHASSIS2
>
>
> Let us say we are pinging from VM1 --> VM2.
>
> a. For a routed traffic , we will never have source mac as VM mac, i.e
> once traffic is routed then, source mac will be replaced.
> b. The reason flooding is observed is because of following:
> i. A typical router will send ARP for the destination endpoint (VM2 in
> the example above) and the corresponding reply will cause initial learning
> of VM MAC (VM2 mac in the example above).
>ii. Similarly, after initial ARP resolution, it will do periodic ARP
> refresh by generating an ARP request (could be a unicast ARP) and again the
> reply will ensure that MAC entry in TOR is refreshed.
> c. Now, OVN LR may not send out ARP request on the wire (because of ARP
> suppression) and even if it does (let us say port just has mac and not the
> IP), then ARP response is not sent on the wire.
> d. Similarly, since there is no periodic ARP refresh, hence even if
> somehow initial learning happens, the MAC will eventually age out.
>
> We can do either of following:
> a. Assign a chassis to do periodic ARP refreshes.
> i. I started some work around this, but looked tricky, i.e besides
> sending the ARP refresh, then we should also remove the ARP cache entry for
> which certain threshhold of ARP refresh is failing.
> b. Periodic RARP advertisement of endpoints on localnet logical switches.
>

>
> And like you said, we can use this thread to converge.
>
> Thanks Ankur!  I don't have a strong opinion of either of the above, but
it looks like we want to do something about this (maybe configurable) to
avoid situations where we flood large amounts of traffic (for example in
long lived connections or bulk transfers or ...)

>
> Thanks
>
> Regards,
> Ankur
>
> --
> *From:* Daniel Alvarez Sanchez 
> *Sent:* Friday, September 11, 2020 8:57 AM
> *To:* ovs-discuss 
> *Cc:* Ankur Sharma 
> *Subject:* [OVN] Packets flooded when using VLAN backed networks
>
> Hi folks,
>
> This is probably not a bug and not sure if much can be done about it but
> thought of raising it here for discussion.
>
> I have deployed a simple topology with two logical switches (VLAN backed
> network), a logical router and a couple of VMs. When pinging between the
> logical switches, all the traffic is flooded in the upstream switch.
>
> Example using this logical [0] and physical [1] topologies, when pinging
> from vm3 (worker2) to vm1 (worker1) and capturing traffic on host1:
>
> 15:50:18.790323 1e:02:ad:bb:aa:dd > 40:44:00:00:00:01, ethertype 802.1Q
> (0x8100), length 102: vlan 190, p 0, ethertype IPv4, (tos 0x0, ttl 63, id
> 47366, offset 0, flags [DF], proto ICMP (1), length 84)
> 192.168.1.13 > 192.168.0.11 [192.168.0.11]
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.0.11=DwMFaQ=s883GpUCOChKOHiocYtGcg=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY=ismplsHTP5PsuM6SoMp5RKExr_6JaJ_jhFin7Bf622Y=b3pOCoAucQQ2VcKZ4gJOCmHfPfiiyyrhh7miaE-a7z8=>:
> ICMP echo request, id 1671, seq 11, length 64
> 15:50:18.790428 1e:02:ad:bb:aa:77 > 40:44:33:00:00:03, ethertype 802.1Q
> (0x8100), length 102: vlan 170, p 0, ethertype IPv4, (tos 0x0, ttl 63, id
> 44948, offset 0, flags [none], proto ICMP (1), length 84)
> 192.168.0.11 > 192.168.1.13 [192.168.1.13]
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.1.13=DwMFaQ=s883GpUCOChKOHiocYtGcg=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY=ismplsHTP5PsuM6SoMp5RKExr_6JaJ_jhFin7Bf622Y=bySzVdejw7-0AUNDmSi5lKdOcBsMpeeRSrXioKAZt48=>:
> ICMP echo reply, id 1671, seq 11, length 64
>
> The reason is that, as we translate the eth.src to that of the
> "ovn-chassis-mac-mappings", the ToR will never see a packet whose eth.src
> is either vm1 or vm3 so it'll never learn their addresses and flood the
> traffic to all ports.
>
> In the example above:
>
> [root@worker1 ~]# ovs-vsctl get open .
> external_ids:ovn-chassis-mac-mappings
> "tenant:1e:02:ad:bb:aa:77"
>
> [root@worker

[ovs-discuss] [OVN] Packets flooded when using VLAN backed networks

2020-09-11 Thread Daniel Alvarez Sanchez
Hi folks,

This is probably not a bug and not sure if much can be done about it but
thought of raising it here for discussion.

I have deployed a simple topology with two logical switches (VLAN backed
network), a logical router and a couple of VMs. When pinging between the
logical switches, all the traffic is flooded in the upstream switch.

Example using this logical [0] and physical [1] topologies, when pinging
from vm3 (worker2) to vm1 (worker1) and capturing traffic on host1:

15:50:18.790323 1e:02:ad:bb:aa:dd > 40:44:00:00:00:01, ethertype 802.1Q
(0x8100), length 102: vlan 190, p 0, ethertype IPv4, (tos 0x0, ttl 63, id
47366, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.1.13 > 192.168.0.11: ICMP echo request, id 1671, seq 11, length
64
15:50:18.790428 1e:02:ad:bb:aa:77 > 40:44:33:00:00:03, ethertype 802.1Q
(0x8100), length 102: vlan 170, p 0, ethertype IPv4, (tos 0x0, ttl 63, id
44948, offset 0, flags [none], proto ICMP (1), length 84)
192.168.0.11 > 192.168.1.13: ICMP echo reply, id 1671, seq 11, length 64

The reason is that, as we translate the eth.src to that of the
"ovn-chassis-mac-mappings", the ToR will never see a packet whose eth.src
is either vm1 or vm3 so it'll never learn their addresses and flood the
traffic to all ports.

In the example above:

[root@worker1 ~]# ovs-vsctl get open . external_ids:ovn-chassis-mac-mappings
"tenant:1e:02:ad:bb:aa:77"

[root@worker2 vagrant]# ovs-vsctl get open .
external_ids:ovn-chassis-mac-mappings
"tenant:1e:02:ad:bb:aa:dd"

I understand that the benefit of using the ovn-chassis-mac-mappings is the
distributed routing capabilities but I wonder if we could come up with a
way of avoiding the flood.

In case somebody's interested in replicating this scenario, you can find a
vagrant setup here [2].

Thanks!
daniel

[0] http://dani.foroselectronica.es/wp-content/uploads/2020/09/extp_log.png
[1] http://dani.foroselectronica.es/wp-content/uploads/2020/09/expt_phy.png
[2] https://github.com/danalsan/vagrants/tree/master/ovn-external-ports
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Vlan transparency in OVN

2020-06-04 Thread Daniel Alvarez Sanchez
Hi Slawek

On Thu, Jun 4, 2020 at 9:53 AM Slawek Kaplonski  wrote:

> Hi,
>
> On Tue, Jun 02, 2020 at 05:37:37PM -0700, Ben Pfaff wrote:
> > On Tue, Jun 02, 2020 at 01:25:05PM +0200, Slawek Kaplonski wrote:
> > > Hi,
> > >
> > > I work in OpenStack Neutron mostly. We have there extension called
> > > "vlan_transparent". See [1] for details.
> > > Basically it allows to send traffic with vlan tags directly to the VMs.
> > >
> > > Recently I was testing if that extension will work with OVN backend
> used in
> > > Neutron. And it seems that we have work to do to make it working.
> > > From my test I found out that for each port I had rule like:
> > >
> > > cookie=0x0, duration=17.580s, table=8, n_packets=6, n_bytes=444,
> idle_age=2, priority=100,metadata=0x2,vlan_tci=0x1000/0x1000 actions=drop
> > >
> > > which was dropping those tagged packets. After removal of this rule
> traffic was
> > > fine.
> > > So we need to have some way to tell northd that it shouldn't match on
> vlan_tci
> > > at all in case when neutron network has got vlan_transparency set to
> True.
> > >
> > > From the discussion with Daniel Alvarez he told me that somehow we can
> try to
> > > leverage such columns to request transparency (for example:
> parent_name=none
> > > and tag_request=0). With this, northd can enforce transparency per
> port.
> > >
> > > Another option could be to create an option in the “other_config”
> column in the
> > > logical switch to have the setting per Neutron network
> > > (other_config:vlan_transparent) While this seems more natural, it may
> break the
> > > trunk/subport current feature.
> > >
> > > What do You, as ovn developers thinks about that?
> > > Is that maybe possible somehow to do currently in northd? Or is one of
> the
> > > options given above doable and acceptable for You?
> >
> > This might be a place to consider using QinQ (at least, until Neutron
> > introduces QinQ transparency).
>
> I'm not sure if I understand. For now Neutron don't supports QinQ - old
> RFE is
> postponed currently [1].
> And my original use case is related to the Neutron tenant networks which is
> Geneve type. How QinQ can help with that?
>

I think that Ben's suggestion could possibly allow you to achieve your goal
while at the same time having QinQ in OVN:?

On one hand, not matching on CFI bit (vlan_tci=0x1000/0x1000) based on some
configuration of the Logical Switch, would achieve the VLAN transparency by
allowing traffic tagged on a logical switch port that is originally
untagged.

While this is probably enough for your use case, it'll break the
trunk/subport use case where we expect the traffic to be tagged. In this
case we'll need to pop one VLAN tag (the one to achieve VLAN transparency)
and match on the second tag to determine the logical port (subport). This
could be solved in the general case by QinQ but looks more complex I
believe.

Thoughts?

>
>
> [1] https://bugs.launchpad.net/neutron/+bug/1705719
>
> --
> Slawek Kaplonski
> Senior software engineer
> Red Hat
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

2020-05-28 Thread Daniel Alvarez Sanchez
Hi all

Sorry for top posting. I want to thank you all for the discussion and
give also some feedback from OpenStack perspective which is affected
by the problem described here.

In OpenStack, it's kind of common to have a shared external network
(logical switch with a localnet port) across many tenants. Each tenant
user may create their own router where their instances will be
connected to access the external network.

In such scenario, we are hitting the issue described here. In
particular in our tests we exercise 3K VIFs (with 1 FIP) each spanning
300 LS; each LS connected to a LR (ie. 300 LRs) and that router
connected to the public LS. This is creating a huge problem in terms
of performance and tons of events due to the MAC_Binding entries
generated as a consequence of the GARPs sent for the floating IPs.

Thanks,
Daniel


On Thu, May 28, 2020 at 10:51 AM Dumitru Ceara  wrote:
>
> On 5/28/20 8:34 AM, Han Zhou wrote:
> >
> >
> > On Wed, May 27, 2020 at 1:10 AM Dumitru Ceara  > > wrote:
> >>
> >> Hi Girish, Han,
> >>
> >> On 5/26/20 11:51 PM, Han Zhou wrote:
> >> >
> >> >
> >> > On Tue, May 26, 2020 at 1:07 PM Girish Moodalbail
> > mailto:gmoodalb...@gmail.com>
> >> > >> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On Tue, May 26, 2020 at 12:42 PM Han Zhou  > 
> >> > >> wrote:
> >> >>>
> >> >>> Hi Girish,
> >> >>>
> >> >>> Thanks for the summary. I agree with you that GARP request v.s. reply
> >> > is irrelavent to the problem here.
> >>
> >> Well, actually I think GARP request vs reply is relevant (at least for
> >> case 1 below) because if OVN would be generating GARP replies we
> >> wouldn't need the priority 80 flow to determine if an ARP request packet
> >> is actually an OVN self originated GARP that needs to be flooded in the
> >> L2 broadcast domain.
> >>
> >> On the other hand, router3 would be learning mac_binding IP2,M2 from the
> >> GARP reply originated by router2 and vice versa so we'd have to restrict
> >> flooding of GARP replies to non-patch ports.
> >>
> >
> > Hi Dumitru, the point was that, on the external LS, the GRs will have to
> > send ARP requests to resolve unknown IPs (at least for the external GW),
> > and it has to be broadcasted, which will cause all the GRs learn all
> > MACs of other GRs. This is regardless of the GARP behavior. You are
> > right that if we only consider the Join switch then the GARP request
> > v.s. reply does make a difference. However, GARP request/reply may be
> > really needed only on the external LS.
> >
>
> Ok, but do you see an easy way to determine if we need to add the
> logical flows that flood self originated GARP packets on a given logical
> switch? Right now we add them on all switches.
>
> >> >>> Please see my comment inline below.
> >> >>>
> >> >>> On Tue, May 26, 2020 at 12:09 PM Girish Moodalbail
> >> > mailto:gmoodalb...@gmail.com>
> > >> wrote:
> >> >>> >
> >> >>> > Hello Dumitru,
> >> >>> >
> >> >>> > There are several things that are being discussed on this thread.
> >> > Let me see if I can tease them out for clarity.
> >> >>> >
> >> >>> > 1. All the router IPs are known to OVN (the join switch case)
> >> >>> > 2. Some IPs are known and some are not known (the external logical
> >> > switch that connects to physical network case).
> >> >>> >
> >> >>> > Let us look at each of the case above:
> >> >>> >
> >> >>> > 1. Join Switch Case
> >> >>> >
> >> >>> > ++++
> >> >>> > |   l3gateway||   l3gateway|
> >> >>> > |router2 ||router3 |
> >> >>> > +-+--++-+--+
> >> >>> > IP2,M2 IP3,M3
> >> >>> >   | |
> >> >>> >+--+-+---+
> >> >>> >|join switch |
> >> >>> >+-+--+
> >> >>> >  |
> >> >>> >   IP1,M1
> >> >>> >  +---++
> >> >>> >  |  distributed   |
> >> >>> >  | router |
> >> >>> >  ++
> >> >>> >
> >> >>> >
> >> >>> > Say, GR router2 wants to send the packet out to DR and that we
> >> > don't have static mappings of MAC to IP in lr_in_arp_resolve table on GR
> >> > router2 (with Han's patch of dynamic_neigh_routes=true for all the
> >> > Gateway Routers). With this in mind, when an ARP request is sent out by
> >> > router2's hypervisor the packet should be directly sent to the
> >> > distributed router alone. Your commit 32f5ebb0622 (ovn-northd: Limit
> >> > ARP/ND broadcast domain whenever possible) should have allowed only
> >> > unicast. However, in ls_in_l2_lkup table we have
> >> >>> >
> >> >>> >   table=19(ls_in_l2_lkup  ), priority=80   , match=(eth.src ==
> >> > { M2 } && 

Re: [ovs-discuss] [OVN] OVN Load balancing algorithm

2020-04-21 Thread Daniel Alvarez Sanchez
Thanks Numan for the investigation and the great explanation!

On Tue, Apr 21, 2020 at 9:38 AM Numan Siddique  wrote:

> On Fri, Apr 17, 2020 at 12:56 PM Han Zhou  wrote:
> >
> >
> >
> > On Tue, Apr 7, 2020 at 7:03 AM Maciej Jozefczyk 
> wrote:
> > >
> > > Hello!
> > >
> > > I would like to ask you to clarify how the OVN Load balancing
> algorithm works.
> > >
> > > Based on the action [1]:
> > > 1) If connection is alive the same 'backend' will be chosen,
> > >
> > > 2) If it is a new connection the backend will be chosen based on
> selection_method=dp_hash [2].
> > > Based on changelog the dp_hash uses '5 tuple hash' [3].
> > > The hash is calculated based on values: source and destination IP,
> source port, protocol and arbitrary value - 42. [4]
> > > Based on that information we could name it SOURCE_IP_PORT.
> > >
> > > Unfortunately we recently got a bug report in OVN Octavia provider
> driver project, that the Load Balancing in OVN
> > > works differently [5]. The report shows even when the test uses the
> same source ip and port, but new TCP connection,
> > > traffic is randomly distributed, but based on [2] it shouldn't?
> > >
> > > Is it a bug?  Is something else taken to account while creating a
> hash? Can it be fixed in OVS/OVN?
> > >
> > >
> > >
> > > Thanks,
> > > Maciej
> > >
> > >
> > > [1]
> https://github.com/ovn-org/ovn/blob/branch-20.03/lib/actions.c#L1017
> > > [2]
> https://github.com/ovn-org/ovn/blob/branch-20.03/lib/actions.c#L1059
> > > [3]
> https://github.com/openvswitch/ovs/blob/d58b59c17c70137aebdde37d3c01c26a26b28519/NEWS#L364-L371
> > > [4]
> https://github.com/openvswitch/ovs/blob/74286173f4d7f51f78e9db09b07a6d4d65263252/lib/flow.c#L2217
> > > [5] https://bugs.launchpad.net/neutron/+bug/1871239
> > >
> > > --
> > > Best regards,
> > > Maciej Józefczyk
> > > ___
> > > discuss mailing list
> > > disc...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> > Hi Maciej,
> >
> > Thanks for reporting. It is definitely strange that same 5-tuple flow
> resulted in hitting different backends. I didn't observed such behavior
> before (maybe I should try again myself to confirm). Can you make sure
> during the testing the group bucket didn't change? You can do so by:
> > # ovs-ofctl dump-groups br-int
> > and also check the group stats and see if multiple buckets has counter
> increased during the test
> > # ovs-ofctl dump-group-stats br-int [group]
> >
> > For the 5-tuple hash function you are seeing flow_hash_5tuple(), it is
> using all the 5-tuples. It adds both ports (src and dst) at once:
> >/* Add both ports at once. */
> > hash = hash_add(hash,
> > ((const uint32_t *)flow)[offsetof(struct flow,
> tp_src)
> >  / sizeof(uint32_t)]);
> >
> > The tp_src is the start of the offset, and the size is 32, meaning both
> src and dst, each is 16 bits. (Although I am not sure if dp_hash method is
> using this function or not. Need to check more code)
> >
> > BTW, I am not sure why Neutron give it the name SOURCE_IP_PORT. Shall it
> be called just 5-TUPLE, since protocol, destination IP and PORT are also
> considered in the hash.
> >
>
>
> Hi Maciej and Han,
>
> I did some testing and I can confirm as you're saying. OVN is not
> choosing the same backend with the src ip, src port fixed.
>
> I think there is an issue with OVN on how it is programming the group
> flows.  OVN is setting the selection_method as dp_hash.
> But when ovs-vswitchd receives the  GROUP_MOD openflow message, I
> noticed that the selection_method is not set.
> From the code I see that selection_method will be encoded only if
> ovn-controller uses openflow version 1.5 [1]
>
> Since selection_method is NULL, vswitchd uses the dp_hash method [2].
> dp_hash means it uses the hash calculated by
> the datapath. In the case of kernel datapath, from what I understand
> it uses skb_get_hash().
>
> I modified the vswitchd code to use the selection_method "hash" if
> selection_method is not set. In this case the load balancer
> works as expected. For a fixed src ip, src port, dst ip and dst port,
> the group action is selecting the same bucket always. [3]
>
> I think we need to fix a few issues in OVN
>   - Use openflow 1.5 so that ovn can set selection_method
>  -  Use "hash" method if dp_hash is not choosing the same bucket for
> 5-tuple hash.
>   - May be provide the option for the CMS to choose an algorithm i.e.
> to use dp_hash or hash.
>
> I'd rather not expose this to the CMS as it depends on the datapath
implementation as per [0] but maybe it makes sense to eventually abstract
it to the CMS in a more LB-ish way (common algorithm names used in load
balancing) in the case at some point the LB feature is enhanced somehow to
support more algorithms.

I believe that for OVN LB users, using OF 1.5 to force the use of 'hash'
would be the best solution now.

My 2 

Re: [ovs-discuss] No connectivity due to missing ARP reply

2020-04-03 Thread Daniel Alvarez Sanchez
Thanks Michael for reporting this and Dumitru for fixing it!

@Michael, I guess that this bug is only relevant for the non-DVR case
right? ie. if the NAT entry for the FIP has both the external_mac and
logical_port fields, then the ARP reply will happen on the compute node
hosting the target VM and that'd be fine as it doesn't need to traverse the
gw node. Am I right?

On Tue, Mar 24, 2020 at 12:17 PM Dumitru Ceara  wrote:

> On 3/24/20 8:50 AM, Plato, Michael wrote:
> > Hi Dumitru,
> >
> > thank you very much for the patch. I tried it and it works. VM1 can now
> reach VM2.
> >
> > Best regards!
> >
> > Michael
>
> Hi Michael,
>
> The fix is now merged in OVN master and branch 20.03:
>
>
> https://github.com/ovn-org/ovn/commit/d2ab98463f299e67a9f9a31e8b7c42680b8645cf
>
> Regards,
> Dumitru
>
> >
> >
> > -Ursprüngliche Nachricht-
> > Von: Dumitru Ceara 
> > Gesendet: Montag, 23. März 2020 13:28
> > An: Plato, Michael ;
> ovs-discuss@openvswitch.org
> > Betreff: Re: [ovs-discuss] No connectivity due to missing ARP reply
> >
> > On 3/21/20 7:04 PM, Plato, Michael wrote:
> >>
> >> Hi all,
> >>
> >> we use OVN with Openstack and have a problem with the following setup:
> >>
> >>
> >>  |   |
> >>  --- | 10.176.0.156  |   ---
> >>  | VM1 |-   | 192.168.0.1|---| VM2 |
> >>  --- |   |
>  ---
> >> 10.176.0.3.123   |--|  R1  |-|   192.168.0.201 /
> GW: 192.168.0.1
> >> GW:10.176.0.1| |(test)|  |   FIP:
> 10.176.2.19
> >>  |   |
> >>   Outside  test
> >>   (10.176.0.0/16) (192.168.0.0/24)
> >>   (VLAN)  (GENEVE)
> >>
> >>
> >> Versions:
> >> - OVN (20.03)
> >> - OVS (2.13)
> >> - networking-ovn (7.1.0)
> >>
> >> Problem:
> >> - no connectivity due to missing ARP reply for FIP 10.176.2.19 from
> >> VM1 (if VM1 is not on GW Chassis for R1 -> is_chassis_resident rules
> >> not applied)
> >> - after moving VM1 to chassis hosting R1 ARP reply appears (due to
> >> local "is_chassis_resident" ARP responder rules)
> >> - temporarily removing priority 75 rules (inserted by commit [0])
> >> restores functionality (even on non gateway chassis), because ARP
> >> requests were flooded to complete L2 domain (but this creates a
> >> scaling issue)
> >>
> >>
> >> Analysis:
> >> - according to ovs-detrace the ARP requests were dropped instead of
> >> being forwarded to remote chassis hosting R1 (as intended by [0])
> >>
> >>
> >> Flow:
> >> arp,in_port=61,vlan_tci=0x,dl_src=fa:16:3e:5e:79:d9,dl_dst=ff:ff:f
> >> f:ff:ff:ff,arp_spa=10.176.3.123,arp_tpa=10.176.2.19,arp_op=1,arp_sha=f
> >> a:16:3e:5e:79:d9,arp_tha=00:00:00:00:00:00
> >>
> >>
> >> bridge("br-int")
> >> 
> >> 0. in_port=61, priority 100, cookie 0x862b95fc
> >> set_field:0x1->reg13
> >> set_field:0x7->reg11
> >> set_field:0x5->reg12
> >> set_field:0x1a->metadata
> >> set_field:0x4->reg14
> >> resubmit(,8)
> >>   *  Logical datapath: "neutron-c2a82a31-632b-4d24-8f35-8a79e2a207a7"
> >> (d516056b-19a6-4613-9838-8c62452fe31d)
> >>   *  Port Binding: logical_port "b19ceab1-c7fe-4c3b-8733-d88cabaa0a23",
> tunnel_key 4, chassis-name "383eb44a-de85-485a-9606-2fc649a9cbb9",
> chassis-str "os-compute-01"
> >> 8. reg14=0x4,metadata=0x1a,dl_src=fa:16:3e:5e:79:d9, priority 50,
> >> cookie 0x9a357820
> >> resubmit(,9)
> >>   *  Logical datapath: "neutron-c2a82a31-632b-4d24-8f35-8a79e2a207a7"
> >> (d516056b-19a6-4613-9838-8c62452fe31d) [ingress]
> >>   *  Logical flow: table=0 (ls_in_port_sec_l2), priority=50,
> >> match=(inport == "b19ceab1-c7fe-4c3b-8733-d88cabaa0a23" && eth.src ==
> >> {fa:16:3e:5e:79:d9}), actions=(next;)
> >>*  Logical Switch Port: b19ceab1-c7fe-4c3b-8733-d88cabaa0a23 type
> >> (addresses ['fa:16:3e:5e:79:d9 10.176.3.123'], dynamic addresses [],
> >> security ['fa:16:3e:5e:79:d9 10.176.3.123'] 9. metadata=0x1a, priority
> >> 0, cookie 0x1a478ee1
> >> resubmit(,10)
> >>   *  Logical datapath: "neutron-c2a82a31-632b-4d24-8f35-8a79e2a207a7"
> >> (d516056b-19a6-4613-9838-8c62452fe31d) [ingress]
> >>   *  Logical flow: table=1 (ls_in_port_sec_ip), priority=0, match=(1),
> >> actions=(next;) 10.
> >> arp,reg14=0x4,metadata=0x1a,dl_src=fa:16:3e:5e:79:d9,arp_spa=10.176.3.
> >> 123,arp_sha=fa:16:3e:5e:79:d9, priority 90, cookie 0x8c5af8ff
> >> resubmit(,11)
> >>   *  Logical datapath: "neutron-c2a82a31-632b-4d24-8f35-8a79e2a207a7"
> >> (d516056b-19a6-4613-9838-8c62452fe31d) [ingress]
> >>   *  Logical flow: table=2 (ls_in_port_sec_nd), priority=90,
> >> match=(inport == "b19ceab1-c7fe-4c3b-8733-d88cabaa0a23" && eth.src ==
> >> fa:16:3e:5e:79:d9 && arp.sha == fa:16:3e:5e:79:d9 && arp.spa ==
> >> {10.176.3.123}), 

Re: [ovs-discuss] [ovn]OVN weekly meeting logs

2020-03-27 Thread Daniel Alvarez Sanchez
On Fri, Mar 27, 2020 at 4:22 PM Ben Pfaff  wrote:

> On Fri, Mar 27, 2020 at 02:23:13PM +0530, Numan Siddique wrote:
> > You can find yesterday's OVN weekly meeting logs here -
> >
> http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2020/ovn_community_development_discussion.2020-03-26-17.13.log.txt
>
> Thanks for passing that along.  I think it's good to get in a habit of
> posting the links here.
>

++ This is great, thanks folks for using the bot and sending out the minutes
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] Logging IRC meetings

2020-03-20 Thread Daniel Alvarez Sanchez
Hi folks,

17:58:31--> | openstack (~openstack@openstack/openstack) has joined
#openvswitch

We have the bot now ready to log the meetings :)!!!
Ideally this should be just for OVN meetings. Please, refer to its usage
here:

https://wiki.openstack.org/wiki/Meetings/ChairaMeeting

After the meetings, it'd be nice to send an email to the ML with the
minutes from [0] and work out a mechanism to link them in the ovn.org
website.

Thanks a lot!
Daniel

[0] http://eavesdrop.openstack.org/irclogs/%23openvswitch/

On Fri, Mar 6, 2020 at 9:04 PM Daniel Alvarez Sanchez 
wrote:

> Thanks a lot Ben
>
> On Fri, Mar 6, 2020 at 8:48 PM Ben Pfaff  wrote:
>
>> On Fri, Mar 06, 2020 at 04:09:41PM +0100, Daniel Alvarez Sanchez wrote:
>> > In OpenStack we use Meetbot [0] to log the IRC meetings and it'll
>> generate
>> > the minutes afterwards that we can send after the meeting to the ML
>> index
>> > them in the website (coming up!).
>> >
>> > @Ben, as founder of the #openvswitch channel we can start straight away
>> > using OpenStack Meetbot by following steps at [1]. I checked with
>> > #openstack-infra folks and they're fine with #openvswitch logging the
>> > meetings using the bot.
>>
>> OK, I did the steps there, that is, I sent the 3 messages to chanserv
>> that it mentioned.  It didn't make meetbot join the channel, which I
>> think is part of the goal.  I am not sure what I need to do to make that
>> happen.  Do you know what I missed?
>>
>
> I think that we missed this patch that I just proposed [0].
> Once approved I believe we'll see the bot joining.
>
> Thanks a lot!
> Daniel
>
> [0] https://review.opendev.org/#/c/711756/
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] newbie to start

2020-03-20 Thread Daniel Alvarez Sanchez
Hi Dan,

This is a question that should've gone to the OpenStack mailing list [0]
instead.
Let me reply inline anyways.

[0] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss


On Fri, Mar 20, 2020 at 1:44 PM Dan Porokh via discuss <
ovs-discuss@openvswitch.org> wrote:

> Dear colleagues,
>
> I'm new to the OVN and, after reading comparison with classical OVS
> (https://docs.openstack.org/networking-ovn/train/faq/index.html),
> decided to try it in my next Openstack installation.
>
> There is a long list of useful links at OVN information page at
> https://docs.openstack.org/networking-ovn/train/admin/ovn.html but links
> to tutorials (#3 and #4 parts of the list) are broken, while many other
> links looks pretty old (latest are 2016-2017).
>

Please, check the latest doc [1] which should be more updated.


Thanks for reporting those, I just sent a patch [2] to fix the broken links
as we just moved the docs to docs.ovn.org (today actually).

The one in stable/train branch (which you linked) is also broken and needs
to be addressed. If you want to fix it you can always send a patch [3] to
networking-ovn on the stable/train branch [4].

Hope it helps,
Daniel

[1] https://docs.openstack.org/neutron/latest/admin/ovn/index.html
[2] https://review.opendev.org/#/c/714133/
[3] https://docs.openstack.org/infra/manual/developers.html
[4] https://github.com/openstack/networking-ovn/tree/stable/train


>
> So, my question is the try to save time, asking community for the recent
> info on how to start with OVN with Openstack :-) I will appreciate if
> anybody can either give some useful links like missing ones from
> Openstacks' "OVN information" page which can help or confirm that
> available links on the page are still a right point to start.
>
> Thank you,
>
> /Dan
>
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] New OVN website!

2020-03-18 Thread Daniel Alvarez Sanchez
Hi folks,

I created 'ovn-org.readthedocs.org' and linked it to my own OVN fork.
Similarly as it happens with OVS, everytime there's a change it will
trigger the doc build and update it. You can see a rendered version [0]
that looks pretty good.

Talking to Miguel he thinks that maybe we should have just this as the OVN
website if we don't plan to add any other content anytime soon. I see the
benefits of having the current site with a doc section that links to the
rendered doc similarly as the OVS Website where talks, news, etc. are added.

I don't have any strong opinion but thought that it may be worth raising it
here.

Thanks,
Daniel

[0] https://ovn-org.readthedocs.io/

On Mon, Mar 16, 2020 at 12:41 PM Daniel Alvarez Sanchez 
wrote:

> Hi folks,
>
> www.ovn.org is alive! Not really full of contents yet but PRs are welcome
> [0].
> Thanks Miguel (CC'ed) and everybody who helped.
>
> As a second step it'd be awesome to have the manpages rendered there with
> updated versions. IIUC from Ilya (CC'ed as well) we need to have
> docs.ovn.org created and hosted at readthedocs.org. I'm not really
> familiar with this process but I guess we can try to mimic what OVS was
> doing so far?
>
> Thanks a lot,
> Daniel
>
> [0] https://github.com/ovn-org/ovn-website/pulls
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] VXLAN support for OVN

2020-03-17 Thread Daniel Alvarez Sanchez
Hi Leonid, all

On Fri, Mar 13, 2020 at 11:54 PM Leonid Grossman 
wrote:

> Ben/all,
> We actually moved to Geneve a while ago... The original hurdle with Geneve
> was the lack of hw support, but it got solved (at least for our
> environments).
>

This is great. I'm not sure if you can share some perf numbers when it
comes to using VXLAN offloading back then and now with the HW support that
you're using in the NICs. It'd help a lot to decide if the effort is really
worthy.

Thanks a lot!
Daniel


> Thanks, Leonid
>
> > -Original Message-
> > From: Ben Pfaff 
> > Sent: Friday, March 13, 2020 3:36 PM
> > To: Ihar Hrachyshka ; Leonid Grossman
> > 
> > Cc: ovs-discuss@openvswitch.org
> > Subject: Re: [ovs-discuss] VXLAN support for OVN
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Hi!  I'm adding Leonid Grossman to this thread because I believe that his
> > team at nVidia has an internal fork of OVN that supports VXLAN.
> > I've discussed the tradeoffs that you mentioned below about splitting up
> bits
> > with him, too.
> >
> > On Mon, Mar 09, 2020 at 09:22:24PM -0400, Ihar Hrachyshka wrote:
> > > Good day,
> > >
> > > at Red Hat, once in a while we hear from customers, both internal and
> > > external, that they would like to see VXLAN support in OVN for them to
> > > consider switching to the technology. This email is a notice that I
> > > plan to work on this feature in the next weeks and months and hope to
> > > post patches for you to consider. Below is an attempt to explain why
> > > we may want it, how we could achieve it, potential limitations. This
> > > is also an attempt to collect early feedback for the whole idea.
> > >
> > > Reasons for the customer requests are multiple; some of more merit,
> > > some are more about perception. One technical reason is that there are
> > > times when a SDN / cloud deployment team doesn't have direct influence
> > > on protocols allowed in the underlying network; and when it's hard,
> > > due to politics or other reasons, to make policy changes to allow
> > > Geneve traffic while VXLAN is already available to use. Coming from
> > > OpenStack background, usually you have interested customers already
> > > using ML2-OVS implementation of Neutron that already relies on VXLAN.
> > >
> > > Another reason is that some potential users may believe that VXLAN
> > > would bring specific benefits in their environment compared to Geneve
> > > tunnelling (these gains are largely expected in performance, not
> > > functionality because of objective limitations of VXLAN protocol
> > > definition).  While Geneve vs. VXLAN performance is indeed quite an
> > > old debate with no clear answers, and while there were experiments set
> > > in the past that apparently demonstrated that potential performance
> > > gains from VXLAN may not be as prominent or present as one may
> > > believe*, nevertheless the belief that VXLAN would be beneficial at
> > > least in some environments on some hardware never dies out; and so
> > > regardless of proven merit of such belief, OVN adoption suffers
> > > because of its lack of VXLAN support.
> > >
> > > *
> > > https://blog.russellbryant.net/2017/05/30/ovn-geneve-vs-vxlan-does-it-
> > > matter/
> > >
> > > So our plan is to satisfy such requests by introducing support for the
> > > new tunnelling type into OVN and by doing that allow interested
> > > parties to try it in their specific environments and see if it makes
> > > the expected difference.
> > >
> > > Obviously, there is a cost to introduce additional protocol to support
> > > matrix (especially considering limitations it would introduce, as
> > > discussed below). We will probably have to consider the complexity of
> > > the final implementation once it's available for review.
> > >
> > > =
> > >
> > > For implementation, the base problem to solve here is the fact that
> > > VXLAN doesn't carry as many bits available to use for encoding
> > > datapath as Geneve does. (Geneve occupies both the 24-bit VNI field as
> > > well as 32 more bits of metadata to carry logical source and
> > > destination ports.) VXLAN ID is just 24 bits long, and there are no
> > > additional fields available for OVN to pass port information.  (This
> > > would be different if one would consider protocol extensions like
> > > VXLAN-GPE, but relying on them makes both reasons to consider VXLAN
> > > listed above somewhat moot.)
> > >
> > > To satisfy OVN while also working with VXLAN, the limited 24 bit VNI
> > > space would have to be split between three components - network ID,
> > > logical source and destination ports. The split necessarily limits the
> > > maximum number of networks or ports per network, depending on where
> > > the split is cut.
> > >
> > > Splitting the same 24 bit space between all three components equally
> > > would result in limitations that would probably not satisfy most real
> > > life deployments (we are talking about max 256 networks 

[ovs-discuss] [OVN] New OVN website!

2020-03-16 Thread Daniel Alvarez Sanchez
Hi folks,

www.ovn.org is alive! Not really full of contents yet but PRs are welcome
[0].
Thanks Miguel (CC'ed) and everybody who helped.

As a second step it'd be awesome to have the manpages rendered there with
updated versions. IIUC from Ilya (CC'ed as well) we need to have
docs.ovn.org created and hosted at readthedocs.org. I'm not really familiar
with this process but I guess we can try to mimic what OVS was doing so far?

Thanks a lot,
Daniel

[0] https://github.com/ovn-org/ovn-website/pulls
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] Logging IRC meetings

2020-03-06 Thread Daniel Alvarez Sanchez
Thanks a lot Ben

On Fri, Mar 6, 2020 at 8:48 PM Ben Pfaff  wrote:

> On Fri, Mar 06, 2020 at 04:09:41PM +0100, Daniel Alvarez Sanchez wrote:
> > In OpenStack we use Meetbot [0] to log the IRC meetings and it'll
> generate
> > the minutes afterwards that we can send after the meeting to the ML index
> > them in the website (coming up!).
> >
> > @Ben, as founder of the #openvswitch channel we can start straight away
> > using OpenStack Meetbot by following steps at [1]. I checked with
> > #openstack-infra folks and they're fine with #openvswitch logging the
> > meetings using the bot.
>
> OK, I did the steps there, that is, I sent the 3 messages to chanserv
> that it mentioned.  It didn't make meetbot join the channel, which I
> think is part of the goal.  I am not sure what I need to do to make that
> happen.  Do you know what I missed?
>

I think that we missed this patch that I just proposed [0].
Once approved I believe we'll see the bot joining.

Thanks a lot!
Daniel

[0] https://review.opendev.org/#/c/711756/
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] Logging IRC meetings

2020-03-06 Thread Daniel Alvarez Sanchez
Hi folks,

In OpenStack we use Meetbot [0] to log the IRC meetings and it'll generate
the minutes afterwards that we can send after the meeting to the ML index
them in the website (coming up!).

@Ben, as founder of the #openvswitch channel we can start straight away
using OpenStack Meetbot by following steps at [1]. I checked with
#openstack-infra folks and they're fine with #openvswitch logging the
meetings using the bot.

However, as OVS is an LF-supported project we could as well use [2]. I
don't really know how to tell the LF meetbot to join #ovn-meeting (or
#openvswitch) but looks like we need to open a ticket reading from what
OPNFV folks mention here [3]. In this case, I believe Ben/Justin might need
to do it?

Thanks a lot!
Daniel

[0] https://docs.openstack.org/infra/system-config/irc.html#meetbot
[1] https://docs.openstack.org/infra/system-config/irc.html#access
[2] https://docs.releng.linuxfoundation.org/en/latest/meetbot.html
[3]
https://lists.linuxfoundation.org/pipermail/opnfv-tech-discuss/2016-July/011759.html
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] Multiple localnet ports on a single Logical Switch

2020-03-06 Thread Daniel Alvarez Sanchez
Thanks a lot for your answer, Ben.

On Thu, Mar 5, 2020 at 9:21 PM Ben Pfaff  wrote:

> On Wed, Mar 04, 2020 at 04:01:22PM +0100, Daniel Alvarez Sanchez wrote:
> > As a possible alternative, we could support multiple localnet ports on
> the
> > same Logical Switch. In the first place, we can assume that on a
> particular
> > hypervisor, we're not going to have ports bound to multiple segments (ie.
> > on hv1 only ports on segment1 will be present, on hv2 only ports on
> > segment2 will be present and so on...). This way, ovn-controller can
> create
> > the patch-port to the provider bridge based on the local bridge-mappings
> > configuration on each hypervisor and the rest of the localnet ports will
> > have no effect.
>
> I don't see a big problem with this.
>
> If you implement it, be sure to update the documentation, since there
> are multiple places that talk about LSes with localnet ports having only
> two LSPs total.
>

Right, I noticed this and it's a great point.

I drew this little diagram [0] to show graphically the idea behind my
suggestion.
The idea is not to have multiple mappings for the same LS on a given
hypervisor as that'd make things trickier I believe. In the diagram, we
would not expect to have hv1 having both 'segment1:br-ex, segment2:br-ex2'.
As long as CMS ensures that only one localnet port is 'active' on a single
hv, I believe it should not be much trouble but perhaps I'm too optimistic
:)

[0] https://imgur.com/a/0Tt9nvI
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] Multiple localnet ports on a single Logical Switch

2020-03-04 Thread Daniel Alvarez Sanchez
Hi all,

I wanted to raise this topic and explain the use case for $subject to see
if it makes sense to implement such feature or anybody comes up with a
better idea.

When the localnet implementation was first introduced in OVN [0], the main
use case was referred to OpenStack provider networks, where an admin wants
to create ports in a pre-existing physical network yet providing control
plane management and other features such as Security Groups (ACLs) via
Neutron.

There's another concept in Neutron called Routed Provider Networks [1]
where an admin can define multiple segments (layer-2 domains) within the
same provider network. Each segment could, for example, represent an edge
site and each site will be mapped to a different physical network. E.g

network1 = { segment1:physnet1, segment2:physnet2, ... }

A possible implementation could be to represent each Neutron segment with a
separate OVN Logical Switch and each of them with a localnet port that is
mapped to its physnet. However, this increases complexity as we break the
1:1 mapping between a Neutron network and an OVN Logical Switch.

As a possible alternative, we could support multiple localnet ports on the
same Logical Switch. In the first place, we can assume that on a particular
hypervisor, we're not going to have ports bound to multiple segments (ie.
on hv1 only ports on segment1 will be present, on hv2 only ports on
segment2 will be present and so on...). This way, ovn-controller can create
the patch-port to the provider bridge based on the local bridge-mappings
configuration on each hypervisor and the rest of the localnet ports will
have no effect.

I think that there's some parts in the code that assumes that no more than
one localnet port per logical switch will be present but I don't know the
complexity and/or implications of supporting this use case.

Any feedback is very much appreciated :)

Thanks,
daniel

[0] https://patchwork.ozlabs.org/patch/514209/
[1]
https://docs.openstack.org/neutron/train/admin/config-routed-networks.html
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN: Delay in handling unixctl commands in ovsdb-server

2020-02-13 Thread Daniel Alvarez Sanchez
Hi all,

On Thu, Feb 13, 2020 at 8:09 AM Han Zhou  wrote:

>
>
> On Wed, Feb 12, 2020 at 9:57 AM Numan Siddique 
> wrote:
> >
> > Hi Ben/All,
> >
> > In an OVN deployment - with OVN dbs deployed as active/standby using
> > pacemaker, we are seeing delays in response to unixctl command -
> > ovsdb-server/sync-status.
> >
> > Pacemaker periodically calls the OVN pacemaker OCF script to get the
> > status and this script internally invokes - ovs-appctl -t
> > /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/sync-status. In a large
> > deployment with lots of OVN resources we see that ovsdb-server takes a
> > lot of time (sometimes > 60 seconds) to respond to this command. This
> > causes pacemaker to stop the service in that node and move the master
> > to another node. This causes a lot of disruption.
> >
> > One approach of solving this issue is to handle unixctl commands in a
> > separate thread. The commands like sync-status, get-** etc can be
> > easily handled in the thread. Still, there are many commands like
> > ovsdb-server/set-active-ovsdb-server, ovsdb-server/compact etc (which
> > changes the state) which needs to be synchronized between the main
> > ovsdb-server thread and the newly added thread using a mutex.
> >
> > Does this approach makes sense ? I started working on it. But I wanted
> > to check with the community before putting into more efforts.
> >
> > Are there better ways to solve this issue ?
> >
> > Thanks
> > Numan
> >
> Hi Numan,
>
> It seems reasonable to me. Multi-threading would add a little complexity,
> but in this case it should be straightforward. It merely requires mutexes
> to synchronize between the threads for *writes*, and also for *reads* of
> non-atomic data.
> The only side effect is that *if* the thread that does the DB job really
> stucked because of a bug and not handling jobs at all, the unixctl thread
> ovsdb-server/sync-status command wouldn't detect it, so it could result in
> pacemaker reporting *happy* status without detecting problems. First for
> all this is unlikely to happen. But if we really think it is a problem we
> can still solve it by incrementing a counter in main loop and have a new
> command (readonly, without mutex) to check if this counter is increasing,
> to tell if the server if really working.
>

I'd be more inclined to do what Han suggests here and that every thread
contributes to the health status with a readonly counter.

Whatever gets implemented here perhaps can be re-used in ovn-controller to
monitor the main & pinctrl threads.
Similar scenario but maybe worse consequences as it affects dataplane is
that the "health" thread reports good status but the pinctrl thread is
stuck and therefore DHCP service is down and instances can't fetch IP.


> Thanks,
> Han
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] CentOS 8 openvswitch RPM

2020-02-04 Thread Daniel Alvarez Sanchez
FWIW, this is the centos8 build that we got in RDO:

https://cbs.centos.org/koji/buildinfo?buildID=28034

It's building directly from the Fedora spec with the exception that we
disabled the tests execution but you should be able to trigger it the same
way.

Thanks,
Daniel

On Mon, Feb 3, 2020 at 5:12 PM Arvin  wrote:

> Hello Numan,
>
> Thank you for your reply. I have tried running 'make rpm-fedora' with
> @PYTHON3@ and it fixed the dependency problem. But following error keeps
> coming again even after I define "%define __python /usr/bin/python3" in
> 'rhel/openvswitch-fedora.spec'. I have noticed that "%define __python
> /usr/bin/python3" is vanishing from 'rhel/openvswitch-fedora.spec'.
>
> ===
> warning: Macro expanded in comment on line 25: %define kernel
> 2.6.40.4-5.fc15.x86_64
>
> error: attempt to use unversioned python, define %__python to
> /usr/bin/python2 or /usr/bin/python3 explicitly
> error: line 209: PYTHON=%{__python}
>
> make: *** [Makefile:8568: rpm-fedora] Error 1
> ===
>
> Can you help again to fix this?
>
> ---
> Best Regards,
> Arvin
>
> SubHosting.net - Managed dedicated servers & VPS solutions.
> www.subhosting.net
>
>
> On Mon, Feb 3, 2020 at 2:03 AM Arvin  wrote:
> >>
> >> Hello Guys,
> >>
> >> I was trying to build a RPM of openvswitch on CentOS 8 server using
> >> https://www.openvswitch.org/releases/openvswitch-2.12.0.tar.gz.
> >>
> >> I got the following error while running 'make rpm-fedora'.
> >>
> >> 
> >> warning: Macro expanded in comment on line 25: %define kernel
> >> 2.6.40.4-5.fc15.x86_64
> >>
> >> error: attempt to use unversioned python, define %__python to
> >> /usr/bin/python2 or /usr/bin/python3 explicitly
> >> error: line 209: PYTHON=%{__python}
> >>
> >> make: *** [Makefile:8574: rpm-fedora] Error 1
> >> 
> >>
> >>
> >> So I have defined __python to /usr/bin/pyhton3 in file
> >> './rhel/openvswitch-fedora.spec' and ran 'make rpm-fedora' again.
> > Now
> >> I'm getting following error.
> >>
> >> 
> >> warning: Macro expanded in comment on line 25: %define kernel
> >> 2.6.40.4-5.fc15.x86_64
> >>
> >> error: Failed build dependencies:
> >> python-devel is needed by openvswitch-2.12.0-1.el8.x86_64
> >> python-six is needed by openvswitch-2.12.0-1.el8.x86_64
> >> python-twisted-core is needed by
> > openvswitch-2.12.0-1.el8.x86_64
> >> python-zope-interface is needed by
> > openvswitch-2.12.0-1.el8.x86_64
> >> make: *** [Makefile:8574: rpm-fedora] Error 1
> >> 
> >>
> >> Some of the python applications are named 'python3-' in
> > CentOS8
> >> rather than starting with 'python-' which is why dependency
> >> failing.
> >>
> >> Is there any fix for this? Do you have any documentation for
> > building
> >> RPM of OpenvSwitch on a CentOS 8 server? Please help.
> >
> > Some of the OVS script files use @PYTHON@ (example -
> >
> https://github.com/openvswitch/ovs/blob/v2.12.0/utilities/ovs-tcpundump.in#L1
> )
> > Can you manually change this to @PYTHON2@ or @PYTHON3@ and try
> > building it ?
> >
> > If it builds, we need to fix this in openvswitch 2.12 branch. I had
> > plans to look into this sometime. But It slipped out of my mind.
> >
> > Let me know how it goes. You can submit a patch yourself if you are
> > fine with it :).
> >
> > Thanks
> > Numan
> >
> >> --
> >> Best Regards,
> >> Arvin
> >>
> >> SubHosting.net - Managed dedicated servers & VPS solutions.
> >> www.subhosting.net
> >> ___
> >> discuss mailing list
> >> disc...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovn] lflows explosion when using a lot of FIPs (dnat_and_snat NAT entries)

2020-01-29 Thread Daniel Alvarez Sanchez
Not much of a surprise as per the nested loop that creates the flows for
all possible FIP pairs but I plotted a graph showing the # of FIPs vs the #
of logical flows on each of those two stages which shows the expected
quadratic growth [0] and the magnitude on a system with just one router.

Those patches were written to address an issue where FIP to FIP traffic was
not distributed and it was sent via the tunnel to the gateway instead.

[0] https://imgur.com/KgRSPpz

On Tue, Jan 28, 2020 at 4:55 PM Daniel Alvarez Sanchez 
wrote:

> Hi all,
>
> Based on some problems that we've detected at scale, I've been doing an
> analysis of how logical flows are distributed on a system which makes heavy
> use of Floating IPs (dnat_and_snat NAT entries) and DVR.
>
> [root@central ~]# ovn-nbctl list NAT|grep dnat_and_snat -c
> 985
>
> With 985 Floating IPs (and ~1.2K ACLs), I can see that 680K logical flows
> are generated. This is creating a terribly stress everywhere (ovsdb-server,
> ovn-northd, ovn-controller) especially upon reconnection of ovn-controllers
> to the SB database which have to read ~0.7 million of logical flows and
> process them:
>
> [root@central ~]# time ovn-sbctl list logical_flow > logical_flows.txt
> real1m17.465s
> user0m41.916s
> sys 0m1.996s
> [root@central ~]# grep _uuid logical_flows.txt -c
> 680276
>
> The problem is even worse when a lot of clients are simultaneously reading
> the dump from the SB DB server (this could be certainly alleviated by using
> RAFT but we're not there yet) causing even OOM killers on
> ovsdb-server/ovn-northd and a severe delay of the control plane to be
> operational again.
>
> I have investigated a little bit the lflows generated and their
> distribution per stage finding that 62.2% are in the lr_out_egr_loop and
> 31.1% are in the lr_in_ip_routing stage:
>
> [root@central ~]# head -n 10 logical_flows_distribution_sorted.txt
> lr_out_egr_loop: 423414  62.24%
> lr_in_ip_routing: 212199  31.19%
> lr_in_ip_input: 10831  1.59%
> ls_out_acl: 4831  0.71%
> ls_in_port_sec_ip: 3471  0.51%
> ls_in_l2_lkup: 2360  0.34%
> 
>
> Tackling first the lflows in lr_out_egr_loop I can see that there are
> mainly two lflow types:
>
> 1)
>
> external_ids: {source="ovn-northd.c:8807",
> stage-name=lr_out_egr_loop}
> logical_datapath: 261206d2-72c5-4e79-ae5c-669e6ee4e71a
> match   : "ip4.src == 10.142.140.39 && ip4.dst ==
> 10.142.140.112"
> pipeline: egress
> priority: 200
> table_id: 2
> hash: 0
>
> 2)
> actions : "inport = outport; outport = \"\"; flags = 0;
> flags.loopback = 1; reg9[1] = 1; next(pipeline=ingress, table=0); "
> external_ids: {source="ovn-northd.c:8799",
> stage-name=lr_out_egr_loop}
> logical_datapath: 161206d2-72c5-4e79-ae5c-669e6ee4e71a
> match   :
> "is_chassis_resident(\"42f64a6c-a52d-4712-8c56-876e8fb30c03\") && ip4.src
> == 10.142.140.39 && ip4.dst == 10.142.141.19"
> pipeline: egress
> priority: 300
>
> Looks like these lflows are added by this commit:
>
> https://github.com/ovn-org/ovn/commit/551e3d989557bd2249d5bbe0978b44b775c5e619
>
>
> And each Floating IP contributes to ~1.2K lflows (of course this grows as
> the number of FIPs grow):
>
> [root@central ~]# grep 10.142.140.39  lr_out_egr_loop.txt |grep match  -c
> 1233
>
> Similarly, for the lr_in_ip_routing stage, we find the same pattern:
>
> 1)
> actions : "outport =
> \"lrp-d2d745f5-91f0-4626-81c0-715c63d35716\"; eth.src = fa:16:3e:22:02:29;
> eth.dst = fa:16:5e:6f:36:e4; reg0 = ip4.dst; reg1 = 10.142.143.147; reg9[2]
> = 1; reg9[0] = 0; next;"
> external_ids: {source="ovn-northd.c:6782",
> stage-name=lr_in_ip_routing}
> logical_datapath: 161206d2-72c5-4e79-ae5c-669e6ee4e71a
> match   : "inport ==
> \"lrp-09f7eba5-54b7-48f4-9820-80423b65c608\" && ip4.src == 10.1.0.170 &&
> ip4.dst == 10.142.140.39"
> pipeline: ingress
> priority: 400
>
> Looks like these last flows are added by this commit:
>
> https://github.com/ovn-org/ovn/commit/8244c6b6bd8802a018e4ec3d3665510ebb16a9c7
>
> Each FIP contributes to 599 LFlows in this stage:
>
> [root@central ~]# grep -c 10.142.140.39  lr_in_ip_routing.txt
> 599
> [root@central ~]# grep -c 10.142.140.185  lr_in_ip_routing.txt
> 599
>
> In order to figure out the relationship between the # of FIPs and the
> lflows, I removed a few of them and still the % of lflows in both stages
&

[ovs-discuss] [ovn] lflows explosion when using a lot of FIPs (dnat_and_snat NAT entries)

2020-01-28 Thread Daniel Alvarez Sanchez
Hi all,

Based on some problems that we've detected at scale, I've been doing an
analysis of how logical flows are distributed on a system which makes heavy
use of Floating IPs (dnat_and_snat NAT entries) and DVR.

[root@central ~]# ovn-nbctl list NAT|grep dnat_and_snat -c
985

With 985 Floating IPs (and ~1.2K ACLs), I can see that 680K logical flows
are generated. This is creating a terribly stress everywhere (ovsdb-server,
ovn-northd, ovn-controller) especially upon reconnection of ovn-controllers
to the SB database which have to read ~0.7 million of logical flows and
process them:

[root@central ~]# time ovn-sbctl list logical_flow > logical_flows.txt
real1m17.465s
user0m41.916s
sys 0m1.996s
[root@central ~]# grep _uuid logical_flows.txt -c
680276

The problem is even worse when a lot of clients are simultaneously reading
the dump from the SB DB server (this could be certainly alleviated by using
RAFT but we're not there yet) causing even OOM killers on
ovsdb-server/ovn-northd and a severe delay of the control plane to be
operational again.

I have investigated a little bit the lflows generated and their
distribution per stage finding that 62.2% are in the lr_out_egr_loop and
31.1% are in the lr_in_ip_routing stage:

[root@central ~]# head -n 10 logical_flows_distribution_sorted.txt
lr_out_egr_loop: 423414  62.24%
lr_in_ip_routing: 212199  31.19%
lr_in_ip_input: 10831  1.59%
ls_out_acl: 4831  0.71%
ls_in_port_sec_ip: 3471  0.51%
ls_in_l2_lkup: 2360  0.34%


Tackling first the lflows in lr_out_egr_loop I can see that there are
mainly two lflow types:

1)

external_ids: {source="ovn-northd.c:8807",
stage-name=lr_out_egr_loop}
logical_datapath: 261206d2-72c5-4e79-ae5c-669e6ee4e71a
match   : "ip4.src == 10.142.140.39 && ip4.dst ==
10.142.140.112"
pipeline: egress
priority: 200
table_id: 2
hash: 0

2)
actions : "inport = outport; outport = \"\"; flags = 0;
flags.loopback = 1; reg9[1] = 1; next(pipeline=ingress, table=0); "
external_ids: {source="ovn-northd.c:8799",
stage-name=lr_out_egr_loop}
logical_datapath: 161206d2-72c5-4e79-ae5c-669e6ee4e71a
match   :
"is_chassis_resident(\"42f64a6c-a52d-4712-8c56-876e8fb30c03\") && ip4.src
== 10.142.140.39 && ip4.dst == 10.142.141.19"
pipeline: egress
priority: 300

Looks like these lflows are added by this commit:
https://github.com/ovn-org/ovn/commit/551e3d989557bd2249d5bbe0978b44b775c5e619


And each Floating IP contributes to ~1.2K lflows (of course this grows as
the number of FIPs grow):

[root@central ~]# grep 10.142.140.39  lr_out_egr_loop.txt |grep match  -c
1233

Similarly, for the lr_in_ip_routing stage, we find the same pattern:

1)
actions : "outport =
\"lrp-d2d745f5-91f0-4626-81c0-715c63d35716\"; eth.src = fa:16:3e:22:02:29;
eth.dst = fa:16:5e:6f:36:e4; reg0 = ip4.dst; reg1 = 10.142.143.147; reg9[2]
= 1; reg9[0] = 0; next;"
external_ids: {source="ovn-northd.c:6782",
stage-name=lr_in_ip_routing}
logical_datapath: 161206d2-72c5-4e79-ae5c-669e6ee4e71a
match   : "inport ==
\"lrp-09f7eba5-54b7-48f4-9820-80423b65c608\" && ip4.src == 10.1.0.170 &&
ip4.dst == 10.142.140.39"
pipeline: ingress
priority: 400

Looks like these last flows are added by this commit:
https://github.com/ovn-org/ovn/commit/8244c6b6bd8802a018e4ec3d3665510ebb16a9c7

Each FIP contributes to 599 LFlows in this stage:

[root@central ~]# grep -c 10.142.140.39  lr_in_ip_routing.txt
599
[root@central ~]# grep -c 10.142.140.185  lr_in_ip_routing.txt
599

In order to figure out the relationship between the # of FIPs and the
lflows, I removed a few of them and still the % of lflows in both stages
remain constant.


[root@central ~]# ovn-nbctl find NAT type=dnat_and_snat | grep -c  _uuid
833

[root@central ~]# grep _uuid logical_flows_2.txt -c
611640

lr_out_egr_loop: 379740  62.08%
lr_in_ip_routing: 190295   31.11%


I'd like to gather feedback around the mentioned commits to see if there's
a way we can avoid to insert those lflows or somehow offload the
calculation to ovn-controller on the chassis where the logical port is
bound to. This way we'll avoid stress on ovsdb-server and ovn-northd.

Any thoughts?

Thanks,
Daniel
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [ovn] ovn-controller: packet buffering resumes the packet through the tunnel incorrectly

2020-01-27 Thread Daniel Alvarez Sanchez
Hi folks,

We found a problem related to the packet buffering feature introduced by
[0] when the destination address is unknown. In such a case, ovn-controller
sends an ARP request and, upon resolving the MAC address, the packet will
be resumed.

If the packet is coming from a Floating IP (dnat_and_snat NAT entry with
external_mac and logical_port fields filled in) it should be resumed
through a localnet port (DVR use case). However, it will be instead pushed
via the tunnel interface to the chassis that is hosting the gateway port.

This creates traffic disruption as the ToR switch will see the MAC address
of the Floating IP on the gateway port and subsequent packets will not be
sent to the compute node where the logical port is bound to.

When the next ARP request for the FIP is sent, as a MAC_Binding entry will
already be present, the packet buffering feature doesn't kick in and the
ARP reply will be sent through the localnet port and seen by the ToR switch
in the right port as expected. More details at [1].

An easy way to reproduce this is to ping a Floating IP from an external
node. In order to reproduce it 100% of the time, you can delete the
MAC_Binding entry corresponding to the source IP to force the packet
buffering in the compute node. I use vagrant setup like [2].

Thanks,
Daniel

[0]
https://github.com/openvswitch/ovs/commit/d7abfe39cfd234227bb6174b7f959a16dc803b83
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1788193
[2] https://github.com/danalsan/vagrants/tree/master/ovn-playground
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] Problem with HA Chassis failover

2019-10-17 Thread Daniel Alvarez Sanchez
Hi folks,

We detected that when ovn-controller doesn't die gracefully leaving a
stale Chassis entry in the SB DB, the ports that were bound to that
chassis and belong to an HA Chassis group will not be failed over to
the next high prio chassis in the group.

Right now in OpenStack we're still using Gateway Chassis for router
ports but we're moving to HA Chassis groups for certain type of ports
like 'external' or 'virtual' and this is where we have hit the issue.
In the scenario that I described above, the router ports were failed
over correctly.

An easy way to reproduce is kill -9 ovn-controller and see that the
Port_Bindings go away from the ports but they're not claimed on any
other chassis.

Thanks,
Daniel

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] ovn-controller is taking 100% CPU all the time in one deployment

2019-09-02 Thread Daniel Alvarez Sanchez
Hi Han,

On Fri, Aug 30, 2019 at 10:37 PM Han Zhou  wrote:
>
> On Fri, Aug 30, 2019 at 1:25 PM Numan Siddique  wrote:
> >
> > Hi Han,
> >
> > I am thinking of this approach to solve this problem. I still need to
> test it.
> > If you have any comments or concerns do let me know.
> >
> >
> > **
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index 9a282..a83b56362 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -6552,6 +6552,41 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
> >
> >  }
> >
> > +/* Handle GARP reply packets received on a distributed router
> gateway
> > + * port. GARP reply broadcast packets could be sent by external
> > + * switches. We don't want them to be handled by all the
> > + * ovn-controllers if they receive it. So add a priority-92 flow
> to
> > + * apply the put_arp action on a redirect chassis and drop it on
> > + * other chassis.
> > + * Note that we are already adding a priority-90 logical flow in
> the
> > + * table S_ROUTER_IN_IP_INPUT to apply the put_arp action if
> > + * arp.op == 2.
> > + * */
> > +if (op->od->l3dgw_port && op == op->od->l3dgw_port
> > +&& op->od->l3redirect_port) {
> > +for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
> > +ds_clear();
> > +ds_put_format(,
> > +  "inport == %s && is_chassis_resident(%s)
> && "
> > +  "eth.bcast && arp.op == 2 && arp.spa ==
> %s/%u",
> > +  op->json_key,
> op->od->l3redirect_port->json_key,
> > +  op->lrp_networks.ipv4_addrs[i].network_s,
> > +  op->lrp_networks.ipv4_addrs[i].plen);
> > +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
> > +  ds_cstr(),
> > +  "put_arp(inport, arp.spa, arp.sha);");
> > +ds_clear();
> > +ds_put_format(,
> > +  "inport == %s && !is_chassis_resident(%s)
> && "
> > +  "eth.bcast && arp.op == 2 && arp.spa ==
> %s/%u",
> > +  op->json_key,
> op->od->l3redirect_port->json_key,
> > +  op->lrp_networks.ipv4_addrs[i].network_s,
> > +  op->lrp_networks.ipv4_addrs[i].plen);
> > +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
> > +  ds_cstr(), "drop;");
> > +}
> > +}
> > +
> >  /* A set to hold all load-balancer vips that need ARP responses.
> */
> >  struct sset all_ips = SSET_INITIALIZER(_ips);
> >  int addr_family;
> > *
> >
> > If a physical switch sends GARP request packets we have existing logical
> flows
> > which handle them only on the gateway chassis.
> >
> > But if the physical switch sends GARP reply packets, then these packets
> > are handled by ovn-controllers where bridge mappings are configured.
> > I think its good enough if the gateway chassis handles these packet.
> >
> > In the deployment where we are seeing this issue, the physical switch
> sends GARP reply
> > packets.
> >
> > Thanks
> > Numan
> >
> >
> Hi Numan,
>
> I think both GARP request and reply should be handled on all chassises. It
> should work not only for physical switch, but also for virtual workloads.
> At least our current use cases relies on that.

I believe that Numan's patch will not change the behavior for virtual
(OVN) workloads, does it?

Although I'm in favor of this patch, I still think that it's not
enough for non-Incremental Processing versions of OVS because even
we're going to release pressure on the compute nodes, still on loaded
systems, the gateway nodes are going to be hogging the CPU. Plus, I
think there's value even from a security standpoint in having it on
stable branches as it looks like a simple attack vector.

>
> Thanks,
> Han
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] ovn-controller is taking 100% CPU all the time in one deployment

2019-09-02 Thread Daniel Alvarez Sanchez
On Fri, Aug 30, 2019 at 8:18 PM Han Zhou  wrote:
>
>
>
> On Fri, Aug 30, 2019 at 6:46 AM Mark Michelson  wrote:
> >
> > On 8/30/19 5:39 AM, Daniel Alvarez Sanchez wrote:
> > > On Thu, Aug 29, 2019 at 10:01 PM Mark Michelson  
> > > wrote:
> > >>
> > >> On 8/29/19 2:39 PM, Numan Siddique wrote:
> > >>> Hello Everyone,
> > >>>
> > >>> In one of the OVN deployments, we are seeing 100% CPU usage by
> > >>> ovn-controllers all the time.
> > >>>
> > >>> After investigations we found the below
> > >>>
> > >>>- ovn-controller is taking more than 20 seconds to complete full loop
> > >>> (mainly in lflow_run() function)
> > >>>
> > >>>- The physical switch is sending GARPs periodically every 10 seconds.
> > >>>
> > >>>- There is ovn-bridge-mappings configured and these GARP packets
> > >>> reaches br-int via the patch port.
> > >>>
> > >>>- We have a flow in router pipeline which applies the action - 
> > >>> put_arp
> > >>> if it is arp packet.
> > >>>
> > >>>- ovn-controller pinctrl thread receives these garps, stores the
> > >>> learnt mac-ips in the 'put_mac_bindings' hmap and notifies the
> > >>> ovn-controller main thread by incrementing the seq no.
> > >>>
> > >>>- In the ovn-controller main thread, after lflow_run() finishes,
> > >>> pinctrl_wait() is called. This function calls - poll_immediate_wake() as
> > >>> 'put_mac_bindings' hmap is not empty.
> > >>>
> > >>> - This causes the ovn-controller poll_block() to not sleep at all and
> > >>> this repeats all the time resulting in 100% cpu usage.
> > >>>
> > >>> The deployment has OVS/OVN 2.9.  We have back ported the pinctrl_thread
> > >>> patch.
> > >>>
> > >>> Some time back I had reported an issue about lflow_run() taking lot of
> > >>> time - 
> > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html
> > >>>
> > >>> I think we need to improve the logical processing sooner or later.
> > >>
> > >> I agree that this is very important. I know that logical flow processing
> > >> is the biggest bottleneck for ovn-controller, but 20 seconds is just
> > >> ridiculous. In your scale testing, you found that lflow_run() was taking
> > >> 10 seconds to complete.
> > > I support this statement 100% (20 seconds is just ridiculous). To be
> > > precise, in this deployment we see over 23 seconds for the main loop
> > > to process and I've seen even 30 seconds some times. I've been talking
> > > to Numan these days about this issue and I support profiling this
> > > actual deployment so that we can figure out how incremental processing
> > > would help.
> > >
> > >>
> > >> I'm curious if there are any factors in this particular deployment's
> > >> configuration that might contribute to this. For instance, does this
> > >> deployment have a glut of ACLs? Are they not using port groups?
> > > They're not using port groups because it's 2.9 and it is not there.
> > > However, I don't think port groups would make a big difference in
> > > terms of ovn-controller computation. I might be wrong but Port Groups
> > > help reduce the number of ACLs in the NB database while the # of
> > > Logical Flows would still remain the same. We'll try to get the
> > > contents of the NB database and figure out what's killing it.
> > >
> >
> > You're right that port groups won't reduce the number of logical flows.
>
> I think port-group reduces number of logical flows significantly, and also 
> reduces OVS flows when conjunctive matches are effective.

Right, definitely the number of lflows will be much lower. My bad as I
was directly involved in this! :) I was just thinking that the number
of OVS flows will remain the same so the computation for
ovn-controller would be similar but I missed the conjunctive matches
part in my statement.


> Please see my calculation here: 
> https://www.slideshare.net/hanzhou1978/large-scale-overlay-networks-with-ovn-problems-and-solutions/30
>
> > However, it can reduce the computation in ovn-controller. The reason is
> > that the logical flows generated by ACLs that use port groups may result
> > in conjunc

Re: [ovs-discuss] ovn-controller is taking 100% CPU all the time in one deployment

2019-08-30 Thread Daniel Alvarez Sanchez
On Thu, Aug 29, 2019 at 10:01 PM Mark Michelson  wrote:
>
> On 8/29/19 2:39 PM, Numan Siddique wrote:
> > Hello Everyone,
> >
> > In one of the OVN deployments, we are seeing 100% CPU usage by
> > ovn-controllers all the time.
> >
> > After investigations we found the below
> >
> >   - ovn-controller is taking more than 20 seconds to complete full loop
> > (mainly in lflow_run() function)
> >
> >   - The physical switch is sending GARPs periodically every 10 seconds.
> >
> >   - There is ovn-bridge-mappings configured and these GARP packets
> > reaches br-int via the patch port.
> >
> >   - We have a flow in router pipeline which applies the action - put_arp
> > if it is arp packet.
> >
> >   - ovn-controller pinctrl thread receives these garps, stores the
> > learnt mac-ips in the 'put_mac_bindings' hmap and notifies the
> > ovn-controller main thread by incrementing the seq no.
> >
> >   - In the ovn-controller main thread, after lflow_run() finishes,
> > pinctrl_wait() is called. This function calls - poll_immediate_wake() as
> > 'put_mac_bindings' hmap is not empty.
> >
> > - This causes the ovn-controller poll_block() to not sleep at all and
> > this repeats all the time resulting in 100% cpu usage.
> >
> > The deployment has OVS/OVN 2.9.  We have back ported the pinctrl_thread
> > patch.
> >
> > Some time back I had reported an issue about lflow_run() taking lot of
> > time - https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html
> >
> > I think we need to improve the logical processing sooner or later.
>
> I agree that this is very important. I know that logical flow processing
> is the biggest bottleneck for ovn-controller, but 20 seconds is just
> ridiculous. In your scale testing, you found that lflow_run() was taking
> 10 seconds to complete.
I support this statement 100% (20 seconds is just ridiculous). To be
precise, in this deployment we see over 23 seconds for the main loop
to process and I've seen even 30 seconds some times. I've been talking
to Numan these days about this issue and I support profiling this
actual deployment so that we can figure out how incremental processing
would help.

>
> I'm curious if there are any factors in this particular deployment's
> configuration that might contribute to this. For instance, does this
> deployment have a glut of ACLs? Are they not using port groups?
They're not using port groups because it's 2.9 and it is not there.
However, I don't think port groups would make a big difference in
terms of ovn-controller computation. I might be wrong but Port Groups
help reduce the number of ACLs in the NB database while the # of
Logical Flows would still remain the same. We'll try to get the
contents of the NB database and figure out what's killing it.

>
> This particular deployment's configuration may give us a good scenario
> for our testing to improve lflow processing time.
Absolutely!
>
> >
> > But to fix this issue urgently, we are thinking of the below approach.
> >
> >   - pinctrl_thread will locally cache the mac_binding entries (just like
> > it caches the dns entries). (Please note pinctrl_thread can not access
> > the SB DB IDL).
>
> >
> > - Upon receiving any arp packet (via the put_arp action), pinctrl_thread
> > will check the local mac_binding cache and will only wake up the main
> > ovn-controller thread only if the mac_binding update is required.
> >
> > This approach will solve the issue since the MAC sent by the physical
> > switches will not change. So there is no need to wake up ovn-controller
> > main thread.
>
> I think this can work well. We have a lot of what's needed already in
> pinctrl at this point. We have the hash table of mac bindings already.
> Currently, we flush this table after we write the data to the southbound
> database. Instead, we would keep the bindings in memory. We would need
> to ensure that the in-memory MAC bindings eventually get deleted if they
> become stale.
>
> >
> > In the present master/2.12 these GARPs will not cause this 100% cpu loop
> > issue because incremental processing will not recompute flows.
>
> Another mitigating factor for master is something I'm currently working
> on. I've got the beginnings of a patch series going where I am
> separating pinctrl into a separate process from ovn-controller:
> https://github.com/putnopvut/ovn/tree/pinctrl_process
>
> It's in the early stages right now, so please don't judge :)
>
> Separating pinctrl to its own process means that it cannot directly
> cause ovn-controller to wake up like it currently might.
>
> >
> > Even though the above approach is not really required for master/2.12, I
> > think it is still Ok to have this as there is no harm.
> >
> > I would like to know your comments and any concerns if any.
>
> Hm, I don't really understand why we'd want to put this in master/2.12
> if the problem doesn't exist there. The main concern I have is with
> regards to cache lifetime. I don't want to introduce potential memory
> growth concerns 

Re: [ovs-discuss] [ovn][clustered] Confusing to create ovsdb-server clustered databases

2019-08-28 Thread Daniel Alvarez Sanchez
On Wed, Aug 28, 2019 at 4:49 PM Zufar Dhiyaulhaq
 wrote:
>
> Hi Numan,
>
> Yes, it's working. I think the networking-ovn plugin in OpenStack has some 
> bugs. let me use a single IP first or maybe I can use pacemaker to create the 
> VIP.

Thanks Zufar, mind patching networking-ovn / reporting the bug on
launchpad / moving the discussion to openstack-discuss mailing list?

Thanks a lot!
Daniel

>
> [root@zu-ovn-controller0 ~]# ovn-nbctl 
> --db=tcp:10.101.101.100:6641,tcp:10.101.101.101:6641,tcp:10.101.101.102:6641  
> ls-add sw0
> [root@zu-ovn-controller0 ~]# ovn-nbctl 
> --db=tcp:10.101.101.100:6641,tcp:10.101.101.101:6641,tcp:10.101.101.102:6641  
> show
> switch 5d3ea060-f92f-41ab-8143-6a6534bbba98 (sw0)
> [root@zu-ovn-controller0 ~]#
>
> [root@zu-ovn-controller1 ~]#  ovn-nbctl 
> --db=tcp:10.101.101.100:6641,tcp:10.101.101.101:6641,tcp:10.101.101.102:6641  
> show
> switch 5d3ea060-f92f-41ab-8143-6a6534bbba98 (sw0)
>
> Thank you very much :)
>
> Best Regards,
> Zufar Dhiyaulhaq
>
>
> On Wed, Aug 28, 2019 at 7:17 PM Numan Siddique  wrote:
>>
>>
>>
>> On Wed, Aug 28, 2019 at 4:45 PM Zufar Dhiyaulhaq  
>> wrote:
>>>
>>> Hi Numan,
>>>
>>> I have tried the command but output nothing.
>>>
>>> [root@zu-ovn-controller0 ~]# ovn-nbctl 
>>> --db=tcp:10.101.101.100:6641,tcp:10.101.101.101:6641,tcp:10.101.101.102:6641
>>>  show
>>
>>
>> These commands seem to work. Try creating a logical switch like -
>> ovn-nbctl 
>> --db=tcp:10.101.101.100:6641,tcp:10.101.101.101:6641,tcp:10.101.101.102:6641 
>>  ls-add sw0
>> ovn-nbctl 
>> --db=tcp:10.101.101.100:6641,tcp:10.101.101.101:6641,tcp:10.101.101.102:6641 
>>  show
>>
>>
>>> [root@zu-ovn-controller0 ~]# ovn-sbctl 
>>> --db=tcp:10.101.101.100:6642,tcp:10.101.101.101:6642,tcp:10.101.101.102:6642
>>>  show
>>> Chassis "1ee48dd1-d520-476d-82d3-3d4651132f47"
>>> hostname: "zu-ovn-compute0"
>>> Encap geneve
>>> ip: "10.101.101.103"
>>> options: {csum="true"}
>>> Chassis "cd1a2535-522a-4571-8eac-8394681846a3"
>>> hostname: "zu-ovn-compute2"
>>> Encap geneve
>>> ip: "10.101.101.105"
>>> options: {csum="true"}
>>> Chassis "a5b59592-f511-4a7a-b37d-93f933c35ea5"
>>> hostname: "zu-ovn-compute1"
>>> Encap geneve
>>> ip: "10.101.101.104"
>>> options: {csum="true"}
>>> [root@zu-ovn-controller0 ~]# tail -f 
>>> /var/log/openvswitch/ovsdb-server-nb.log
>>> 2019-08-28T09:12:31.190Z|00031|reconnect|INFO|tcp:10.101.101.102:6643: 
>>> connection attempt failed (No route to host)
>>> 2019-08-28T09:12:31.190Z|00032|reconnect|INFO|tcp:10.101.101.102:6643: 
>>> waiting 2 seconds before reconnect
>>> 2019-08-28T09:12:33.191Z|00033|reconnect|INFO|tcp:10.101.101.102:6643: 
>>> connecting...
>>> 2019-08-28T09:12:33.192Z|00034|reconnect|INFO|tcp:10.101.101.102:6643: 
>>> connection attempt failed (No route to host)
>>> 2019-08-28T09:12:33.192Z|00035|reconnect|INFO|tcp:10.101.101.102:6643: 
>>> waiting 4 seconds before reconnect
>>> 2019-08-28T09:12:37.192Z|00036|reconnect|INFO|tcp:10.101.101.102:6643: 
>>> connecting...
>>> 2019-08-28T09:12:37.192Z|00037|reconnect|INFO|tcp:10.101.101.102:6643: 
>>> connection attempt failed (No route to host)
>>> 2019-08-28T09:12:37.192Z|00038|reconnect|INFO|tcp:10.101.101.102:6643: 
>>> continuing to reconnect in the background but suppressing further logging
>>> 2019-08-28T09:22:52.597Z|00039|reconnect|INFO|tcp:10.101.101.101:6643: 
>>> connected
>>> 2019-08-28T09:23:01.279Z|00040|reconnect|INFO|tcp:10.101.101.102:6643: 
>>> connected
>>>
>>> I have tried with ovn-ctl to create the clustered-databases, but the 
>>> problem is same, stuck when creating neutron resources. I think it because 
>>> the ovn-northd run in 3 nodes, but neutron only run on a single controller.
>>
>>
>> I don't think that's the issue.
>> The issue seems to be that networking-ovn is not tested with connecting to 
>> clustered db.
>> Try passing just one remote to neutron server and see if it works.
>>
>>
>> May be you can ask this question in the openstack ML to get more attention.
>>
>> Numan
>>
>>>
>>> this is the step: http://paste.openstack.org/show/766470/
>>> should I try the step first? but check with passing the remote URL to 
>>> command?
>>>
>>> Best Regards,
>>> Zufar Dhiyaulhaq
>>>
>>>
>>> On Wed, Aug 28, 2019 at 6:06 PM Numan Siddique  wrote:



 On Wed, Aug 28, 2019 at 4:04 PM Zufar Dhiyaulhaq 
  wrote:
>
> [ovn][clustered] Confusing to create ovsdb-server clustered databases
>
> Hi Everyone, I have successfully created OpenStack with OVN enabled. But 
> the problem comes when I try to cluster the ovsdb-server. My scenario is 
> trying to cluster the ovsdb-server databases but only using single 
> ovn-northd.
>
> My cluster:
> - controller0 : 10.100.100.100 / 10.101.101.100 (ovn-northd, 
> ovsdb-server, neutron server)
> - controller1 : 10.100.100.101 / 10.101.101.101 (ovsdb-server)
> - controller2 : 10.100.100.102 / 10.101.101.102 

Re: [ovs-discuss] [OVN] Aging mechanism for MAC_Binding table

2019-08-21 Thread Daniel Alvarez Sanchez
On Wed, Aug 21, 2019 at 3:11 AM Han Zhou  wrote:
>
>
>
> On Tue, Aug 20, 2019 at 4:57 PM Ben Pfaff  wrote:
> >
> > Let me see if I'm following this correctly.  This is what currently
> > happens:
> >
> > - HV1 needs a MAC address for an IP so it broadcasts an ARP request.
> >
> > - The port with the IP address, on HV2, causes the MAC_Binding to be
> >   inserted.
> >
> > - Every ovn-controller inserts an OF flow for the binding.  HV1 and
> >   perhaps other ovn-controllers use this flow to populate the MAC
> >   address for subsequent packets destined to the IP address in question.
> >
> > This proposal augments that with:
> >
> > - After a while, the binding goes idle and isn't used.  The
> >   ovn-controllers gradually notice this and delete their OF flows for
> >   it.
> >
> > - HV3 eventually needs the binding again.  It broadcasts an ARP request.
> >
> > - The port with the IP address causes the MAC_Binding to be inserted.
> >   This might still be on HV2 if the port hasn't moved, or it might be on
> >   HV4 if it has.
> >
> > Is that what you mean?  It might work OK.
Yes, that's it.
At some point we can look into enhancing this using the SB DB and if
all ovn-controllers decided to ignore a particular MAC_Binding entry,
then we can remove it from the DB from ovn-northd (or some other
mechanism).

> >
> > Please do update the lifetime description in ovn-sb(5) under the
> > MAC_Binding table regardless of what you implement.
> >
> > Thanks,
> >
> > Ben.
> >
> > On Tue, Aug 20, 2019 at 09:03:57AM +0200, Daniel Alvarez Sanchez wrote:
> > > Hi folks,
> > >
> > > Reviving this thread as we're seeing this more and more problematic.
> > > Combining the ideas mentioned up thread, Dumitru, Numan, Lucas and I
> > > had some internal discussion where we came up with a possible approach
> > > and we'd love to get feedback from you:
> > >
> > > - Local ovn-controller will always insert an OF rule per MAC_Binding
> > > entry to match on src_ip + src_mac that will be sampled with a meter
> > > to ovn-controller.
> > > - When ovn-controller sees that one entry has not been hit "for a
> > > while", it'll delete the OpenFlow rule in table 65 that fills the
> > > eth.dst field with the MAC_Binding info.
>
> I assume the rules in table 65 can be "extended" for this purpose, instead of 
> adding extra rules for this.
>
> > > - This will result in further ARP requests from the instance(s) that
> > > will refresh the MAC_Binding entries in the database.
> > >
> > > This could make troubleshooting a bit harder so at some point it'll be
> > > great to have a mechanism in OVS where we could disable a flow instead
> > > of deleting it. This way, one can tell that the flows in table 65 have
> > > been disabled due to the aging mechanism in the local node.
>
> Sorry that I didn't understand this. Why do you want the flow being disabled 
> instead of deleted? I think if we want to avoid stale entries, we do want to 
> delete them, so that the stale data doesn't occupy the space in flow table, 
> neither in SB DB. It may be ok to add debug log for deleting a aged entry in 
> ovn-controller, for trouble shooting purpose?

We can use traces as well, yes :)

>
> > >
> > > Thoughts? Is there any performance consideration regarding the extra
> > > flows and meters?
>
> Are you proposing shared meters or one meter per mac-binding? If it is per 
> mac-binding, I would be worried about the scalability considering that we may 
> have >10k of mac-bindings. Or should I be worried? Maybe Justin and Ben can 
> comment on the meter scalability. If it is a concern, I would suggest the 
> feature be configurable (i.e. enable/disable), so that it can be enabled in 
> environments where aging is required but number of mac-bindings are not very 
> high.

I was talking about one meter per mac-binding but I'll defer the
answer to others, as I don't know much about meters. I'm not a big fan
of configuration options but unless we have a clear view on this, it
makes sense to me to have a knob for the 'aging'.

>
> > >
> > > Thanks a lot!
> > > Daniel
> > >
> > >
> > > On Tue, Jul 9, 2019 at 7:19 AM Ben Pfaff  wrote:
> > > >
> > > > On Mon, Jul 08, 2019 at 06:19:23PM -0700, Han Zhou wrote:
> > > > > On Thu, Jun 27, 2019 at 6:44 AM Ben Pfaff  wrote:
> > > > > >
> > > > > > On Tue, Jun 25, 2019 at 01:05:21PM +0200, Daniel Alvare

Re: [ovs-discuss] [OVN] Aging mechanism for MAC_Binding table

2019-08-20 Thread Daniel Alvarez Sanchez
Hi folks,

Reviving this thread as we're seeing this more and more problematic.
Combining the ideas mentioned up thread, Dumitru, Numan, Lucas and I
had some internal discussion where we came up with a possible approach
and we'd love to get feedback from you:

- Local ovn-controller will always insert an OF rule per MAC_Binding
entry to match on src_ip + src_mac that will be sampled with a meter
to ovn-controller.
- When ovn-controller sees that one entry has not been hit "for a
while", it'll delete the OpenFlow rule in table 65 that fills the
eth.dst field with the MAC_Binding info.
- This will result in further ARP requests from the instance(s) that
will refresh the MAC_Binding entries in the database.

This could make troubleshooting a bit harder so at some point it'll be
great to have a mechanism in OVS where we could disable a flow instead
of deleting it. This way, one can tell that the flows in table 65 have
been disabled due to the aging mechanism in the local node.

Thoughts? Is there any performance consideration regarding the extra
flows and meters?

Thanks a lot!
Daniel


On Tue, Jul 9, 2019 at 7:19 AM Ben Pfaff  wrote:
>
> On Mon, Jul 08, 2019 at 06:19:23PM -0700, Han Zhou wrote:
> > On Thu, Jun 27, 2019 at 6:44 AM Ben Pfaff  wrote:
> > >
> > > On Tue, Jun 25, 2019 at 01:05:21PM +0200, Daniel Alvarez Sanchez wrote:
> > > > Lately we've been trying to solve certain issues related to stale
> > > > entries in the MAC_Binding table (e.g. [0]). On the other hand, for
> > > > the OpenStack + Octavia (Load Balancing service) use case, we see that
> > > > a reused VIP can be as well affected by stale entries in this table
> > > > due to the fact that it's never bound to a VIF so ovn-controller won't
> > > > claim it and send the GARPs to update the neighbors.
> > > >
> > > > I'm not sure if other scenarios may suffer from this issue but seems
> > > > reasonable to have an aging mechanism (as we discussed at some point
> > > > in the past) that makes unused/old entries to expire. After talking to
> > > > Numan on IRC, since a new pinctrl thread has been introduced recently
> > > > [1], it'd be nice to implement this aging mechanism there.
> > > > At the same time we'd be also reducing the amount of entries for long
> > > > lived systems as it'd grow indefinitely.
> > > >
> > > > Any thoughts?
> > > >
> > > > Thanks!
> > > > Daniel
> > > >
> > > > PS. With regards to the 'unused' vs 'old' entries I think it has to be
> > > > 'old' rather than 'unused' as I don't see a way to reset the TTL of a
> > > > MAC_Binding entry when we see packets coming. The implication is that
> > > > we'll be seeing ARPs sent out more often when perhaps they're not
> > > > needed. This also leads to the discussion of making the cache timeout
> > > > configurable.
> > >
> > > I've always considered the MAC_Binding implementation incomplete because
> > > of this issue and others.  ovn/TODO.rst says:
> > >
> > > * Dynamic IP to MAC binding enhancements.
> > >
> > >   OVN has basic support for establishing IP to MAC bindings
> > dynamically, using
> > >   ARP.
> > >
> > >   * Ratelimiting.
> > >
> > > From casual observation, Linux appears to generate at most one
> > ARP per
> > > second per destination.
> > >
> > > This might be supported by adding a new OVN logical action for
> > > rate-limiting.
> > >
> > >   * Tracking queries
> > >
> > >  It's probably best to only record in the database responses to
> > queries
> > >  actually issued by an L3 logical router, so somehow they have to
> > be
> > >  tracked, probably by putting a tentative binding without a MAC
> > address
> > >  into the database.
> > >
> > >   * Renewal and expiration.
> > >
> > > Something needs to make sure that bindings remain valid and
> > expire those
> > > that become stale.
> > >
> > > One way to do this might be to add some support for time to the
> > database
> > > server itself.
> > >
> > >   * Table size limiting.
> > >
> > > The table of MAC bindings must not be allowed to grow
> > unreasonably large.
> > >
> > >   * MTU handling (fragmentation on output)
> > >
> > > So, what

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-22 Thread Daniel Alvarez Sanchez
Neat! Thanks folks :)
I'll try to get an OSP setup where we can patch this and re-run the
same tests than previous time to confirm but looks promising.

On Fri, Jul 19, 2019 at 11:12 PM Han Zhou  wrote:
>
>
>
> On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique  wrote:
>>
>>
>>
>> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique  wrote:
>>>
>>>
>>>
>>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou  wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique  wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez 
>>>> >  wrote:
>>>> >>
>>>> >> Thanks Numan for running these tests outside OpenStack!
>>>> >>
>>>> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique  
>>>> >> wrote:
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou  wrote:
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou  wrote:
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique 
>>>> >> >> >  wrote:
>>>> >> >> > >
>>>> >> >> > >
>>>> >> >> > >
>>>> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou  
>>>> >> >> > > wrote:
>>>> >> >> > >>
>>>> >> >> > >>
>>>> >> >> > >>
>>>> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez 
>>>> >> >> > >>  wrote:
>>>> >> >> > >> >
>>>> >> >> > >> > Thanks a lot Han for the answer!
>>>> >> >> > >> >
>>>> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  
>>>> >> >> > >> > wrote:
>>>> >> >> > >> > >
>>>> >> >> > >> > >
>>>> >> >> > >> > >
>>>> >> >> > >> > >
>>>> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara 
>>>> >> >> > >> > >  wrote:
>>>> >> >> > >> > > >
>>>> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>>>> >> >> > >> > > >  wrote:
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > Hi Han, all,
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing 
>>>> >> >> > >> > > > > of OpenStack
>>>> >> >> > >> > > > > using OVN and wanted to present some results and issues 
>>>> >> >> > >> > > > > that we've
>>>> >> >> > >> > > > > found with the Incremental Processing feature in 
>>>> >> >> > >> > > > > ovn-controller. Below
>>>> >> >> > >> > > > > is the scenario that we executed:
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>>>> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 
>>>> >> >> > >> > > > > compute nodes. OVS
>>>> >> >> > >> > > > > 2.10.
>>>> >> >> > >> > > > > * The test consists on:
>>>> >> >> > >> > > > >   - Create openstack network (OVN LS), subnet and router
>>>> >> >> > >> > > > >   - Attach subn

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-09 Thread Daniel Alvarez Sanchez
Thanks Numan for running these tests outside OpenStack!

On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique  wrote:
>
>
>
> On Tue, Jul 9, 2019 at 11:05 AM Han Zhou  wrote:
>>
>>
>>
>> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou  wrote:
>> >
>> >
>> >
>> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique  
>> > wrote:
>> > >
>> > >
>> > >
>> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou  wrote:
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez 
>> > >>  wrote:
>> > >> >
>> > >> > Thanks a lot Han for the answer!
>> > >> >
>> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara  
>> > >> > > wrote:
>> > >> > > >
>> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>> > >> > > >  wrote:
>> > >> > > > >
>> > >> > > > > Hi Han, all,
>> > >> > > > >
>> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of 
>> > >> > > > > OpenStack
>> > >> > > > > using OVN and wanted to present some results and issues that 
>> > >> > > > > we've
>> > >> > > > > found with the Incremental Processing feature in 
>> > >> > > > > ovn-controller. Below
>> > >> > > > > is the scenario that we executed:
>> > >> > > > >
>> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute 
>> > >> > > > > nodes. OVS
>> > >> > > > > 2.10.
>> > >> > > > > * The test consists on:
>> > >> > > > >   - Create openstack network (OVN LS), subnet and router
>> > >> > > > >   - Attach subnet to the router and set gw to the external 
>> > >> > > > > network
>> > >> > > > >   - Create an OpenStack port and apply a Security Group (ACLs 
>> > >> > > > > to allow
>> > >> > > > > UDP, SSH and ICMP).
>> > >> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
>> > >> > > > > attaching it to a network namespace.
>> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in 
>> > >> > > > > NB)
>> > >> > > > >   - Wait until the test can ping the port
>> > >> > > > > * Running browbeat/rally with 16 simultaneous process to 
>> > >> > > > > execute the
>> > >> > > > > test above 150 times.
>> > >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete 
>> > >> > > > > all
>> > >> > > > > the OpenStack/OVN resources.
>> > >> > > > >
>> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which 
>> > >> > > > > showed
>> > >> > > > > 100% success but ovn-controller is quite loaded (as expected) 
>> > >> > > > > in all
>> > >> > > > > the nodes especially during the deletion phase:
>> > >> > > > >
>> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): 
>> > >> > > > > https://imgur.com/a/8ffKKYF
>> > >> > > > >
>> > >> > > > > After conducting the tests above, we replaced ovn-controller in 
>> > >> > > > > all 7
>> > >> > > > > nodes by the one with the current master branch (actually from 
>> > >> > > > > last
>> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
>> >

Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Daniel Alvarez Sanchez
On Mon, Jul 8, 2019 at 5:43 PM Ben Pfaff  wrote:
>
> Would you mind formally submitting this?  It seems like the best
> immediate solution.

Will do, thanks a lot Ben!
>
> On Mon, Jul 08, 2019 at 02:27:31PM +0200, Daniel Alvarez Sanchez wrote:
> > I tried a simple patch and it fixes the issue (see below). The
> > question now is, do we want to do this? I think it makes sense to drop
> > *all* the connections when the role changes but I'm curious to see
> > what other people think:
> >
> > diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
> > index 4dda63a..ddbbc2e 100644
> > --- a/ovsdb/jsonrpc-server.c
> > +++ b/ovsdb/jsonrpc-server.c
> > @@ -365,7 +365,7 @@ ovsdb_jsonrpc_server_set_read_only(struct
> > ovsdb_jsonrpc_server *svr,
> >  {
> >  if (svr->read_only != read_only) {
> >  svr->read_only = read_only;
> > -ovsdb_jsonrpc_server_reconnect(svr, false,
> > +ovsdb_jsonrpc_server_reconnect(svr, true,
> > xstrdup(read_only
> > ? "making server read-only"
> > : "making server 
> > read/write"));
> >
> >
> > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> > $ovn-nbctl ls-add sw0
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: active
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server
> > tcp:192.0.2.2:6641
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: backup
> > connecting: tcp:192.0.2.2:6641
> > $ ovn-nbctl ls-add sw1
> > ovn-nbctl: transaction error: {"details":"insert operation not allowed
> > when database server is in read only mode","error":"not allowed"}
> >
> > On Mon, Jul 8, 2019 at 1:25 PM Daniel Alvarez Sanchez
> >  wrote:
> > >
> > > I *think* that it may not a bug in ovsdb-server but a problem with
> > > ovn-controller as it doesn't seem to be a DB change aware client.
> > >
> > > When the role changes from master to backup or viceversa, connections
> > > are expected to be reestablished for all clients except those that are
> > > not aware of db changes [0] (note the 'false' argument). This flag is
> > > explained here [1] and looks like since ovn-controller is not
> > > monitoring the Database table in the _Server database, then the
> > > connection with it is not re-established. This is just a blind guess
> > > but  I can give it a shot :)
> > >
> > > [0] 
> > > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368
> > > [1] 
> > > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456
> > >
> > > On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique  
> > > wrote:
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez 
> > > >  wrote:
> > > >>
> > > >> Hi folks,
> > > >>
> > > >> While working with an OpenStack environment running OVN and
> > > >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
> > > >> has been probably around for a long time. The bug itself seems to be
> > > >> related with ovsdb-server not updating the read-only flag properly.
> > > >>
> > > >> With a 3 nodes cluster running ovsdb-server in active/passive mode,
> > > >> when we restart the master-node, pacemaker promotes another node as
> > > >> master and moves the associated IPAddr2 resource to it.
> > > >> At this point, ovn-controller instances across the cloud reconnect to
> > > >> the new node but there's a window where ovsdb-server is still running
> > > >> as backup.
> > > >>
> > > >> For those ovn-controller instances that reconnect within that window,
> > > >> every attempt to write in the OVSDB will fail with "operation not
> > > >> allowed when database server is in read only mode". This state will
> > > >> remain forever unless a reconnection is forced. Restarting
> > > >> ovn-controller or killing the connection (for example with tcpkill)
> > > >> will m

Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Daniel Alvarez Sanchez
I tried a simple patch and it fixes the issue (see below). The
question now is, do we want to do this? I think it makes sense to drop
*all* the connections when the role changes but I'm curious to see
what other people think:

diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
index 4dda63a..ddbbc2e 100644
--- a/ovsdb/jsonrpc-server.c
+++ b/ovsdb/jsonrpc-server.c
@@ -365,7 +365,7 @@ ovsdb_jsonrpc_server_set_read_only(struct
ovsdb_jsonrpc_server *svr,
 {
 if (svr->read_only != read_only) {
 svr->read_only = read_only;
-ovsdb_jsonrpc_server_reconnect(svr, false,
+ovsdb_jsonrpc_server_reconnect(svr, true,
xstrdup(read_only
? "making server read-only"
: "making server read/write"));


$export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
$ovn-nbctl ls-add sw0
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
state: active
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server
tcp:192.0.2.2:6641
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
state: backup
connecting: tcp:192.0.2.2:6641
$ ovn-nbctl ls-add sw1
ovn-nbctl: transaction error: {"details":"insert operation not allowed
when database server is in read only mode","error":"not allowed"}

On Mon, Jul 8, 2019 at 1:25 PM Daniel Alvarez Sanchez
 wrote:
>
> I *think* that it may not a bug in ovsdb-server but a problem with
> ovn-controller as it doesn't seem to be a DB change aware client.
>
> When the role changes from master to backup or viceversa, connections
> are expected to be reestablished for all clients except those that are
> not aware of db changes [0] (note the 'false' argument). This flag is
> explained here [1] and looks like since ovn-controller is not
> monitoring the Database table in the _Server database, then the
> connection with it is not re-established. This is just a blind guess
> but  I can give it a shot :)
>
> [0] 
> https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368
> [1] 
> https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456
>
> On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique  wrote:
> >
> >
> >
> >
> > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez  
> > wrote:
> >>
> >> Hi folks,
> >>
> >> While working with an OpenStack environment running OVN and
> >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
> >> has been probably around for a long time. The bug itself seems to be
> >> related with ovsdb-server not updating the read-only flag properly.
> >>
> >> With a 3 nodes cluster running ovsdb-server in active/passive mode,
> >> when we restart the master-node, pacemaker promotes another node as
> >> master and moves the associated IPAddr2 resource to it.
> >> At this point, ovn-controller instances across the cloud reconnect to
> >> the new node but there's a window where ovsdb-server is still running
> >> as backup.
> >>
> >> For those ovn-controller instances that reconnect within that window,
> >> every attempt to write in the OVSDB will fail with "operation not
> >> allowed when database server is in read only mode". This state will
> >> remain forever unless a reconnection is forced. Restarting
> >> ovn-controller or killing the connection (for example with tcpkill)
> >> will make things work again.
> >>
> >> A workaround in OVN OCF script could be to wait for the
> >> ovsdb_server_promote function to wait until we get 'running/active' on
> >> that instance.
> >>
> >> Another open question is what should clients (in this case,
> >> ovn-controller) do in such situation? Shall they log an error and
> >> attempt a reconnection (rate limited)?
> >
> >
> > Thanks for reporting this issue Daniel.
> >
> > I can easily  reproduce the issue with the below commands.
> >
> > $  > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> > $ovn-nbctl ls-add sw0
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: active
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server 
> > tcp:192.0.2.2:6641
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: backup
> > connecting: tcp

Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Daniel Alvarez Sanchez
I *think* that it may not a bug in ovsdb-server but a problem with
ovn-controller as it doesn't seem to be a DB change aware client.

When the role changes from master to backup or viceversa, connections
are expected to be reestablished for all clients except those that are
not aware of db changes [0] (note the 'false' argument). This flag is
explained here [1] and looks like since ovn-controller is not
monitoring the Database table in the _Server database, then the
connection with it is not re-established. This is just a blind guess
but  I can give it a shot :)

[0] 
https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368
[1] 
https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456

On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique  wrote:
>
>
>
>
> On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez  
> wrote:
>>
>> Hi folks,
>>
>> While working with an OpenStack environment running OVN and
>> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
>> has been probably around for a long time. The bug itself seems to be
>> related with ovsdb-server not updating the read-only flag properly.
>>
>> With a 3 nodes cluster running ovsdb-server in active/passive mode,
>> when we restart the master-node, pacemaker promotes another node as
>> master and moves the associated IPAddr2 resource to it.
>> At this point, ovn-controller instances across the cloud reconnect to
>> the new node but there's a window where ovsdb-server is still running
>> as backup.
>>
>> For those ovn-controller instances that reconnect within that window,
>> every attempt to write in the OVSDB will fail with "operation not
>> allowed when database server is in read only mode". This state will
>> remain forever unless a reconnection is forced. Restarting
>> ovn-controller or killing the connection (for example with tcpkill)
>> will make things work again.
>>
>> A workaround in OVN OCF script could be to wait for the
>> ovsdb_server_promote function to wait until we get 'running/active' on
>> that instance.
>>
>> Another open question is what should clients (in this case,
>> ovn-controller) do in such situation? Shall they log an error and
>> attempt a reconnection (rate limited)?
>
>
> Thanks for reporting this issue Daniel.
>
> I can easily  reproduce the issue with the below commands.
>
> $  $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> $ovn-nbctl ls-add sw0
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> state: active
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server 
> tcp:192.0.2.2:6641
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> state: backup
> connecting: tcp:192.0.2.2:6641
> $ovn-nbctl ls-add sw1  --> This should have failed. Since OVN_NB_DAEMON is 
> set, ovn-nbctl talks to the
>ovn-nbctl daemon and it is able to 
> create a logical switch even though the db is in backup mode
> $unset OVN_NB_DAEMON
> $ovn-nbctl ls-add sw2
> ovn-nbctl: transaction error: {"details":"insert operation not allowed when 
> database server is in read only mode","error":"not allowed"}
>
>
> I looked into the ovsdb-server code, when the user changes the state of the 
> ovsdb-server, the read_only param of  active ovsdb_server_sessions
> are not updated.
>
> Thanks
> Numan
>
>>
>> Thoughts?
>>
>> Thanks a lot,
>> Daniel
>> ___
>> discuss mailing list
>> disc...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Daniel Alvarez Sanchez
Hi folks,

While working with an OpenStack environment running OVN and
ovsdb-server in A/P configuration with Pacemaker we hit an issue that
has been probably around for a long time. The bug itself seems to be
related with ovsdb-server not updating the read-only flag properly.

With a 3 nodes cluster running ovsdb-server in active/passive mode,
when we restart the master-node, pacemaker promotes another node as
master and moves the associated IPAddr2 resource to it.
At this point, ovn-controller instances across the cloud reconnect to
the new node but there's a window where ovsdb-server is still running
as backup.

For those ovn-controller instances that reconnect within that window,
every attempt to write in the OVSDB will fail with "operation not
allowed when database server is in read only mode". This state will
remain forever unless a reconnection is forced. Restarting
ovn-controller or killing the connection (for example with tcpkill)
will make things work again.

A workaround in OVN OCF script could be to wait for the
ovsdb_server_promote function to wait until we get 'running/active' on
that instance.

Another open question is what should clients (in this case,
ovn-controller) do in such situation? Shall they log an error and
attempt a reconnection (rate limited)?

Thoughts?

Thanks a lot,
Daniel
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] Aging mechanism for MAC_Binding table

2019-06-25 Thread Daniel Alvarez Sanchez
Hi folks,

Lately we've been trying to solve certain issues related to stale
entries in the MAC_Binding table (e.g. [0]). On the other hand, for
the OpenStack + Octavia (Load Balancing service) use case, we see that
a reused VIP can be as well affected by stale entries in this table
due to the fact that it's never bound to a VIF so ovn-controller won't
claim it and send the GARPs to update the neighbors.

I'm not sure if other scenarios may suffer from this issue but seems
reasonable to have an aging mechanism (as we discussed at some point
in the past) that makes unused/old entries to expire. After talking to
Numan on IRC, since a new pinctrl thread has been introduced recently
[1], it'd be nice to implement this aging mechanism there.
At the same time we'd be also reducing the amount of entries for long
lived systems as it'd grow indefinitely.

Any thoughts?

Thanks!
Daniel

PS. With regards to the 'unused' vs 'old' entries I think it has to be
'old' rather than 'unused' as I don't see a way to reset the TTL of a
MAC_Binding entry when we see packets coming. The implication is that
we'll be seeing ARPs sent out more often when perhaps they're not
needed. This also leads to the discussion of making the cache timeout
configurable.

[0] 
https://github.com/openvswitch/ovs/commit/81e928526b8a9393b90785fb0a9c82d79570ef84
[1] 
https://github.com/openvswitch/ovs/commit/3594ffab6b4b423aa635a313f6b304180d7dbaf7
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-11 Thread Daniel Alvarez Sanchez
Thanks a lot Han for the answer!

On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
>
>
>
>
> On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara  wrote:
> >
> > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> >  wrote:
> > >
> > > Hi Han, all,
> > >
> > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack
> > > using OVN and wanted to present some results and issues that we've
> > > found with the Incremental Processing feature in ovn-controller. Below
> > > is the scenario that we executed:
> > >
> > > * 7 baremetal nodes setup: 3 controllers (running
> > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
> > > 2.10.
> > > * The test consists on:
> > >   - Create openstack network (OVN LS), subnet and router
> > >   - Attach subnet to the router and set gw to the external network
> > >   - Create an OpenStack port and apply a Security Group (ACLs to allow
> > > UDP, SSH and ICMP).
> > >   - Bind the port to one of the 4 compute nodes (randomly) by
> > > attaching it to a network namespace.
> > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
> > >   - Wait until the test can ping the port
> > > * Running browbeat/rally with 16 simultaneous process to execute the
> > > test above 150 times.
> > > * When all the 150 'fake VMs' are created, browbeat will delete all
> > > the OpenStack/OVN resources.
> > >
> > > We first tried with OVS/OVN 2.10 and pulled some results which showed
> > > 100% success but ovn-controller is quite loaded (as expected) in all
> > > the nodes especially during the deletion phase:
> > >
> > > - Compute node: https://imgur.com/a/tzxfrIR
> > > - Controller node (ovn-northd and ovsdb-servers): 
> > > https://imgur.com/a/8ffKKYF
> > >
> > > After conducting the tests above, we replaced ovn-controller in all 7
> > > nodes by the one with the current master branch (actually from last
> > > week). We also replaced ovn-northd and ovsdb-servers but the
> > > ovs-vswitchd has been left untouched (still on 2.10). The expected
> > > results were to get less ovn-controller CPU usage and also better
> > > times due to the Incremental Processing feature introduced recently.
> > > However, the results don't look very good:
> > >
> > > - Compute node: https://imgur.com/a/wuq87F1
> > > - Controller node (ovn-northd and ovsdb-servers): 
> > > https://imgur.com/a/99kiyDp
> > >
> > > One thing that we can tell from the ovs-vswitchd CPU consumption is
> > > that it's much less in the Incremental Processing (IP) case which
> > > apparently doesn't make much sense. This led us to think that perhaps
> > > ovn-controller was not installing the necessary flows in the switch
> > > and we confirmed this hypothesis by looking into the dataplane
> > > results. Out of the 150 VMs, 10% of them were unreachable via ping
> > > when using ovn-controller from master.
> > >
> > > @Han, others, do you have any ideas as of what could be happening
> > > here? We'll be able to use this setup for a few more days so let me
> > > know if you want us to pull some other data/traces, ...
> > >
> > > Some other interesting things:
> > > On each of the compute nodes, (with an almost evenly distributed
> > > number of logical ports bound to them), the max amount of logical
> > > flows in br-int is ~90K (by the end of the test, right before deleting
> > > the resources).
> > >
> > > It looks like with the IP version, ovn-controller leaks some memory:
> > > https://imgur.com/a/trQrhWd
> > > While with OVS 2.10, it remains pretty flat during the test:
> > > https://imgur.com/a/KCkIT4O
> >
> > Hi Daniel, Han,
> >
> > I just sent a small patch for the ovn-controller memory leak:
> > https://patchwork.ozlabs.org/patch/1113758/
> >
> > At least on my setup this is what valgrind was pointing at.
> >
> > Cheers,
> > Dumitru
> >
> > >
> > > Looking forward to hearing back :)
> > > Daniel
> > >
> > > PS. Sorry for my previous email, I sent it by mistake without the subject
> > > ___
> > > discuss mailing list
> > > disc...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
> Thanks Daniel for the testing and reporting, and thanks D

[ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-11 Thread Daniel Alvarez Sanchez
Hi Han, all,

Lucas, Numan and I have been doing some 'scale' testing of OpenStack
using OVN and wanted to present some results and issues that we've
found with the Incremental Processing feature in ovn-controller. Below
is the scenario that we executed:

* 7 baremetal nodes setup: 3 controllers (running
ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
2.10.
* The test consists on:
  - Create openstack network (OVN LS), subnet and router
  - Attach subnet to the router and set gw to the external network
  - Create an OpenStack port and apply a Security Group (ACLs to allow
UDP, SSH and ICMP).
  - Bind the port to one of the 4 compute nodes (randomly) by
attaching it to a network namespace.
  - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
  - Wait until the test can ping the port
* Running browbeat/rally with 16 simultaneous process to execute the
test above 150 times.
* When all the 150 'fake VMs' are created, browbeat will delete all
the OpenStack/OVN resources.

We first tried with OVS/OVN 2.10 and pulled some results which showed
100% success but ovn-controller is quite loaded (as expected) in all
the nodes especially during the deletion phase:

- Compute node: https://imgur.com/a/tzxfrIR
- Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF

After conducting the tests above, we replaced ovn-controller in all 7
nodes by the one with the current master branch (actually from last
week). We also replaced ovn-northd and ovsdb-servers but the
ovs-vswitchd has been left untouched (still on 2.10). The expected
results were to get less ovn-controller CPU usage and also better
times due to the Incremental Processing feature introduced recently.
However, the results don't look very good:

- Compute node: https://imgur.com/a/wuq87F1
- Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp

One thing that we can tell from the ovs-vswitchd CPU consumption is
that it's much less in the Incremental Processing (IP) case which
apparently doesn't make much sense. This led us to think that perhaps
ovn-controller was not installing the necessary flows in the switch
and we confirmed this hypothesis by looking into the dataplane
results. Out of the 150 VMs, 10% of them were unreachable via ping
when using ovn-controller from master.

@Han, others, do you have any ideas as of what could be happening
here? We'll be able to use this setup for a few more days so let me
know if you want us to pull some other data/traces, ...

Some other interesting things:
On each of the compute nodes, (with an almost evenly distributed
number of logical ports bound to them), the max amount of logical
flows in br-int is ~90K (by the end of the test, right before deleting
the resources).

It looks like with the IP version, ovn-controller leaks some memory:
https://imgur.com/a/trQrhWd
While with OVS 2.10, it remains pretty flat during the test:
https://imgur.com/a/KCkIT4O

Looking forward to hearing back :)
Daniel

PS. Sorry for my previous email, I sent it by mistake without the subject
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] (no subject)

2019-06-11 Thread Daniel Alvarez Sanchez
Hi Han, all,

Lucas, Numan and I have been doing some 'scale' testing of OpenStack
using OVN and wanted to present some results and issues that we've
found with the Incremental Processing feature in ovn-controller. Below
is the scenario that we executed:

* 7 baremetal nodes setup: 3 controllers (running
ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
2.10.
* The test consists on:
  - Create openstack network (OVN LS), subnet and router
  - Attach subnet to the router and set gw to the external network
  - Create an OpenStack port and apply a Security Group (ACLs to allow
UDP, SSH and ICMP).
  - Bind the port to one of the 4 compute nodes (randomly) by
attaching it to a network namespace.
  - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
  - Wait until the test can ping the port
* Running browbeat/rally with 16 simultaneous process to execute the
test above 150 times.
* When all the 150 'fake VMs' are created, browbeat will delete all
the OpenStack/OVN resources.

We first tried with OVS/OVN 2.10 and pulled some results which showed
100% success but ovn-controller is quite loaded (as expected) in all
the nodes especially during the deletion phase:

- Compute node: https://imgur.com/a/tzxfrIR
- Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF

After conducting the tests above, we replaced ovn-controller in all 7
nodes by the one with the current master branch (actually from last
week). We also replaced ovn-northd and ovsdb-servers but the
ovs-vswitchd has been left untouched (still on 2.10). The expected
results were to get less ovn-controller CPU usage and also better
times due to the Incremental Processing feature introduced recently.
However, the results don't look very good:

- Compute node: https://imgur.com/a/wuq87F1
- Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp

One thing that we can tell from the ovs-vswitchd CPU consumption is
that it's much less in the Incremental Processing (IP) case which
apparently doesn't make much sense. This led us to think that perhaps
ovn-controller was not installing the necessary flows in the switch
and we confirmed this hypothesis by looking into the dataplane
results. Out of the 150 VMs, 10% of them were unreachable via ping
when using ovn-controller from master.

@Han, others, do you have any ideas as of what could be happening
here? We'll be able to use this setup for a few more days so let me
know if you want us to pull some other data/traces, ...

Some other interesting things:
On each of the compute nodes, (with an almost evenly distributed
number of logical ports bound to them), the max amount of logical
flows in br-int is ~90K (by the end of the test, right before deleting
the resources).

It looks like with the IP version, ovn-controller leaks some memory:
https://imgur.com/a/trQrhWd
While with OVS 2.10, it remains pretty flat during the test:
https://imgur.com/a/KCkIT4O

Looking forward to hearing back :)
Daniel
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] how to forward all traffic from vm with dst 169.254.169.254 to local compute node

2019-05-16 Thread Daniel Alvarez Sanchez
In OpenStack we do this via a DHCP static route [0]. Then we use an
OVN 'localport' in the hypervisor inside a namespace to handle the
requests.

[0] 
https://opendev.org/openstack/networking-ovn/src/branch/stable/stein/networking_ovn/common/ovn_client.py#L1524

On Thu, May 16, 2019 at 1:13 PM Vasiliy Tolstov  wrote:
>
> Hi! I need to route all traffic (tcp) from vm to metadata ip address
> 169.254.169.254 to host server. Ideally i need to know what vm is
> going to this address.
> I know that via ovs i can create flow for this stuff, does it possible
> something like this with ovn?
>
> --
> Vasiliy Tolstov,
> e-mail: v.tols...@selfip.ru
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] Incremental processing patches

2019-05-07 Thread Daniel Alvarez Sanchez
Hi folks,

After some conversations with Han (thanks for your time and great
talk!) at the Open Infrastructure Summit in Denver last week, here I
go with this - somehow crazy - idea.

Since DDlog approach for incremental processing is not going to happen
soon and Han's reported his patches to be working quite well and seem
to be production ready (please correct me if I'm wrong Han), would it
be possible to somehow enable those and then drop it after DDlog is in
a good shape?

Han keeps rebasing them [0] and I know we could use them but I think
that the whole OVN project would get better adoption if we could have
them in place in the main repo. The main downside is its complexity
but I wonder if we can live with it until DDlog becomes a reality.

Apologies in advance if this is just a terrible idea since I'm not
fully aware of what this exactly involves in terms of technical
complexity and feasibility. I believe that it'll make DDlog harder as
it'll have to deal with both approaches at the same time but looks
like the performance benefits are huge so worth to at least consider
it?

Thanks a lot!
Daniel

[0] https://github.com/hzhou8/ovs/tree/ip12_rebased_mar29
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] High CPU load on failvoer using HA_Chassis_Group

2019-04-24 Thread Daniel Alvarez Sanchez
Hi folks,

While working on a multinode setup and created this logical topology
[0] where I scheduled a router on two gateway chassis, I found out
that after bringing down ovn-controller on the chassis where the gw
port is master, then the second chassis observes 100% CPU load on
ovn-controller and around 400Kbps from SB ovsdb-server. The events I
receive are related to the HA_Chassis_Group table:

{"HA_Chassis_Group":{"8a1b0b28-69f9-431a-bc4a-2374c59860b6":{"ha_chassis":["set",[["uuid","540d8f3a-8556-43d2-851b-f04bf9429ecd"],["uuid","d6ec9d15-4122-4f14-b8a7-2d1074b8267d"]]]}},"_date":1556110328458,"HA_Chassis":{"49bd7408-6442-473e-9d5b-6913f5a1e681":null,"540d8f3a-8556-43d2-851b-f04bf9429ecd":{"priority":20},"de2f1638-9a0b-4608-98d0-9399eee8d471":null,"d6ec9d15-4122-4f14-b8a7-2d1074b8267d":{"chassis":["uuid","4f1ac159-9fd5-4856-9ae2-8e16a193463e"],"priority":10}}}
OVSDB JSON 476 d8cc6f694a0ad69c4701aebfaa9a5ae2398c172b
{"HA_Chassis_Group":{"8a1b0b28-69f9-431a-bc4a-2374c59860b6":{"ha_chassis":["set",[["uuid","90ba76f6-965d-49a9-89de-c38a989d375b"],["uuid","af977a93-a3d8-455f-b044-0f21c0f37dec"]]]}},"_date":1556110328461,"HA_Chassis":{"90ba76f6-965d-49a9-89de-c38a989d375b":{"chassis":["uuid","4f1ac159-9fd5-4856-9ae2-8e16a193463e"],"priority":10},"540d8f3a-8556-43d2-851b-f04bf9429ecd":null,"af977a93-a3d8-455f-b044-0f21c0f37dec":{"priority":20},"d6ec9d15-4122-4f14-b8a7-2d1074b8267d":null}}
OVSDB JSON 476 b4df3460600c0af2f3132cd7808ad5574eaa5323
{"HA_Chassis_Group":{"8a1b0b28-69f9-431a-bc4a-2374c59860b6":{"ha_chassis":["set",[["uuid","eb80ea97-bb4b-4deb-82ad-35a4d830a20f"],["uuid","f38514f2-bfcf-4ff4-9cc6-d19a76b868ea"]]]}},"_date":1556110328463,"HA_Chassis":{"90ba76f6-965d-49a9-89de-c38a989d375b":null,"f38514f2-bfcf-4ff4-9cc6-d19a76b868ea":{"priority":20},"eb80ea97-bb4b-4deb-82ad-35a4d830a20f":{"chassis":["uuid","4f1ac159-9fd5-4856-9ae2-8e16a193463e"],"priority":10},"af977a93-a3d8-455f-b044-0f21c0f37dec":null}}
OVSDB JSON 476 e3106ae2b527fc5000b8663b5df283f726d2ffec
{"HA_Chassis_Group":{"8a1b0b28-69f9-431a-bc4a-2374c59860b6":{"ha_chassis":["set",[["uuid","05365741-39fe-4a43-83ed-6f8e31322155"],["uuid","771fd1db-8b5c-4d5d-9e74-4ac628ae01ff"]]]}},"_date":1556110328465,"HA_Chassis":{"f38514f2-bfcf-4ff4-9cc6-d19a76b868ea":null,"05365741-39fe-4a43-83ed-6f8e31322155":{"chassis":["uuid","4f1ac159-9fd5-4856-9ae2-8e16a193463e"],"priority":10},"771fd1db-8b5c-4d5d-9e74-4ac628ae01ff":{"priority":20},"eb80ea97-bb4b-4deb-82ad-35a4d830a20f":null}}
OVSDB JSON 476 13ffdecbf19116a2ecd74d3c754d442d11e3dc14
{"HA_Chassis_Group":{"8a1b0b28-69f9-431a-bc4a-2374c59860b6":{"ha_chassis":["set",[["uuid","6b82fb26-b448-4e61-9543-30f21e7f9e9a"],["uuid","e7ac036f-bf94-4a75-847c-c3fd6a87269e"]]]}},"_date":1556110328467,"HA_Chassis":{"05365741-39fe-4a43-83ed-6f8e31322155":null,"e7ac036f-bf94-4a75-847c-c3fd6a87269e":{"chassis":["uuid","4f1ac159-9fd5-4856-9ae2-8e16a193463e"],"priority":10},"771fd1db-8b5c-4d5d-9e74-4ac628ae01ff":null,"6b82fb26-b448-4e61-9543-30f21e7f9e9a":{"priority":20}}}
OVSDB JSON 546 88601aae13e44a53c5a91bbb90ed08e7e480a9f5
{"Chassis":{"c88b7db8-82c1-4deb-803e-ebf7d4e9a077":{"name":"gw1","hostname":"gw1","encaps":["uuid","0d994cfd-bebf-4321-9fdb-81029003bea0"],"external_ids":["map",[["datapath-type",""],["iface-types","erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan"],["ovn-bridge-mappings","external:br-ex"]]]}},"Encap":{"0d994cfd-bebf-4321-9fdb-81029003bea0":{"ip":"192.168.50.102","options":["map",[["csum","true"]]],"chassis_name":"gw1","type":"geneve"}},"_date":1556110328468,"_comment":"ovn-controller:
registering chassis 'gw1'"}


[0] 
https://github.com/danalsan/vagrants/blob/master/ovn-playground/create_ovn_resources.sh
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVS/OVN troubleshooting: where's my packet?

2019-03-15 Thread Daniel Alvarez Sanchez
Sounds like a great plan, Ben! Thanks for that. It'd be great if
people could chime in this thread to help identify those gaps.

As about the anecdotes, we had just been involved in a case where OVN
was used and packets were dropped at conntrack:

Two VMs on different Logical Switches (externally routed), running on
the same hypervisor were communicating between each other and packet
loss was observed. The packet loss was observed only on small (<64B)
packets. These packets were padded by the NIC before being put on the
wire and when they came back, due to the ACLs, they were put into
conntrack and dropped there. We determined this by inspecting DP flows
via 'ovs-dpctl dump-flows' and then we enabled logging on netfilter
which showed that there was an error with the checksum calculation. It
happened to be a bug on the OVS kernel side which was already fixed in
newer kernels but it took quite a while to figure out and a good
understanding on what was going on. In this scenario, if OVN ACLs were
removed, traffic worked so OVN was the first to be blamed. And
sometimes, the OVN user/engineer is not an OVS expert to be able to
tell effectively what happened to a packet.

Maybe the example is not the best as it was resolved using just the
'ovs-dpctl' tool and some logging but support engineers may loop in
OVN engineers which may loop in OVS engineers which may loop in kernel
engineers. It'd be great to improve the experience somehow so that the
initial assessment doesn't have to go always all the way down.

I'm curious about other folks' experiences here as well with more pure
OVS experience.

Thanks a lot!
Daniel

On Thu, Mar 14, 2019 at 5:55 PM Ben Pfaff  wrote:
>
> On Thu, Mar 14, 2019 at 04:55:56PM +0100, Daniel Alvarez Sanchez wrote:
> > Hi folks,
> >
> > Lately I'm getting the question in the subject line more and more
> > frequently and facing it myself, especially in the context of
> > OpenStack.
> >
> > The shift to OVN in OpenStack involves a totally different approach
> > when it comes to tracing packet drops. Before OVN, there were a bunch
> > of network namespaces and devices where you could hook a tcpdump on
> > and inspect the traffic. People are used to those troubleshooting
> > techniques and OVS was merely used for normal action switches.
> >
> > It's clear that there's tools and techniques to analyze this (trace
> > tool, port mirroring, etc.), but often times requires quite high
> > knowledge and understanding of the pipeline and OVS itself to
> > effectively trace where a packet got dropped. Furthermore, there could
> > be some scenarios where the packet can be silently dropped.
> >
> > I came across this patch [0] and presentation about it [1] which aims
> > to tackle partly the problem described here (focusing in the DPDK
> > datapath).
> >
> > The intent of this email is to gather some feedback as how to provide
> > efficient tools and techniques to troubleshoot OVS/OVN issues and what
> > do you think is immediately missing in this context.
>
> I guess that there are multiple things to do here:
>
> - Better document the tools that are available.
>
> - Implement improvements, especially UX-wise, to the existing tools.
>
> - Identify gaps in the available tools (and then fill them).
>
> Do you have any good anecdotes about user/admin frustration?  They might
> be helpful for figuring out how to help.  A lot of us here designed and
> built this stuff and so the gaps are not always obvious to us.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVS/OVN troubleshooting: where's my packet?

2019-03-14 Thread Daniel Alvarez Sanchez
Hi folks,

Lately I'm getting the question in the subject line more and more
frequently and facing it myself, especially in the context of
OpenStack.

The shift to OVN in OpenStack involves a totally different approach
when it comes to tracing packet drops. Before OVN, there were a bunch
of network namespaces and devices where you could hook a tcpdump on
and inspect the traffic. People are used to those troubleshooting
techniques and OVS was merely used for normal action switches.

It's clear that there's tools and techniques to analyze this (trace
tool, port mirroring, etc.), but often times requires quite high
knowledge and understanding of the pipeline and OVS itself to
effectively trace where a packet got dropped. Furthermore, there could
be some scenarios where the packet can be silently dropped.

I came across this patch [0] and presentation about it [1] which aims
to tackle partly the problem described here (focusing in the DPDK
datapath).

The intent of this email is to gather some feedback as how to provide
efficient tools and techniques to troubleshoot OVS/OVN issues and what
do you think is immediately missing in this context.

Thanks a lot!
Daniel

[0] https://patchwork.ozlabs.org/patch/918934/
 which has been revived here: https://patchwork.ozlabs.org/patch/1048766/
[1] 
https://www.slideshare.net/LF_OpenvSwitch/lfovs17troubleshooting-the-data-plane-in-ovs-82280329
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN: availability zones concept

2019-03-06 Thread Daniel Alvarez Sanchez
Thanks Dan for chiming in and others as well for your feedback!

I also thought of having separate OVN deployments but that introduces
the drawbacks that Han pointed out adding - maybe a lot of - burden to
the CMS. Separate zones in the same OVN deployment will add minimal
changes (at deployment size to define the zones and when selecting the
nodes to schedule a gateway are the only ones I can think of).

For the particular case of OpenStack, the overlapping TZs won't be of
much help I guess as there's no notion of overlapping failure domains.
However, if you see value for OVN itself or other CMS's, I think it's
not too expensive to have (I'm possibly being naive here).

That said, I think that Han's suggestion of establishing tunnels
dynamically is *great*! It will improve scalability and will be more
efficient generally (even if OpenStack is not using Availability
Zones), reducing traffic and processing and avoiding deployer/ops to
configure each node. The Transport/Availability zone concept will be
then implemented at the CMS layer which makes sense to me (in the
OpenStack case, just scheduling gateways).

Only concern, as Han said, would be the latency between the time the
port gets bound and it becomes available from other hypervisors
(especially in the case of the encrypted tunnels). If we could have
some figures before making the decision, it'd be great but my vote
goes for this approach :) Han++

Thanks all,
Daniel


On Wed, Mar 6, 2019 at 11:27 AM Dan Sneddon  wrote:
>
>
>
> On Tue, Mar 5, 2019 at 9:40 PM Han Zhou  wrote:
>>
>> On Tue, Mar 5, 2019 at 7:24 PM Ben Pfaff  wrote:
>> > What's the effective difference between an OVN deployment with 3 zones,
>> > and a collection of 3 OVN deployments?  Is it simply that the 3-zone
>> > deployment shares databases?  Is that a significant advantage?
>>
>> Hi Ben, based on the discussions there are two cases:
>>
>> For completely separated zones (no overlapping) v.s. separate OVN
>> deployments, the difference is that separate OVN deployments requires
>> some sort of federation at a higher layer, so that a single CMS can
>> operate multiple OVN deployments. Of course separate zones in same OVN
>> still requires changes in CMS to operate but the change may be smaller
>> in some cases.
>>
>> For overlapping zones v.s. separate OVN deployments, the difference is
>> more obvious. Separate OVN deployments doesn't allow overlapping.
>> Overlapping zones allows sharing gateways between different groups of
>> hypervisors.
>>
>> If the purpose is only reducing tunnel mesh size, I think it may be
>> better to avoid the zone concept but instead create tunnels (and bfd
>> sessions) on-demand, as discussed here:
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-March/048281.html
>>
>> Daniel or other folks please comment if there are other benefit of
>> creating zones.
>>
>> Thanks,
>> Han
>
>
> The original discussion came about when I was consulting with a very large 
> bank who were considering network designs for an application cloud. In that 
> case, all chassis were in a single site, and the desire was to be able to 
> separate groups of chassis into trust zones with no East-West communication 
> between zones. Of course this same result can be handled via network 
> segregation and firewalling, but zones would provide an additional layer of 
> security enforcement. In their case, the choice due to policy was to have 
> separate flow controllers and software routers in each zone rather than rely 
> on firewalls alone, but this increased the hardware footprint.
>
> When I discovered that there was no way to prevent tunnels from being formed 
> between all chassis, that became an obvious problem for edge scenarios. To me 
> that is the more pressing issue, which dynamic tunnels would solve. However, 
> the ability to have separate transit zones would also be a useful feature, in 
> my opinion.
>
> --
> Dan Sneddon
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVN: availability zones concept

2019-02-28 Thread Daniel Alvarez Sanchez
Hi folks,

Just wanted to throw an idea here about introducing availability zones
(AZ) concept in OVN and get implementation ideas. From a CMS
perspective, it makes sense to be able to implement some sort of
logical division of resources into failure domains to maximize their
availability.

In this sense, establishing a full mesh of Geneve tunnels is not
needed (and possibly undesired when strict firewalls are used between
AZs) as L2 connectivity will be constrained to the AZ boundaries.

A possibility would be to let the deployer of the CMS set a key on the
OpenvSwitch table of the local OVS instance like
'external_ids:ovn_az=' and if it's set, ovn-controller will
register itself as a Chassis with the same external ID and establish
tunnels to those Chassis within the same AZ, otherwise it'll keep the
current behavior.

It'll be responsibility of the CMS to schedule gateway ports in the
right AZ as well to provide L3 AZ awareness.

Does that make sense? Thoughts?

Thanks a lot!!
Daniel
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-12-03 Thread Daniel Alvarez Sanchez
On Mon, Dec 3, 2018 at 3:48 PM Mark Michelson  wrote:
>
> On 12/01/2018 03:44 PM, Han Zhou wrote:
> >
> >
> > On Fri, Nov 30, 2018 at 7:29 AM Daniel Alvarez Sanchez
> > mailto:dalva...@redhat.com>> wrote:
> >  >
> >  > Thanks folks again for the discussion.
> >  > I sent an RFC patch here [0]. I tried it out with my reproducer and it
> >  > seems to work well. Instead of outputting the packet to the localnet
> >  > ofport, it will inject it to the public switch pipeline so it'll get
> >  > broadcasted to the rest of the ports resulting in other Logical
> >  > Routers connected to the external switch updating their neighbours. As
> >  > it's broadcasted, the GARP will also be sent out through the localnet
> >  > port as before.
> >  >
> >  > Looking forward to your comments before moving on and writing tests.
> >  >
> >  > Thanks Numan for your help!
> >  >
> >  > [0]
> > https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354220.html
> >  > On Wed, Nov 28, 2018 at 3:32 PM Daniel Alvarez Sanchez
> >  > mailto:dalva...@redhat.com>> wrote:
> >  > >
> >  > > Hi all,
> >  > >
> >  > > As this thread is getting big I'm summarizing the issue I see so far:
> >  > >
> >  > > * When a dnat_and_snat entry is added to a logical router (or port
> >  > > gets bound to a chassis), ovn-controller will send GARPs to announce
> >  > > the MAC address of the FIP(s) (either the gw port or of the actual FIP
> >  > > MAC address if distributed) only through localnet ports [0].
> >  > >
> >  > > * This means that gateway ports bound to that same chassis and
> >  > > connected to the public switch won't get the GARPs, so they won't
> >  > > update their MAC_Binding entries causing unreachability. In the
> >  > > diagram of this thread, LR0 won't get the GARP sent by ovn-controller
> >  > > if both gateway ports are bound to the same chassis.
> >  > >
> >  > > I tried out sending GARPs from the external network using master
> >  > > branch and MAC_Binding entries get updated. However, in order to cover
> >  > > missing cases, I think it would make sense to send the GARPs not only
> >  > > to localnet ports but to all ports of those logical switches that have
> >  > > a localnet port. What do you think?
> >  > >
> >  > > [0]
> > https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073
> >  > >
> >  > > [0]
> > https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073On
> >  > > Fri, Nov 23, 2018 at 5:28 PM Daniel Alvarez Sanchez
> >  > > mailto:dalva...@redhat.com>> wrote:
> >  > > >
> >  > > > On Wed, Nov 21, 2018 at 9:04 PM Han Zhou  > <mailto:zhou...@gmail.com>> wrote:
> >  > > > >
> >  > > > >
> >  > > > >
> >  > > > > On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson
> > mailto:mmich...@redhat.com>> wrote:
> >  > > > > >
> >  > > > > > Hi Daniel,
> >  > > > > >
> >  > > > > > I agree with Numan that this seems like a good approach to take.
> >  > > > > >
> >  > > > > > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> >  > > > > > >
> >  > > > > > > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff  > <mailto:b...@ovn.org>
> >  > > > > > > <mailto:b...@ovn.org <mailto:b...@ovn.org>>> wrote:
> >  > > > > > >  >
> >  > > > > > >  > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique
> > wrote:
> >  > > > > > >  > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> >  > > > > > > mailto:dalva...@redhat.com>
> > <mailto:dalva...@redhat.com <mailto:dalva...@redhat.com>>>
> >  > > > > > >  > > wrote:
> >  > > > > > >  > >
> >  > > > > > >  > > > Hi,
> >  > > > > > >  > > >
> >  > > > > > >  > > > After digging further. The problem seems to be
> > reduced to reusing an
> >  > > > > > >  > > > old gateway IP address for a dnat_and_snat entry.
> >  > > > &g

Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-11-30 Thread Daniel Alvarez Sanchez
Thanks folks again for the discussion.
I sent an RFC patch here [0]. I tried it out with my reproducer and it
seems to work well. Instead of outputting the packet to the localnet
ofport, it will inject it to the public switch pipeline so it'll get
broadcasted to the rest of the ports resulting in other Logical
Routers connected to the external switch updating their neighbours. As
it's broadcasted, the GARP will also be sent out through the localnet
port as before.

Looking forward to your comments before moving on and writing tests.

Thanks Numan for your help!

[0] https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354220.html
On Wed, Nov 28, 2018 at 3:32 PM Daniel Alvarez Sanchez
 wrote:
>
> Hi all,
>
> As this thread is getting big I'm summarizing the issue I see so far:
>
> * When a dnat_and_snat entry is added to a logical router (or port
> gets bound to a chassis), ovn-controller will send GARPs to announce
> the MAC address of the FIP(s) (either the gw port or of the actual FIP
> MAC address if distributed) only through localnet ports [0].
>
> * This means that gateway ports bound to that same chassis and
> connected to the public switch won't get the GARPs, so they won't
> update their MAC_Binding entries causing unreachability. In the
> diagram of this thread, LR0 won't get the GARP sent by ovn-controller
> if both gateway ports are bound to the same chassis.
>
> I tried out sending GARPs from the external network using master
> branch and MAC_Binding entries get updated. However, in order to cover
> missing cases, I think it would make sense to send the GARPs not only
> to localnet ports but to all ports of those logical switches that have
> a localnet port. What do you think?
>
> [0] 
> https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073
>
> [0] 
> https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073On
> Fri, Nov 23, 2018 at 5:28 PM Daniel Alvarez Sanchez
>  wrote:
> >
> > On Wed, Nov 21, 2018 at 9:04 PM Han Zhou  wrote:
> > >
> > >
> > >
> > > On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson  
> > > wrote:
> > > >
> > > > Hi Daniel,
> > > >
> > > > I agree with Numan that this seems like a good approach to take.
> > > >
> > > > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> > > > >
> > > > > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff  > > > > <mailto:b...@ovn.org>> wrote:
> > > > >  >
> > > > >  > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique wrote:
> > > > >  > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> > > > > mailto:dalva...@redhat.com>>
> > > > >  > > wrote:
> > > > >  > >
> > > > >  > > > Hi,
> > > > >  > > >
> > > > >  > > > After digging further. The problem seems to be reduced to 
> > > > > reusing an
> > > > >  > > > old gateway IP address for a dnat_and_snat entry.
> > > > >  > > > When a gateway port is bound to a chassis, its entry will show 
> > > > > up in
> > > > >  > > > the MAC_Binding table (at least when that Logical Switch is 
> > > > > connected
> > > > >  > > > to more than one Logical Router). After deleting the Logical 
> > > > > Router
> > > > >  > > > and all its ports, this entry will remain there. If a new 
> > > > > Logical
> > > > >  > > > Router is created and a Floating IP (dnat_and_snat) is 
> > > > > assigned to a
> > > > >  > > > VM with the old gw IP address, it will become unreachable.
> > > > >  > > >
> > > > >  > > > A workaround now from networking-ovn (OpenStack integration) 
> > > > > is to
> > > > >  > > > delete MAC_Binding entries for that IP address upon a FIP 
> > > > > creation. I
> > > > >  > > > think that this however should be done from OVN, what do you 
> > > > > folks
> > > > >  > > > think?
> > > > >  > > >
> > > > >  > > >
> > > > >  > > Agree. Since the MAC_Binding table row is created by 
> > > > > ovn-controller, it
> > > > >  > > should
> > > > >  > > be handled properly within OVN.
> > > > >  >
> > > > >  > I 

Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-11-28 Thread Daniel Alvarez Sanchez
Hi all,

As this thread is getting big I'm summarizing the issue I see so far:

* When a dnat_and_snat entry is added to a logical router (or port
gets bound to a chassis), ovn-controller will send GARPs to announce
the MAC address of the FIP(s) (either the gw port or of the actual FIP
MAC address if distributed) only through localnet ports [0].

* This means that gateway ports bound to that same chassis and
connected to the public switch won't get the GARPs, so they won't
update their MAC_Binding entries causing unreachability. In the
diagram of this thread, LR0 won't get the GARP sent by ovn-controller
if both gateway ports are bound to the same chassis.

I tried out sending GARPs from the external network using master
branch and MAC_Binding entries get updated. However, in order to cover
missing cases, I think it would make sense to send the GARPs not only
to localnet ports but to all ports of those logical switches that have
a localnet port. What do you think?

[0] 
https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073

[0] 
https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073On
Fri, Nov 23, 2018 at 5:28 PM Daniel Alvarez Sanchez
 wrote:
>
> On Wed, Nov 21, 2018 at 9:04 PM Han Zhou  wrote:
> >
> >
> >
> > On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson  wrote:
> > >
> > > Hi Daniel,
> > >
> > > I agree with Numan that this seems like a good approach to take.
> > >
> > > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> > > >
> > > > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff  > > > <mailto:b...@ovn.org>> wrote:
> > > >  >
> > > >  > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique wrote:
> > > >  > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> > > > mailto:dalva...@redhat.com>>
> > > >  > > wrote:
> > > >  > >
> > > >  > > > Hi,
> > > >  > > >
> > > >  > > > After digging further. The problem seems to be reduced to 
> > > > reusing an
> > > >  > > > old gateway IP address for a dnat_and_snat entry.
> > > >  > > > When a gateway port is bound to a chassis, its entry will show 
> > > > up in
> > > >  > > > the MAC_Binding table (at least when that Logical Switch is 
> > > > connected
> > > >  > > > to more than one Logical Router). After deleting the Logical 
> > > > Router
> > > >  > > > and all its ports, this entry will remain there. If a new Logical
> > > >  > > > Router is created and a Floating IP (dnat_and_snat) is assigned 
> > > > to a
> > > >  > > > VM with the old gw IP address, it will become unreachable.
> > > >  > > >
> > > >  > > > A workaround now from networking-ovn (OpenStack integration) is 
> > > > to
> > > >  > > > delete MAC_Binding entries for that IP address upon a FIP 
> > > > creation. I
> > > >  > > > think that this however should be done from OVN, what do you 
> > > > folks
> > > >  > > > think?
> > > >  > > >
> > > >  > > >
> > > >  > > Agree. Since the MAC_Binding table row is created by 
> > > > ovn-controller, it
> > > >  > > should
> > > >  > > be handled properly within OVN.
> > > >  >
> > > >  > I see that this has been sitting here for a while.  The solution 
> > > > seems
> > > >  > reasonable to me.  Are either of you working on it?
> > > >
> > > > I started working on it. I came up with a solution (see patch below)
> > > > which works but I wanted to give you a bit more of context and get your
> > > > feedback:
> > > >
> > > >
> > > > ^ localnet
> > > > |
> > > > +---+---+
> > > > |   |
> > > >  +--+  pub  +--+
> > > >  |  |   |  |
> > > >  |  +---+  |
> > > >  | 172.24.4.0/24 <http://172.24.4.0/24>|
> > > >  | |
> > > > 172.24.4.220 | | 172.24.4.221
> > > >  +---+---+ +---+---+
> > > >   

Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-11-28 Thread Daniel Alvarez Sanchez
On Wed, Nov 28, 2018 at 3:10 PM Ben Pfaff  wrote:
>
> On Wed, Nov 28, 2018 at 12:07:55PM +0100, Daniel Alvarez Sanchez wrote:
> > On Mon, Nov 26, 2018 at 9:30 PM Ben Pfaff  wrote:
> > >
> > > On Fri, Nov 16, 2018 at 06:41:33PM +0100, Daniel Alvarez Sanchez wrote:
> > > > +static void
> > > > +delete_mac_binding_by_ip(struct northd_context *ctx, const char *ip)
> > > > +{
> > > > +const struct sbrec_mac_binding *b, *n;
> > > > +SBREC_MAC_BINDING_FOR_EACH_SAFE (b, n, ctx->ovnsb_idl) {
> > > > +if (strstr(ip, b->ip)) {
> > > > +sbrec_mac_binding_delete(b);
> > > > +}
> > > > +}
> > > > +}
> > >
> > > I haven't read the whole thread properly yet, but: why does this use
> > > strstr()?
> >
> > I used it because b->ip could be like "50:57:00:00:00:02 20.0.0.10"
> > and wanted to check if the IP address was present there.
>
> Is the 'ip' column in the MAC_Binding table documented incorrectly?  It
> is currently documented as:
>
>ip: string
>   The bound IP address.
>
> which doesn't mention strings that also contain a MAC address.
Sorry for the confusion, the prototype is misleading. It's not the
'ip' col of the MAC_Binding table but the 'mac' column of the
Port_Binding table which is what's being passed to the
'delete_mac_binding_by_ip()' function.
+for (int i = 0; i < op->sb->n_mac; i++) {
+delete_mac_binding_by_ip(ctx, op->sb->mac[i]);

Thanks!
Daniel


>
> > I am sending another email to this thread with more details about the
> > current issue, to gather more feedback.  As Han says, the patch I sent
> > is not covering all situations and perhaps it's not the best way to
> > fix it but need to confirm few things before moving forward.
>
> Thanks.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-11-28 Thread Daniel Alvarez Sanchez
On Mon, Nov 26, 2018 at 9:30 PM Ben Pfaff  wrote:
>
> On Fri, Nov 16, 2018 at 06:41:33PM +0100, Daniel Alvarez Sanchez wrote:
> > +static void
> > +delete_mac_binding_by_ip(struct northd_context *ctx, const char *ip)
> > +{
> > +const struct sbrec_mac_binding *b, *n;
> > +SBREC_MAC_BINDING_FOR_EACH_SAFE (b, n, ctx->ovnsb_idl) {
> > +if (strstr(ip, b->ip)) {
> > +sbrec_mac_binding_delete(b);
> > +}
> > +}
> > +}
>
> I haven't read the whole thread properly yet, but: why does this use
> strstr()?

I used it because b->ip could be like "50:57:00:00:00:02 20.0.0.10"
and wanted to check if the IP address was present there. I am sending
another email to this thread with more details about the current
issue, to gather more feedback.
As Han says, the patch I sent is not covering all situations and
perhaps it's not the best way to fix it but need to confirm few things
before moving forward.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-11-23 Thread Daniel Alvarez Sanchez
On Wed, Nov 21, 2018 at 9:04 PM Han Zhou  wrote:
>
>
>
> On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson  wrote:
> >
> > Hi Daniel,
> >
> > I agree with Numan that this seems like a good approach to take.
> >
> > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> > >
> > > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff  > > <mailto:b...@ovn.org>> wrote:
> > >  >
> > >  > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique wrote:
> > >  > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> > > mailto:dalva...@redhat.com>>
> > >  > > wrote:
> > >  > >
> > >  > > > Hi,
> > >  > > >
> > >  > > > After digging further. The problem seems to be reduced to reusing 
> > > an
> > >  > > > old gateway IP address for a dnat_and_snat entry.
> > >  > > > When a gateway port is bound to a chassis, its entry will show up 
> > > in
> > >  > > > the MAC_Binding table (at least when that Logical Switch is 
> > > connected
> > >  > > > to more than one Logical Router). After deleting the Logical Router
> > >  > > > and all its ports, this entry will remain there. If a new Logical
> > >  > > > Router is created and a Floating IP (dnat_and_snat) is assigned to 
> > > a
> > >  > > > VM with the old gw IP address, it will become unreachable.
> > >  > > >
> > >  > > > A workaround now from networking-ovn (OpenStack integration) is to
> > >  > > > delete MAC_Binding entries for that IP address upon a FIP 
> > > creation. I
> > >  > > > think that this however should be done from OVN, what do you folks
> > >  > > > think?
> > >  > > >
> > >  > > >
> > >  > > Agree. Since the MAC_Binding table row is created by ovn-controller, 
> > > it
> > >  > > should
> > >  > > be handled properly within OVN.
> > >  >
> > >  > I see that this has been sitting here for a while.  The solution seems
> > >  > reasonable to me.  Are either of you working on it?
> > >
> > > I started working on it. I came up with a solution (see patch below)
> > > which works but I wanted to give you a bit more of context and get your
> > > feedback:
> > >
> > >
> > > ^ localnet
> > > |
> > > +---+---+
> > > |   |
> > >  +--+  pub  +--+
> > >  |  |   |  |
> > >  |  +---+  |
> > >  | 172.24.4.0/24 <http://172.24.4.0/24>|
> > >  | |
> > > 172.24.4.220 | | 172.24.4.221
> > >  +---+---+ +---+---+
> > >  |   | |   |
> > >  |  LR0  | |  LR1  |
> > >  |   | |   |
> > >  +---+---+ +---+---+
> > >   10.0.0.254 | | 20.0.0.254
> > >  | |
> > >  +---+---+ +---+---+
> > >  |   | |   |
> > > 10.0.0.0/24 <http://10.0.0.0/24> |  SW0  | |  SW1  |
> > > 20.0.0.0/24 <http://20.0.0.0/24>
> > >  |   | |   |
> > >  +---+---+ +---+---+
> > >  | |
> > >  | |
> > >  +---+---+ +---+---+
> > >  |   | |   |
> > >  |  VM0  | |  VM1  |
> > >  |   | |   |
> > >  +---+ +---+
> > >  10.0.0.10 20.0.0.10
> > >172.24.4.100   172.24.4.200
> > >
> > >
> > > When I ping VM1 floating IP from the external network, a new entry for
> > > 172.24.4.221 in the LR0 datapath appears in the MAC_Binding table:
> > >
> > > _uuid   : 85e30e87-3c59-423e-8681-ec4cfd9205f9
> > > datapath: ac5984b9-0fea-485f-84d4-031bdeced29b
> > > ip

Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-11-23 Thread Daniel Alvarez Sanchez
Hi Han,

Yes, I agree that the patch is not enough. I'll take a look at the
GARP thing because it's either not implemented or not working. Here's
a reproducer while I jump back into it.

When you ping 172.24.4.200 from the namespace 1 the first time, a
MAC_Binding entry gets created:

# ovn-sbctl list mac_binding | grep 200 -C2
_uuid   : 07967416-c89c-4233-8cc2-4dc929720838
datapath: 918a9363-fa6e-4086-98ee-8d073b924d29
ip  : "172.24.4.200"
logical_port: "lr0-public"
mac : "00:00:20:20:12:15"


After recreating lr1 and sw1 using a different MAC address,
172.24.4.200 becomes unreachable from sw0 as the MAC_Binding entry
never gets updated.


reproducer.sh

#!/bin/bash
for i in $(ovn-sbctl list mac_binding | grep uuid  | awk '{print
$3}'); do ovn-sbctl destroy mac_binding $i; done

ip net del ns1
ip net del ns2
ovs-vsctl del-port ns1
ovs-vsctl del-port ns2
ovn-nbctl lr-del lr0
ovn-nbctl lr-del lr1
ovn-nbctl ls-del sw0
ovn-nbctl ls-del sw1
ovn-nbctl ls-del public

chassis_name=`ovn-sbctl find chassis | grep ^name | awk '{print $3}'`
ovn-nbctl ls-add sw0
ovn-nbctl lsp-add sw0 sw0-port1
ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 10.0.0.10"


ovn-nbctl lr-add lr0
# Connect sw0 to lr0
ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.254/24
ovn-nbctl lsp-add sw0 sw0-lr0
ovn-nbctl lsp-set-type sw0-lr0 router
ovn-nbctl lsp-set-addresses sw0-lr0 router
ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0


ovn-nbctl ls-add public
ovn-nbctl lrp-add lr0  lr0-public 00:00:20:20:12:13 172.24.4.220/24
ovn-nbctl lsp-add public public-lr0
ovn-nbctl lsp-set-type public-lr0 router
ovn-nbctl lsp-set-addresses public-lr0 router
ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public

# localnet port
ovn-nbctl lsp-add public ln-public
ovn-nbctl lsp-set-type ln-public localnet
ovn-nbctl lsp-set-addresses ln-public unknown
ovn-nbctl lsp-set-options ln-public network_name=public

ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "50:57:00:00:00:02 20.0.0.10"

ovn-nbctl lr-add lr1
# Connect sw1 to lr1
ovn-nbctl lrp-add lr1 lr1-sw1 00:00:00:00:ff:02 20.0.0.254/24
ovn-nbctl lsp-add sw1 sw1-lr1
ovn-nbctl lsp-set-type sw1-lr1 router
ovn-nbctl lsp-set-addresses sw1-lr1 router
ovn-nbctl lsp-set-options sw1-lr1 router-port=lr1-sw1

ovn-nbctl lrp-add lr1  lr1-public 00:00:20:20:12:15 172.24.4.221/24
ovn-nbctl lsp-add public public-lr1
ovn-nbctl lsp-set-type public-lr1 router
ovn-nbctl lsp-set-addresses public-lr1 router
ovn-nbctl lsp-set-options public-lr1 router-port=lr1-public


ovn-nbctl lr-nat-add lr0 snat 172.24.4.220 10.0.0.0/24
ovn-nbctl lr-nat-add lr1 snat 172.24.4.221  20.0.0.0/24

# Create the FIPs
ovn-nbctl lr-nat-add lr0 dnat_and_snat 172.24.4.100 10.0.0.10
ovn-nbctl lr-nat-add lr1 dnat_and_snat 172.24.4.200 20.0.0.10

# Schedule the gateways
ovn-nbctl lrp-set-gateway-chassis lr0-public $chassis_name 20
ovn-nbctl lrp-set-gateway-chassis lr1-public $chassis_name  20


add_phys_port() {
name=$1
mac=$2
ip=$3
mask=$4
gw=$5
iface_id=$6
ip netns add $name
ovs-vsctl add-port br-int $name -- set interface $name type=internal
ip link set $name netns $name
ip netns exec $name ip link set $name address $mac
ip netns exec $name ip addr add $ip/$mask dev $name
ip netns exec $name ip link set $name up
ip netns exec $name ip route add default via $gw
ovs-vsctl set Interface $name external_ids:iface-id=$iface_id
}


add_phys_port ns1 50:54:00:00:00:01 10.0.0.10  24 10.0.0.254 sw0-port1
add_phys_port ns2 50:57:00:00:00:02 20.0.0.10  24 20.0.0.254 sw1-port1

# Pinging from sw0
ip net e ns1 ping -c 4 172.24.4.200

ovn-nbctl lr-del lr1
ovn-nbctl ls-del sw1

ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "50:57:00:00:00:02 20.0.0.10"

ovn-nbctl lr-add lr1
# Connect sw1 to lr1
ovn-nbctl lrp-add lr1 lr1-sw1 00:00:00:00:ff:02 20.0.0.254/24
ovn-nbctl lsp-add sw1 sw1-lr1
ovn-nbctl lsp-set-type sw1-lr1 router
ovn-nbctl lsp-set-addresses sw1-lr1 router
ovn-nbctl lsp-set-options sw1-lr1 router-port=lr1-sw1


# Change the MAC address of the LRP
ovn-nbctl lrp-add lr1  lr1-public 00:00:20:20:12:95 172.24.4.221/24

ovn-nbctl lr-nat-add lr1 snat 172.24.4.221  20.0.0.0/24
ovn-nbctl lr-nat-add lr1 dnat_and_snat 172.24.4.200 20.0.0.10

ovn-nbctl lrp-set-gateway-chassis lr1-public centosl-rdocloud 20

# Pinging from sw0 won't work now. For the outside it will.
ip net e ns1 ping -c 4 172.24.4.200
On Wed, Nov 21, 2018 at 9:04 PM Han Zhou  wrote:
>
>
>
> On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson  wrote:
> >
> > Hi Daniel,
> >
> > I agree with Numan that this seems like a good approach to take.
> >
> > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> > >
> > > On Sat, Nov 10, 2018 at 12:21 AM

Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-11-19 Thread Daniel Alvarez Sanchez
Having thought this again, I'd rather merge the patch I proposed in my
previous email (I'd need tests and propose a formal patch after your
feedback) but in the long term I think it'd make sense to also implement
some sort of aging to the MAC_Binding entries so that they eventually
expire, especially for entries that come from external networks.

On Fri, Nov 16, 2018 at 6:41 PM Daniel Alvarez Sanchez 
wrote:

>
> On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff  wrote:
> >
> > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique wrote:
> > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez <
> dalva...@redhat.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > After digging further. The problem seems to be reduced to reusing an
> > > > old gateway IP address for a dnat_and_snat entry.
> > > > When a gateway port is bound to a chassis, its entry will show up in
> > > > the MAC_Binding table (at least when that Logical Switch is connected
> > > > to more than one Logical Router). After deleting the Logical Router
> > > > and all its ports, this entry will remain there. If a new Logical
> > > > Router is created and a Floating IP (dnat_and_snat) is assigned to a
> > > > VM with the old gw IP address, it will become unreachable.
> > > >
> > > > A workaround now from networking-ovn (OpenStack integration) is to
> > > > delete MAC_Binding entries for that IP address upon a FIP creation. I
> > > > think that this however should be done from OVN, what do you folks
> > > > think?
> > > >
> > > >
> > > Agree. Since the MAC_Binding table row is created by ovn-controller, it
> > > should
> > > be handled properly within OVN.
> >
> > I see that this has been sitting here for a while.  The solution seems
> > reasonable to me.  Are either of you working on it?
>
> I started working on it. I came up with a solution (see patch below) which
> works but I wanted to give you a bit more of context and get your feedback:
>
>
>^ localnet
>|
>+---+---+
>|   |
> +--+  pub  +--+
> |  |   |  |
> |  +---+  |
> |172.24.4.0/24|
> | |
>172.24.4.220 | | 172.24.4.221
> +---+---+ +---+---+
> |   | |   |
> |  LR0  | |  LR1  |
> |   | |   |
> +---+---+ +---+---+
>  10.0.0.254 | | 20.0.0.254
> | |
> +---+---+ +---+---+
> |   | |   |
> 10.0.0.0/24 |  SW0  | |  SW1  | 20.0.0.0/24
> |   | |   |
> +---+---+ +---+---+
> | |
> | |
> +---+---+ +---+---+
> |   | |   |
> |  VM0  | |  VM1  |
> |   | |   |
> +---+ +---+
> 10.0.0.10 20.0.0.10
>   172.24.4.100   172.24.4.200
>
>
> When I ping VM1 floating IP from the external network, a new entry for
> 172.24.4.221 in the LR0 datapath appears in the MAC_Binding table:
>
> _uuid   : 85e30e87-3c59-423e-8681-ec4cfd9205f9
> datapath: ac5984b9-0fea-485f-84d4-031bdeced29b
> ip  : "172.24.4.221"
> logical_port: "lrp02"
> mac : "00:00:02:01:02:04"
>
>
> Now, if LR1 gets removed and the old gateway IP (172.24.4.221) is reused
> for VM2 FIP with different MAC and new gateway IP is created (for example
> 172.24.4.222 00:00:02:01:02:99),  VM2 FIP becomes unreachable from VM1
> until the old MAC_Binding entry gets deleted as pinging 172.24.4.221 will
> use the wrong address ("00:00:02:01:02:04").
>
> With the patch below, removing LR1 results in deleting all MAC_Binding
> entries for every datapath where '172.24.4.221' appears in the 'ip' column
> so the problem goes away.
>
> Another solution would be implementing some kind of 'aging' for
> MAC_Binding entries but perhaps it's more complex.
> Looking forward for your comments :)
>
>
> diff --git a/ov

Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-11-16 Thread Daniel Alvarez Sanchez
On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff  wrote:
>
> On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique wrote:
> > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez <
dalva...@redhat.com>
> > wrote:
> >
> > > Hi,
> > >
> > > After digging further. The problem seems to be reduced to reusing an
> > > old gateway IP address for a dnat_and_snat entry.
> > > When a gateway port is bound to a chassis, its entry will show up in
> > > the MAC_Binding table (at least when that Logical Switch is connected
> > > to more than one Logical Router). After deleting the Logical Router
> > > and all its ports, this entry will remain there. If a new Logical
> > > Router is created and a Floating IP (dnat_and_snat) is assigned to a
> > > VM with the old gw IP address, it will become unreachable.
> > >
> > > A workaround now from networking-ovn (OpenStack integration) is to
> > > delete MAC_Binding entries for that IP address upon a FIP creation. I
> > > think that this however should be done from OVN, what do you folks
> > > think?
> > >
> > >
> > Agree. Since the MAC_Binding table row is created by ovn-controller, it
> > should
> > be handled properly within OVN.
>
> I see that this has been sitting here for a while.  The solution seems
> reasonable to me.  Are either of you working on it?

I started working on it. I came up with a solution (see patch below) which
works but I wanted to give you a bit more of context and get your feedback:


   ^ localnet
   |
   +---+---+
   |   |
+--+  pub  +--+
|  |   |  |
|  +---+  |
|172.24.4.0/24|
| |
   172.24.4.220 | | 172.24.4.221
+---+---+ +---+---+
|   | |   |
|  LR0  | |  LR1  |
|   | |   |
+---+---+ +---+---+
 10.0.0.254 | | 20.0.0.254
| |
+---+---+ +---+---+
|   | |   |
10.0.0.0/24 |  SW0  | |  SW1  | 20.0.0.0/24
|   | |   |
+---+---+ +---+---+
| |
| |
+---+---+ +---+---+
|   | |   |
|  VM0  | |  VM1  |
|   | |   |
+---+ +---+
10.0.0.10 20.0.0.10
  172.24.4.100   172.24.4.200


When I ping VM1 floating IP from the external network, a new entry for
172.24.4.221 in the LR0 datapath appears in the MAC_Binding table:

_uuid   : 85e30e87-3c59-423e-8681-ec4cfd9205f9
datapath: ac5984b9-0fea-485f-84d4-031bdeced29b
ip  : "172.24.4.221"
logical_port: "lrp02"
mac : "00:00:02:01:02:04"


Now, if LR1 gets removed and the old gateway IP (172.24.4.221) is reused
for VM2 FIP with different MAC and new gateway IP is created (for example
172.24.4.222 00:00:02:01:02:99),  VM2 FIP becomes unreachable from VM1
until the old MAC_Binding entry gets deleted as pinging 172.24.4.221 will
use the wrong address ("00:00:02:01:02:04").

With the patch below, removing LR1 results in deleting all MAC_Binding
entries for every datapath where '172.24.4.221' appears in the 'ip' column
so the problem goes away.

Another solution would be implementing some kind of 'aging' for MAC_Binding
entries but perhaps it's more complex.
Looking forward for your comments :)


diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 58bef7d..a86733e 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -2324,6 +2324,18 @@ cleanup_mac_bindings(struct northd_context *ctx,
struct hmap *ports)
 }
 }

+static void
+delete_mac_binding_by_ip(struct northd_context *ctx, const char *ip)
+{
+const struct sbrec_mac_binding *b, *n;
+SBREC_MAC_BINDING_FOR_EACH_SAFE (b, n, ctx->ovnsb_idl) {
+if (strstr(ip, b->ip)) {
+sbrec_mac_binding_delete(b);
+}
+}
+}
+
+
 /* Updates the southbound Port_Binding table so that it contains the
logical
  * switch ports specified by the northbound database.
  *
@@ -2383,6 +2395,15 @@ build_ports(struct northd_context *ctx,
 /* Delete southbound records without northbound matches. */
 LIST_FOR_EACH_SAFE(op, next, list, _only) {
 ovs_list_remove(>list);
+
+/* Delet

Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-10-29 Thread Daniel Alvarez Sanchez
Hi,

After digging further. The problem seems to be reduced to reusing an
old gateway IP address for a dnat_and_snat entry.
When a gateway port is bound to a chassis, its entry will show up in
the MAC_Binding table (at least when that Logical Switch is connected
to more than one Logical Router). After deleting the Logical Router
and all its ports, this entry will remain there. If a new Logical
Router is created and a Floating IP (dnat_and_snat) is assigned to a
VM with the old gw IP address, it will become unreachable.

A workaround now from networking-ovn (OpenStack integration) is to
delete MAC_Binding entries for that IP address upon a FIP creation. I
think that this however should be done from OVN, what do you folks
think?

Thanks,
Daniel
On Fri, Oct 26, 2018 at 11:39 AM Daniel Alvarez Sanchez
 wrote:
>
> Hi all,
>
> While analyzing a problem in OpenStack I think I have found out a
> severe bug in OVN when it comes to reuse floating IPs (which is a very
> common use case in OpenStack and Kubernetes). Let me explain the
> scenario, issue and possible solutions:
>
> * Three logical switches  (Neutron networks) LS1, LS2, LS3
> * LS3 has external connectivity (localnet port to a provider bridge).
> * Two logical routers LR1 and LR2.
> * LS1 and LS3 connected to LR1
> * LS2 and LS3 connected to LR2.
> * VM1 in LS1 with a FIP (dnat_and_snat NAT entry) in LS3 CIDR
> * VM2 in LS2 with a FIP (dnat_and_snat NAT entry) in LS3 CIDR
> * Ping from VM1 to VM2 FIP and viceversa works.
>
> Echo requests from VM1 reach to VM2 and VM2 responds to the FIP of VM1.
> First time, ovn-controller will insert the ARP responder and add a new
> entry to MAC_Binding table like:
>
> _uuid   : 447eaf43-119a-43b2-a821-0c79d8885d68
> datapath: 07a76c72-6896-464a-8683-3df145d02434
> ip  : "172.24.5.13"
> logical_port: "lrp-82af833f-f78b-4f45-9fc8-719db0f9e619"
> mac : "fa:16:3e:22:6c:0a"
>
> |binding|INFO|cr-lrp-198e5576-b654-4605-80c0-b9cf6d21ea2b: Claiming
> fa:16:3e:22:6c:0a 172.24.5.4/24
>
> The problem happens when VM1, LS1, LR1 entry are deleted and recreated
> again. If the FIP (172.24.5.13) is reused, the MAC_Binding entry won't
> get updated and VM2 will be now unable to respond to pings coming from
> VM1 as it'll attempt to do it to fa:16:3e:22:6c:0a.
>
> If I manually delete the MAC_Binding entry, a new one will then
> correctly be recreated by ovn-controller with the right MAC address
> (the one of the new cr-lrp).
>
> |00126|binding|INFO|cr-lrp-f09b2186-1cb2-4e50-99a5-587f680db8ad:
> Claiming fa:16:3e:14:48:20 172.24.5.6/24
>
> _uuid   : dae11bdb-47d3-471e-8826-9aefb8572700
> datapath: 07a76c72-6896-464a-8683-3df145d02434
> ip  : "172.24.5.13"
> logical_port: "lrp-82af833f-f78b-4f45-9fc8-719db0f9e619"
> mac : "fa:16:3e:14:48:20"
>
>
> Possible solutions:
>
> 1) Make ovn-controller (or ovn.-northd?) to update the MAC_Binding
> entries whenever a new NAT row is created.
>
> 2) Send GARPs (I guess we're not doing this yet) whenever a LRP gets
> bound to a chassis for all the nat_addresses that it has configured.
>
> For 2), I guess that it would make MAC_Binding entries getting updated
> automatically?
>
> How does this sound?
>
> Thanks a lot,
> Daniel Alvarez
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-10-26 Thread Daniel Alvarez Sanchez
Hi all,

While analyzing a problem in OpenStack I think I have found out a
severe bug in OVN when it comes to reuse floating IPs (which is a very
common use case in OpenStack and Kubernetes). Let me explain the
scenario, issue and possible solutions:

* Three logical switches  (Neutron networks) LS1, LS2, LS3
* LS3 has external connectivity (localnet port to a provider bridge).
* Two logical routers LR1 and LR2.
* LS1 and LS3 connected to LR1
* LS2 and LS3 connected to LR2.
* VM1 in LS1 with a FIP (dnat_and_snat NAT entry) in LS3 CIDR
* VM2 in LS2 with a FIP (dnat_and_snat NAT entry) in LS3 CIDR
* Ping from VM1 to VM2 FIP and viceversa works.

Echo requests from VM1 reach to VM2 and VM2 responds to the FIP of VM1.
First time, ovn-controller will insert the ARP responder and add a new
entry to MAC_Binding table like:

_uuid   : 447eaf43-119a-43b2-a821-0c79d8885d68
datapath: 07a76c72-6896-464a-8683-3df145d02434
ip  : "172.24.5.13"
logical_port: "lrp-82af833f-f78b-4f45-9fc8-719db0f9e619"
mac : "fa:16:3e:22:6c:0a"

|binding|INFO|cr-lrp-198e5576-b654-4605-80c0-b9cf6d21ea2b: Claiming
fa:16:3e:22:6c:0a 172.24.5.4/24

The problem happens when VM1, LS1, LR1 entry are deleted and recreated
again. If the FIP (172.24.5.13) is reused, the MAC_Binding entry won't
get updated and VM2 will be now unable to respond to pings coming from
VM1 as it'll attempt to do it to fa:16:3e:22:6c:0a.

If I manually delete the MAC_Binding entry, a new one will then
correctly be recreated by ovn-controller with the right MAC address
(the one of the new cr-lrp).

|00126|binding|INFO|cr-lrp-f09b2186-1cb2-4e50-99a5-587f680db8ad:
Claiming fa:16:3e:14:48:20 172.24.5.6/24

_uuid   : dae11bdb-47d3-471e-8826-9aefb8572700
datapath: 07a76c72-6896-464a-8683-3df145d02434
ip  : "172.24.5.13"
logical_port: "lrp-82af833f-f78b-4f45-9fc8-719db0f9e619"
mac : "fa:16:3e:14:48:20"


Possible solutions:

1) Make ovn-controller (or ovn.-northd?) to update the MAC_Binding
entries whenever a new NAT row is created.

2) Send GARPs (I guess we're not doing this yet) whenever a LRP gets
bound to a chassis for all the nat_addresses that it has configured.

For 2), I guess that it would make MAC_Binding entries getting updated
automatically?

How does this sound?

Thanks a lot,
Daniel Alvarez
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN - MTU path discovery

2018-09-24 Thread Daniel Alvarez Sanchez
Resending this email as I can't see it in [0] for some reason.
[0] https://mail.openvswitch.org/pipermail/ovs-dev/2018-September/




On Fri, Sep 21, 2018 at 2:36 PM Daniel Alvarez Sanchez 
wrote:

> Hi folks,
>
> After talking to Numan and reading log from IRC meeting yesterday,
> looks like there's some confusion around the issue.
>
> jpettit | I should look at the initial bug report again, but is it not
> sufficient to configure a smaller MTU within the VM?
>
> Imagine the case where some host from the external network (MTU 1500)
> sends 1000B UDP packets to the VM (MTU 200). When OVN attempts to deliver
> the packet to the VM it won't fit and the application running there will
> never
> get the packet.
>
> With reference implementation (or if namespaces were used as Han suggests
> that this is what NSX does), the packet would be handled by the IP stack on
> the gateway node. An ICMP need-to-frag would be sent back to the sender
> and - if they're not blocked by some firewall - the IP stack on the sender
> node
> will fragment this and subsequent packets to fit the MTU on the receiver.
>
> Also, generally we don't want to configure small MTUs on the VMs for
> performance as it would also impact on east/west traffic where
> Jumbo frames appear to work.
>
> Thanks a lot for bringing this up on the meeting!
> Daniel
>
> On Mon, Aug 13, 2018 at 5:23 PM Miguel Angel Ajo Pelayo <
> majop...@redhat.com> wrote:
> >
> > Yeah, later on we have found that it was, again, more important that we
> think.
> >
> > For example, there are still cases not covered by TCP MSS negotiation (or
> > for UDP/other protocols):
> >
> > Imagine you have two clouds, both with an internal MTU (let’s imagine
> > MTUb on cloud B, and MTUa on cloud A), and an external transit
> > network with a 1500 MTU (MTUc).
> >
> > MTUb > MTUc. And MTUb > MTUc
> >
> > Also, imagine that VMa in cloud A, has a floating IP (DNAT_SNAT NAT),
> > and VMb in cloud B has also a floating IP.
> >
> > VMa tries to establish  connection to VMb FIP, and announces
> > MSSa = MTUa - (IP + TCP overhead), VMb ACKs the TCP SYN request
> > with  MSSb = MTUb - (IP - TCP overhead).
> >
> > So the agreement will be min(MSSa,MSSb) , but… the transit network MSSc
> > will always be smaller , min(MSSa, MSSb) < MSSc.
> >
> > In ML2/OVS deployments, those big packets will get fragmented at the
> router
> > edge, and a notification ICMP will be sent to the sender of the packets
> to notify
> > fragmenting in source is necessary.
> >
> >
> > I guess we can also replicate this with 2 VMs on the same cloud with
> MSSa > MSSb
> > where they try to talk via floating IP to each other.
> >
> >
> > So going back to the thing, I guess we need to implement some OpenFlow
> extension
> > to match packets per size, redirecting those to an slow path
> (ovn-controller) so we can
> > Fragment/and icmp back the source for source fragmentation?
> >
> > Any advise on what’s the procedure here (OpenFlow land, kernel wise,
> even in terms
> > of our source code and design so we could implement this) ?
> >
> >
> > Best regards,
> > Miguel Ángel.
> >
> >
> > On 3 August 2018 at 17:41:05, Daniel Alvarez Sanchez (
> dalva...@redhat.com) wrote:
> >
> > Maybe ICMP is not that critical but seems like not having the ICMP 'need
> to frag' on UDP communications could break some applications that are aware
> of this to reduce the size of the packets? I wonder...
> >
> > Thanks!
> > Daniel
> >
> > On Fri, Aug 3, 2018 at 5:20 PM Miguel Angel Ajo Pelayo <
> majop...@redhat.com> wrote:
> >>
> >>
> >> We didn’t understand why a MTU missmatch in one direction worked (N/S),
> >> but in other direction (S/N) didn’t work… and we found that that it’s
> actually
> >> working (at least for TCP, via MSS negotiation), we had a
> missconfiguration
> >> In one of the physical interfaces.
> >>
> >> So, in the case of TCP we are fine. TCP is smart enough to negotiate
> properly.
> >>
> >> Other protocols like ICMP with the DF flag, or UDP… would not get the
> ICMP
> >> that notifies the sender about the MTU miss-match.
> >>
> >> I suspect that the most common cases are covered, and that it’s not
> worth
> >> pursuing what I was asking for at least with a high priority, but I’d
> like to hear
> >> opinions.
> >>
> >>
> >> Best regards,
> >> Miguel Ángel.
> >>
> >> On 3 August 

[ovs-discuss] Unnecessary sorting in production JSON code

2018-09-11 Thread Daniel Alvarez Sanchez
Hi all,

I noticed that we're doing a lot of sorting in the JSON code which is not
needed except for testing. Problem is that if I remove the sorting, then
most of the tests break. I spent a fair amount of time trying to fix them
but it's getting harder and harder.

Possibly, the best way to fix it would be to rewrite the tests in some
other way but I'd like to get your feedback.

At the bottom of the e-mail is the patch that I wrote and even though I
fixed a few tests (basically applying egrep to the output of test-ovsdb.py
to filter out some stuff) but there's still others that fail. For example:

1335. ovsdb-types.at:18: testing integer enum - Python2
+++ /home/centos/ovs-perf/ovs/tests/testsuite.dir/at-groups/1335/stdout
2018-09-11 22:50:45.577339922 +
@@ -1,2 +1,2 @@
-{"enum":["set",[-1,4,5]],"type":"integer"}
+{"enum":["set",[4,5,-1]],"type":"integer"}

[centos@centos python]$ python test-ovsdb.py -t 1 parse-base-type
'{"type": "integer", "enum": ["set", [-1, 4, 5]]}'
{"enum":["set",[4,5,-1]],"type":"integer"}

The test would expect the set to be ordered ([-1, 4, 5]) so it fails as I
removed the sorting.

For now I'm only focusing on Python code but this happens also in C. I want
to pull some numbers to figure out the performance hit of this sorting but
anyways it doesn't make sense to do it just for the sake of testing.

What do you folks think?
Thanks!
Daniel

diff --git a/python/ovs/db/data.py b/python/ovs/db/data.py
index 9e57595..80665ba 100644
--- a/python/ovs/db/data.py
+++ b/python/ovs/db/data.py
@@ -379,12 +379,12 @@ class Datum(object):
 def to_json(self):
 if self.type.is_map():
 return ["map", [[k.to_json(), v.to_json()]
-for k, v in sorted(self.values.items())]]
+for k, v in self.values.items()]]
 elif len(self.values) == 1:
 key = next(six.iterkeys(self.values))
 return key.to_json()
 else:
-return ["set", [k.to_json() for k in
sorted(self.values.keys())]]
+return ["set", [k.to_json() for k in self.values.keys()]]

 def to_string(self):
 head = tail = None
@@ -400,7 +400,7 @@ class Datum(object):
 if head:
 s.append(head)

-for i, key in enumerate(sorted(self.values)):
+for i, key in enumerate(self.values):
 if i:
 s.append(", ")

@@ -499,7 +499,7 @@ class Datum(object):
 dk = uuid_to_row(k.value, self.type.key)
 if dk is not None:
 s.add(dk)
-return sorted(s)
+return list(s)

 @staticmethod
 def from_python(type_, value, row_to_uuid):
@@ -566,13 +566,13 @@ class Datum(object):
 return ["static struct ovsdb_datum %s = { .n = 0 };"]

 s = ["static union ovsdb_atom %s_keys[%d] = {" % (name, n)]
-for key in sorted(self.values):
+for key in self.values:
 s += ["{ %s }," % key.cInitAtom(key)]
 s += ["};"]

 if self.type.value:
 s = ["static union ovsdb_atom %s_values[%d] = {" % (name, n)]
-for k, v in sorted(self.values.items()):
+for k, v in self.values.items():
 s += ["{ %s }," % v.cInitAtom(v)]
 s += ["};"]
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN - MTU path discovery

2018-07-11 Thread Daniel Alvarez Sanchez
On Wed, Jul 11, 2018 at 12:55 PM Daniel Alvarez Sanchez 
wrote:

> Hi all,
>
> Miguel Angel Ajo and I have been trying to setup Jumbo frames in OpenStack
> using OVN as a backend.
>
> The external network has an MTU of 1900 while we have created two tenant
> networks (Logical Switches) with an MTU of 8942.
>

s/1900/1500

>
> When pinging from one instance in one of the networks to the other
> instance on the other network, the routing takes place locally and
> everything is fine. We can ping with -s 3000 and with tcpdump we verify
> that the packets are not fragmented at all.
>
> However, when trying to reach the external network, we see that the
> packets are not tried to be fragmented and the traffic doesn't go through.
>
> In the ML2/OVS case (reference implementation for OpenStack networking),
> this works as we're seeing the following when attempting to reach a network
> with a lower MTU:
>

Just to clarify, in the reference implementation (ML2/OVS) the routing
takes place with iptables rules so we assume that it's the kernel
processing those ICMP packets.

>
> 10:38:03.807695 IP 192.168.20.14 > dell-virt-lab-01.mgmt.com: ICMP echo
> request, id 30977, seq 0, length 3008
>
> 10:38:03.807723 IP overcloud-controller-0 > 192.168.20.14: ICMP
> dell-virt-lab-01.mgmt.com unreachable - need to frag (mtu 1500), length
> 556
>
> As you can see, the router (overcloud-controller-0) is responding to the
> instance with an ICMP need to frag and after this, subsequent packets are
> going fragmented (while replies are not):
>
> 0:38:34.630437 IP 192.168.20.14 > dell-virt-lab-01.mgmt.com: ICMP echo
> request, id 31233, seq 0, length 1480
>
> 10:38:34.630458 IP 192.168.20.14 > dell-virt-lab-01.mgmt.com: icmp
>
> 10:38:34.630462 IP 192.168.20.14 > dell-virt-lab-01.mgmt.com: icmp
>
> 10:38:34.631334 IP dell-virt-lab-01.mgmt.com > 192.168.20.14: ICMP echo
> reply, id 31233, seq 0, length 3008
>
>
>
> Are we missing some configuration or we lack support for this in OVN?
>
> Any pointers are highly appreciated :)
>
>
> Thanks a lot.
>
> Daniel Alvarez
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVN - MTU path discovery

2018-07-11 Thread Daniel Alvarez Sanchez
Hi all,

Miguel Angel Ajo and I have been trying to setup Jumbo frames in OpenStack
using OVN as a backend.

The external network has an MTU of 1900 while we have created two tenant
networks (Logical Switches) with an MTU of 8942.

When pinging from one instance in one of the networks to the other instance
on the other network, the routing takes place locally and everything is
fine. We can ping with -s 3000 and with tcpdump we verify that the packets
are not fragmented at all.

However, when trying to reach the external network, we see that the packets
are not tried to be fragmented and the traffic doesn't go through.

In the ML2/OVS case (reference implementation for OpenStack networking),
this works as we're seeing the following when attempting to reach a network
with a lower MTU:

10:38:03.807695 IP 192.168.20.14 > dell-virt-lab-01.mgmt.com: ICMP echo
request, id 30977, seq 0, length 3008

10:38:03.807723 IP overcloud-controller-0 > 192.168.20.14: ICMP
dell-virt-lab-01.mgmt.com unreachable - need to frag (mtu 1500), length 556

As you can see, the router (overcloud-controller-0) is responding to the
instance with an ICMP need to frag and after this, subsequent packets are
going fragmented (while replies are not):

0:38:34.630437 IP 192.168.20.14 > dell-virt-lab-01.mgmt.com: ICMP echo
request, id 31233, seq 0, length 1480

10:38:34.630458 IP 192.168.20.14 > dell-virt-lab-01.mgmt.com: icmp

10:38:34.630462 IP 192.168.20.14 > dell-virt-lab-01.mgmt.com: icmp

10:38:34.631334 IP dell-virt-lab-01.mgmt.com > 192.168.20.14: ICMP echo
reply, id 31233, seq 0, length 3008



Are we missing some configuration or we lack support for this in OVN?

Any pointers are highly appreciated :)


Thanks a lot.

Daniel Alvarez
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Port Groups and DHCP lflows

2018-07-05 Thread Daniel Alvarez Sanchez
Hi Han, all

While implementing Port Groups in OpenStack I have noticed that we are
duplicating the lflows for the DHCP now with the current code. Seeking for
advice here:

When we create a Neutron subnet, I'm creating a Port Group with the ACL for
the DHCP:

_uuid   : 7f2b64eb-090b-4bb4-85fd-09576329c21b
action  : allow
direction   : from-lport
external_ids: {}
log : false
match   : "inport == @pg_12070130_e7f0_47a7_aee2_cde2064e7a28
&& ip4 && ip4.dst == {255.255.255.255, 192.168.1.0/24} && udp && udp.src ==
68 && udp.dst == 67"
name: []
priority: 1002
severity: []


This generates the proper lflow in the Logical_Flow table:

_uuid   : a2a970ec-82ee-4474-bf0e-43f1cdedd7ed
actions : "next;"
external_ids: {source="ovn-northd.c:3192", stage-hint="7f2b64eb",
stage-name=ls_in_acl}
logical_datapath: e1bdb553-5bbf-4b76-a19d-cf385612a3ff
match   : "inport == @pg_12070130_e7f0_47a7_aee2_cde2064e7a28
&& ip4 && ip4.dst == {255.255.255.255, 192.168.1.0/24} && udp && udp.src ==
68 && udp.dst == 67"
pipeline: ingress
priority: 2002
table_id: 6
hash: 0


However, all the ports belonging in that subnet also have a lflow for DHCP
(different stages though)

_uuid   : f159803f-6b8d-4c8a-9339-b89ee267c2eb
actions : "next;"
external_ids: {source="ovn-northd.c:2579",
stage-name=ls_in_port_sec_ip}
logical_datapath: 2b3126db-74d4-48a1-9e81-192066748de6
match   : "inport == \"240edf21-5a9c-4edd-98b5-8dadc343b9de\"
&& eth.src == fa:16:3e:07:85:91 && ip4.src == 0.0.0.0 && ip4.dst ==
255.255.255.255 && udp.src == 68 && udp.dst == 67"
pipeline: ingress
priority: 90
table_id: 1
hash: 0


My questions are:

1) Do I really need to create the Port Group for every subnet just to take
care of the DHCP?
2) We have per-port DHCP lflows, is it worth to implement port groups
around them too?

Thanks!
Daniel
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] egress ACLs on Port Groups seem broken

2018-06-19 Thread Daniel Alvarez Sanchez
On Tue, Jun 19, 2018 at 10:37 PM, Daniel Alvarez Sanchez <
dalva...@redhat.com> wrote:

> Sorry, the problem seems to be that this ACL is not added in the Port
> Groups case for some reason (I checked wrong lflows log I had):
>
s/ACL/Logical Flow

>
> _uuid   : 5a1bce6c-e4ed-4a1f-8150-cb855bbac037
> actions : "reg0[0] = 1; next;"
> external_ids: {source="ovn-northd.c:2931",
> stage-name=ls_in_pre_acl}
> logical_datapath: 0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf
> match   : ip
> pipeline: ingress
> priority: 100
>
>
> Apparently, this code is not getting triggered for the Port Group case:
> https://github.com/openvswitch/ovs/blob/master/ovn/northd/
> ovn-northd.c#L2930
>
>
>
> The problem is that build_pre_acls() [0] function checks if the Logical
Switch has stateful
ACLs but since we're now applying ACLs on Port Groups, it'll always return
false
and it won't apply the pre ACLs for conntrack.

[0]
https://github.com/openvswitch/ovs/blob/master/ovn/northd/ovn-northd.c#L2852


>
> On Tue, Jun 19, 2018 at 10:09 PM, Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
>
>> Hi folks,
>>
>> Sorry for not being clear enough. In the tcpdump we can see the SYN
>> packets being sent by port1 but retransmitted as it looks like the response
>> to that SYN never reaches its destination. This is confirmed through the DP
>> flows:
>>
>> $ sudo ovs-dpctl dump-flows
>>
>> recirc_id(0),in_port(3),eth(src=fa:16:3e:78:a2:cf,dst=fa:16:
>> 3e:bf:6f:51),eth_type(0x0800),ipv4(src=10.0.0.6,dst=168.0.0.
>> 0/252.0.0.0,proto=6,frag=no), packets:4, bytes:296, used:0.514s,
>> flags:S, actions:4
>>
>> recirc_id(0),in_port(4),eth(src=fa:16:3e:bf:6f:51,dst=fa:16:
>> 3e:78:a2:cf),eth_type(0x0800),ipv4(src=128.0.0.0/128.0.0.0,d
>> st=10.0.0.0/255.255.255.192,proto=6,frag=no),tcp(dst=32768/0x8000),
>> packets:7, bytes:518, used:0.514s, flags:S., actions:drop
>>
>>
>> $ sudo ovs-appctl ofproto/trace br-int in_port=20,tcp,dl_src=fa:16:3e
>> :78:a2:cf,dl_dst=fa:16:3e:bf:6f:51,nw_src=10.0.0.6,nw_dst=169.254.169.254,tcp_dst=80
>> | ovn-detrace
>>
>> Flow: tcp,in_port=20,vlan_tci=0x,dl_src=fa:16:3e:78:a2:cf,dl_d
>> st=fa:16:3e:bf:6f:51,nw_src=10.0.0.6,nw_dst=169.254.169.254,
>> nw_tos=0,nw_ecn=0,nw_ttl=0,tp_sr
>> c=0,tp_dst=80,tcp_flags=0
>>
>> bridge("br-int")
>> 
>> 0. in_port=20, priority 100
>> set_field:0x8->reg13
>> set_field:0x5->reg11
>> set_field:0x1->reg12
>> set_field:0x1->metadata
>> set_field:0x4->reg14
>> resubmit(,8)
>> 8. reg14=0x4,metadata=0x1,dl_src=fa:16:3e:78:a2:cf, priority 50, cookie
>> 0xe299b701
>> resubmit(,9)
>> 9. ip,reg14=0x4,metadata=0x1,dl_src=fa:16:3e:78:a2:cf,nw_src=10.0.0.6,
>> priority 90, cookie 0x6581e351
>> resubmit(,10)
>> * Logical datapath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
>> (0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [ingress]
>> * Logical flow: table=1 (ls_in_port_sec_ip), priority=90,
>> match=(inport == "8ea9d963-7e55-49a6-8be7-cc294278180a" && eth.src ==
>> fa:16:3e:78:a2:cf && i
>> p4.src == {10.0.0.6}), actions=(next;)
>> 10. metadata=0x1, priority 0, cookie 0x1c3ddeef
>> resubmit(,11)
>> * Logical datapath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
>> (0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [ingress]
>> * Logical flow: table=2 (ls_in_port_sec_nd), priority=0,
>> match=(1), actions=(next;)
>>
>> ...
>>
>> 47. metadata=0x1, priority 0, cookie 0xf35c5784
>> resubmit(,48)
>> * Logical datapath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
>> (0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [egress]
>> * Logical flow: table=7 (ls_out_stateful), priority=0, match=(1),
>> actions=(next;)
>> 48. metadata=0x1, priority 0, cookie 0x9546c56e
>> resubmit(,49)
>> * Logical datapath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
>> (0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [egress]
>> * Logical flow: table=8 (ls_out_port_sec_ip), priority=0,
>> match=(1), actions=(next;)
>> 49. reg15=0x1,metadata=0x1, priority 50, cookie 0x58af7841
>> resubmit(,64)
>> * Logical datapath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
>> (0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [egress]
>> * Logical flow: table=9 (ls_out_port_sec_l2), priority=50,
>> match=(outport == "74db766c-2600-40f1-9f

Re: [ovs-discuss] [OVN] egress ACLs on Port Groups seem broken

2018-06-19 Thread Daniel Alvarez Sanchez
Sorry, the problem seems to be that this ACL is not added in the Port
Groups case for some reason (I checked wrong lflows log I had):

_uuid   : 5a1bce6c-e4ed-4a1f-8150-cb855bbac037
actions : "reg0[0] = 1; next;"
external_ids: {source="ovn-northd.c:2931", stage-name=ls_in_pre_acl}
logical_datapath: 0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf
match   : ip
pipeline: ingress
priority: 100


Apparently, this code is not getting triggered for the Port Group case:
https://github.com/openvswitch/ovs/blob/master/ovn/northd/ovn-northd.c#L2930




On Tue, Jun 19, 2018 at 10:09 PM, Daniel Alvarez Sanchez <
dalva...@redhat.com> wrote:

> Hi folks,
>
> Sorry for not being clear enough. In the tcpdump we can see the SYN
> packets being sent by port1 but retransmitted as it looks like the response
> to that SYN never reaches its destination. This is confirmed through the DP
> flows:
>
> $ sudo ovs-dpctl dump-flows
>
> recirc_id(0),in_port(3),eth(src=fa:16:3e:78:a2:cf,dst=fa:
> 16:3e:bf:6f:51),eth_type(0x0800),ipv4(src=10.0.0.6,dst=
> 168.0.0.0/252.0.0.0,proto=6,frag=no), packets:4, bytes:296, used:0.514s,
> flags:S, actions:4
>
> recirc_id(0),in_port(4),eth(src=fa:16:3e:bf:6f:51,dst=fa:
> 16:3e:78:a2:cf),eth_type(0x0800),ipv4(src=128.0.0.0/
> 128.0.0.0,dst=10.0.0.0/255.255.255.192,proto=6,frag=no),tcp(dst=32768/0x8000),
> packets:7, bytes:518, used:0.514s, flags:S., actions:drop
>
>
> $ sudo ovs-appctl ofproto/trace br-int in_port=20,tcp,dl_src=fa:16:
> 3e:78:a2:cf,dl_dst=fa:16:3e:bf:6f:51,nw_src=10.0.0.6,nw_dst=169.254.169.254,tcp_dst=80
> | ovn-detrace
>
> Flow: tcp,in_port=20,vlan_tci=0x,dl_src=fa:16:3e:78:a2:
> cf,dl_dst=fa:16:3e:bf:6f:51,nw_src=10.0.0.6,nw_dst=169.
> 254.169.254,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_sr
> c=0,tp_dst=80,tcp_flags=0
>
> bridge("br-int")
> 
> 0. in_port=20, priority 100
> set_field:0x8->reg13
> set_field:0x5->reg11
> set_field:0x1->reg12
> set_field:0x1->metadata
> set_field:0x4->reg14
> resubmit(,8)
> 8. reg14=0x4,metadata=0x1,dl_src=fa:16:3e:78:a2:cf, priority 50, cookie
> 0xe299b701
> resubmit(,9)
> 9. ip,reg14=0x4,metadata=0x1,dl_src=fa:16:3e:78:a2:cf,nw_src=10.0.0.6,
> priority 90, cookie 0x6581e351
> resubmit(,10)
> * Logical datapath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
> (0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [ingress]
> * Logical flow: table=1 (ls_in_port_sec_ip), priority=90,
> match=(inport == "8ea9d963-7e55-49a6-8be7-cc294278180a" && eth.src ==
> fa:16:3e:78:a2:cf && i
> p4.src == {10.0.0.6}), actions=(next;)
> 10. metadata=0x1, priority 0, cookie 0x1c3ddeef
> resubmit(,11)
> * Logical datapath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
> (0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [ingress]
> * Logical flow: table=2 (ls_in_port_sec_nd), priority=0,
> match=(1), actions=(next;)
>
> ...
>
> 47. metadata=0x1, priority 0, cookie 0xf35c5784
> resubmit(,48)
> * Logical datapath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
> (0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [egress]
> * Logical flow: table=7 (ls_out_stateful), priority=0, match=(1),
> actions=(next;)
> 48. metadata=0x1, priority 0, cookie 0x9546c56e
> resubmit(,49)
> * Logical datapath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
> (0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [egress]
> * Logical flow: table=8 (ls_out_port_sec_ip), priority=0,
> match=(1), actions=(next;)
> 49. reg15=0x1,metadata=0x1, priority 50, cookie 0x58af7841
> resubmit(,64)
> * Logical datapath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
> (0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [egress]
> * Logical flow: table=9 (ls_out_port_sec_l2), priority=50,
> match=(outport == "74db766c-2600-40f1-9ffa-255dc147d8a5),
> actions=(output;)
> 64. priority 0
> resubmit(,65)
> 65. reg15=0x1,metadata=0x1, priority 100
> output:21
>
> Final flow: tcp,reg11=0x5,reg12=0x1,reg13=0x9,reg14=0x4,reg15=0x1,
> metadata=0x1,in_port=20,vlan_tci=0x,dl_src=fa:16:3e:78:
> a2:cf,dl_dst=fa:16:3e:bf:6f:51,nw_src=10.0.0.6,nw_dst=169.
> 254.169.254,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=80,tcp_flags=0
> Megaflow: recirc_id=0,eth,tcp,in_port=20,vlan_tci=0x/0x1000,dl_
> src=fa:16:3e:78:a2:cf,dl_dst=fa:16:3e:bf:6f:51,nw_src=10.0.0.6,nw_dst=
> 168.0.0.0/6,nw_frag=no
> Datapath actions: 4
>
>
>
> At this point I would've expected the connection to be in conntrack (but
> if i'm not mistaken this is not supported in ovn-trace :?) so the return
> packet would be dropped:
>
> $ sudo ovs-appctl

Re: [ovs-discuss] [OVN] egress ACLs on Port Groups seem broken

2018-06-19 Thread Daniel Alvarez Sanchez
apath: "neutron-9d5615df-a7ba-4649-82f9-961a76fe6f64"
(0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf) [ingress]
* Logical flow: table=5 (ls_in_pre_stateful), priority=0,
match=(1), actions=(next;)


OpenFlow:

Non port groups:

 cookie=0x5da9a3af, duration=798.461s, table=11, n_packets=59,
n_bytes=9014, idle_age=132, priority=100,ip,metadata=0x1
actions=load:0x1->NXM_NX_XXREG0[96],resubmit(,12)
 cookie=0x5da9a3af, duration=798.461s, table=11, n_packets=0, n_bytes=0,
idle_age=798, priority=100,ipv6,metadata=0x1
actions=load:0x1->NXM_NX_XXREG0[96],resubmit(,12)
 cookie=0x145522b1, duration=234138.077s, table=11, n_packets=4687,
n_bytes=455491, idle_age=135, hard_age=65534, priority=0,metadata=0x1
actions=resubmit(,12)


Port groups:
cookie=0x145522b1, duration=234247.781s, table=11, n_packets=4746,
n_bytes=461470, idle_age=0, hard_age=65534, priority=0,metadata=0x1
actions=resubmit(,12)


>From ovn-northd man page:

Ingress Table 3: from-lport Pre-ACLs

This  table  prepares  flows for possible stateful ACL processing in
ingress table ACLs. It contains a priority-0 flow that simply moves traffic
to the next table. If stateful ACLs are used in the logical datapath, a
priority 100 flow is added that sets a hint (with reg0[0] = 1; next;) for
table  Pre-stateful  to  send  IP packets to the connection tracker before
eventually advancing to ingress table ACLs. If special ports such as route
ports or localnet ports can’t use ct(), a priority 110 flow is added to
skip over stateful ACLs.


So, for some reason, in both cases I see this Logical_Flow:

_uuid   : 5a1bce6c-e4ed-4a1f-8150-cb855bbac037
actions : "reg0[0] = 1; next;"
external_ids: {source="ovn-northd.c:2931", stage-name=ls_in_pre_acl}
logical_datapath: 0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf
match   : ip
pipeline: ingress
priority: 100


Which apparently is responsible for adding the hint and putting the packet
into conntrack but I can't see the physical flow in the Port Groups case.

I'm still investigating but if the lflow is there it must be something with
ovn-controller.
Thanks,

Daniel



On Tue, Jun 19, 2018 at 1:07 AM, Han Zhou  wrote:

> On Mon, Jun 18, 2018 at 1:43 PM, Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >
> > Hi all,
> >
> > I'm writing the code to implement the port groups in networking-ovn (the
> OpenStack integration project with OVN). I found out that when a boot a VM,
> looks like the egress traffic (from VM) is not working properly. The VM
> port belongs to 3 Port Groups:
> >
> > 1. Default drop port group with the following ACLs:
> >
> > _uuid   : 0b092bb2-e97b-463b-a678-8a28085e3d68
> > action  : drop
> > direction   : from-lport
> > external_ids: {}
> > log : false
> > match   : "inport == @neutron_pg_drop && ip"
> > name: []
> > priority: 1001
> > severity: []
> >
> > _uuid   : 849ee2e0-f86e-4715-a949-cb5d93437847
> > action  : drop
> > direction   : to-lport
> > external_ids: {}
> > log : false
> > match   : "outport == @neutron_pg_drop && ip"
> > name: []
> > priority: 1001
> > severity: []
> >
> >
> > 2. Subnet port group to allow DHCP traffic on that subnet:
> >
> > _uuid   : 8360a415-b7e1-412b-95ff-15cc95059ef0
> > action  : allow
> > direction   : from-lport
> > external_ids: {}
> > log : false
> > match   : "inport == @pg_b1a572c6_2331_4cfb_a892_3d9d7b0af70c
> && ip4 && ip4.dst == {255.255.255.255, 10.0.0.0/26} && udp && udp.src ==
> 68 && udp.dst == 67"
> > name: []
> > priority: 1002
> > severity: []
> >
> >
> > 3. Security group port group which the following rules:
> >
> > 3.1 Allow ICMP traffic:
> >
> > _uuid   : d12a749f-0f75-4634-aa20-6116e1d5d26d
> > action  : allow-related
> > direction   : to-lport
> > external_ids: {"neutron:security_group_rule_
> id"="9675d6df-56a1-4640-9a0f-1f88e49ed2b5"}
> > log : false
> > match   : "outport == @pg_d237185f_733f_4a09_8832_bcee773722ef
> && ip4 && ip4.src == 0.0.0.0/0 && icmp4"
> > name: []
> > priority: 1002
> > severity: []
> >
> > 3.2

[ovs-discuss] [OVN] egress ACLs on Port Groups seem broken

2018-06-18 Thread Daniel Alvarez Sanchez
Hi all,

I'm writing the code to implement the port groups in networking-ovn (the
OpenStack integration project with OVN). I found out that when a boot a VM,
looks like the egress traffic (from VM) is not working properly. The VM
port belongs to 3 Port Groups:

1. Default drop port group with the following ACLs:

_uuid   : 0b092bb2-e97b-463b-a678-8a28085e3d68
action  : drop
direction   : from-lport
external_ids: {}
log : false
match   : "inport == @neutron_pg_drop && ip"
name: []
priority: 1001
severity: []

_uuid   : 849ee2e0-f86e-4715-a949-cb5d93437847
action  : drop
direction   : to-lport
external_ids: {}
log : false
match   : "outport == @neutron_pg_drop && ip"
name: []
priority: 1001
severity: []


2. Subnet port group to allow DHCP traffic on that subnet:

_uuid   : 8360a415-b7e1-412b-95ff-15cc95059ef0
action  : allow
direction   : from-lport
external_ids: {}
log : false
match   : "inport == @pg_b1a572c6_2331_4cfb_a892_3d9d7b0af70c
&& ip4 && ip4.dst == {255.255.255.255, 10.0.0.0/26} && udp && udp.src == 68
&& udp.dst == 67"
name: []
priority: 1002
severity: []


3. Security group port group which the following rules:

3.1 Allow ICMP traffic:

_uuid   : d12a749f-0f75-4634-aa20-6116e1d5d26d
action  : allow-related
direction   : to-lport
external_ids:
{"neutron:security_group_rule_id"="9675d6df-56a1-4640-9a0f-1f88e49ed2b5"}
log : false
match   : "outport == @pg_d237185f_733f_4a09_8832_bcee773722ef
&& ip4 && ip4.src == 0.0.0.0/0 && icmp4"
name: []
priority: 1002
severity: []

3.2 Allow SSH traffic:

_uuid   : 05100729-816f-4a09-b15c-4759128019d4
action  : allow-related
direction   : to-lport
external_ids:
{"neutron:security_group_rule_id"="2a48979f-8209-4fb7-b24b-fff8d82a2ae9"}
log : false
match   : "outport == @pg_d237185f_733f_4a09_8832_bcee773722ef
&& ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 22"
name: []
priority: 1002
severity: []


3.3 Allow IPv4/IPv6 traffic from this same port group


_uuid   : b56ce66e-da6b-48be-a66e-77c8cfd6ab92
action  : allow-related
direction   : to-lport
external_ids:
{"neutron:security_group_rule_id"="5b0a47ee-8114-4b13-8d5b-b16d31586b3b"}
log : false
match   : "outport == @pg_d237185f_733f_4a09_8832_bcee773722ef
&& ip6 && ip6.src == $pg_d237185f_733f_4a09_8832_bcee773722ef_ip6"
name: []
priority: 1002
severity: []


_uuid   : 7b68f430-41b5-414d-a2ed-6c548be53dce
action  : allow-related
direction   : to-lport
external_ids:
{"neutron:security_group_rule_id"="299bd9ca-89fb-4767-8ae9-a738e98603fb"}
log : false
match   : "outport == @pg_d237185f_733f_4a09_8832_bcee773722ef
&& ip4 && ip4.src == $pg_d237185f_733f_4a09_8832_bcee773722ef_ip4"
name: []
priority: 1002
severity: []


3.4 Allow all egress (VM point of view) IPv4 traffic

_uuid   : c5fbf0b7-6461-4f27-802e-b0d743be59e5
action  : allow-related
direction   : from-lport
external_ids:
{"neutron:security_group_rule_id"="a4ffe40a-f773-41d6-bc04-40500d158f51"}
log : false
match   : "inport == @pg_d237185f_733f_4a09_8832_bcee773722ef
&& ip4"
name: []
priority: 1002
severity: []



So, I boot a VM using this port and I can verify that ICMP and SSH traffic
works good while the egress traffic doesn't work. From the VM I curl to an
IP living in a network namespace and this is what I see with tcpdump there:

On the VM:
$ ip r get 169.254.254.169
169.254.254.169 via 10.0.0.1 dev eth0  src 10.0.0.6
$ curl 169.254.169.254

On the hypervisor (haproxy listening on 169.254.169.254:80):

$ sudo ip net e ovnmeta-0cf12eb0-fdb3-4087-98b0-9c52cafd0bdf tcpdump -i any
po
rt 80 -vvn
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size
262144 bytes
21:59:47.106883 IP (tos 0x0, ttl 64, id 61543, offset 0, flags [DF], proto
TCP (6), length 60)
10.0.0.6.34553 > 169.254.169.254.http: Flags [S], cksum 0x851c
(correct), seq 2571046510, win 14020, options [mss 1402,sackOK,TS val
22740490 ecr 0,nop,wscale 2], length 0
21:59:47.106935 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP
(6), length 60)
169.254.169.254.http > 10.0.0.6.34553: Flags [S.], cksum 0x5e31
(incorrect -> 0x34c0), seq 3215869181, ack 2571046511, win 28960, options
[mss 1460,sackOK,TS val 

Re: [ovs-discuss] OVN Database sizes - Auto compact feature

2018-04-27 Thread Daniel Alvarez Sanchez
Hi Ben,
After [0] got merged the compaction code is not there anymore. How is this
being done
now? Also, can we get the backport of [1][2] to 2.9 branch?

[0]
https://github.com/openvswitch/ovs/commit/1b1d2e6daa563cc91f974ffdc082fb3a8b424801
[1]
https://github.com/openvswitch/ovs/commit/1cfdc175ab1ecbc8f5d22f78d8e5f4344d55c5dc#diff-62fba9ea73e44f70aa9f56228bd4658c
[2]
https://github.com/openvswitch/ovs/commit/69f453713459c60e5619174186f94a0975891580

On Thu, Mar 8, 2018 at 11:21 PM, Daniel Alvarez Sanchez <dalva...@redhat.com
> wrote:

> Ok, I've just sent a patch and if you're not convinced we can
> just do the 2x change. Thanks a lot!
> Daniel
>
> On Thu, Mar 8, 2018 at 10:19 PM, Ben Pfaff <b...@ovn.org> wrote:
>
>> I guess I wouldn't object.
>>
>> On Thu, Mar 08, 2018 at 10:11:11PM +0100, Daniel Alvarez Sanchez wrote:
>> > Thanks Ben and Mark. I'd be okay with 2x.
>> > Don't you think that apart from that it can still be good to compact
>> after
>> > a
>> > certain amount of time (like 1 day) if the number of transactions is > 0
>> > regardless of the size?
>> >
>> > On Thu, Mar 8, 2018 at 10:00 PM, Ben Pfaff <b...@ovn.org> wrote:
>> >
>> > > It would be trivial to change 4x to 2x.  4x was just the suggestion in
>> > > the Raft thesis.  If 2x would make everyone a little more comfortable,
>> > > let's make that change.
>> > >
>>
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Database sizes - Auto compact feature

2018-03-08 Thread Daniel Alvarez Sanchez
Ok, I've just sent a patch and if you're not convinced we can
just do the 2x change. Thanks a lot!
Daniel

On Thu, Mar 8, 2018 at 10:19 PM, Ben Pfaff <b...@ovn.org> wrote:

> I guess I wouldn't object.
>
> On Thu, Mar 08, 2018 at 10:11:11PM +0100, Daniel Alvarez Sanchez wrote:
> > Thanks Ben and Mark. I'd be okay with 2x.
> > Don't you think that apart from that it can still be good to compact
> after
> > a
> > certain amount of time (like 1 day) if the number of transactions is > 0
> > regardless of the size?
> >
> > On Thu, Mar 8, 2018 at 10:00 PM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > > It would be trivial to change 4x to 2x.  4x was just the suggestion in
> > > the Raft thesis.  If 2x would make everyone a little more comfortable,
> > > let's make that change.
> > >
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Database sizes - Auto compact feature

2018-03-08 Thread Daniel Alvarez Sanchez
Thanks Ben and Mark. I'd be okay with 2x.
Don't you think that apart from that it can still be good to compact after
a
certain amount of time (like 1 day) if the number of transactions is > 0
regardless of the size?

On Thu, Mar 8, 2018 at 10:00 PM, Ben Pfaff  wrote:

> It would be trivial to change 4x to 2x.  4x was just the suggestion in
> the Raft thesis.  If 2x would make everyone a little more comfortable,
> let's make that change.
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Database sizes - Auto compact feature

2018-03-08 Thread Daniel Alvarez Sanchez
I agree with you Mark. I tried to check how much it would shrink with 1800
ports in the system:

[stack@ovn ovs]$ sudo ovn-nbctl list Logical_Switch_Port | grep uuid | wc -l
1809
[stack@ovn ovs]$ sudo ovn-sbctl list Logical_Flow | grep uuid | wc -l


50780
[stack@ovn ovs]$ ls -alh ovn*.db


-rw-r--r--. 1 stack stack 15M Mar  8 15:56 ovnnb_db.db
-rw-r--r--. 1 stack stack 61M Mar  8 15:56 ovnsb_db.db
[stack@ovn ovs]$ sudo ovs-appctl -t
/usr/local/var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact
[stack@ovn ovs]$ sudo ovs-appctl -t
/usr/local/var/run/openvswitch/ovnnb_db.ctl ovsdb-server/compact


[stack@ovn ovs]$ ls -alh ovn*.db
-rw-r--r--. 1 stack stack 5.8M Mar  8 20:45 ovnnb_db.db
-rw-r--r--. 1 stack stack  23M Mar  8 20:45 ovnsb_db.db

As you can see, with ~50K lflows, the database min size would be ~23M while
the NB database
is much smaller. Still I think we need to do something to not allow delay
the compact task to
kick in this much unnecessarily. Or maybe we want some sort of
configuration (ie. normal, aggressive,...)
for this since in some situations it may help to have the full log of the
DB (although this can be
achieved through periodic backups :?). That said, I'm not a big fan of such
configs but...



On Thu, Mar 8, 2018 at 9:31 PM, Mark Michelson <mmich...@redhat.com> wrote:

> Most of the data in this thread has been pretty easily explainable based
> on what I've seen in the code compared with the nature of the data in the
> southbound database.
>
> The southbound database tends to have more data in it than other databases
> in OVS, due especially to the Logical_Flow table. The result is that auto
> shrinking of the database does not shrink it down by as much as other
> databases. You can see in Daniel's graphs that each time the southbound
> database is shrunk, its "base" size ends up noticeably larger than it
> previously was.
>
> Couple that with the fact that the database has to increase to 4x its
> previous snapshot size in order to be shrunk, and you can end up with a
> situation after a while where the "shrunk" southbound database is 750MB,
> and it won't shrink again until it exceeds 3GB.
>
> To fix this, I think there are a few things that can be done:
>
> * Somehow make the southbound database have less data in it. I don't have
> any real good ideas for how to do this, and doing this in a
> backwards-compatible way will be difficult.
>
> * Ease the requirements for shrinking a database. For instance, once the
> database reaches a certain size, maybe it doesn't need to grow by 4x in
> order to be a candidate for shrinking. Maybe it only needs to double in
> size. Or, there could be some time cutoff where the database always will be
> shrunk. So for instance, every hour, always shrink the database, no matter
> how much activity has occurred in it (okay, maybe not if there have been 0
> transactions).


Maybe we can just do the the shrink if the last compact took place >24h ago
regardless of the other conditions.
I can send a patch for this if you guys like the idea. It's some sort of
"cleanup task" just in case and seems harmless.
What do you say?

>
>
> On 03/07/2018 02:50 PM, Ben Pfaff wrote:
>
>> OK.
>>
>> I guess we need to investigate this issue from the basics.
>>
>> On Wed, Mar 07, 2018 at 09:02:02PM +0100, Daniel Alvarez Sanchez wrote:
>>
>>> With OVS 2.8 branch it never shrank when I started to delete the ports
>>> since
>>> the DB sizes didn't grow, which makes sense to me. The conditions weren't
>>> met for further compaction.
>>> See attached image.
>>>
>>> NB:
>>> 2018-03-07T18:25:49.269Z|9|ovsdb_file|INFO|/opt/stack/
>>> data/ovs/ovnnb_db.db:
>>> compacting database online (647.317 seconds old, 436 transactions,
>>> 10505382
>>> bytes)
>>> 2018-03-07T18:35:51.414Z|00012|ovsdb_file|INFO|/opt/stack/
>>> data/ovs/ovnnb_db.db:
>>> compacting database online (602.089 seconds old, 431 transactions,
>>> 29551917
>>> bytes)
>>> 2018-03-07T18:45:52.263Z|00015|ovsdb_file|INFO|/opt/stack/
>>> data/ovs/ovnnb_db.db:
>>> compacting database online (600.563 seconds old, 463 transactions,
>>> 52843231
>>> bytes)
>>> 2018-03-07T18:55:53.810Z|00016|ovsdb_file|INFO|/opt/stack/
>>> data/ovs/ovnnb_db.db:
>>> compacting database online (601.128 seconds old, 365 transactions,
>>> 57618931
>>> bytes)
>>>
>>>
>>> SB:
>>> 2018-03-07T18:33:24.927Z|9|ovsdb_file|INFO|/opt/stack/
>>> data/ovs/ovnsb_db.db:
>>> compacting database online (1102.840 seconds old, 775 transactions,
>>> 10505486 bytes)
>>> 201

Re: [ovs-discuss] OVN Database sizes - Auto compact feature

2018-03-07 Thread Daniel Alvarez Sanchez
With OVS 2.8 branch it never shrank when I started to delete the ports since
the DB sizes didn't grow, which makes sense to me. The conditions weren't
met for further compaction.
See attached image.

NB:
2018-03-07T18:25:49.269Z|9|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
compacting database online (647.317 seconds old, 436 transactions, 10505382
bytes)
2018-03-07T18:35:51.414Z|00012|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
compacting database online (602.089 seconds old, 431 transactions, 29551917
bytes)
2018-03-07T18:45:52.263Z|00015|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
compacting database online (600.563 seconds old, 463 transactions, 52843231
bytes)
2018-03-07T18:55:53.810Z|00016|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
compacting database online (601.128 seconds old, 365 transactions, 57618931
bytes)


SB:
2018-03-07T18:33:24.927Z|9|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
compacting database online (1102.840 seconds old, 775 transactions,
10505486 bytes)
2018-03-07T18:43:27.569Z|00012|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
compacting database online (602.394 seconds old, 445 transactions, 15293972
bytes)
2018-03-07T18:53:31.664Z|00015|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
compacting database online (603.605 seconds old, 385 transactions, 19282371
bytes)
2018-03-07T19:03:42.116Z|00031|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
compacting database online (607.542 seconds old, 371 transactions, 23538784
bytes)




On Wed, Mar 7, 2018 at 7:18 PM, Daniel Alvarez Sanchez <dalva...@redhat.com>
wrote:

> No worries, I just triggered the test now running OVS compiled out of
> 2.8 branch (2.8.3). I'll post the results and investigate too.
>
> I have just sent a patch to fix the timing issue we can see in the traces I
> posted. I applied it and it works, I believe it's good to fix as it gives
> us
> an idea of how frequent the compact is, and also to backport if you
> agree with it.
>
> Thanks!
>
> On Wed, Mar 7, 2018 at 7:13 PM, Ben Pfaff <b...@ovn.org> wrote:
>
>> OK, thanks.
>>
>> If this is a lot of trouble, let me know and I'll investigate directly
>> instead of on the basis of a suspected regression.
>>
>> On Wed, Mar 07, 2018 at 07:06:50PM +0100, Daniel Alvarez Sanchez wrote:
>> > All right, I'll repeat it with code in branch-2.8.
>> > Will post the results once the test finishes.
>> > Daniel
>> >
>> > On Wed, Mar 7, 2018 at 7:03 PM, Ben Pfaff <b...@ovn.org> wrote:
>> >
>> > > On Wed, Mar 07, 2018 at 05:53:15PM +0100, Daniel Alvarez Sanchez
>> wrote:
>> > > > Repeated the test with 1000 ports this time. See attached image.
>> > > > For some reason, the sizes grow while deleting the ports (the
>> > > > deletion task starts at around x=2500). The weird thing is why
>> > > > they keep growing and the online compact doesn't work as when
>> > > > I do it through ovs-appctl tool.
>> > > >
>> > > > I suspect this is a bug and eventually it will grow and grow unless
>> > > > we manually compact the db.
>> > >
>> > > Would you mind trying out an older ovsdb-server, for example the one
>> > > from OVS 2.8?  Some of the logic in ovsdb-server around compaction
>> > > changed in OVS 2.9, so it would be nice to know whether this was a
>> > > regression or an existing bug.
>> > >
>>
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Database sizes - Auto compact feature

2018-03-07 Thread Daniel Alvarez Sanchez
No worries, I just triggered the test now running OVS compiled out of
2.8 branch (2.8.3). I'll post the results and investigate too.

I have just sent a patch to fix the timing issue we can see in the traces I
posted. I applied it and it works, I believe it's good to fix as it gives us
an idea of how frequent the compact is, and also to backport if you
agree with it.

Thanks!

On Wed, Mar 7, 2018 at 7:13 PM, Ben Pfaff <b...@ovn.org> wrote:

> OK, thanks.
>
> If this is a lot of trouble, let me know and I'll investigate directly
> instead of on the basis of a suspected regression.
>
> On Wed, Mar 07, 2018 at 07:06:50PM +0100, Daniel Alvarez Sanchez wrote:
> > All right, I'll repeat it with code in branch-2.8.
> > Will post the results once the test finishes.
> > Daniel
> >
> > On Wed, Mar 7, 2018 at 7:03 PM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > > On Wed, Mar 07, 2018 at 05:53:15PM +0100, Daniel Alvarez Sanchez wrote:
> > > > Repeated the test with 1000 ports this time. See attached image.
> > > > For some reason, the sizes grow while deleting the ports (the
> > > > deletion task starts at around x=2500). The weird thing is why
> > > > they keep growing and the online compact doesn't work as when
> > > > I do it through ovs-appctl tool.
> > > >
> > > > I suspect this is a bug and eventually it will grow and grow unless
> > > > we manually compact the db.
> > >
> > > Would you mind trying out an older ovsdb-server, for example the one
> > > from OVS 2.8?  Some of the logic in ovsdb-server around compaction
> > > changed in OVS 2.9, so it would be nice to know whether this was a
> > > regression or an existing bug.
> > >
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Database sizes - Auto compact feature

2018-03-07 Thread Daniel Alvarez Sanchez
All right, I'll repeat it with code in branch-2.8.
Will post the results once the test finishes.
Daniel

On Wed, Mar 7, 2018 at 7:03 PM, Ben Pfaff <b...@ovn.org> wrote:

> On Wed, Mar 07, 2018 at 05:53:15PM +0100, Daniel Alvarez Sanchez wrote:
> > Repeated the test with 1000 ports this time. See attached image.
> > For some reason, the sizes grow while deleting the ports (the
> > deletion task starts at around x=2500). The weird thing is why
> > they keep growing and the online compact doesn't work as when
> > I do it through ovs-appctl tool.
> >
> > I suspect this is a bug and eventually it will grow and grow unless
> > we manually compact the db.
>
> Would you mind trying out an older ovsdb-server, for example the one
> from OVS 2.8?  Some of the logic in ovsdb-server around compaction
> changed in OVS 2.9, so it would be nice to know whether this was a
> regression or an existing bug.
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Database sizes - Auto compact feature

2018-03-07 Thread Daniel Alvarez Sanchez
Right, thanks Mark! Good point about the 4x, I missed that one.
I'm repeating the test for 1K ports. Not sure if I'll be able to reproduce
the 2.5GB part but it is still weird that while deleting ports (actually
deleting stuff from the DB) it doesn't get to compact the DB further
(last time it shrinked to 9MB) so maybe we have something odd here
going in the online compact. I'll post the results after this test.

On Wed, Mar 7, 2018 at 3:35 PM, Mark Michelson <mmich...@redhat.com> wrote:

> On 03/07/2018 07:40 AM, Daniel Alvarez Sanchez wrote:
>
>> Hi folks,
>>
>> During the performance tests I've been doing lately I noticed
>> that the size of the Southbound database was around 2.5GB
>> in one of my setups. I couldn't dig further then but now I
>> decided to explore a bit more and these are the results in
>> my all-in-one OpenStack setup using OVN as a backend:
>>
>> * Created 800 ports on the same network (logical switch).
>> * Deleted those 800 ports.
>> * I logged the DB sizes for both NB and SB databases every second.
>>
>> See attached image for the results.
>>
>> At around x=2000, the creation task finished and deletion starts.
>> As you can see, there's automatic compact happening in the
>> NB database across the whole test. However, while I was deleting
>> ports, the SB database stop shrinking and keeps growing.
>>
>> After the test finished, the DB sizes remaining the same
>> thus SB database being around 34MB. It was not until I
>> manually compacted it when it finally shrinked:
>>
>>
>> [stack@ovn ovs]$ ls -alh ovnsb_db.db
>> -rw-r--r--. 1 stack stack 34M Mar  7 12:04 ovnsb_db.db
>>
>> [stack@ovn ovs]$ sudo ovs-appctl -t 
>> /usr/local/var/run/openvswitch/ovnsb_db.ctl
>> ovsdb-server/compact
>>
>> [stack@ovn ovs]$ ls -alh ovnsb_db.db
>> -rw-r--r--. 1 stack stack 207K Mar  7 13:32 ovnsb_db.db
>>
>> I'll try to investigate further in the code.
>> Thanks,
>>
>> Daniel
>>
>
> Daniel and I discussed this offline and I think the attached result does
> not necessarily indicate a bug yet.
>
> One of the requirements for the DB to compact automatically is that it
> needs to grow to at least 4x the size of the previous snapshot. In the
> graph, we can see that the first time the SB DB compacts (around x==1000),
> it shrinks down to ~5MB. We would expect the DB to compact again when it
> gets to ~20MB. This happens at around x==1900. The DB shrinks to ~9MB, so
> we would expect it to shrink again when it reaches ~36MB. When the test
> ends, the SB DB is ~34 MB, so it is not quite large enough to compact yet.
> If the test had gone a bit longer, then presumably we might have seen the
> DB shrink again.
>
> It will be interesting to see the test that leads to a 2.5GB southbound
> database.
>
> Mark!
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Database sizes - Auto compact feature

2018-03-07 Thread Daniel Alvarez Sanchez
BTW, I didn't spot any of these messages in the log:

https://github.com/openvswitch/ovs/blob/4cc9d1f03f83e9fac90a77ddaca0af662b2758b1/ovsdb/file.c#L615

I'll add a few traces to figure out why the auto compact is not triggering.

Also, I could see the trace when I ran it manually:
2018-03-07T13:32:21.672Z|00021|ovsdb_server|INFO|compacting OVN_Southbound
database by user request
2018-03-07T13:32:21.672Z|00022|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
compacting database online (1519124364.908 seconds old, 951 transactions)


On Wed, Mar 7, 2018 at 2:40 PM, Daniel Alvarez Sanchez <dalva...@redhat.com>
wrote:

> Hi folks,
>
> During the performance tests I've been doing lately I noticed
> that the size of the Southbound database was around 2.5GB
> in one of my setups. I couldn't dig further then but now I
> decided to explore a bit more and these are the results in
> my all-in-one OpenStack setup using OVN as a backend:
>
> * Created 800 ports on the same network (logical switch).
> * Deleted those 800 ports.
> * I logged the DB sizes for both NB and SB databases every second.
>
> See attached image for the results.
>
> At around x=2000, the creation task finished and deletion starts.
> As you can see, there's automatic compact happening in the
> NB database across the whole test. However, while I was deleting
> ports, the SB database stop shrinking and keeps growing.
>
> After the test finished, the DB sizes remaining the same
> thus SB database being around 34MB. It was not until I
> manually compacted it when it finally shrinked:
>
>
> [stack@ovn ovs]$ ls -alh ovnsb_db.db
> -rw-r--r--. 1 stack stack 34M Mar  7 12:04 ovnsb_db.db
>
> [stack@ovn ovs]$ sudo ovs-appctl -t 
> /usr/local/var/run/openvswitch/ovnsb_db.ctl
> ovsdb-server/compact
>
> [stack@ovn ovs]$ ls -alh ovnsb_db.db
> -rw-r--r--. 1 stack stack 207K Mar  7 13:32 ovnsb_db.db
>
> I'll try to investigate further in the code.
> Thanks,
>
> Daniel
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVN Database sizes - Auto compact feature

2018-03-07 Thread Daniel Alvarez Sanchez
Hi folks,

During the performance tests I've been doing lately I noticed
that the size of the Southbound database was around 2.5GB
in one of my setups. I couldn't dig further then but now I
decided to explore a bit more and these are the results in
my all-in-one OpenStack setup using OVN as a backend:

* Created 800 ports on the same network (logical switch).
* Deleted those 800 ports.
* I logged the DB sizes for both NB and SB databases every second.

See attached image for the results.

At around x=2000, the creation task finished and deletion starts.
As you can see, there's automatic compact happening in the
NB database across the whole test. However, while I was deleting
ports, the SB database stop shrinking and keeps growing.

After the test finished, the DB sizes remaining the same
thus SB database being around 34MB. It was not until I
manually compacted it when it finally shrinked:


[stack@ovn ovs]$ ls -alh ovnsb_db.db
-rw-r--r--. 1 stack stack 34M Mar  7 12:04 ovnsb_db.db

[stack@ovn ovs]$ sudo ovs-appctl -t
/usr/local/var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact

[stack@ovn ovs]$ ls -alh ovnsb_db.db
-rw-r--r--. 1 stack stack 207K Mar  7 13:32 ovnsb_db.db

I'll try to investigate further in the code.
Thanks,

Daniel
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow

2018-02-28 Thread Daniel Alvarez Sanchez
Thanks a lot Han! Great and swift work!
I'm testing them now, will let you know ASAP.

On Thu, Mar 1, 2018 at 8:39 AM, Han Zhou <zhou...@gmail.com> wrote:

>
>
> On Mon, Feb 26, 2018 at 12:05 PM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > On Fri, Feb 23, 2018 at 03:51:28PM -0800, Han Zhou wrote:
> > > On Fri, Feb 23, 2018 at 2:17 PM, Ben Pfaff <b...@ovn.org> wrote:
> > > >
> > > > On Tue, Feb 20, 2018 at 08:56:42AM -0800, Han Zhou wrote:
> > > > > On Tue, Feb 20, 2018 at 8:15 AM, Ben Pfaff <b...@ovn.org> wrote:
> > > > > >
> > > > > > On Mon, Feb 19, 2018 at 11:33:11AM +0100, Daniel Alvarez Sanchez
> > > wrote:
> > > > > > > @Han, I can try rebase the patch if you want but that was
> > > > > > > basically renaming the Address_Set table and from Ben's
> > > > > > > comment, it may be better to keep the name. Not sure,
> > > > > > > however, how we can proceed to address Lucas' points in
> > > > > > > this thread.
> > > > > >
> > > > > > I wouldn't rename the table.  It sounds like the priority should
> be to
> > > > > > add support for sets of port names.  I thought that there was
> already
> > > a
> > > > > > patch for that to be rebased, but maybe I misunderstood.
> > > > >
> > > > > I feel it is better to add a new table for port group explicitly,
> and
> > > the
> > > > > column type can be a set of weak reference to Logical_Switch_Port.
> > > > > The benefits are:
> > > > > - Better data integrity: deleting a lport automatically deletes
> from the
> > > > > port group
> > > > > - No confusion about the type of records in a single table
> > > > > - Existing Address_Set mechanism will continue to be supported
> without
> > > any
> > > > > change
> > > > > - Furthermore, the race condition issue brought up by Lucas can be
> > > solved
> > > > > by supporting port-group in IP address match condition in
> > > ovn-controller,
> > > > > so that all addresses in the lports are used just like how
> AddressSet is
> > > > > used today. And there is no need for Neutron networking-ovn to use
> > > > > AddressSet any more. Since addresses are deduced from lports, the
> > > ordering
> > > > > of deleting/adding doesn't matter any more.
> > > > >
> > > > > How does this sound?
> > > >
> > > > Will we want sets of Logical_Router_Ports later?
> > > At least I don't see any use case in Neutron for router ports since
> > > Security Group is only for VIF ports.
> > >
> > > There is another tricky point I see while working on implementation. In
> > > Neutron, SG can be applied to ports across different networks, but in
> OVN
> > > lports works only on its own datapath, so in ovn-controller we need to
> be
> > > able to pickup related ports from the port-group when translating
> lflows
> > > for each datapath. I hope this is not an issue. Otherwise, Neutron
> plugin
> > > will have to divide the group of ports into sub-groups according to the
> > > lswitch they belong to, which would be a pain.
> >
> > I think that we can make ovn-controller gracefully tolerate that.
> >
> > Let's try this implementation.  I'm not excited about having a new table
> > for this purpose, but it sounds like the advantages may be worthwhile.
>
> Here are the patches: https://patchwork.ozlabs.org/
> project/openvswitch/list/?series=31165
>
> Thanks,
> Han
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow

2018-02-19 Thread Daniel Alvarez Sanchez
Just for completeness, I did some tests with and without the
JSON parser C extension [0]. There's no significant gain (at
this point), maybe when we add the port groups it's more
noticeable though. Average time without it's been 2.09s
while with the C extension it's been 2.03s.

@Han, I can try rebase the patch if you want but that was
basically renaming the Address_Set table and from Ben's
comment, it may be better to keep the name. Not sure,
however, how we can proceed to address Lucas' points in
this thread.

Thanks,
Daniel

[0] https://imgur.com/a/etb5M

On Fri, Feb 16, 2018 at 6:33 PM, Han Zhou <zhou...@gmail.com> wrote:

> Hi Daniel,
>
> Thanks for the detailed profiling!
>
> On Fri, Feb 16, 2018 at 6:50 AM, Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >
> > About the duplicated processing of the update2 messages, I've verified
> that those are not always present. I've isolated the scenario further and
> did tcpdump and debugging on the exact process which is sending
> the'transact' command and I see no update2 processing duplicates. Among the
> rest of the workers, one of them is always getting them duplicated while
> the rest don't I don't know why.
> > However, the 'modify' in the LS table for updating the acls set is the
> one always taking 2-3 seconds on this load.
> >
> I think this explains why the time spent grows linearly with number of
> lports. Each lport has its own ACLs added to same LS, so the acl column in
> the row gets bigger and bigger. It is likely that the port group
> optimization would solve this problem.
>
> Thanks
> Han
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow

2018-02-16 Thread Daniel Alvarez Sanchez
On Fri, Feb 16, 2018 at 12:12 PM, Daniel Alvarez Sanchez <
dalva...@redhat.com> wrote:

> I've found out more about what is running slow in this scenario.
> I've profiled the processing of the update2 messages and here you can
> see the sequence of calls to __process_update2 (idl.py) when I'm
> creating a new port via OpenStack on a system loaded with 800 ports
> on the same Logical Switch:
>
>
> 1. Logical_Switch_Port 'insert'
> 2. Address_Set 'modify' (to add the new address, it takes ~ takes around
> 0.2 seconds
> 3. Logical_Switch_Port 'modify' (to add the new 8 ACLs)  <- This takes > 2
> seconds
>
Sorry this is ^ Logical_Switch table.

> 4. ACL 'insert' x8 (one per ACL)
> 5. Logical_Switch_Port 'modify' ('up' = False)
> 6. Logical_Switch_Port 'insert' (this is exactly the same as 1, so it'll
> be skipped)
> 7. Address_Set 'modify' (this is exactly the same as 2, so it'll be
> skipped) still takes ~0.05-0.01 s
> 8. Logical_Switch 'modify' (to add the new 8 ACLs, same as 3)  still takes
> ~0.5 seconds
>
This too ^

> 9. ACL 'insert' x8 (one per ACL, same as 4)
> 10. Logical_Switch_Port 'modify' ('up' = False) same as 5
> 11. Port_Binding (SB) 'insert'
> 12. Port_Binding (SB) 'insert' (same as 11)
>
>
> Half of those are dups and even they are noop, they consume times.
> The most expensive operation is adding the new 8 ACLs to the acls set in
> LS table.
> (800 ports with 8 ACLs each makes that set to have 6400 elements).
>
> NOTE: As you can see, we're trying to insert the address again into
> Address_Sets so we should
> bear this in mind if we go ahead with Lucas' suggestion about allowing
> dups here.
>
> It's obvious that we'll gain a lot by using Ports Sets for ACLs but, we'll
> also need
> to invest some time in finding out why we're getting dups and also trying
> to optimize
> the process_update2 method and its callees to make it faster. With those
> last
> two things I guess we can improve the performance a lot.
> Also, creating a python C binding for this module could also help but that
> seems like
> a lot of work and still we would need to convert to/from C structures to
> Python
> objects. However, inserting or identifying dups on large sets would be way
> faster.
>
> I'm going to try out Ben's suggestions for optimizing the process_update*
> methods,
> and will also try to dig further about the dups. As process_update* seems
> a bit
> expensive, looks to me that calling it 26 times for a single port is a lot.
> 26 calls = 2*(1 (LS insert) + 1 (AS modify)  + 1(LSP modify) + 8 (ACL
> insert) + 1(LSP modify) + 1(PB insert))
>
 26 calls = 2*(1 (LS insert) + 1 (AS modify)  + 1(LS modify) + 8 (ACL
insert) + 1(LSP modify) + 1(PB insert))


> Thoughts?
> Thanks!
> Daniel
>
>
> On Thu, Feb 15, 2018 at 10:56 PM, Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
>
>>
>>
>> On Wed, Feb 14, 2018 at 9:34 PM, Han Zhou <zhou...@gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Feb 14, 2018 at 9:45 AM, Ben Pfaff <b...@ovn.org> wrote:
>>> >
>>> > On Wed, Feb 14, 2018 at 11:27:11AM +0100, Daniel Alvarez Sanchez wrote:
>>> > > Thanks for your inputs. I need to look more carefully into the patch
>>> you
>>> > > submitted but it looks like, at least, we'll be reducing the number
>>> of
>>> > > calls to Datum.__cmp__ which should be good.
>>> >
>>> > Thanks.  Please do take a look.  It's a micro-optimization but maybe
>>> > it'll help?
>>> >
>>> > > I probably didn't explain it very well. Right now we have N processes
>>> > > for Neutron server (in every node). Each of those opens a connection
>>> > > to NB db and they subscribe to updates from certain tables. Each time
>>> > > a change happens, ovsdb-server will send N update2 messages that has
>>> > > to be processed in this "expensive" way by each of those N
>>> > > processes. My proposal (yet to be refined) would be to now open N+1
>>> > > connections to ovsdb-server and only subscribe to notifications from
>>> 1
>>> > > of those. So every time a new change happens, ovsdb-server will send
>>> 1
>>> > > update2 message. This message will be processed (using Py IDL as we
>>> do
>>> > > now) and once processed, send it (mcast maybe?) to the rest N
>>> > > processes. This msg could be simply a Python object serialized and
>>> > > we'd be saving all this Datum, Atom, etc. processing by doing it just
>>> > > once.
>>

Re: [ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow

2018-02-16 Thread Daniel Alvarez Sanchez
I've found out more about what is running slow in this scenario.
I've profiled the processing of the update2 messages and here you can
see the sequence of calls to __process_update2 (idl.py) when I'm
creating a new port via OpenStack on a system loaded with 800 ports
on the same Logical Switch:


1. Logical_Switch_Port 'insert'
2. Address_Set 'modify' (to add the new address, it takes ~ takes around
0.2 seconds
3. Logical_Switch_Port 'modify' (to add the new 8 ACLs)  <- This takes > 2
seconds
4. ACL 'insert' x8 (one per ACL)
5. Logical_Switch_Port 'modify' ('up' = False)
6. Logical_Switch_Port 'insert' (this is exactly the same as 1, so it'll be
skipped)
7. Address_Set 'modify' (this is exactly the same as 2, so it'll be
skipped) still takes ~0.05-0.01 s
8. Logical_Switch_Port 'modify' (to add the new 8 ACLs, same as 3)  still
takes ~0.5 seconds
9. ACL 'insert' x8 (one per ACL, same as 4)
10. Logical_Switch_Port 'modify' ('up' = False) same as 5
11. Port_Binding (SB) 'insert'
12. Port_Binding (SB) 'insert' (same as 11)


Half of those are dups and even they are noop, they consume times.
The most expensive operation is adding the new 8 ACLs to the acls set in LS
table.
(800 ports with 8 ACLs each makes that set to have 6400 elements).

NOTE: As you can see, we're trying to insert the address again into
Address_Sets so we should
bear this in mind if we go ahead with Lucas' suggestion about allowing dups
here.

It's obvious that we'll gain a lot by using Ports Sets for ACLs but, we'll
also need
to invest some time in finding out why we're getting dups and also trying
to optimize
the process_update2 method and its callees to make it faster. With those
last
two things I guess we can improve the performance a lot.
Also, creating a python C binding for this module could also help but that
seems like
a lot of work and still we would need to convert to/from C structures to
Python
objects. However, inserting or identifying dups on large sets would be way
faster.

I'm going to try out Ben's suggestions for optimizing the process_update*
methods,
and will also try to dig further about the dups. As process_update* seems a
bit
expensive, looks to me that calling it 26 times for a single port is a lot.
26 calls = 2*(1 (LS insert) + 1 (AS modify)  + 1(LSP modify) + 8 (ACL
insert) + 1(LSP modify) + 1(PB insert))

Thoughts?
Thanks!
Daniel


On Thu, Feb 15, 2018 at 10:56 PM, Daniel Alvarez Sanchez <
dalva...@redhat.com> wrote:

>
>
> On Wed, Feb 14, 2018 at 9:34 PM, Han Zhou <zhou...@gmail.com> wrote:
>
>>
>>
>> On Wed, Feb 14, 2018 at 9:45 AM, Ben Pfaff <b...@ovn.org> wrote:
>> >
>> > On Wed, Feb 14, 2018 at 11:27:11AM +0100, Daniel Alvarez Sanchez wrote:
>> > > Thanks for your inputs. I need to look more carefully into the patch
>> you
>> > > submitted but it looks like, at least, we'll be reducing the number of
>> > > calls to Datum.__cmp__ which should be good.
>> >
>> > Thanks.  Please do take a look.  It's a micro-optimization but maybe
>> > it'll help?
>> >
>> > > I probably didn't explain it very well. Right now we have N processes
>> > > for Neutron server (in every node). Each of those opens a connection
>> > > to NB db and they subscribe to updates from certain tables. Each time
>> > > a change happens, ovsdb-server will send N update2 messages that has
>> > > to be processed in this "expensive" way by each of those N
>> > > processes. My proposal (yet to be refined) would be to now open N+1
>> > > connections to ovsdb-server and only subscribe to notifications from 1
>> > > of those. So every time a new change happens, ovsdb-server will send 1
>> > > update2 message. This message will be processed (using Py IDL as we do
>> > > now) and once processed, send it (mcast maybe?) to the rest N
>> > > processes. This msg could be simply a Python object serialized and
>> > > we'd be saving all this Datum, Atom, etc. processing by doing it just
>> > > once.
>> >
>> Daniel, I understand that the update2 messages sending would consume NB
>> ovsdb-server CPU and processing those update would consume neutron server
>> process CPU. However, are we sure it is the bottleneck for port creation?
>>
>> From ovsdb-server point of view, sending updates to tens of clients
>> should not be the bottleneck, considering that we have a lot more clients
>> on HVs for SB ovsdb-server.
>>
>> From clients point of view, I think it is more of memory overhead than
>> CPU, and it also depends on how many neutron processes are running on the
>> same node. I didn't find neutron process CPU in your charts. I am hesitate
>> for such big change before we are 

Re: [ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow

2018-02-15 Thread Daniel Alvarez Sanchez
On Wed, Feb 14, 2018 at 9:34 PM, Han Zhou <zhou...@gmail.com> wrote:

>
>
> On Wed, Feb 14, 2018 at 9:45 AM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > On Wed, Feb 14, 2018 at 11:27:11AM +0100, Daniel Alvarez Sanchez wrote:
> > > Thanks for your inputs. I need to look more carefully into the patch
> you
> > > submitted but it looks like, at least, we'll be reducing the number of
> > > calls to Datum.__cmp__ which should be good.
> >
> > Thanks.  Please do take a look.  It's a micro-optimization but maybe
> > it'll help?
> >
> > > I probably didn't explain it very well. Right now we have N processes
> > > for Neutron server (in every node). Each of those opens a connection
> > > to NB db and they subscribe to updates from certain tables. Each time
> > > a change happens, ovsdb-server will send N update2 messages that has
> > > to be processed in this "expensive" way by each of those N
> > > processes. My proposal (yet to be refined) would be to now open N+1
> > > connections to ovsdb-server and only subscribe to notifications from 1
> > > of those. So every time a new change happens, ovsdb-server will send 1
> > > update2 message. This message will be processed (using Py IDL as we do
> > > now) and once processed, send it (mcast maybe?) to the rest N
> > > processes. This msg could be simply a Python object serialized and
> > > we'd be saving all this Datum, Atom, etc. processing by doing it just
> > > once.
> >
> Daniel, I understand that the update2 messages sending would consume NB
> ovsdb-server CPU and processing those update would consume neutron server
> process CPU. However, are we sure it is the bottleneck for port creation?
>
> From ovsdb-server point of view, sending updates to tens of clients should
> not be the bottleneck, considering that we have a lot more clients on HVs
> for SB ovsdb-server.
>
> From clients point of view, I think it is more of memory overhead than
> CPU, and it also depends on how many neutron processes are running on the
> same node. I didn't find neutron process CPU in your charts. I am hesitate
> for such big change before we are clear about the bottleneck. The chart of
> port creation time is very nice, but do we know which part of code
> contributed to the linear growth? Do we have profiling for the time spent
> in ovn_client.add_acls()?
>

Here we are [0]. We see some spikes which are larger as the amount of ports
increases
but looks like the actual bottleneck is going to be when we're actually
commiting the
transaction [1]. I'll dig further though.

[0 https://imgur.com/a/TmwbC
[1]
https://github.com/openvswitch/ovs/blob/master/python/ovs/db/idl.py#L1158

>
> > OK.  It's an optimization that does the work in one place rather than N
> > places, so definitely a win from a CPU cost point of view, but it trades
> > performance for increased complexity.  It sounds like performance is
> > really important so maybe the increased complexity is a fair trade.
> >
> > We might also be able to improve performance by using native code for
> > some of the work.  Were these tests done with the native code JSON
> > parser that comes with OVS?  It is dramatically faster than the Python
> > code.
> >
> > > On Tue, Feb 13, 2018 at 8:32 PM, Ben Pfaff <b...@ovn.org> wrote:
> > >
> > > > Can you sketch the rows that are being inserted or modified when a
> port
> > > > is added?  I would expect something like this as a minimum:
> > > >
> > > > * Insert one Logical_Switch_Port row.
> > > >
> > > > * Add pointer to Logical_Switch_Port to ports column in one
> row
> > > >   in Logical_Switch.
> > > >
> > > > In addition it sounds like currently we're seeing:
> > > >
> > > > * Add one ACL row per security group rule.
> > > >
> > > > * Add pointers to ACL rows to acls column in one row in
> > > >   Logical_Switch.
> > > >
> > > This is what happens when we create a port in OpenStack (without
> > > binding it) which belongs to a SG which allows ICMP and SSH traffic
> > > and drops the rest [0]
> > >
> > > Basically, you were right and only thing missing was adding the new
> > > address to the Address_Set table.
> >
> > OK.
> >
> > It sounds like the real scaling problem here is that for R security
> > group rules and P ports, we have R*P rows in the ACL table.  Is that
> > correct?  Should we aim to solve that problem?
>
> I think this might be the most valuable point to optimize for the
> create_port scenario from Neutron.
> I remember there was a patch for ACL group in OVN, so that instead of R*P
> rows we will have only R + P rows, but didn't see it went through.
> Is this also a good use case of conjuncture?
>
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow

2018-02-14 Thread Daniel Alvarez Sanchez
Hi Ben,

Thanks for your inputs. I need to look more carefully into the patch you
submitted but it looks like, at least, we'll be reducing the number of
calls to Datum.__cmp__ which should be good.

For the rest of the things, let me answer inline.


On Tue, Feb 13, 2018 at 8:32 PM, Ben Pfaff <b...@ovn.org> wrote:

> On Tue, Feb 13, 2018 at 12:39:56PM +0100, Daniel Alvarez Sanchez wrote:
> > Hi folks,
> >
> > As we're doing some performance tests in OpenStack using OVN,
> > we noticed that as we keep creating ports, the time for creating a
> > single port increases. Also, ovn-northd CPU consumption is quite
> > high (see [0] which shows the CPU consumption when creating
> > 1000 ports and deleting them. Last part where CPU is at 100% is
> > when all the ports get deleted).
> >
> > With 500 ports in the same Logical Switch, I did some profiling
> > of OpenStack neutron-server adding 10 more ports to that Logical
> > Switch. Currently, neutron-server spawns different API workers
> > (separate processes) which open connections to OVN NB so every
> > time an update message is sent from ovsdb-server it'll be processed
> > by all of them.
> >
> > In my profiling, I used GreenletProfiler in all those processes to
> produce
> > a trace file and then merged all of them together to aggregate the
> > results. In those tests I used OVS master branch compiled it with
> > shared libraries to make use of the JSON C parser. Still, I've seen
> > that most of the time's been spent in the following two modules:
> >
> > - python/ovs/db/data.py:  33%
> > - uuid.py:  21%
> >
> > For the data.py module, this is the usage (self time):
> >
> > Atom.__lt__   16.25% 8283 calls
> > from_json:118  6.18%   406935 calls
> > Atom.__hash__  3.48%  1623832 calls
> > from_json:328  2.01% 5040 calls
> >
> > While for the uuid module:
> >
> > UUID.__cmp__   12.84%  3570975 calls
> > UUID.__init__   4.06%   362541 calls
> > UUID.__hash__   2.96% 1800 calls
> > UUID.__str__1.03%   355016 calls
> >
> > Most of the calls to Atom.__lt__ come from
> > BaseOvnIdl.__process_update2(idl.py)
> > -> BaseOvnIdl.__row_update(idl.py) -> Datum.__cmp__(data.py) ->
> > Atom.__cmp__(data.py).
>
> I don't know Python well enough to whether these are "real" or correct
> optimizations, but the following strike me as possible low-hanging fruit
> micro-optimizations:
>
> diff --git a/python/ovs/db/data.py b/python/ovs/db/data.py
> index 9e57595f7513..dc816f64708f 100644
> --- a/python/ovs/db/data.py
> +++ b/python/ovs/db/data.py
> @@ -76,12 +76,12 @@ class Atom(object):
>  def __eq__(self, other):
>  if not isinstance(other, Atom) or self.type != other.type:
>  return NotImplemented
> -return True if self.value == other.value else False
> +return self.value == other.value
>
>  def __lt__(self, other):
>  if not isinstance(other, Atom) or self.type != other.type:
>  return NotImplemented
> -return True if self.value < other.value else False
> +return self.value < other.value
>
>  def __cmp__(self, other):
>  if not isinstance(other, Atom) or self.type != other.type:
>
> We could also reduce the number of calls to Datum.__cmp__ by recognizing
> that if we've already found one change we don't have to do any more
> comparisons, something like this:
>
> diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
> index 60548bcf50b6..2e550adfdf1c 100644
> --- a/python/ovs/db/idl.py
> +++ b/python/ovs/db/idl.py
> @@ -447,6 +447,7 @@ class Idl(object):
>  raise error.Error(" is not an object",
>table_updates)
>
> +changed = False
>  for table_name, table_update in six.iteritems(table_updates):
>  table = self.tables.get(table_name)
>  if not table:
> @@ -472,8 +473,8 @@ class Idl(object):
>% (table_name, uuid_string))
>
>  if version == OVSDB_UPDATE2:
> -if self.__process_update2(table, uuid, row_update):
> -self.change_seqno += 1
> +changed = self.__process_update2(table, uuid,
> row_update,
> + changed)
>  continue
>
>  parser = ovs.db.parser.Parser(row_update, "row-update")
> @@ -485,12 +486,12 @@ class Idl(object):
>  raise error.Error(' mi

[ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow

2018-02-13 Thread Daniel Alvarez Sanchez
Hi folks,

As we're doing some performance tests in OpenStack using OVN,
we noticed that as we keep creating ports, the time for creating a
single port increases. Also, ovn-northd CPU consumption is quite
high (see [0] which shows the CPU consumption when creating
1000 ports and deleting them. Last part where CPU is at 100% is
when all the ports get deleted).

With 500 ports in the same Logical Switch, I did some profiling
of OpenStack neutron-server adding 10 more ports to that Logical
Switch. Currently, neutron-server spawns different API workers
(separate processes) which open connections to OVN NB so every
time an update message is sent from ovsdb-server it'll be processed
by all of them.

In my profiling, I used GreenletProfiler in all those processes to produce
a trace file and then merged all of them together to aggregate the
results. In those tests I used OVS master branch compiled it with
shared libraries to make use of the JSON C parser. Still, I've seen
that most of the time's been spent in the following two modules:

- python/ovs/db/data.py:  33%
- uuid.py:  21%

For the data.py module, this is the usage (self time):

Atom.__lt__   16.25% 8283 calls
from_json:118  6.18%   406935 calls
Atom.__hash__  3.48%  1623832 calls
from_json:328  2.01% 5040 calls

While for the uuid module:

UUID.__cmp__   12.84%  3570975 calls
UUID.__init__   4.06%   362541 calls
UUID.__hash__   2.96% 1800 calls
UUID.__str__1.03%   355016 calls

Most of the calls to Atom.__lt__ come from
BaseOvnIdl.__process_update2(idl.py)
-> BaseOvnIdl.__row_update(idl.py) -> Datum.__cmp__(data.py) ->
Atom.__cmp__(data.py).

The aggregated number of calls to BaseOvnIdl.__process_update2 is
1400 (and we're updating only 10 ports!!) while the total connections
opened to NB database are 10:

# netstat -np | grep 6641 | grep python | wc -l
10

* Bear in mind that those results above were aggregated across all
processes.

Looks like the main culprit for this explosion could be the way we
handle ACLs. As every time we create a port, it'll belong to a Neutron
security group (OVN Address Set) and we'll add a new ACL for every
Neutron security group rule. If we patch the code to skip the ACL part,
the time for creating a port remains stable over the time.

>From the comparison tests against ML2/OVS (reference implementation),
OVN outperforms in most of the operations except for the port creation
where we can see it can become a bottleneck.

Before optimizing/redesigning the ACL part, we could do some other
changes to the way we handle notifications from OVSDB: eg., instead of
having multiple processes receiving *all* notifications, we could have
one single process subscribed to those notifications and send a more
optimized (already parsed) multicast notification to all listening processes
to keep their own in-memory copies of the DB up to date. All processes
would connect to NB database in "write-only" mode to commit their
transactions however.

Even though this last paragraph would best fit in OpenStack ML I want
to raise it here for feedback and see if someone can see some "immediate"
optimization for the way we're processing notifications from OVSDB.
Maybe some python binding to do it in C? :)

Any feedback, comments or suggestions are highly appreciated :)

Best,
Daniel Alvarez

[0]
https://snapshot.raintank.io/dashboard/snapshot/dwbhn0Z1zVTh9kI5j6mCVySx8TvrP45m?orgId=2
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] ovs-vswitchd 100% CPU in OVN scale test

2018-02-09 Thread Daniel Alvarez Sanchez
Nice findings Han!
Looking back at the patch that Numan sent I answered this to the report:

"Yes, thanks Numan for the patch :)

Another option would be that ovn-controller sets explicitly the MTU to 1450.
Not sure which of the two is the best or would have less side effects.

Cheers,
Daniel
"

Would that be an option? We'd be closing the loop around OVN but maybe
we still need it fixed wider at OVS level. Thoughts?

Daniel

On Fri, Feb 9, 2018 at 2:55 AM, Han Zhou  wrote:

>
>
> On Wed, Feb 7, 2018 at 12:47 PM, Han Zhou  wrote:
> >
> > When doing scale testing for OVN (using https://github.com/
> openvswitch/ovn-scale-test), we had some interesting findings, and need
> some help here.
> >
> > We ran the test "create and bind lports" against branch 2.9 and branch
> 2.6, and we found that 2.6 was must faster. With some analysis, we found
> out the reason is not because of OVN gets slower in 2.9, but because the
> bottleneck of this test in branch 2.9 is ovs-vswitchd.
> >
> > The testing was run in an environment with 20 farm nodes, each has 50
> sandbox HVs (I will just mention them as HVs in short). Before the test,
> there are already 9500 lports bound in 950 HVs on 19 farm nodes. The test
> run against the last farm node to bind the lport on the 50 HVs there. The
> steps in the test scenario are:
> >
> > 1. Create 5 new LSs in NB (so that the LSs will not be shared with any
> of HVs on other farm nodes)
> > 2. create 100 lports in NB on a LS
> > 3. bind these lports on HVs, 2 for each HV. They are bound sequentially
> on each HV, and for each HV the 2 ports are bound using one command
> together: ovs-vsctl add-port  -- set Interface external-ids:...  --
> add-port  -- set Interface external-ids:... (the script didn't set
> type to internal, but I hope it is not an issue for this test).
> > 4. wait the port stated changed to up in NB for all the 100 lports (with
> a single ovn-nbctl command)
> >
> > These steps are repeated for 5 times, one for each LS. So in the end we
> got 500 more lports created and bound (the total scale is then 1k HVs and
> 10k lports).
> >
> > When running with 2.6, the ovn-controllers were taking most of the CPU
> time. However, with 2.9, the CPU of ovn-controllers spikes but there is
> always ovs-vswitchd on the top with 100% CPU. It means the ovs-vswitchd is
> the bottleneck in this testing. There is only one ovs-vswitchd with 100% at
> the same time and different ovs-vswitchd will spike one after another,
> since the ports are bound sequentially on each HV. From the rally log, each
> 2 ports binding takes around 4 - 5 seconds. This is just the ovs-vsctl
> command execution time. The 100% CPU of ovs-vswitchd explains the slowness.
> >
> > So, based on this result, we can not using the total time to evaluate
> the efficiency of OVN, instead we can evaluate by CPU cost of
> ovn-controller processes. In fact, 2.9 ovn-controller costs around 70% less
> CPU than 2.6, which I think is due to some optimization we made earlier.
> (With my work-in-progress patch it saves much more, and I will post later
> as RFC).
> >
> > However, I cannot explain why ovs-vswitchd is getting slower than 2.6
> when doing port-binding. We need expert suggestions here, for what could be
> the possible reason of this slowness. We can do more testing with different
> versions between 2.6 and 2.9 to find out related change, but with some
> pointers it might save some effort. Below are some logs of ovs-vswitchd
> when port binding is happening:
> >
> > ==
> > 2018-02-07T00:12:54.558Z|01767|bridge|INFO|bridge br-int: added
> interface lport_bc65cd_QFOU3v on port 1028
> > 2018-02-07T00:12:55.629Z|01768|timeval|WARN|Unreasonably long 1112ms
> poll interval (1016ms user, 4ms system)
> > 2018-02-07T00:12:55.629Z|01769|timeval|WARN|faults: 336 minor, 0 major
> > 2018-02-07T00:12:55.629Z|01770|timeval|WARN|context switches: 0
> voluntary, 13 involuntary
> > 2018-02-07T00:12:55.629Z|01771|coverage|INFO|Event coverage, avg rate
> over last: 5 seconds, last minute, last hour,  hash=b256889c:
> > 2018-02-07T00:12:55.629Z|01772|coverage|INFO|bridge_reconfigure
> 0.0/sec 0.000/sec0.0056/sec   total: 29
> > 2018-02-07T00:12:55.629Z|01773|coverage|INFO|ofproto_flush
>  0.0/sec 0.000/sec0./sec   total: 1
> > 2018-02-07T00:12:55.629Z|01774|coverage|INFO|ofproto_packet_out
> 0.0/sec 0.000/sec0.0111/sec   total: 90
> > 2018-02-07T00:12:55.629Z|01775|coverage|INFO|ofproto_recv_openflow
>  0.2/sec 0.033/sec0.4858/sec   total: 6673
> > 2018-02-07T00:12:55.629Z|01776|coverage|INFO|ofproto_update_port
>  0.0/sec 0.000/sec5.5883/sec   total: 28258
> > 2018-02-07T00:12:55.629Z|01777|coverage|INFO|rev_reconfigure
>  0.0/sec 0.000/sec0.0056/sec   total: 32
> > 2018-02-07T00:12:55.629Z|01778|coverage|INFO|rev_port_toggled
> 0.0/sec 0.000/sec0.0011/sec   total: 6
> > 

Re: [ovs-discuss] [OVN][RFC] ovn-northd simple optimization converting uuid from string

2018-02-03 Thread Daniel Alvarez Sanchez
Thanks Ben,


On Sat, Feb 3, 2018 at 12:16 AM, Ben Pfaff <b...@ovn.org> wrote:

> Nice finding!
>
> I don't think it's necessary to inline this into the header file to get
> the speedup, since its caller along the uuid_from_string() call chain is
> in the same file as hexit_value().
>
> I did this because hexit_value() is called also from scan_int(), which is
called by ovs_scan(). The former was called around > 20K times in my
run. But it's also in util.c so that's fine :)


> I sent out a patch that does something similar:
> https://patchwork.ozlabs.org/patch/868826/
>
> On Fri, Feb 02, 2018 at 07:24:59PM +0100, Daniel Alvarez Sanchez wrote:
> > Hi folks,
> >
> > While running rally in OpenStack we found out that ovn-northd was
> > at 100% CPU most of the time. It doesn't have to be necessarily
> > a problem but I wanted to do a simple profiling by running a rally task
> > which creates a network (Logical Switch) and creates 6 ports on it,
> > repeating the whole operation 1000 times. The ports are networks
> > are also deleted.
> >
> > I used master branch and compiled it with -O1:
> >
> > CFLAGS="-O1 -g" ./configure --prefix=/usr/local/
> > --with-linux=/usr/lib/modules/`ls /usr/lib/modules/ | tail -n 1`/build
> >
> > gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)
> >
> > What I saw is that ~ 15% of the execution time was spent in
> > uuid_from_string() function in util/uuid.c module which calls
> hexits_value()
> > which ends up calling hexit_value(). This last function gets called >1M
> > times.
> >
> > I thought that it was worth to inline hexit_value() and use a lookup
> table
> > instead the switch/case [0] so I did it (find the patch below) and the
> gain
> > is
> > that instead of a 14%, uuid_from_string() now takes a 9% of the total
> > execution time. See [1].
> >
> > [0]
> > https://github.com/openvswitch/ovs/blob/79feb3b0de83932c6cbf761d700513
> 30db4d07f7/lib/util.c#L844
> > [1] https://imgur.com/a/3gzDF
> >
> > Patch:
> > Note that we could make the table smaller to optimize the data cache
> usage
> > but then we would have to accommodate the argument and include extra
> > checks.
> >
> >
> > ---
> >
> > diff --git a/lib/util.c b/lib/util.c
> > index a4d22df0c..a24472690 100644
> > --- a/lib/util.c
> > +++ b/lib/util.c
> > @@ -839,38 +839,6 @@ str_to_double(const char *s, double *d)
> >  }
> >  }
> >
> > -/* Returns the value of 'c' as a hexadecimal digit. */
> > -int
> > -hexit_value(int c)
> > -{
> > -switch (c) {
> > -case '0': case '1': case '2': case '3': case '4':
> > -case '5': case '6': case '7': case '8': case '9':
> > -return c - '0';
> > -
> > -case 'a': case 'A':
> > -return 0xa;
> > -
> > -case 'b': case 'B':
> > -return 0xb;
> > -
> > -case 'c': case 'C':
> > -return 0xc;
> > -
> > -case 'd': case 'D':
> > -return 0xd;
> > -
> > -case 'e': case 'E':
> > -return 0xe;
> > -
> > -case 'f': case 'F':
> > -return 0xf;
> > -
> > -default:
> > -return -1;
> > -}
> > -}
> > -
> >  /* Returns the integer value of the 'n' hexadecimal digits starting at
> > 's', or
> >   * UINTMAX_MAX if one of those "digits" is not really a hex digit.  Sets
> > '*ok'
> >   * to true if the conversion succeeds or to false if a non-hex digit is
> > diff --git a/lib/util.h b/lib/util.h
> > index b6639b8b8..f41e2a030 100644
> > --- a/lib/util.h
> > +++ b/lib/util.h
> > @@ -217,7 +217,28 @@ bool ovs_scan_len(const char *s, int *n, const char
> > *format, ...);
> >
> >  bool str_to_double(const char *, double *);
> >
> > -int hexit_value(int c);
> > +
> > +static const char hextable[] = {
> > +-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
> -1,-1,-1,
> > +-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
> -1,-1,-1,
> > +-1,-1, 0,1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1,-1,10,11,12,13,14,15,-
> 1,
> > +-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
> -1,-1,-1,
> > +-1,-1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
> -1,-1,-1,
> > +-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
> -1,-1,-1,
> > +-1

  1   2   >