[ovs-discuss] [ovn] distributed traffic for routers without snat with traffic going through intermediate router

2021-12-06 Thread Krzysztof Klimonda
Hi,

I'm trying to create OVN(21.06 checkout from 20210802)/OpenStack(ussuri) 
topology as in the attached file, where some traffic from a VM is routed in 
tenant router to another shared router which is then connected to the external 
network without SNAT. The idea here is to allow some VMs access to the intranet 
(via another router with its own external network used just for interconnect 
between OS and the intranet), while still using standard OpenStack external 
connectivity via Floating IPs: I assign IP address for a VM from an internal 
subnet allocated for openstack tenants, and then configure routing and ACLs on 
both routers to route traffic via LAN router (with SNAT disabled) as opposed to 
routing via tenant router and FIP.

The idea here is to centralize ACLs for the intranet in one place, so that we 
can enforce them and prevent users from making any changes - LAN router is in 
another tenant, and users only have access to their own tenant routers that are 
connected to the LAN router by a small interconnect network.

This setup seems to be working to some extent, that is I have connectivity 
working in both directions via intranet network, but the traffic is not 
distributed - instead all traffic is centralized on the gateway chassis node 
that is assigned to the LAN router.

It feels like it should work and it's either a bug in my setup, or omission in 
ovn code, where it can't tell that the traffic could be decentralized.

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovn] recommended upgrade/restart procedure for ovn components

2021-08-18 Thread Krzysztof Klimonda
Hi Numan,

On Wed, Aug 18, 2021, at 17:42, Numan Siddique wrote:
> On Wed, Aug 18, 2021 at 3:55 AM Krzysztof Klimonda
>  wrote:
> >
> > Hi,
> >
> > After reading OVN upgrade documentation[1], my understanding is that the 
> > order of upgrading components is pretty important to ensure controlplane & 
> > dataplane stability. As I understand those are the upgrade steps:
> 
> >
> > 1. upgrade and restart ovn-controller on every chassis
> > 2. upgrade ovn-nb-db and ovn-sb-db and migrate database schema
> > 3. upgrade ovn-northd as the last component
> 
> Even though this is the recommended procedure,  I know that Openstack
> tripleo deployments and Openshift upgrades the ovn-northd and
> ovsdb-servers first
> 
> 
> >
> > First, is schema upgrade is done by ovn-ctl somehow? It didn't upgrade 
> > schema for me and I had to run "ovsdb-client migrate" command on both 
> > northbound and southbound databases.
> 
> I think ovn-ctl should take care of upgrading the database to the
> updated schema.  Before restarting the ovsdb-servers, the ovn packages
> were upgraded to the desired schema files right ?
> If so, I think ovn-ctl should upgrade the database.

Yeah, those are kolla containers and after restart we use new image with new 
ovn packages. This is how kolla starts northbound db: 
"/usr/share/ovn/scripts/ovn-ctl run_nb_ovsdb --db-nb-addr=172.16.0.213 
--db-nb-cluster-local-addr=172.16.0.213  --db-nb-sock=/run/ovn/ovnnb_db.sock 
--db-nb-pid=/run/ovn/ovnnb_db.pid 
--db-nb-file=/var/lib/openvswitch/ovn-nb/ovnnb.db 
--ovn-nb-logfile=/var/log/kolla/openvswitch/ovn-nb-db.log" - I'll double check 
if I can figure out why schema wasn't upgraded.

> 
> >
> > Second, in large deployments (250+ ovn-controllers) restarting ovn 
> > southbound cluster nodes leads to complete failure of the southbound 
> > database in my environment - once all ovn-controllers (and 
> > neutron-ovn-metadata-agents) start reconnecting to the cluster, the load 
> > generated by them makes cluster lose quorum, or even corrupt database on 
> > some nodes.
> 
> If there are a lot of connections to ovsdb-servers, it would
> definitely slow down.   Maybe you can restart ovn-controllers in
> phased manners ?  Or pause all ovn-controllers and then unpause them
> in a few groups so that ovsdb-servers are not overloaded.
> I think in one of our production scale deployments we did something similar.

By pause do you mean "debug/pause"? Thanks, I'll check it out.

> 
> 
> > I'm running OVN 21.06 with ovsdb-server 2.14.0 - should I be upgrading to 
> > 2.15.x? I've also seen the new relay-based architecture introduced in 
> > 2.16.0 release but this seems be rather recent development and I'm worried 
> > about stability (I've seen some report about crashes and high memory usage).
> >
> > When running scale tests for ovn with kubernetes with hundreds of nodes, 
> > how are cluster upgrades handled?
> 
> As I mentioned above, I think in the case of openshift,  the master
> nodes are upgraded first and then the worker nodes are upgraded.
> I think during the master node upgrades, the worker nodes are paused.
> My kubernetes/openshift knowledge is limited though.

Thanks, any idea on upgrading ovsdb-server to 2.15.1 release? I see that there 
is a new database format - would that give any performance boost to northbound 
and southbound clusters? Or should I just start looking into relay-based 
southbound deployment to scale my cluster to 200+ nodes?

Thanks
Krzysztof

> 
> Thanks
> Numan
> 
> >
> > Regards,
> > Krzysztof
> >
> > [1] https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html
> >
> > --
> >   Krzysztof Klimonda
> >   kklimo...@syntaxhighlighted.com
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [ovn] recommended upgrade/restart procedure for ovn components

2021-08-18 Thread Krzysztof Klimonda
Hi,

After reading OVN upgrade documentation[1], my understanding is that the order 
of upgrading components is pretty important to ensure controlplane & dataplane 
stability. As I understand those are the upgrade steps:

1. upgrade and restart ovn-controller on every chassis
2. upgrade ovn-nb-db and ovn-sb-db and migrate database schema
3. upgrade ovn-northd as the last component

First, is schema upgrade is done by ovn-ctl somehow? It didn't upgrade schema 
for me and I had to run "ovsdb-client migrate" command on both northbound and 
southbound databases.

Second, in large deployments (250+ ovn-controllers) restarting ovn southbound 
cluster nodes leads to complete failure of the southbound database in my 
environment - once all ovn-controllers (and neutron-ovn-metadata-agents) start 
reconnecting to the cluster, the load generated by them makes cluster lose 
quorum, or even corrupt database on some nodes.

I'm running OVN 21.06 with ovsdb-server 2.14.0 - should I be upgrading to 
2.15.x? I've also seen the new relay-based architecture introduced in 2.16.0 
release but this seems be rather recent development and I'm worried about 
stability (I've seen some report about crashes and high memory usage).

When running scale tests for ovn with kubernetes with hundreds of nodes, how 
are cluster upgrades handled?

Regards,
Krzysztof

[1] https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [ovn] Constant "Claiming virtual lport [uuid] for this chassis with the virtual parent [uuid]

2021-08-17 Thread Krzysztof Klimonda
Hi,

In my deployment based on OVN 21.06, for virtual ports that are claimed on a 
given chassis, the following log is appended ever 5 seconds to 
ovn-controller.log:

-
2021-08-17T07:27:34.214Z|05405|pinctrl|INFO|Claiming virtual lport 
e11296f8-eb7f-4eef-8f2a-a40360ea061d for this chassis with the virtual parent 
96895f2e-c623-4928-83c1-a4aa128a874b
2021-08-17T07:27:39.214Z|05532|pinctrl|INFO|Claiming virtual lport 
e11296f8-eb7f-4eef-8f2a-a40360ea061d for this chassis with the virtual parent 
96895f2e-c623-4928-83c1-a4aa128a874b
-

The time interval matches GARP packets sent from the amphora VM so my 
understanding is that this message is logged for every GARP packet received by 
local ovn-controller, but is there a need to append this log entry at INFO 
level when the port is already claimed? Perhaps this should be only logged when 
lport ownership changes for the first time (on vrrp failover for example).

Regards,
Krzysztof

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN /OVS openvswitch: ovs-system: deferred action limit reached, drop recirc action

2021-08-04 Thread Krzysztof Klimonda
Hi Ammad,

(Re-adding ovs-discuss@openvswitch.org to CC to keep track of the discussion)

Thanks for testing it with SNAT enabled/disabled and verifying that it seems to 
be related.

As for the impact of this bug I have to say I'm unsure. I have theorized that 
this could the cause for (or at least connected to) BFD sessions being dropped 
between gateway chassises, but I couldn't really validate it.

My linked patch is pretty old and no longer applies cleanly on master, but I'd 
be interested in getting some feedback from developers on whether I'm even 
fixing the right thing.

Regards,
Krzysztof

On Wed, Aug 4, 2021, at 09:02, Ammad Syed wrote:
> I am able to reproduce this issue with snat enabled network and 
> accessing the snat IP from external network can reproduce this issue . 
> If I keep snat disable, then I didn't see these logs in syslog.
> 
> Ammad
> 
> On Tue, Aug 3, 2021 at 6:39 PM Ammad Syed  wrote:
> > Thanks. Let me try to reproduce it with this way.
> > 
> > Can you please advise if this will cause any trouble if we have this bug in 
> > production? Any workaround to avoid this issue?
> > 
> > Ammad
> > 
> > On Tue, Aug 3, 2021 at 5:56 PM Krzysztof Klimonda 
> >  wrote:
> >> Hi,
> >> 
> >> To reproduce it (on openstack. although the issue does not seem to be 
> >> openstack-specific) I've created a network with SNAT enabled (which is 
> >> default) and set its external gateway to my external network. Next, I've 
> >> tried establishing TCP session from the outside to IP address assigned to 
> >> the router and checked dmesg on the chassis that the port is assigned to 
> >> for "ovs-system: deferred action limit reached, drop recirc action" 
> >> messages.
> >> 
> >> Best Regards,
> >> Krzysztof
> >> 
> >> On Tue, Aug 3, 2021, at 09:05, Ammad Syed wrote:
> >> > Hi Krzysztof,
> >> > 
> >> > Yes I might be stuck in this issue. How can I check if there is any 
> >> > loop in lflow-list ?
> >> > 
> >> > Ammad
> >> > 
> >> > On Tue, Aug 3, 2021 at 2:14 AM Krzysztof Klimonda 
> >> >  wrote:
> >> > > Hi,
> >> > > 
> >> > > Not sure if it's related, but I've seen this bug in ovn 20.12 release, 
> >> > > where routing loop was related to flows created to handle SNAT, I've 
> >> > > sent an RFC patch few months back but didn't really have time to 
> >> > > follow up on it since then to get some feedback: 
> >> > > https://www.mail-archive.com/ovs-dev@openvswitch.org/msg53195.html
> >> > > I was planning on re-testing it with 21.06 release and follow up on 
> >> > > the patch.
> >> > > 
> >> > > On Mon, Aug 2, 2021, at 21:31, Han Zhou wrote:
> >> > > > 
> >> > > > 
> >> > > > On Mon, Aug 2, 2021 at 5:07 AM Ammad Syed  
> >> > > > wrote:
> >> > > > >
> >> > > > > Hello, 
> >> > > > >
> >> > > > > I am using openstack with OVN 20.12 and OVS 2.15.0 on ubuntu 
> >> > > > > 20.04. I am using geneve tenant network and vlan provider network. 
> >> > > > >
> >> > > > > I am continuously getting below messages in my dmesg logs 
> >> > > > > continuously on compute node 1 only the other two compute nodes 
> >> > > > > have no such messages. 
> >> > > > >
> >> > > > > [275612.826698] openvswitch: ovs-system: deferred action limit 
> >> > > > > reached, drop recirc action
> >> > > > > [275683.750343] openvswitch: ovs-system: deferred action limit 
> >> > > > > reached, drop recirc action
> >> > > > > [276102.200772] openvswitch: ovs-system: deferred action limit 
> >> > > > > reached, drop recirc action
> >> > > > > [276161.575494] openvswitch: ovs-system: deferred action limit 
> >> > > > > reached, drop recirc action
> >> > > > > [276210.262524] openvswitch: ovs-system: deferred action limit 
> >> > > > > reached, drop recirc action
> >> > > > >
> >> > > > > I have tried by reinstalling (OS everything) compute node 1 but 
> >> > > > > still having same errors.
> >> > > > >
> >> > > > > Need your advise.
> >&

Re: [ovs-discuss] OVN /OVS openvswitch: ovs-system: deferred action limit reached, drop recirc action

2021-08-02 Thread Krzysztof Klimonda
Hi,

Not sure if it's related, but I've seen this bug in ovn 20.12 release, where 
routing loop was related to flows created to handle SNAT, I've sent an RFC 
patch few months back but didn't really have time to follow up on it since then 
to get some feedback: 
https://www.mail-archive.com/ovs-dev@openvswitch.org/msg53195.html
I was planning on re-testing it with 21.06 release and follow up on the patch.

On Mon, Aug 2, 2021, at 21:31, Han Zhou wrote:
> 
> 
> On Mon, Aug 2, 2021 at 5:07 AM Ammad Syed  wrote:
> >
> > Hello, 
> >
> > I am using openstack with OVN 20.12 and OVS 2.15.0 on ubuntu 20.04. I am 
> > using geneve tenant network and vlan provider network. 
> >
> > I am continuously getting below messages in my dmesg logs continuously on 
> > compute node 1 only the other two compute nodes have no such messages. 
> >
> > [275612.826698] openvswitch: ovs-system: deferred action limit reached, 
> > drop recirc action
> > [275683.750343] openvswitch: ovs-system: deferred action limit reached, 
> > drop recirc action
> > [276102.200772] openvswitch: ovs-system: deferred action limit reached, 
> > drop recirc action
> > [276161.575494] openvswitch: ovs-system: deferred action limit reached, 
> > drop recirc action
> > [276210.262524] openvswitch: ovs-system: deferred action limit reached, 
> > drop recirc action
> >
> > I have tried by reinstalling (OS everything) compute node 1 but still 
> > having same errors.
> >
> > Need your advise.
> >
> > --
> > Regards,
> >
> >
> > Syed Ammad Ali
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 
> Hi Syed,
> 
> Could you check if you have routing loops (i.e. a packet being routed 
> back and forth between logical routers infinitely) in your logical 
> topology?
> 
> Thanks,
> Han
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs] Adding tagged interfaces to ovs-system managed interface in the system configuration.

2021-06-29 Thread Krzysztof Klimonda
Right, but it seems it's only when I create internal interface in OVS for use 
in the host system. If instead I create a system-level bond and use it for both 
vswitchd and vlan interfaces (like I've shown in my example) the resulting 
system interfaces, even though created on a bond that now has "master 
ovs-system", seem to be working fine even if vswitchd is not running.

I understand this is not a "proper" configuration, but I'm trying to understand 
what's wrong about it and how to measure its "wrongness" - frankly I don't even 
fully understand why it works at all, especially when vswitchd is turned off.

Best Regards,
Krzysztof

On Tue, Jun 29, 2021, at 18:03, Ben Pfaff wrote:
> Any use of OVS will bring down the network if vswitch fails.
> 
> On Mon, Jun 28, 2021 at 03:26:00PM +0200, Krzysztof Klimonda wrote:
> > Hi,
> > 
> > Could you elaborate on that? Is there some documentation on this 
> > interaction I could read? Is this a potential performance issue, or 
> > offloading issue? What would be a better way to configure bonding with ovs 
> > that does not bring down network in case of vswitchd failure?
> > 
> > Best Regards,
> > Krzysztof
> > 
> > On Fri, Jun 25, 2021, at 01:49, Ben Pfaff wrote:
> > > Linux bonds and OVS bridges don't necessarily mix well.
> > > 
> > > On Thu, Jun 24, 2021 at 10:25:58AM +0200, Krzysztof Klimonda wrote:
> > > > Hi,
> > > > 
> > > > I had a configuration like that in mind:
> > > > 
> > > > # ip link add bond0 type bond
> > > > # ip link set em1 master bond0
> > > > # ip link set em2 master bond0
> > > > # ip link add link bond0 name mgmt type vlan id 100
> > > > # ip link add link bond0 name ovs_tunnel type vlan id 200
> > > > 
> > > > # ovs-vsctl add-br br0
> > > > # ovs-vsctl add-port bond0
> > > > 
> > > > # ip link |grep bond0
> > > > 6: bond0:  mtu 9000 qdisc 
> > > > noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
> > > > 7: mgmt@bond0:  mtu 9000 qdisc noqueue 
> > > > state UP mode DEFAULT group default qlen 1000
> > > > #
> > > > 
> > > > On Wed, Jun 23, 2021, at 18:51, Ben Pfaff wrote:
> > > > > On Tue, Jun 22, 2021 at 09:58:49PM +0200, Krzysztof Klimonda wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I have tried the following configuration for the system-level 
> > > > > > network in the lab:
> > > > > > 
> > > > > >
> > > > > >   +--vlan10@bond0
> > > > > > ens1--+  |  
> > > > > >---bond0 (ovs-system)--+--vlan20@bond0
> > > > > > ens2--+  |  
> > > > > >   +--vlan30@bond0
> > > > > > 
> > > > > > The idea is to plug bond0 into openvswitch so that I can add 
> > > > > > specific VLANs to my virtual topology, but push some of those VLANs 
> > > > > > into system without doing any specific configuration on the ovs 
> > > > > > side (for example, to have access to the management interface even 
> > > > > > if vswitchd is down).
> > > > > > 
> > > > > > This seems to be working fine in my lab (there is access to the 
> > > > > > management interface - vlan10 - even when bond0 has ovs-system as 
> > > > > > master), but are there any drawbacks to such a configuration?
> > > > > 
> > > > > It's hard to guess how you're implementing this.  If you're doing it
> > > > > with something like this:
> > > > > 
> > > > > ovs-vsctl add-port br0 ens1
> > > > > ovs-vsctl add-port br0 ens2
> > > > > ovs-vsctl add-bond br0 bond0 ens1 ens2
> > > > > ovs-vsctl add-port br0 vlan1 tag=1 -- set interface vlan1 
> > > > > type=internal
> > > > > ovs-vsctl add-port br0 vlan2 tag=2 -- set interface vlan2 
> > > > > type=internal
> > > > > ovs-vsctl add-port br0 vlan3 tag=3 -- set interface vlan3 
> > > > > type=internal
> > > > > 
> > > > > then it ought to work fine.
> > > > > 
> > > > 
> > > > 
> > > > -- 
> > > >   Krzysztof Klimonda
> > > >   kklimo...@syntaxhighlighted.com
> > > 
> > 
> > 
> > -- 
> >   Krzysztof Klimonda
> >   kklimo...@syntaxhighlighted.com
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs] Adding tagged interfaces to ovs-system managed interface in the system configuration.

2021-06-28 Thread Krzysztof Klimonda
Hi,

Could you elaborate on that? Is there some documentation on this interaction I 
could read? Is this a potential performance issue, or offloading issue? What 
would be a better way to configure bonding with ovs that does not bring down 
network in case of vswitchd failure?

Best Regards,
Krzysztof

On Fri, Jun 25, 2021, at 01:49, Ben Pfaff wrote:
> Linux bonds and OVS bridges don't necessarily mix well.
> 
> On Thu, Jun 24, 2021 at 10:25:58AM +0200, Krzysztof Klimonda wrote:
> > Hi,
> > 
> > I had a configuration like that in mind:
> > 
> > # ip link add bond0 type bond
> > # ip link set em1 master bond0
> > # ip link set em2 master bond0
> > # ip link add link bond0 name mgmt type vlan id 100
> > # ip link add link bond0 name ovs_tunnel type vlan id 200
> > 
> > # ovs-vsctl add-br br0
> > # ovs-vsctl add-port bond0
> > 
> > # ip link |grep bond0
> > 6: bond0:  mtu 9000 qdisc noqueue 
> > master ovs-system state UP mode DEFAULT group default qlen 1000
> > 7: mgmt@bond0:  mtu 9000 qdisc noqueue 
> > state UP mode DEFAULT group default qlen 1000
> > #
> > 
> > On Wed, Jun 23, 2021, at 18:51, Ben Pfaff wrote:
> > > On Tue, Jun 22, 2021 at 09:58:49PM +0200, Krzysztof Klimonda wrote:
> > > > Hi,
> > > > 
> > > > I have tried the following configuration for the system-level network 
> > > > in the lab:
> > > > 
> > > >
> > > >   +--vlan10@bond0
> > > > ens1--+  |  
> > > >---bond0 (ovs-system)--+--vlan20@bond0
> > > > ens2--+  |  
> > > >   +--vlan30@bond0
> > > > 
> > > > The idea is to plug bond0 into openvswitch so that I can add specific 
> > > > VLANs to my virtual topology, but push some of those VLANs into system 
> > > > without doing any specific configuration on the ovs side (for example, 
> > > > to have access to the management interface even if vswitchd is down).
> > > > 
> > > > This seems to be working fine in my lab (there is access to the 
> > > > management interface - vlan10 - even when bond0 has ovs-system as 
> > > > master), but are there any drawbacks to such a configuration?
> > > 
> > > It's hard to guess how you're implementing this.  If you're doing it
> > > with something like this:
> > > 
> > > ovs-vsctl add-port br0 ens1
> > > ovs-vsctl add-port br0 ens2
> > > ovs-vsctl add-bond br0 bond0 ens1 ens2
> > > ovs-vsctl add-port br0 vlan1 tag=1 -- set interface vlan1 
> > > type=internal
> > > ovs-vsctl add-port br0 vlan2 tag=2 -- set interface vlan2 
> > > type=internal
> > > ovs-vsctl add-port br0 vlan3 tag=3 -- set interface vlan3 
> > > type=internal
> > > 
> > > then it ought to work fine.
> > > 
> > 
> > 
> > -- 
> >   Krzysztof Klimonda
> >   kklimo...@syntaxhighlighted.com
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs] Adding tagged interfaces to ovs-system managed interface in the system configuration.

2021-06-24 Thread Krzysztof Klimonda
Hi,

I had a configuration like that in mind:

# ip link add bond0 type bond
# ip link set em1 master bond0
# ip link set em2 master bond0
# ip link add link bond0 name mgmt type vlan id 100
# ip link add link bond0 name ovs_tunnel type vlan id 200

# ovs-vsctl add-br br0
# ovs-vsctl add-port bond0

# ip link |grep bond0
6: bond0:  mtu 9000 qdisc noqueue 
master ovs-system state UP mode DEFAULT group default qlen 1000
7: mgmt@bond0:  mtu 9000 qdisc noqueue state 
UP mode DEFAULT group default qlen 1000
#

On Wed, Jun 23, 2021, at 18:51, Ben Pfaff wrote:
> On Tue, Jun 22, 2021 at 09:58:49PM +0200, Krzysztof Klimonda wrote:
> > Hi,
> > 
> > I have tried the following configuration for the system-level network in 
> > the lab:
> > 
> >
> >   +--vlan10@bond0
> > ens1--+  |  
> >---bond0 (ovs-system)--+--vlan20@bond0
> > ens2--+  |  
> >   +--vlan30@bond0
> > 
> > The idea is to plug bond0 into openvswitch so that I can add specific VLANs 
> > to my virtual topology, but push some of those VLANs into system without 
> > doing any specific configuration on the ovs side (for example, to have 
> > access to the management interface even if vswitchd is down).
> > 
> > This seems to be working fine in my lab (there is access to the management 
> > interface - vlan10 - even when bond0 has ovs-system as master), but are 
> > there any drawbacks to such a configuration?
> 
> It's hard to guess how you're implementing this.  If you're doing it
> with something like this:
> 
> ovs-vsctl add-port br0 ens1
> ovs-vsctl add-port br0 ens2
> ovs-vsctl add-bond br0 bond0 ens1 ens2
> ovs-vsctl add-port br0 vlan1 tag=1 -- set interface vlan1 type=internal
> ovs-vsctl add-port br0 vlan2 tag=2 -- set interface vlan2 type=internal
> ovs-vsctl add-port br0 vlan3 tag=3 -- set interface vlan3 type=internal
> 
> then it ought to work fine.
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [ovs] Adding tagged interfaces to ovs-system managed interface in the system configuration.

2021-06-22 Thread Krzysztof Klimonda
Hi,

I have tried the following configuration for the system-level network in the 
lab:

   
  +--vlan10@bond0
ens1--+  |  
   ---bond0 (ovs-system)--+--vlan20@bond0
ens2--+  |  
  +--vlan30@bond0

The idea is to plug bond0 into openvswitch so that I can add specific VLANs to 
my virtual topology, but push some of those VLANs into system without doing any 
specific configuration on the ovs side (for example, to have access to the 
management interface even if vswitchd is down).

This seems to be working fine in my lab (there is access to the management 
interface - vlan10 - even when bond0 has ovs-system as master), but are there 
any drawbacks to such a configuration?

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] BGP EVPN support

2021-03-16 Thread Krzysztof Klimonda



On Tue, Mar 16, 2021, at 19:15, Mark Gray wrote:
> On 16/03/2021 15:41, Krzysztof Klimonda wrote:
> > Yes, that seems to be prerequisite (or one of prerequisites) for keeping 
> > current DPDK / offload capabilities, as far as I understand. By Proxy 
> > ARP/NDP I think you mean responding to ARP and NDP on behalf of the system 
> > where FRR is running?  
> > 
> > As for whether to go ovn-kubernetes way and try to implement it with 
> > existing primitives, or add BGP support directly into OVN, I feel like this 
> > should be a core feature of OVN itself and not something that could be 
> > built on top of it by a careful placement of logical switches, routers and 
> > ports. This would also help with management (you would configure new BGP 
> > connection by modifying northbound DB) and simplify troubleshooting in case 
> > something is not working as expected.
> > 
> 
> There would be quite a lot of effort to implement BGP support directly
> into OVN as per all the relevant BGP RPCs .. and the effort to maintain.
> Another option might be to make FRR Openflow-aware and enabling it to
> program Openflow flows directly into an OVN bridge much like it does
> into the kernel today. FRR does provide some flexibility to extend like
> that through the use of something like FPM
> (http://docs.frrouting.org/projects/dev-guide/en/latest/fpm.html)

Indeed, when I wrote "adding BGP support directly to OVN" I didn't really mean 
implementing BGP protocol directly in OVN, but rather implementing integration 
with FRR directly in OVN, and not by reusing existing resources. Making 
ovn-controller into fully fledged BGP peer seems.. like a nice expansion of the 
initial idea, assuming that the protocol could be offloaded to some library, 
but it's probably not a hard requirement for the initial implementation, as 
long as OVS can be programmed to deliver BGP traffic to FRR.

When you write that FRR would program flows on OVS bridge, do you have 
something specific in mind? I thought the discussion so far was mostly one way 
BGP announcement with FRR "simply" announcing specific prefixes from the 
chassis nodes. Do you have something more in mind, like programming routes 
received from BGP router into OVN?

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] BGP EVPN support

2021-03-16 Thread Krzysztof Klimonda
Hi Daniel,

On Tue, Mar 16, 2021, at 15:19, Daniel Alvarez Sanchez wrote:
> 
> 
> On Tue, Mar 16, 2021 at 2:45 PM Luis Tomas Bolivar  
> wrote:
> > Of course we are fully open to redesign it if there is a better approach! 
> > And that was indeed the intention when linking to the current efforts, 
> > figure out if that was a "valid" way of doing it, and how it can be 
> > improved/redesigned. The main idea behind the current design was not to 
> > need modifications to core OVN as well as to minimize the complexity, i.e., 
> > not having to implement another kind of controller for managing the extra 
> > OF flows.
> > 
> > Regarding the metadata/localport, I have a couple of questions, mainly due 
> > to me not knowing enough about ovn/localport:
> > 1) Isn't the metadata managed through a namespace? And the end of the day 
> > that is also visible from the hypervisor, as well as the OVS bridges
> > 2) Another difference is that we are using BGP ECMP and therefore not 
> > associating any nic/bond to br-ex, and that is why we require some 
> > rules/routes to redirect the traffic to br-ex.
> > 
> > Thanks for your input! Really appreciated!
> > 
> > Cheers,
> > Luis
> > 
> > On Tue, Mar 16, 2021 at 2:22 PM Krzysztof Klimonda 
> >  wrote:
> >> __
> >> Would it make more sense to reverse this part of the design? I was 
> >> thinking of having each chassis its own IPv4/IPv6 address used for 
> >> next-hop in announcements and OF flows installed to direct BGP control 
> >> packets over to the host system, in a similar way how localport is used 
> >> today for neutron's metadata service (although I'll admit that I haven't 
> >> looked into how this integrates with dpdk and offload).
> 
> Hi Krzysztof, not sure I follow your suggestion but let me see if I do. 
> With this PoC, the kernel will do:
> 
> 1) Routing to/from physical interface to OVN
> 2) Proxy ARP
> 3) Proxy NDP
> 
> Also FRR will advertise directly connected routes based on the IPs 
> configured on dummy interfaces.
> All this comes with the benefit that no changes are required in the CMS 
> or OVN itself.
> 
> If I understand your proposal well, you would like to do 1), 2) and 3) 
> in OpenFlow so an agent running on all compute nodes is going to be 
> responsible for this? Or you propose adding extra OVN resources in a 
> similar way to what ovn-kubernetes does today [0] and in this case:

Yes, that seems to be prerequisite (or one of prerequisites) for keeping 
current DPDK / offload capabilities, as far as I understand. By Proxy ARP/NDP I 
think you mean responding to ARP and NDP on behalf of the system where FRR is 
running?  

As for whether to go ovn-kubernetes way and try to implement it with existing 
primitives, or add BGP support directly into OVN, I feel like this should be a 
core feature of OVN itself and not something that could be built on top of it 
by a careful placement of logical switches, routers and ports. This would also 
help with management (you would configure new BGP connection by modifying 
northbound DB) and simplify troubleshooting in case something is not working as 
expected.

> 
> - Create an OVN Gateway router and connect it to the provider Logical 
> Switch
> - Advertise host routes through the Gateway Router IP address for each 
> node. This would consume one IP address per provider network per node

That seems excessive - why would we need one IP address per provider network 
per node? Shouldn't single IP per node be enough even if we go with your 
proposal of reusing existing OVN resources? If we do that, separate "service 
subnets" could be used per "external network" that provide connectivity between 
BGP router and OVN chassis (so that next hop can be configured correctly). 
Burning IP addresses from all provider networks seems excessive, given that 
some of them are going to be public and those are getting pretty expensive at 
the moment.

> - Some external entity to configure ECMP routing to the ToRs

(we're still talking about implementing it via neutron CMS, right?)
This is probably out of scope for the OVN or neutron anyway? I'd assume ToRs 
are configured before the compute node is deployed.

> - Who creates/manages the infra resources? Onboarding new hypervisors 
> requires IPAM and more

Right, that seems to be another reason to do that "natively" in OVN.

> - OpenStack provides flexibility to its users to customize their own 
> networking (more than ovn-kubernetes I believe). Mixing user created 
> network resources with infra resources in the same OVN cluster is non 
> trivial (eg. maintenance tasks, migration to OVN, ...)

I'm not sure I follow, but if you

[ovs-discuss] [OVN] random BFD timeouts between chassis

2021-03-16 Thread Krzysztof Klimonda
Hi,

I'm trying to track down some issue resulting in BFD session timeouts in our 
deployment.

What I'm seeing is that (seemingly) randomly one chassis stops sending BFD 
packets to some of its neighbors (seemingly one at a time, and it seems one 
chassis is more prone to that behavior currently). After timeout is reached, 
neighbor signals that the session is down, and they re-establish it promptly. 
I've captured BFD packets on both chassis and it seems that one chassis stops 
sending its BFD packets, or at least they are not showing up on the wire. At 
the same time I can see incoming BFD packets from the neighbor so seemingly 
it's not an underlying networking issue that is causing it.

There is nothing BFD related in the logs until session is torn down by the 
neighbor, the only correlated logs I can see right now is constant messages 
like that in syslog:

```
ovs-system: deferred action limit reached, drop recirc action
```

Those seem to be caused by a constant barrage of ARP requests (500-600/s) 
coming from the external network router for IP addresses that are not currently 
in use. That seems to be putting some extra load on ovs-vswitchd process, but 
seemingly nowhere enough to stop it from processing other packets (ovs-vswitchd 
logs don't report increased CPU usage).

openvswitch version: 2.11.0
ovn version: 20.09.90 (a build from 20.09 branch from 2020.12.07)

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] BGP EVPN support

2021-03-16 Thread Krzysztof Klimonda

On Tue, Mar 16, 2021, at 14:45, Luis Tomas Bolivar wrote:
> Of course we are fully open to redesign it if there is a better approach! And 
> that was indeed the intention when linking to the current efforts, figure out 
> if that was a "valid" way of doing it, and how it can be improved/redesigned. 
> The main idea behind the current design was not to need modifications to core 
> OVN as well as to minimize the complexity, i.e., not having to implement 
> another kind of controller for managing the extra OF flows.
> 
> Regarding the metadata/localport, I have a couple of questions, mainly due to 
> me not knowing enough about ovn/localport:
> 1) Isn't the metadata managed through a namespace? And the end of the day 
> that is also visible from the hypervisor, as well as the OVS bridges

Indeed, that's true - you can reach tenant's network from ovnmeta- namespace 
(where metadata proxy lives), however from what I remember while testing you 
can only establish connection to VMs running on the same hypervisor. Granted, 
this is less about "hardening" per se - any potential takeover of the 
hypervisor is probably giving the attacker enough tools to own entire overlay 
network anyway. Perhaps it's just giving me a bad feeling, where what should be 
an isolated public facing network can be reached from hypervisor without going 
through expected network path.

> 2) Another difference is that we are using BGP ECMP and therefore not 
> associating any nic/bond to br-ex, and that is why we require some 
> rules/routes to redirect the traffic to br-ex.

That's an interesting problem  - I wonder if that can even be done in OVS today 
(for example with multipath action) and how would ovs handle incoming traffic 
(what flows are needed to handle that properly). I guess someone with OVS 
internals knowledge would have to chime in on this one.

> Thanks for your input! Really appreciated!
> 
> Cheers,
> Luis
> 
> On Tue, Mar 16, 2021 at 2:22 PM Krzysztof Klimonda 
>  wrote:
>> __
>> Would it make more sense to reverse this part of the design? I was thinking 
>> of having each chassis its own IPv4/IPv6 address used for next-hop in 
>> announcements and OF flows installed to direct BGP control packets over to 
>> the host system, in a similar way how localport is used today for neutron's 
>> metadata service (although I'll admit that I haven't looked into how this 
>> integrates with dpdk and offload).
>> 
>> This way we can also simplify host's networking configuration as extra 
>> routing rules and arp entries are no longer needed (I think it would be 
>> preferable, from security perspective, for hypervisor to not have a direct 
>> access to overlay networks which seems to be the case when you use rules 
>> like that).
>> 
>> --
>>   Krzysztof Klimonda
>>   kklimo...@syntaxhighlighted.com
>> 
>> 
>> 
>> On Tue, Mar 16, 2021, at 13:56, Luis Tomas Bolivar wrote:
>>> Hi Krzysztof,
>>> 
>>> On Tue, Mar 16, 2021 at 12:54 PM Krzysztof Klimonda 
>>>  wrote:
>>>> __
>>>> Hi Luis,
>>>> 
>>>> I haven't yet had time to give it a try in our lab, but from reading your 
>>>> blog posts I have a quick question. How does it work when either DPDK or 
>>>> NIC offload is used for OVN traffic? It seems you are (de-)encapsulating 
>>>> traffic on chassis nodes by routing them through kernel - is this current 
>>>> design or just an artifact of PoC code?
>>> 
>>> You are correct, that is a limitation as we are using kernel routing for 
>>> N/S traffic, so DPDK/NIC offloading could not be used. That said, the E/W 
>>> traffic still uses the OVN overlay and Geneve tunnels.
>>> 
>>> 
>>>> 
>>>> 
>>>> --
>>>>   Krzysztof Klimonda
>>>>   kklimo...@syntaxhighlighted.com
>>>> 
>>>> 
>>>> 
>>>> On Mon, Mar 15, 2021, at 11:29, Luis Tomas Bolivar wrote:
>>>>> Hi Sergey, all,
>>>>> 
>>>>> In fact we are working on a solution based on FRR where a (python) agent 
>>>>> reads from OVN SB DB (port binding events) and triggers FRR so that the 
>>>>> needed routes gets advertised. It leverages kernel networking to redirect 
>>>>> the traffic to the OVN overlay, and therefore does not require any 
>>>>> modifications to ovn itself (at least for now). The PoC code can be found 
>>>>> here: https://github.com/luis5tb/bgp-agent
>>>>> 
>>>>> And there is a series of blog posts related to how to use it on OpenStack

Re: [ovs-discuss] BGP EVPN support

2021-03-16 Thread Krzysztof Klimonda
Would it make more sense to reverse this part of the design? I was thinking of 
having each chassis its own IPv4/IPv6 address used for next-hop in 
announcements and OF flows installed to direct BGP control packets over to the 
host system, in a similar way how localport is used today for neutron's 
metadata service (although I'll admit that I haven't looked into how this 
integrates with dpdk and offload).
This way we can also simplify host's networking configuration as extra routing 
rules and arp entries are no longer needed (I think it would be preferable, 
from security perspective, for hypervisor to not have a direct access to 
overlay networks which seems to be the case when you use rules like that).

--
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com



On Tue, Mar 16, 2021, at 13:56, Luis Tomas Bolivar wrote:
> Hi Krzysztof,
> 
> On Tue, Mar 16, 2021 at 12:54 PM Krzysztof Klimonda 
>  wrote:
>> __
>> Hi Luis,
>> 
>> I haven't yet had time to give it a try in our lab, but from reading your 
>> blog posts I have a quick question. How does it work when either DPDK or NIC 
>> offload is used for OVN traffic? It seems you are (de-)encapsulating traffic 
>> on chassis nodes by routing them through kernel - is this current design or 
>> just an artifact of PoC code?
> 
> You are correct, that is a limitation as we are using kernel routing for N/S 
> traffic, so DPDK/NIC offloading could not be used. That said, the E/W traffic 
> still uses the OVN overlay and Geneve tunnels.
> 
> 
>> 
>> 
>> --
>>   Krzysztof Klimonda
>>   kklimo...@syntaxhighlighted.com
>> 
>> 
>> 
>> On Mon, Mar 15, 2021, at 11:29, Luis Tomas Bolivar wrote:
>>> Hi Sergey, all,
>>> 
>>> In fact we are working on a solution based on FRR where a (python) agent 
>>> reads from OVN SB DB (port binding events) and triggers FRR so that the 
>>> needed routes gets advertised. It leverages kernel networking to redirect 
>>> the traffic to the OVN overlay, and therefore does not require any 
>>> modifications to ovn itself (at least for now). The PoC code can be found 
>>> here: https://github.com/luis5tb/bgp-agent
>>> 
>>> And there is a series of blog posts related to how to use it on OpenStack 
>>> and how it works:
>>> - OVN-BGP agent introduction: 
>>> https://ltomasbo.wordpress.com/2021/02/04/openstack-networking-with-bgp/
>>> - How to set ip up on DevStack Environment: 
>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/
>>> - In-depth traffic flow inspection: 
>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-in-depth-traffic-flow-inspection/
>>> 
>>> We are thinking that possible next steps if community is interested could 
>>> be related to adding multitenancy support (e.g., through EVPN), as well as 
>>> defining what could be the best API to decide what to expose through BGP. 
>>> It would be great to get some feedback on it!
>>> 
>>> Cheers,
>>> Luis
>>> 
>>> On Fri, Mar 12, 2021 at 8:09 PM Dan Sneddon  wrote:
>>>> 
>>>> 
>>>> On 3/10/21 2:09 PM, Sergey Chekanov wrote:
>>>> > I am looking to Gobgp (BGP implementation in Go) + go-openvswitch for 
>>>> > communicate with OVN Northbound Database right now, but not sure yet.
>>>> > FRR I think will be too heavy for it...
>>>> > 
>>>> > On 10.03.2021 05:05, Raymond Burkholder wrote:
>>>> >> You could look at it from a Free Range Routing perspective.  I've used 
>>>> >> it in combination with OVS for layer 2 and layer 3 handling.
>>>> >>
>>>> >> On 3/8/21 3:40 AM, Sergey Chekanov wrote:
>>>> >>> Hello!
>>>> >>>
>>>> >>> Is there are any plans for support BGP EVPN for extending virtual 
>>>> >>> networks to ToR hardware switches?
>>>> >>> Or why it is bad idea?
>>>> >>>
>>>> >>> ___
>>>> >>> discuss mailing list
>>>> >>> disc...@openvswitch.org
>>>> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>> >>
>>>> > 
>>>> > ___
>>>> > discuss mailing list
>>>> > disc...@openvswitch.org
>>>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>> > 
>>>> 
>>>>

Re: [ovs-discuss] BGP EVPN support

2021-03-16 Thread Krzysztof Klimonda
Hi Luis,

I haven't yet had time to give it a try in our lab, but from reading your blog 
posts I have a quick question. How does it work when either DPDK or NIC offload 
is used for OVN traffic? It seems you are (de-)encapsulating traffic on chassis 
nodes by routing them through kernel - is this current design or just an 
artifact of PoC code?

--
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com



On Mon, Mar 15, 2021, at 11:29, Luis Tomas Bolivar wrote:
> Hi Sergey, all,
> 
> In fact we are working on a solution based on FRR where a (python) agent 
> reads from OVN SB DB (port binding events) and triggers FRR so that the 
> needed routes gets advertised. It leverages kernel networking to redirect the 
> traffic to the OVN overlay, and therefore does not require any modifications 
> to ovn itself (at least for now). The PoC code can be found here: 
> https://github.com/luis5tb/bgp-agent
> 
> And there is a series of blog posts related to how to use it on OpenStack and 
> how it works:
> - OVN-BGP agent introduction: 
> https://ltomasbo.wordpress.com/2021/02/04/openstack-networking-with-bgp/
> - How to set ip up on DevStack Environment: 
> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/
> - In-depth traffic flow inspection: 
> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-in-depth-traffic-flow-inspection/
> 
> We are thinking that possible next steps if community is interested could be 
> related to adding multitenancy support (e.g., through EVPN), as well as 
> defining what could be the best API to decide what to expose through BGP. It 
> would be great to get some feedback on it!
> 
> Cheers,
> Luis
> 
> On Fri, Mar 12, 2021 at 8:09 PM Dan Sneddon  wrote:
>> 
>> 
>> On 3/10/21 2:09 PM, Sergey Chekanov wrote:
>> > I am looking to Gobgp (BGP implementation in Go) + go-openvswitch for 
>> > communicate with OVN Northbound Database right now, but not sure yet.
>> > FRR I think will be too heavy for it...
>> > 
>> > On 10.03.2021 05:05, Raymond Burkholder wrote:
>> >> You could look at it from a Free Range Routing perspective.  I've used 
>> >> it in combination with OVS for layer 2 and layer 3 handling.
>> >>
>> >> On 3/8/21 3:40 AM, Sergey Chekanov wrote:
>> >>> Hello!
>> >>>
>> >>> Is there are any plans for support BGP EVPN for extending virtual 
>> >>> networks to ToR hardware switches?
>> >>> Or why it is bad idea?
>> >>>
>> >>> ___
>> >>> discuss mailing list
>> >>> disc...@openvswitch.org
>> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> >>
>> > 
>> > ___
>> > discuss mailing list
>> > disc...@openvswitch.org
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> > 
>> 
>> FRR is delivered as a set of daemons which perform specific functions. 
>> If you only need BGP functionality, you can just run bgpd. The zebra 
>> daemon adds routing exchange between BGP and the kernel. The vtysh 
>> daemon provides a command-line interface to interact with the FRR 
>> processes. There is also a bi-directional forwarding detection (BFD) 
>> daemon that can be run to detect unidirectional forwarding failures. 
>> Other daemons provide other services and protocols. For this reason, I 
>> felt that it was lightweight enough to just run a few daemons in a 
>> container.
>> 
>> A secondary concern for my use case was support on Red Hat Enterprise 
>> Linux, which will be adding FRR to the supported packages shortly.
>> 
>> I'm curious to hear any input that anyone has on FRR compared with GoBGP 
>> and other daemons. Please feel free to respond on-list if it involves 
>> OVS, or off-list if not. Thanks.
>> 
>> -- 
>> Dan Sneddon |  Senior Principal Software Engineer
>> dsned...@redhat.com |  redhat.com/cloud
>> dsneddon:irc|  @dxs:twitter
>> 
>> ___
>> discuss mailing list
>> disc...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 
> 
> -- 
> LUIS TOMÁS BOLÍVAR
> Principal Software Engineer
> Red Hat
> Madrid, Spain
> ltoma...@redhat.com   
>  
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [ovn] hardware offload status - supported NICs

2021-03-11 Thread Krzysztof Klimonda
Hi,

I've been trying to get a better feel of the current status of hardware offload 
for OVN. Is there some authoritative document that specifies what NICs support 
offloading flows created by OVN? I'm mostly interested in Mellanox, as this is 
our preferred manufacturer, but if other NICs have better support, it would be 
good to know that too.

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovn] TCP/UDP traffic dropped between IP addresses defined in allowed address pairs when remote security group is used

2020-12-15 Thread Krzysztof Klimonda
Hi Flavio, Dumitru

See inline for replies


On Tue, Dec 15, 2020, at 12:37, Flavio Fernandes wrote:
> 
> 
>> On Dec 15, 2020, at 6:20 AM, Dumitru Ceara  wrote:
>> 
>> On 12/15/20 12:02 PM, Krzysztof Klimonda wrote:
>>> Hi Dumitru,
>>> 
>>> Thanks for checking it out.
>>> 
>>> On Tue, Dec 15, 2020, at 10:45, Dumitru Ceara wrote:
>>>> Hi Krzysztof,
>>>> 
>>>> Thanks for the DBs and all the details.
>>>> 
>>>> I gave it a try on my local setup, using your DBs.  The behavior below
>>>> is identical on v20.06.2, branch-20.12 and current master.
>>> 
>>> I have to admit I'm not sure why in your tests the behaviour is the same in 
>>> 20.06.2 - unfortunately I no longer have access to the environment with 
>>> that version, so I can't re-verify that on my end and I based it on the 
>>> fact that this SG was created by magnum for kubernetes cluster deployed on 
>>> top of openstack, and the cluster was being tested successfully. The 
>>> earliest I can try recreating this with 20.06.2 release is next year, so 
>>> I'm not sure if this will be of any help.
>>> 
>> 
>> I just rechecked with 20.06.2 and I'm 100% sure that with the DBs you
>> provided and the behavior is exactly the same as with newer branches
>> (including latest master).
>> 
>>> [...]
>>>> 
>>>> For TCP traffic 172.16.0.11 -> 172.16.0.10 we don't hit any of the
>>>> allow-related ACLs.  172.16.0.11 is not set as a port address on port
>>>> 81d23182-37ac-4d3d-815e-4c25d26fe154 so it will not be included in the
>>>> auto-generated address_set pg_ed081ef3_754a_492f_80b2_fb73cd2dceed_ip4.
>>>> 
>>>> On the other hand, for TCP traffic 10.0.0.11 -> 172.16.0.10 we hit the
>>>> 4th allow-related ACL:
>>>> to-lport  1002 (outport == @pg_ed081ef3_754a_492f_80b2_fb73cd2dceed &&
>>>> ip4 && ip4.src == $pg_ed081ef3_754a_492f_80b2_fb73cd2dceed_ip4 && tcp)
>>>> allow-related
>>>> 
>>>> ICMP traffic matches the 5th allow-related ACL which doesn't check
>>>> source IPs:
>>>> to-lport  1002 (outport == @pg_ed081ef3_754a_492f_80b2_fb73cd2dceed &&
>>>> ip4 && ip4.src == 0.0.0.0/0 && icmp4) allow-related
>>>> 
>>>> So, unless I'm missing something, OVN is doing what it's supposed to do
>>>> based on the current configuration.  To avoid adding a new ACL, one
>>>> option would be to add the 172.16.0.10 and 172.16.0.11 IPs to the
>>>> corresponding logical_switch_ports "addresses" column, i.e., something 
>>>> like:
>>> 
>>> The problem with adding those addresses to LSP is that they are outside of 
>>> OVN/neutron control - this is a kubernetes cluster deployed with magnum, 
>>> where CNI plugin (calico) doesn't use any tunneling between k8s nodes, each 
>>> pod has address in 172.16.0.0/16 subnet and uses addresses from 
>>> 172.16.0.0/16 to communicate witch other pods. To accomodate that ports 
>>> have 172.16.0.0/16 added to their allowed addresses.
>>> 
>>> I'm assuming that, if OVN is doing what it's supposed to be doing based on 
>>> the configuration[1], then there is a mismatch between neutron and OVN 
>>> behaviour in regards to SG with allowed address pairs? I guess someone from 
>>> neutron team would have to comment whether it's 
>> 
>> I agree, some input from neutron would be great. (cc-ing Daniel).
> 
> Hi folks. I'm caching up on this thread and have a follow up question/request.
> 
> @Krzysztof: can you tell me a little more about how the ips 172.16.0.10 and 
> 172.16.0.11 are added as
> secondary on the neutron ports? Are you using multiple fixed ip addresses? Do 
> you have 2 neutron subnets configured
> on the same neutron network?
> 
> The reason I ask is because a neutron network and its associated neutron 
> subnet(s) are "aggregated" into a single logical switch
> in OVN. In fact, neutron subnets have no corresponding row in OVN. So I 
> wonder if what you really want are 2 separate neutron
> ports for each vm, instead of a single port with overlapping subnets.

Not sure what do you mean by overlapping subnets - the subnets in question are 
10.0.0.0/8 and 172.16.0.0/16.

The IP addresses are added either as secondary IP addresses set on the 
interface inside VM (in my manual testing), or are result of calico CNI using 
"physical" interface (as opposed to IPIP tunnel) for pod-to-pod com

Re: [ovs-discuss] [ovn] TCP/UDP traffic dropped between IP addresses defined in allowed address pairs when remote security group is used

2020-12-15 Thread Krzysztof Klimonda
Hi Dumitru,

Thanks for checking it out.

On Tue, Dec 15, 2020, at 10:45, Dumitru Ceara wrote:
> Hi Krzysztof,
> 
> Thanks for the DBs and all the details.
> 
> I gave it a try on my local setup, using your DBs.  The behavior below
> is identical on v20.06.2, branch-20.12 and current master.

I have to admit I'm not sure why in your tests the behaviour is the same in 
20.06.2 - unfortunately I no longer have access to the environment with that 
version, so I can't re-verify that on my end and I based it on the fact that 
this SG was created by magnum for kubernetes cluster deployed on top of 
openstack, and the cluster was being tested successfully. The earliest I can 
try recreating this with 20.06.2 release is next year, so I'm not sure if this 
will be of any help.

[...]
> 
> For TCP traffic 172.16.0.11 -> 172.16.0.10 we don't hit any of the
> allow-related ACLs.  172.16.0.11 is not set as a port address on port
> 81d23182-37ac-4d3d-815e-4c25d26fe154 so it will not be included in the
> auto-generated address_set pg_ed081ef3_754a_492f_80b2_fb73cd2dceed_ip4.
> 
> On the other hand, for TCP traffic 10.0.0.11 -> 172.16.0.10 we hit the
> 4th allow-related ACL:
> to-lport  1002 (outport == @pg_ed081ef3_754a_492f_80b2_fb73cd2dceed &&
> ip4 && ip4.src == $pg_ed081ef3_754a_492f_80b2_fb73cd2dceed_ip4 && tcp)
> allow-related
> 
> ICMP traffic matches the 5th allow-related ACL which doesn't check
> source IPs:
> to-lport  1002 (outport == @pg_ed081ef3_754a_492f_80b2_fb73cd2dceed &&
> ip4 && ip4.src == 0.0.0.0/0 && icmp4) allow-related
> 
> So, unless I'm missing something, OVN is doing what it's supposed to do
> based on the current configuration.  To avoid adding a new ACL, one
> option would be to add the 172.16.0.10 and 172.16.0.11 IPs to the
> corresponding logical_switch_ports "addresses" column, i.e., something like:

The problem with adding those addresses to LSP is that they are outside of 
OVN/neutron control - this is a kubernetes cluster deployed with magnum, where 
CNI plugin (calico) doesn't use any tunneling between k8s nodes, each pod has 
address in 172.16.0.0/16 subnet and uses addresses from 172.16.0.0/16 to 
communicate witch other pods. To accomodate that ports have 172.16.0.0/16 added 
to their allowed addresses.

I'm assuming that, if OVN is doing what it's supposed to be doing based on the 
configuration[1], then there is a mismatch between neutron and OVN behaviour in 
regards to SG with allowed address pairs? I guess someone from neutron team 
would have to comment whether it's 

[1] I'm not entirely convinced that's the case given that ICMP traffic is being 
forwarded - I see how it's doing what programmed flows are telling it to do, 
but that doesn't seem to be expected result.

Best Regards,
  Chris

> 
> Thanks,
> Dumitru
> 
> 
> On 12/14/20 4:43 PM, Krzysztof Klimonda wrote:
> > Hi Numan,
> > 
> > https://signal.klimonda.com/ovnnb.db-broken.txt - this is the "initial" 
> > state where TCP is not being established (but ping works)
> > https://signal.klimonda.com/ovnnb.db-working.txt - this is after I create a 
> > separate IP-based rule to allow TCP traffic
> > 
> > In both examples, security group in question is 
> > ed081ef3-754a-492f-80b2-fb73cd2dceed which is mapped to 
> > pg_ed081ef3_754a_492f_80b2_fb73cd2dceed port group.
> > 
> > In the second ovnnb.db (ovnnb.db-working.txt), there is an extra ACL 
> > fb464efc-f63b-494b-b59b-6c2860dcecba added from CLI via:
> > 
> > `openstack security group rule create --ingress --protocol tcp --remote-ip 
> > 172.16.0.0/24 default`
> > 
> > -- Krzysztof Klimonda kklimo...@syntaxhighlighted.com On Mon, Dec 14,
> > 2020, at 14:14, Numan Siddique wrote:
> >> On Mon, Dec 14, 2020 at 4:01 PM Krzysztof Klimonda
> >>  wrote:
> >>> Hi,
> >>>
> >>> After upgrading to OVN 20.12.0 snapshot d8bc0377c I've noticed a problem 
> >>> in communication between VMs that use allowed address pairs and remote 
> >>> group id in security groups. I believe it has worked properly with OVN 
> >>> 20.06.2 release (although I have no way of verifying it right now).
> >>>
> >> Thanks for reporting the issue.
> >>
> >> Is it possible for you to share the OVN NB DB somewhere ?
> >>
> >> It would be easier to reproduce the issue with the DB.
> >>
> >> Thanks
> >> Numan
> >>
> >>> Given the following scenario:
> >>>
> >>> - 2 VMs with IP addresses: vm-a with IP addresses 10.0.0.10 and 
> >>> 172.16.0.10 and vm-b with IP addresses 10.0.0.11 and 172.16.0.11 where 
&

Re: [ovs-discuss] [ovn] Broken ovs localport flow for ovnmeta namespaces created by neutron

2020-12-15 Thread Krzysztof Klimonda
Hi,

Just as a quick update - I've updated our ovn version to 20.12.0 snapshot 
(d8bc0377c) and so far the problem hasn't yet reoccurred after over 24 hours of 
tempest testing.

Best Regards,
-Chris


On Tue, Dec 15, 2020, at 11:13, Daniel Alvarez Sanchez wrote:
> Hey Krzysztof,
> 
> On Fri, Nov 20, 2020 at 1:17 PM Krzysztof Klimonda 
>  wrote:
>> Hi,
>> 
>> Doing some tempest runs on our pre-prod environment (stable/ussuri with ovn 
>> 20.06.2 release) I've noticed that some network connectivity tests were 
>> failing randomly. I've reproduced that by conitnously rescuing and 
>> unrescuing instance - network connectivity from and to VM works in general 
>> (dhcp is fine, access from outside is fine), however VM has no access to its 
>> metadata server (via 169.254.169.254 ip address). Tracing packet from VM to 
>> metadata via:
>> 
>> 8<8<8<
>> ovs-appctl ofproto/trace br-int 
>> in_port=tapa489d406-91,dl_src=fa:16:3e:2c:b0:fd,dl_dst=fa:16:3e:8b:b5:39
>> 8<8<8<
>> 
>> ends with
>> 
>> 8<8<8<
>> 65. reg15=0x1,metadata=0x97e, priority 100, cookie 0x15ec4875
>> output:1187
>>  >> Nonexistent output port
>> 8<8<8<
>> 
>> And I can verify that there is no flow for the actual ovnmeta tap interface 
>> (tap67731b0a-c0):
>> 
>> 8<8<8<
>> # docker exec -it openvswitch_vswitchd ovs-ofctl dump-flows br-int |grep -E 
>> output:'("tap67731b0a-c0"|1187)'
>>  cookie=0x15ec4875, duration=1868.378s, table=65, n_packets=524, 
>> n_bytes=40856, priority=100,reg15=0x1,metadata=0x97e actions=output:1187
>> #
>> 8<8<8<
>> 
>> From ovs-vswitchd.log it seems the interface tap67731b0a-c0 was added with 
>> index 1187, then deleted, and re-added with index 1189 - that's probably due 
>> to the fact that that is the only VM in that network and I'm constantly hard 
>> rebooting it via rescue/unrescue:
>> 
>> 8<8<8<
>> 2020-11-20T11:41:18.347Z|08043|bridge|INFO|bridge br-int: added interface 
>> tap67731b0a-c0 on port 1187
>> 2020-11-20T11:41:30.813Z|08044|bridge|INFO|bridge br-int: deleted interface 
>> tapa489d406-91 on port 1186
>> 2020-11-20T11:41:30.816Z|08045|bridge|WARN|could not open network device 
>> tapa489d406-91 (No such device)
>> 2020-11-20T11:41:31.040Z|08046|bridge|INFO|bridge br-int: deleted interface 
>> tap67731b0a-c0 on port 1187
>> 2020-11-20T11:41:31.044Z|08047|bridge|WARN|could not open network device 
>> tapa489d406-91 (No such device)
>> 2020-11-20T11:41:31.050Z|08048|bridge|WARN|could not open network device 
>> tapa489d406-91 (No such device)
>> 2020-11-20T11:41:31.235Z|08049|connmgr|INFO|br-int<->unix#31: 2069 flow_mods 
>> in the last 43 s (858 adds, 814 deletes, 397 modifications)
>> 2020-11-20T11:41:33.057Z|08050|bridge|INFO|bridge br-int: added interface 
>> tapa489d406-91 on port 1188
>> 2020-11-20T11:41:33.582Z|08051|bridge|INFO|bridge br-int: added interface 
>> tap67731b0a-c0 on port 1189
>> 2020-11-20T11:42:31.235Z|08052|connmgr|INFO|br-int<->unix#31: 168 flow_mods 
>> in the 2 s starting 59 s ago (114 adds, 10 deletes, 44 modifications) 
>> 8<8<8<
>> 
>> Once I restart ovn-controller it recalculates local ovs flows and the 
>> problem is fixed so I'm assuming it's a local problem and not related to NB 
>> and SB databases.
>> 
> 
> I have seen exactly the same which with 20.09, for the same port input and 
> output ofports do not match:
> 
> bash-4.4# ovs-ofctl dump-flows br-int table=0 | grep 745
>  cookie=0x38937d8e, duration=40387.372s, table=0, n_packets=1863, 
> n_bytes=111678, idle_age=1, priority=100,in_port=745 
> actions=load:0x4b->NXM_NX_REG13[],load:0x6a->NXM_NX_REG11[],load:0x69->NXM_NX_REG12[],load:0x18d->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8)
> 
> 
> bash-4.4# ovs-ofctl dump-flows br-int table=65 | grep 8937d8e
>  cookie=0x38937d8e, duration=40593.699s, table=65, n_packets=1848, 
> n_bytes=98960, idle_age=2599, priority=100,reg15=0x1,metadata=0x18d 
> actions=output:737
> 
> 
> In table=0, the ofport is fine (745) but in the output stage it is using a 
> different one (737).
> 
> By checking the OVS database transaction history, that port, at some point, 
> had the id 737:
> 
> record 6516: 2020-12-14 22:22:54.184
> 
>   table Interface row "tap71a5dfc1-10" (073801e2):
> ofport=737
>   table Open_vSwitch row 1d9566c8 (1d9566c8):
> cur_cfg=2023
> 
> So it looks like ovn-controller is not updating the ofport in the physical 
> flows for the output stage.
> 
> We'll try to figure out if this happens also in master.
> 
> Thanks,
> daniel
>  
>> -- 
>>   Krzysztof Klimonda
>>   kklimo...@syntaxhighlighted.com
>> ___
>> discuss mailing list
>> disc...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> 
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovn] TCP/UDP traffic dropped between IP addresses defined in allowed address pairs when remote security group is used

2020-12-14 Thread Krzysztof Klimonda
Hi Numan,

https://signal.klimonda.com/ovnnb.db-broken.txt - this is the "initial" state 
where TCP is not being established (but ping works)
https://signal.klimonda.com/ovnnb.db-working.txt - this is after I create a 
separate IP-based rule to allow TCP traffic

In both examples, security group in question is 
ed081ef3-754a-492f-80b2-fb73cd2dceed which is mapped to 
pg_ed081ef3_754a_492f_80b2_fb73cd2dceed port group.

In the second ovnnb.db (ovnnb.db-working.txt), there is an extra ACL 
fb464efc-f63b-494b-b59b-6c2860dcecba added from CLI via:

`openstack security group rule create --ingress --protocol tcp --remote-ip 
172.16.0.0/24 default`

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com

On Mon, Dec 14, 2020, at 14:14, Numan Siddique wrote:
> On Mon, Dec 14, 2020 at 4:01 PM Krzysztof Klimonda
>  wrote:
> >
> > Hi,
> >
> > After upgrading to OVN 20.12.0 snapshot d8bc0377c I've noticed a problem in 
> > communication between VMs that use allowed address pairs and remote group 
> > id in security groups. I believe it has worked properly with OVN 20.06.2 
> > release (although I have no way of verifying it right now).
> >
> 
> Thanks for reporting the issue.
> 
> Is it possible for you to share the OVN NB DB somewhere ?
> 
> It would be easier to reproduce the issue with the DB.
> 
> Thanks
> Numan
> 
> > Given the following scenario:
> >
> > - 2 VMs with IP addresses: vm-a with IP addresses 10.0.0.10 and 172.16.0.10 
> > and vm-b with IP addresses 10.0.0.11 and 172.16.0.11 where 10.0.0.0/8 
> > addresses are set on ports, and 172.16.0.0/16 addresses are set in 
> > allowed-address for on ports
> > - There is single security group attached to both ports allowing for 
> > ingress tcp traffic coming from the same security group (remote-group)
> > - There is a TCP service listening on 10.0.0.10 on port 8000
> >
> > When I try accessing service from vm-b using 10.0.0.10 address, ovn 
> > forwards traffic properly. However, when I try accessing same service via 
> > 172.16.0.10 traffic is dropped.
> >
> > When I trace packets between VMs using ovn-trace, for first scenario the 
> > last step is:
> >
> > 8<8<
> > ct_next(ct_state=est|trk /* default (use --ct to customize) */)
> > ---
> >  4. ls_out_acl_hint (ovn-northd.c:5292): !ct.new && ct.est && !ct.rpl && 
> > ct_label.blocked == 0, priority 4, uuid ab5a233e
> > reg0[8] = 1;
> > reg0[10] = 1;
> > next;
> >  5. ls_out_acl (ovn-northd.c:5498): reg0[8] == 1 && (outport == 
> > @pg_ed081ef3_754a_492f_80b2_fb73cd2dceed && ip4 && ip4.src == 
> > $pg_ed081ef3_754a_492f_80b2_fb73cd2dceed_ip4 && tcp), priority 2002, uuid 
> > d92706d4
> > next;
> >  9. ls_out_port_sec_ip (ovn-northd.c:4525): outport == "864929" && eth.dst 
> > == fa:16:3e:bc:20:10 && ip4.dst == {255.255.255.255, 224.0.0.0/4, 
> > 10.0.0.10, 172.16.0.0/16}, priority 90, uuid ff3390b1
> > next;
> > 10. ls_out_port_sec_l2 (ovn-northd.c:4929): outport == "864929" && eth.dst 
> > == {fa:16:3e:bc:20:10}, priority 50, uuid af91c05c
> > output;
> > /* output to "864929", type "" */
> > 8<8<
> >
> > However, when I use 172.16.0.0/24 addresses, the last step changes to:
> >
> > 8<8<
> > ct_next(ct_state=est|trk /* default (use --ct to customize) */)
> > ---
> >  4. ls_out_acl_hint (ovn-northd.c:5292): !ct.new && ct.est && !ct.rpl && 
> > ct_label.blocked == 0, priority 4, uuid ab5a233e
> > reg0[8] = 1;
> > reg0[10] = 1;
> > next;
> >  5. ls_out_acl (ovn-northd.c:5553): reg0[10] == 1 && (outport == 
> > @neutron_pg_drop && ip), priority 2001, uuid e36c0840
> > ct_commit { ct_label.blocked = 1; };
> > 8<8<
> >
> > Further notes:
> >
> > - ICMP traffic between 172.16.0.0/24 addresses is forwarded correctly, with 
> > last step of ovn-trace being:
> >
> > 8<8<
> > ct_next(ct_state=est|trk /* default (use --ct to customize) */)
> > ---
> >  4. ls_out_acl_hint (ovn-northd.c:5292): !ct.new && ct.est && !ct.rpl && 
> > ct_label.blocked == 0, priority 4, uuid ab5a233e
> > reg0[8] = 1;
> > reg0[10] = 1;
> > next;
>

[ovs-discuss] [ovn] TCP/UDP traffic dropped between IP addresses defined in allowed address pairs when remote security group is used

2020-12-14 Thread Krzysztof Klimonda
Hi,

After upgrading to OVN 20.12.0 snapshot d8bc0377c I've noticed a problem in 
communication between VMs that use allowed address pairs and remote group id in 
security groups. I believe it has worked properly with OVN 20.06.2 release 
(although I have no way of verifying it right now).

Given the following scenario:

- 2 VMs with IP addresses: vm-a with IP addresses 10.0.0.10 and 172.16.0.10 and 
vm-b with IP addresses 10.0.0.11 and 172.16.0.11 where 10.0.0.0/8 addresses are 
set on ports, and 172.16.0.0/16 addresses are set in allowed-address for on 
ports
- There is single security group attached to both ports allowing for ingress 
tcp traffic coming from the same security group (remote-group)
- There is a TCP service listening on 10.0.0.10 on port 8000

When I try accessing service from vm-b using 10.0.0.10 address, ovn forwards 
traffic properly. However, when I try accessing same service via 172.16.0.10 
traffic is dropped.

When I trace packets between VMs using ovn-trace, for first scenario the last 
step is:

8<8<
ct_next(ct_state=est|trk /* default (use --ct to customize) */)
---
 4. ls_out_acl_hint (ovn-northd.c:5292): !ct.new && ct.est && !ct.rpl && 
ct_label.blocked == 0, priority 4, uuid ab5a233e
reg0[8] = 1;
reg0[10] = 1;
next;
 5. ls_out_acl (ovn-northd.c:5498): reg0[8] == 1 && (outport == 
@pg_ed081ef3_754a_492f_80b2_fb73cd2dceed && ip4 && ip4.src == 
$pg_ed081ef3_754a_492f_80b2_fb73cd2dceed_ip4 && tcp), priority 2002, uuid 
d92706d4
next;
 9. ls_out_port_sec_ip (ovn-northd.c:4525): outport == "864929" && eth.dst == 
fa:16:3e:bc:20:10 && ip4.dst == {255.255.255.255, 224.0.0.0/4, 10.0.0.10, 
172.16.0.0/16}, priority 90, uuid ff3390b1
next;
10. ls_out_port_sec_l2 (ovn-northd.c:4929): outport == "864929" && eth.dst == 
{fa:16:3e:bc:20:10}, priority 50, uuid af91c05c
output;
/* output to "864929", type "" */
8<8<

However, when I use 172.16.0.0/24 addresses, the last step changes to:

8<8<
ct_next(ct_state=est|trk /* default (use --ct to customize) */)
---
 4. ls_out_acl_hint (ovn-northd.c:5292): !ct.new && ct.est && !ct.rpl && 
ct_label.blocked == 0, priority 4, uuid ab5a233e
reg0[8] = 1;
reg0[10] = 1;
next;
 5. ls_out_acl (ovn-northd.c:5553): reg0[10] == 1 && (outport == 
@neutron_pg_drop && ip), priority 2001, uuid e36c0840
ct_commit { ct_label.blocked = 1; };
8<8<

Further notes:

- ICMP traffic between 172.16.0.0/24 addresses is forwarded correctly, with 
last step of ovn-trace being:

8<8<
ct_next(ct_state=est|trk /* default (use --ct to customize) */)
---
 4. ls_out_acl_hint (ovn-northd.c:5292): !ct.new && ct.est && !ct.rpl && 
ct_label.blocked == 0, priority 4, uuid ab5a233e
reg0[8] = 1;
reg0[10] = 1;
next;
 5. ls_out_acl (ovn-northd.c:5498): reg0[8] == 1 && (outport == 
@pg_ed081ef3_754a_492f_80b2_fb73cd2dceed && ip4 && ip4.src == 0.0.0.0/0 && 
icmp4), priority 2002, uuid cd1705d8
next;
 9. ls_out_port_sec_ip (ovn-northd.c:4525): outport == "864929" && eth.dst == 
fa:16:3e:bc:20:10 && ip4.dst == {255.255.255.255, 224.0.0.0/4, 10.0.0.10, 
172.16.0.0/16}, priority 90, uuid ff3390b1
next;
10. ls_out_port_sec_l2 (ovn-northd.c:4929): outport == "864929" && eth.dst == 
{fa:16:3e:bc:20:10}, priority 50, uuid af91c05c
output;
/* output to "864929", type "" */
8<8<

- If I replace security group rule, changing remote group to remote ip, traffic 
is forwarded correctly and last step in ovn-trace is:

8<8<
ct_next(ct_state=est|trk /* default (use --ct to customize) */)
---
 4. ls_out_acl_hint (ovn-northd.c:5292): !ct.new && ct.est && !ct.rpl && 
ct_label.blocked == 0, priority 4, uuid ab5a233e
reg0[8] = 1;
reg0[10] = 1;
next;
 5. ls_out_acl (ovn-northd.c:5498): reg0[8] == 1 && (outport == 
@pg_ed081ef3_754a_492f_80b2_fb73cd2dceed && ip4 && ip4.src == 172.16.0.0/24 && 
tcp), priority 2002, uuid a0871ca2
next;
 9. ls_out_port_sec_ip (ovn-northd.c:4525): outport == "864929" && eth.dst == 
fa:16:3e:bc:20:10 && ip4.dst == {255.255.255.255, 224.0.0.0/4, 10.0.0.10, 
172.16.0.0/16}, priority 90, uuid ff3390b1
next;
10. ls_out_port_sec_l2 (ovn-northd.c:4929): outport == "864929" && eth.dst == 
{fa:16:3e:bc:20:10}, priority 50, uuid af91c05c
output;
/* output to "864929", type "" */
8<8<

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [ovn] Broken ovs localport flow for ovnmeta namespaces created by neutron

2020-11-20 Thread Krzysztof Klimonda
Hi,

Doing some tempest runs on our pre-prod environment (stable/ussuri with ovn 
20.06.2 release) I've noticed that some network connectivity tests were failing 
randomly. I've reproduced that by conitnously rescuing and unrescuing instance 
- network connectivity from and to VM works in general (dhcp is fine, access 
from outside is fine), however VM has no access to its metadata server (via 
169.254.169.254 ip address). Tracing packet from VM to metadata via:

8<8<8<
ovs-appctl ofproto/trace br-int 
in_port=tapa489d406-91,dl_src=fa:16:3e:2c:b0:fd,dl_dst=fa:16:3e:8b:b5:39
8<8<8<

ends with

8<8<8<
65. reg15=0x1,metadata=0x97e, priority 100, cookie 0x15ec4875
output:1187
 >> Nonexistent output port
8<8<8<

And I can verify that there is no flow for the actual ovnmeta tap interface 
(tap67731b0a-c0):

8<8<8<
# docker exec -it openvswitch_vswitchd ovs-ofctl dump-flows br-int |grep -E 
output:'("tap67731b0a-c0"|1187)'
 cookie=0x15ec4875, duration=1868.378s, table=65, n_packets=524, n_bytes=40856, 
priority=100,reg15=0x1,metadata=0x97e actions=output:1187
#
8<8<8<

>From ovs-vswitchd.log it seems the interface tap67731b0a-c0 was added with 
>index 1187, then deleted, and re-added with index 1189 - that's probably due 
>to the fact that that is the only VM in that network and I'm constantly hard 
>rebooting it via rescue/unrescue:

8<8<8<
2020-11-20T11:41:18.347Z|08043|bridge|INFO|bridge br-int: added interface 
tap67731b0a-c0 on port 1187
2020-11-20T11:41:30.813Z|08044|bridge|INFO|bridge br-int: deleted interface 
tapa489d406-91 on port 1186
2020-11-20T11:41:30.816Z|08045|bridge|WARN|could not open network device 
tapa489d406-91 (No such device)
2020-11-20T11:41:31.040Z|08046|bridge|INFO|bridge br-int: deleted interface 
tap67731b0a-c0 on port 1187
2020-11-20T11:41:31.044Z|08047|bridge|WARN|could not open network device 
tapa489d406-91 (No such device)
2020-11-20T11:41:31.050Z|08048|bridge|WARN|could not open network device 
tapa489d406-91 (No such device)
2020-11-20T11:41:31.235Z|08049|connmgr|INFO|br-int<->unix#31: 2069 flow_mods in 
the last 43 s (858 adds, 814 deletes, 397 modifications)
2020-11-20T11:41:33.057Z|08050|bridge|INFO|bridge br-int: added interface 
tapa489d406-91 on port 1188
2020-11-20T11:41:33.582Z|08051|bridge|INFO|bridge br-int: added interface 
tap67731b0a-c0 on port 1189
2020-11-20T11:42:31.235Z|08052|connmgr|INFO|br-int<->unix#31: 168 flow_mods in 
the 2 s starting 59 s ago (114 adds, 10 deletes, 44 modifications) 
8<8<8<

Once I restart ovn-controller it recalculates local ovs flows and the problem 
is fixed so I'm assuming it's a local problem and not related to NB and SB 
databases.

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] Too many resubmits for packets coming from "external" network

2020-09-29 Thread Krzysztof Klimonda
On Tue, Sep 29, 2020, at 12:40, Dumitru Ceara wrote:
> On 9/29/20 12:14 PM, Daniel Alvarez Sanchez wrote:
> > 
> > 
> > On Tue, Sep 29, 2020 at 11:14 AM Krzysztof Klimonda
> >  > <mailto:kklimo...@syntaxhighlighted.com>> wrote:
> > 
> > On Tue, Sep 29, 2020, at 10:40, Dumitru Ceara wrote:
> > > On 9/29/20 12:42 AM, Krzysztof Klimonda wrote:
> > > > Hi Dumitru,
> > > >
> > > > This cluster is IPv4-only for now - there are no IPv6 networks
> > defined at all - overlay or underlay.
> > > >
> > > > However, once I increase a number of routers to ~250, a similar
> > behavior can be observed when I send ARP packets for non-existing
> > IPv4 addresses. The following warnings will flood ovs-vswitchd.log
> > for every address not known to OVN when I run `fping -g
> > 192.168.0.0/16` <http://192.168.0.0/16>:
> > > >
> > > > ---8<---8<---8<---
> > > >
> > 2020-09-28T22:26:40.967Z|21996|ofproto_dpif_xlate(handler6)|WARN|over 
> > 4096
> > resubmit actions on bridge br-int while processing
> > 
> > arp,in_port=1,vlan_tci=0x,dl_src=fa:16:3e:75:38:be,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.0.1,arp_tpa=192.168.0.35,arp_op=1,arp_sha=fa:16:3e:75:38:be,arp_tha=00:00:00:00:00:00
> > > > ---8<---8<---8<---
> > > >
> > > > This is even a larger concern for me, as some of our clusters
> > would be exposed to the internet where we can't easily prevent
> > scanning of an entire IP range.
> > > >
> > > > Perhaps this is something that should be handled differently for
> > traffic coming from external network? Is there any reason why OVN is
> > not dropping ARP requests and IPv6 ND for IP addresses it knows
> > nothing about? Or maybe OVN should drop most of BUM traffic on
> > external network in general? I think all this network is used for is
> > SNAT and/or SNAT+DNAT for overlay networks.
> > > >
> > >
> > > Ok, so I guess we need a combination of the existing broadcast domain
> > > limiting options:
> > >
> > > 1. send ARP/NS packets only to router ports that own the target IP
> > address.
> > > 2. flood IPv6 ND RS packets only to router ports with IPv6 addresses
> > > configured and ipv6_ra_configs.address_mode set.
> > > 3. according to the logical switch multicast configuration either
> > flood
> > > unkown IP multicast or forward it only to hosts that registered
> > for the
> > > IP multicast group.
> > > 4. drop all other BUM traffic.
> > >
> > > From the above, 1 and 3 are already implemented. 2 is what I suggested
> > > earlier. 4 would probably turn out to be configuration option that
> > needs
> > > to be explicitly enabled on the logical switch connected to the
> > external
> > > network.
> > >
> > > Would this work for you?
> > 
> > I believe it would work for me, although it may be a good idea to
> > consult with neutron developers and see if they have any input on that.
> > 
> > 
> > I think that's a good plan. Implementing 4) via a configuration option
> > sounds smart. From an OpenStack point of view, I think that as all the
> > ports are known, we can just have it on by default.
> > We need to make sure it works for 'edge' cases like virtual ports, load
> > balancers and subports (ports with a parent port and a tag) but the idea
> > sounds great to me.
> > 
> > Thanks folks for the discussion! 
> 
> Thinking more about it it's probably not OK to drop all other BUM
> traffic. Instead we should just flood it on all logical ports of a
> logical switch _except_ router ports.
> 
> Otherwise we'll be breaking E-W traffic between VIFs connected to the
> same logical switch. E.g., VM1 and VM2 connected to the same LS and VM1
> sending ARP request for VM2's IP.

Does it also matter for the LS that is used by openstack for external networks? 
We don't usually connect VMs directly to that network, instead using FIPs for 
some VMs and SNATing traffic from other VMs on the router. Or is it unrelated 
to how VM is connected to the network and it would break for example FIP<->FIP 
traffic?

> 
> > 
> > 
> > >
> > > Thanks,
> > > Dumitru
> > >
> > > > -- Krzysztof Klimonda kklim

Re: [ovs-discuss] [OVN] Too many resubmits for packets coming from "external" network

2020-09-29 Thread Krzysztof Klimonda
On Tue, Sep 29, 2020, at 10:40, Dumitru Ceara wrote:
> On 9/29/20 12:42 AM, Krzysztof Klimonda wrote:
> > Hi Dumitru,
> > 
> > This cluster is IPv4-only for now - there are no IPv6 networks defined at 
> > all - overlay or underlay.
> > 
> > However, once I increase a number of routers to ~250, a similar behavior 
> > can be observed when I send ARP packets for non-existing IPv4 addresses. 
> > The following warnings will flood ovs-vswitchd.log for every address not 
> > known to OVN when I run `fping -g 192.168.0.0/16`:
> > 
> > ---8<---8<---8<---
> > 2020-09-28T22:26:40.967Z|21996|ofproto_dpif_xlate(handler6)|WARN|over 4096 
> > resubmit actions on bridge br-int while processing 
> > arp,in_port=1,vlan_tci=0x,dl_src=fa:16:3e:75:38:be,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.0.1,arp_tpa=192.168.0.35,arp_op=1,arp_sha=fa:16:3e:75:38:be,arp_tha=00:00:00:00:00:00
> > ---8<---8<---8<---
> > 
> > This is even a larger concern for me, as some of our clusters would be 
> > exposed to the internet where we can't easily prevent scanning of an entire 
> > IP range.
> > 
> > Perhaps this is something that should be handled differently for traffic 
> > coming from external network? Is there any reason why OVN is not dropping 
> > ARP requests and IPv6 ND for IP addresses it knows nothing about? Or maybe 
> > OVN should drop most of BUM traffic on external network in general? I think 
> > all this network is used for is SNAT and/or SNAT+DNAT for overlay networks.
> > 
> 
> Ok, so I guess we need a combination of the existing broadcast domain
> limiting options:
> 
> 1. send ARP/NS packets only to router ports that own the target IP address.
> 2. flood IPv6 ND RS packets only to router ports with IPv6 addresses
> configured and ipv6_ra_configs.address_mode set.
> 3. according to the logical switch multicast configuration either flood
> unkown IP multicast or forward it only to hosts that registered for the
> IP multicast group.
> 4. drop all other BUM traffic.
> 
> From the above, 1 and 3 are already implemented. 2 is what I suggested
> earlier. 4 would probably turn out to be configuration option that needs
> to be explicitly enabled on the logical switch connected to the external
> network.
> 
> Would this work for you?

I believe it would work for me, although it may be a good idea to consult with 
neutron developers and see if they have any input on that.

> 
> Thanks,
> Dumitru
> 
> > -- Krzysztof Klimonda kklimo...@syntaxhighlighted.com On Mon, Sep 28,
> > 2020, at 21:14, Dumitru Ceara wrote:
> >> On 9/28/20 5:33 PM, Krzysztof Klimonda wrote:
> >>> Hi,
> >>>
> >> Hi Krzysztof,
> >>
> >>> We're still doing some scale tests of OpenStack ussuri with ml2/ovn 
> >>> driver. We've deployed 140 virtualized compute nodes, and started 
> >>> creating routers that share single external network between them. 
> >>> Additionally, each router is connected to a private network.
> >>> Previously[1] we hit a problem of too many logical flows being generated 
> >>> per router connected to the same "external" network - this put too much 
> >>> stress on ovn-controller and ovs-vswitchd on compute nodes, and we've 
> >>> applied a patch[2] to limit a number of logical flows created per router.
> >>> After we dealt with that we've done more testing and created 200 routers 
> >>> connected to single external network. After that we've noticed the 
> >>> following logs in ovs-vswitchd.log:
> >>>
> >>> ---8<---8<---8<---
> >>> 2020-09-28T11:10:18.938Z|18401|ofproto_dpif_xlate(handler9)|WARN|over 
> >>> 4096 resubmit actions on bridge br-int while processing 
> >>> icmp6,in_port=1,vlan_tci=0x,dl_src=fa:16:3e:9b:77:c3,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::f816:3eff:fe9b:77c3,ipv6_dst=ff02::2,ipv6_label=0x2564e,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
> >>> ---8<---8<---8<---
> >>>
> >>> That starts happening after I create ~178 routers connected to the same 
> >>> external network.
> >>>
> >>> IPv6 RS ICMP packets are coming from the external network - that's due to 
> >>> the fact that all virtual compute nodes have IPv6 address on their 
> >>> interface used for the external network and are trying to discover a 
> >>> gateway. That's by accident, and we can remove IPv6 address from that 
> >>> interface, however I'm worried that it would just h

Re: [ovs-discuss] [OVN] Too many resubmits for packets coming from "external" network

2020-09-28 Thread Krzysztof Klimonda
Hi Dumitru,

This cluster is IPv4-only for now - there are no IPv6 networks defined at all - 
overlay or underlay.

However, once I increase a number of routers to ~250, a similar behavior can be 
observed when I send ARP packets for non-existing IPv4 addresses. The following 
warnings will flood ovs-vswitchd.log for every address not known to OVN when I 
run `fping -g 192.168.0.0/16`:

---8<---8<---8<---
2020-09-28T22:26:40.967Z|21996|ofproto_dpif_xlate(handler6)|WARN|over 4096 
resubmit actions on bridge br-int while processing 
arp,in_port=1,vlan_tci=0x,dl_src=fa:16:3e:75:38:be,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.0.1,arp_tpa=192.168.0.35,arp_op=1,arp_sha=fa:16:3e:75:38:be,arp_tha=00:00:00:00:00:00
---8<---8<---8<---

This is even a larger concern for me, as some of our clusters would be exposed 
to the internet where we can't easily prevent scanning of an entire IP range.

Perhaps this is something that should be handled differently for traffic coming 
from external network? Is there any reason why OVN is not dropping ARP requests 
and IPv6 ND for IP addresses it knows nothing about? Or maybe OVN should drop 
most of BUM traffic on external network in general? I think all this network is 
used for is SNAT and/or SNAT+DNAT for overlay networks.

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com

On Mon, Sep 28, 2020, at 21:14, Dumitru Ceara wrote:
> On 9/28/20 5:33 PM, Krzysztof Klimonda wrote:
> > Hi,
> > 
> 
> Hi Krzysztof,
> 
> > We're still doing some scale tests of OpenStack ussuri with ml2/ovn driver. 
> > We've deployed 140 virtualized compute nodes, and started creating routers 
> > that share single external network between them. Additionally, each router 
> > is connected to a private network.
> > Previously[1] we hit a problem of too many logical flows being generated 
> > per router connected to the same "external" network - this put too much 
> > stress on ovn-controller and ovs-vswitchd on compute nodes, and we've 
> > applied a patch[2] to limit a number of logical flows created per router.
> > After we dealt with that we've done more testing and created 200 routers 
> > connected to single external network. After that we've noticed the 
> > following logs in ovs-vswitchd.log:
> > 
> > ---8<---8<---8<---
> > 2020-09-28T11:10:18.938Z|18401|ofproto_dpif_xlate(handler9)|WARN|over 4096 
> > resubmit actions on bridge br-int while processing 
> > icmp6,in_port=1,vlan_tci=0x,dl_src=fa:16:3e:9b:77:c3,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::f816:3eff:fe9b:77c3,ipv6_dst=ff02::2,ipv6_label=0x2564e,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
> > ---8<---8<---8<---
> > 
> > That starts happening after I create ~178 routers connected to the same 
> > external network.
> > 
> > IPv6 RS ICMP packets are coming from the external network - that's due to 
> > the fact that all virtual compute nodes have IPv6 address on their 
> > interface used for the external network and are trying to discover a 
> > gateway. That's by accident, and we can remove IPv6 address from that 
> > interface, however I'm worried that it would just hide some bigger issue 
> > with flows generated by OVN.
> > 
> 
> Is this an IPv4 cluster; are there IPv6 addresses configured on the
> logical router ports connected to the external network?
> 
> If there are IPv6 addresses, do the logical router ports connected to
> the external network have
> Logical_Router_Port.ipv6_ra_configs.address_mode set?
> 
> If not, we could try to enhance the broadcast domain limiting code in
> OVN [3] to also limit sending router solicitations only to router ports
> with address_mode configured.
> 
> Regards,
> Dumitru
> 
> [3]
> https://github.com/ovn-org/ovn/blob/20a20439219493f27eb222617f045ba54c95ebfc/northd/ovn-northd.c#L6424
> 
> > software stack:
> > 
> > ovn: 20.06.2
> > ovs: 2.13.1
> > neutron: 16.1.0
> > 
> > [1] 
> > http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017370.html
> > [2] https://review.opendev.org/#/c/752678/
> > 
> 
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] Too many resubmits for packets coming from "external" network

2020-09-28 Thread Krzysztof Klimonda
Hi,

We're still doing some scale tests of OpenStack ussuri with ml2/ovn driver. 
We've deployed 140 virtualized compute nodes, and started creating routers that 
share single external network between them. Additionally, each router is 
connected to a private network.
Previously[1] we hit a problem of too many logical flows being generated per 
router connected to the same "external" network - this put too much stress on 
ovn-controller and ovs-vswitchd on compute nodes, and we've applied a patch[2] 
to limit a number of logical flows created per router.
After we dealt with that we've done more testing and created 200 routers 
connected to single external network. After that we've noticed the following 
logs in ovs-vswitchd.log:

---8<---8<---8<---
2020-09-28T11:10:18.938Z|18401|ofproto_dpif_xlate(handler9)|WARN|over 4096 
resubmit actions on bridge br-int while processing 
icmp6,in_port=1,vlan_tci=0x,dl_src=fa:16:3e:9b:77:c3,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::f816:3eff:fe9b:77c3,ipv6_dst=ff02::2,ipv6_label=0x2564e,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
---8<---8<---8<---

That starts happening after I create ~178 routers connected to the same 
external network.

IPv6 RS ICMP packets are coming from the external network - that's due to the 
fact that all virtual compute nodes have IPv6 address on their interface used 
for the external network and are trying to discover a gateway. That's by 
accident, and we can remove IPv6 address from that interface, however I'm 
worried that it would just hide some bigger issue with flows generated by OVN.

software stack:

ovn: 20.06.2
ovs: 2.13.1
neutron: 16.1.0

[1] 
http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017370.html
[2] https://review.opendev.org/#/c/752678/

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [ovn] Duplicated ARP replies from all gateway chassis when sending ARP who-has for IP address of the gateway port

2020-09-11 Thread Krzysztof Klimonda
Hi,

I'm testing openstack ussuri deployment with ovn ml2 driver (versions at the 
end of the email) and seeing an unexpected behavior when sending ARP requests. 
for IP address of the gateway port configured on the router. from "outside" of 
OVN network: when I attach a subnet to the router, and send ARP request for 
router's gateway IP (by pinging it) I receive ARP replies from all chassis, and 
not only the active one. 

I've already reproduced it with numans on #openstwitch, so sending an email 
here to have a reference later if needed.

I've tested it with the following versions:

ovn: 20.06.2
openvswitch: 2.13.0
neutron: 16.1.0

Best Regards,
  Chris
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss