[ovs-discuss] OVS Tuple Space Searching Algorithm

2019-06-11 Thread pei Jikui
HI,

I am trying to understand how the Tuple Space Searching works in vswitchd when 
it tries to search an openflow rule for the upcall packets.

Following is my understanding about the procedure. I am not sure if the 
understanding is right or not since I couldn't understand well how the tss hash 
tables work together with the trie trees.


1)The data structures.



[cid:113ac556-883d-4c04-9d0d-25083c406333]

 *   For every openflow table, there are several sub-tables each one has 
the same mask for rules.
 *   For staged search purpose, in each sub-table, there are 4 hash tables 
which are built based on the meta, meta+l2, meta+l2+l3, meta+l2+l3+l4 
respectively.
 *   For prefix tracking purpose, in each sub-table, there are 2 or 3? 
Prefix trees which are used to track the source-ip, dest-ip fields? ( from the 
source code, it seems we have 3 trie trees? What's the other items is tracking 
by the trie tree besides the source ip and dest ip?)


2)The rule search procedure of each table.

The routine will first search the four hash tables and find the rule. Then use 
the prefix trees for the prefix tracking purpose which will potentially reduce 
the flow entries in datapath?  Is this right or not? What's the right procedure 
of the matching if the answer is no?


Thanks


Jikui

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Small 802.1q-prepended packets not getting through to VM

2019-06-11 Thread Steinar H. Gunderson
On Tue, Jun 11, 2019 at 03:59:53PM -0700, Gregory Rose wrote:
> root@ubuntu-1604-base:~# ovs-vsctl add-br br0
> root@ubuntu-1604-base:~# ovs-vsctl show
> 6be291a9-6bab-4fff-bda9-7f54335b4884
>     Bridge "br0"
>     Port "br0"
>     Interface "br0"
>     type: internal
>     ovs_version: "2.11.90"
> 
> As you can see no ports are automatically created or added to the bridge. 

Wait, what? You clearly have an interface there.

Perhaps go through everything I say and replace “port” with “interface”. :-)

> Some configuration script somewhere added
> vlan1 to your bridge and the source of that vlan1 interface should be in
> /etc/network/interfaces.

Yes, sure. I added it there because I need to talk to VMs (and the external
network).

> Could you provide your /etc/network/interfaces file?  Also, the output of
> 'ip link show' and 'ip addr show'.

Sure. Apologies for the non-English comments (and note the ethtool command
that I needed down there):

root@kaze:~# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo ovsbr0 ovsbr1 vlan1 vlan16 vlan50 vlan100
iface lo inet loopback

allow-ovs ovsbr0
iface ovsbr0 inet manual
ovs_type OVSBridge
# No physical ports

allow-ovs ovsbr1
iface ovsbr1 inet manual
ovs_type OVSBridge
ovs_ports enp57s2 enp57s3

allow-ovsbr1 enp57s2
iface enp57s2 inet manual
ovs_bridge ovsbr1
ovs_type OVSPort

allow-ovsbr1 enp57s3
iface enp57s3 inet manual
ovs_bridge ovsbr1
ovs_type OVSPort
ovs_options tag=1

# Telenor fiber admin
allow-ovsbr1 vlan16
iface vlan16 inet manual

# LAN
allow-ovsbr1 vlan1
iface vlan1 inet static
address 10.0.0.1
netmask 255.255.255.0
ovs_bridge ovsbr1
ovs_type OVSIntPort
ovs_options tag=1

# Nett mot Telenor
allow-ovsbr1 vlan50
iface vlan50 inet static
# Linknett
address 193.214.81.22
netmask 255.255.255.252
# Brukersynlig adresse
post-up ip addr add 193.213.32.206/32 dev lo
post-up ip route add default via 193.214.81.21 src 193.213.32.206
ovs_bridge ovsbr1
ovs_type OVSIntPort
ovs_options tag=50

# OfficeExtend VLAN
allow-ovsbr1 vlan100
iface vlan100 inet static
address 10.0.1.1
netmask 255.255.255.0
ovs_bridge ovsbr1
ovs_type OVSIntPort
ovs_options tag=100
# Pakker med feil sjekksum sendes ut gjennom OfficeExtend
post-up ethtool -K vlan100 tx off

root@kaze:~# ip link show
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp57s2:  mtu 1500 qdisc mq master 
ovs-system state UP mode DEFAULT group default qlen 1000
link/ether 00:14:5e:5b:97:57 brd ff:ff:ff:ff:ff:ff
alias connected to 8 peers
3: enp57s3:  mtu 1500 qdisc mq master 
ovs-system state UP mode DEFAULT group default qlen 1000
link/ether 00:14:5e:5b:97:59 brd ff:ff:ff:ff:ff:ff
4: ovs-system:  mtu 1500 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
link/ether 82:cf:ff:43:a3:32 brd ff:ff:ff:ff:ff:ff
5: ovsbr0:  mtu 1500 qdisc noqueue state 
UNKNOWN mode DEFAULT group default qlen 1000
link/ether c6:2f:5b:e2:f8:41 brd ff:ff:ff:ff:ff:ff
6: vlan50:  mtu 1500 qdisc cake state UNKNOWN 
mode DEFAULT group default qlen 1000
link/ether 96:28:65:47:8d:47 brd ff:ff:ff:ff:ff:ff
7: vlan100:  mtu 1500 qdisc noqueue state 
UNKNOWN mode DEFAULT group default qlen 1000
link/ether 82:cf:74:29:4d:92 brd ff:ff:ff:ff:ff:ff
8: ovsbr1:  mtu 1500 qdisc noqueue state 
UNKNOWN mode DEFAULT group default qlen 1000
link/ether 00:14:5e:5b:97:57 brd ff:ff:ff:ff:ff:ff
9: vlan1:  mtu 1500 qdisc noqueue state 
UNKNOWN mode DEFAULT group default qlen 1000
link/ether 3e:29:2a:06:95:d3 brd ff:ff:ff:ff:ff:ff
10: sit0@NONE:  mtu 1480 qdisc noop state DOWN mode DEFAULT group 
default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
11: he-ipv6@NONE:  mtu 1480 qdisc noqueue state 
UNKNOWN mode DEFAULT group default qlen 1000
link/sit 193.213.32.206 peer 216.66.80.90
12: vnet0:  mtu 1500 qdisc pfifo_fast master 
ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
link/ether fe:54:00:0a:63:16 brd ff:ff:ff:ff:ff:ff
13: vnet1:  mtu 1500 qdisc pfifo_fast master 
ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
link/ether fe:54:00:be:38:e5 brd ff:ff:ff:ff:ff:ff
alias connected to liawlc (Gi0/0/1)
14: nat64:  mtu 1500 qdisc pfifo_fast 
state UP mode DEFAULT group default qlen 500
link/none 

root@kaze:~# ip addr show
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever

Re: [ovs-discuss] Small 802.1q-prepended packets not getting through to VM

2019-06-11 Thread Gregory Rose


On 6/11/2019 3:30 PM, Steinar H. Gunderson wrote:

On Tue, Jun 11, 2019 at 03:00:00PM -0700, Gregory Rose wrote:

I see the source of the confusion here.  The vlan1 interface is added to the
OVS bridge and the port type
is internal.  Here's a better way to look at this:

   Net Device
  OVS switch
|--| |-|
|  | | |
|   vlan1  |<--->|  ovsbr1 |
|  | ^   | |
|--| |   |-|
  |
  |
     <- Think of that as a
virtual Ethernet cable

When you ssh to a destination via the vlan1 interface then the vlan1
interface is generating the packets.  If it has a tcp checksum offload
capability then it would use it but that will depend on the master device
it is controlled by.  This port is in no way owned by OVS.  OVS has simply
added it to the bridge using a virtual port which is by convention called
an 'internal' port.  But think of it as the cable connecting your virtual
device 'vlan1' to the OVS bridge 'ovsbr1'.

Does that help explain?

Only if you can tell me what the vlan1 device is. :-) I had assumed this was
a first-class concept within ovs; after all, when you create a bridge you get
one of these.


You don't automatically get any ports on a bridge when created.  For 
example:


root@ubuntu-1604-base:~# ovs-ctl start
 * Starting ovsdb-server
 * system ID not configured, please use --system-id
 * Configuring Open vSwitch system IDs
 * Starting ovs-vswitchd
 * Enabling remote OVSDB managers
root@ubuntu-1604-base:~# ovs-vsctl show
6be291a9-6bab-4fff-bda9-7f54335b4884
    ovs_version: "2.11.90"

Now I'll create a bridge:
root@ubuntu-1604-base:~# ovs-vsctl add-br br0
root@ubuntu-1604-base:~# ovs-vsctl show
6be291a9-6bab-4fff-bda9-7f54335b4884
    Bridge "br0"
    Port "br0"
    Interface "br0"
    type: internal
    ovs_version: "2.11.90"

As you can see no ports are automatically created or added to the 
bridge.  Some configuration script somewhere added
vlan1 to your bridge and the source of that vlan1 interface should be in 
/etc/network/interfaces.


Could you provide your /etc/network/interfaces file?  Also, the output 
of 'ip link show' and 'ip addr show'.


Thanks,

- Greg





It's created with Debian's ifupdown integration (an “OVSIntPort”-type
interface), which seems to do:

 ovs_vsctl -- --may-exist add-port "${IF_OVS_BRIDGE}"\
 "${IFACE}" ${IF_OVS_OPTIONS} -- set Interface "${IFACE}"\
 type=internal ${OVS_EXTRA+-- $OVS_EXTRA}


No, the internal ports are the virtual interfaces between physical/virtual
devices (vlan1) and the OVS bridge (ovsbr1).

So what are the vlan1 devices? Who should I nag to get them to pad their
packets correctly? :-)

/* Steinar */


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Small 802.1q-prepended packets not getting through to VM

2019-06-11 Thread Steinar H. Gunderson
On Tue, Jun 11, 2019 at 03:00:00PM -0700, Gregory Rose wrote:
> I see the source of the confusion here.  The vlan1 interface is added to the
> OVS bridge and the port type
> is internal.  Here's a better way to look at this:
> 
>   Net Device
>   OVS switch
> |--| |-|
> |  | | |
> |   vlan1  |<--->|  ovsbr1 |
> |  | ^   | |
> |--| |   |-|
>  |
>  |
>     <- Think of that as a
> virtual Ethernet cable
> 
> When you ssh to a destination via the vlan1 interface then the vlan1
> interface is generating the packets.  If it has a tcp checksum offload
> capability then it would use it but that will depend on the master device
> it is controlled by.  This port is in no way owned by OVS.  OVS has simply
> added it to the bridge using a virtual port which is by convention called
> an 'internal' port.  But think of it as the cable connecting your virtual
> device 'vlan1' to the OVS bridge 'ovsbr1'.
> 
> Does that help explain?

Only if you can tell me what the vlan1 device is. :-) I had assumed this was
a first-class concept within ovs; after all, when you create a bridge you get
one of these.

It's created with Debian's ifupdown integration (an “OVSIntPort”-type
interface), which seems to do:

ovs_vsctl -- --may-exist add-port "${IF_OVS_BRIDGE}"\
"${IFACE}" ${IF_OVS_OPTIONS} -- set Interface "${IFACE}"\
type=internal ${OVS_EXTRA+-- $OVS_EXTRA}

> No, the internal ports are the virtual interfaces between physical/virtual
> devices (vlan1) and the OVS bridge (ovsbr1).

So what are the vlan1 devices? Who should I nag to get them to pad their
packets correctly? :-)

/* Steinar */
-- 
Homepage: https://www.sesse.net/
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] VxLAN in Userspace

2019-06-11 Thread Ben Pfaff
On Tue, Jun 11, 2019 at 03:17:24PM -0400, Vasu Dasari wrote:
> I am running into an issue which sounds pretty basic, probably I might be
> missing something.

I think you're trying to use kernel tools to configure userspace
tunnels.  Did you read Documentation/howto/userspace-tunneling.rst?
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Small 802.1q-prepended packets not getting through to VM

2019-06-11 Thread Gregory Rose


On 6/11/2019 1:38 PM, Steinar H. Gunderson wrote:

On Tue, Jun 11, 2019 at 01:32:22PM -0700, Gregory Rose wrote:

I don't understand this.  The OVS internal port is a switch port. It does
not originate packets.  When you say you 'send TCP on an OVS internal
port', how are you doing that?

vlan1 is an OVS internal port:

   ifconfig vlan1 10.0.0.1 netmask 255.255.255.0
   ssh foo@10.0.0.2


Also, please provide the output of the following:

'ip -s link show '
'ovs-vsctl show'

on the system where the OVS bridge and internal port you mention reside.

9: vlan1:  mtu 1500 qdisc noqueue state 
UNKNOWN mode DEFAULT group default qlen 1000
 link/ether 3e:29:2a:06:95:d3 brd ff:ff:ff:ff:ff:ff
 RX: bytes  packets  errors  dropped overrun mcast
 13904405117 30446263 0   0   0   0
 TX: bytes  packets  errors  dropped carrier collsns
 54370564514 24255974 0   0   0   0

root@kaze:~# ovs-vsctl show
826aeca2-2786-49da-8bf5-f5cae976abb3
 Bridge "ovsbr1"
 Port "vlan50"
 tag: 50
 Interface "vlan50"
 type: internal
 Port "vlan100"
 tag: 100
 Interface "vlan100"
 type: internal
 Port "vnet1"
 Interface "vnet1"
 Port "vlan1"
 tag: 1
 Interface "vlan1"
 type: internal


I see the source of the confusion here.  The vlan1 interface is added to 
the OVS bridge and the port type

is internal.  Here's a better way to look at this:

  Net Device
  OVS switch
|--| |-|
|  | | |
|   vlan1  |<--->|  ovsbr1 |
|  | ^   | |
|--| |   |-|
 |
 |
    <- Think of that as 
a virtual Ethernet cable


When you ssh to a destination via the vlan1 interface then the vlan1 
interface is generating
the packets.  If it has a tcp checksum offload capability then it would 
use it but that will
depend on the master device it is controlled by.  This port is in no way 
owned by OVS.  OVS has
simply added it to the bridge using a virtual port which is by 
convention called an 'internal'
port.  But think of it as the cable connecting your virtual device 
'vlan1' to the OVS bridge

'ovsbr1'.

Does that help explain?



 Port "ovsbr1"
 Interface "ovsbr1"
 type: internal
 Port "enp57s3"
 tag: 1
 Interface "enp57s3"
 Port "enp57s2"
 Interface "enp57s2"
 Bridge "ovsbr0"
 Port "ovsbr0"
 Interface "ovsbr0"
 type: internal
 Port "vnet0"
 Interface "vnet0"
 ovs_version: "2.10.1"


So I need to understand what you mean when you 'send TCP  on an OVS internal
port'.  I have hard time
envisioning what you mean.

Maybe my nomenclature is somehow off here? I had assumed that if I gave an
OVS internal port an IP address and put it in my routing table, I'd be
sending TCP packets on it pretty fast. Isn't that what internal ports are
for?


No, the internal ports are the virtual interfaces between 
physical/virtual devices (vlan1) and the OVS bridge

(ovsbr1).

- Greg

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Small 802.1q-prepended packets not getting through to VM

2019-06-11 Thread Steinar H. Gunderson
On Tue, Jun 11, 2019 at 01:32:22PM -0700, Gregory Rose wrote:
> I don't understand this.  The OVS internal port is a switch port. It does
> not originate packets.  When you say you 'send TCP on an OVS internal
> port', how are you doing that?

vlan1 is an OVS internal port:

  ifconfig vlan1 10.0.0.1 netmask 255.255.255.0
  ssh foo@10.0.0.2

> Also, please provide the output of the following:
> 
> 'ip -s link show '
> 'ovs-vsctl show'
> 
> on the system where the OVS bridge and internal port you mention reside.

9: vlan1:  mtu 1500 qdisc noqueue state 
UNKNOWN mode DEFAULT group default qlen 1000
link/ether 3e:29:2a:06:95:d3 brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast   
13904405117 30446263 0   0   0   0   
TX: bytes  packets  errors  dropped carrier collsns 
54370564514 24255974 0   0   0   0 

root@kaze:~# ovs-vsctl show
826aeca2-2786-49da-8bf5-f5cae976abb3
Bridge "ovsbr1"
Port "vlan50"
tag: 50
Interface "vlan50"
type: internal
Port "vlan100"
tag: 100
Interface "vlan100"
type: internal
Port "vnet1"
Interface "vnet1"
Port "vlan1"
tag: 1
Interface "vlan1"
type: internal
Port "ovsbr1"
Interface "ovsbr1"
type: internal
Port "enp57s3"
tag: 1
Interface "enp57s3"
Port "enp57s2"
Interface "enp57s2"
Bridge "ovsbr0"
Port "ovsbr0"
Interface "ovsbr0"
type: internal
Port "vnet0"
Interface "vnet0"
ovs_version: "2.10.1"

> So I need to understand what you mean when you 'send TCP  on an OVS internal
> port'.  I have hard time
> envisioning what you mean.

Maybe my nomenclature is somehow off here? I had assumed that if I gave an
OVS internal port an IP address and put it in my routing table, I'd be
sending TCP packets on it pretty fast. Isn't that what internal ports are
for?

/* Steinar */
-- 
Homepage: https://www.sesse.net/
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Small 802.1q-prepended packets not getting through to VM

2019-06-11 Thread Gregory Rose


On 6/11/2019 11:28 AM, Steinar H. Gunderson wrote:

On Tue, Jun 11, 2019 at 11:06:15AM -0700, Gregory Rose wrote:

That depends.  Tap/tun interfaces can transfer data between user space
programs and in such cases the packets don't need to be padded to Ethernet
specs.  However, to the extent that any host interface, whether it is
virtual or physical, wants to send packets to an Ethernet switch (OVS in
this instance) then those interfaces should take care to pad their packets
to make them into legal Ethernet frames.  Or they might, and probably
should, be dropped.

The VM in question _does_ pad frames before sending them out. So that part of
the equation is just fine. The only problematic direction is OVS internal
port (which I don't know how is implemented?) -> OVS bridge -> tap -> VM.


OVS will recompute TCP/UDP checksums as well as IP checksums as necessary
based upon the packet transformations it is required to do.  However, if
the packet is not a legal Ethernet packet to begin with then behavior is
undefined since OVS is an Open Flow L2 Ethernet switch.

Again, when I send TCP on an OVS internal port, it appears to never get the
right checksum, and is forwarded to the VM with the wrong one (unless I
explicitly turn off offloading on the internal port). If that's not an OVS
bug, whose is it?

/* Steinar */


I don't understand this.  The OVS internal port is a switch port. It 
does not originate packets.  When

you say you 'send TCP on an OVS internal port', how are you doing that?

Also, please provide the output of the following:

'ip -s link show '
'ovs-vsctl show'

on the system where the OVS bridge and internal port you mention reside.

To help clear this up - OVS switches packets.  It does not, except on 
some specific cases, originate packets of its own.  I certainly can't 
think of any time it would need to generate TCP traffic. Cases I can 
think of are BPDU's and STP/RSTP
if it is configured for that but maybe I'm missing something obvious.  
It wouldn't be the first time...


So I need to understand what you mean when you 'send TCP  on an OVS 
internal port'.  I have hard time

envisioning what you mean.

Thanks,

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] VxLAN in Userspace

2019-06-11 Thread Vasu Dasari
Hi,

I am running into an issue which sounds pretty basic, probably I might be
missing something.

Using, OVS: 2.9.0:

1. Created a mininet instance using:
mn --topo single,3
2. Converted switch s1 to userspace switch and executed following commands:

# Delete controller
ovs-vsctl del-controller s1

#Convert s1 to userspace switch
ovs-vsctl set Bridge s1 datapath_type=netdev

# Add a VxLAN interface
ovs-vsctl add-port s1 vxlan -- set Interface vxlan type=vxlan \
options:key=5 options:local_ip=1.1.1.1 options:remote_ip=1.1.1.2
ofport_request=10

# Add a Flow to pick any packet coming in on port 1(s1-eth1) egress out of
vlan port
ovs-ofctl add-flow s1 "table=0,in_port=1 actions=output:10"

# Configure s1-eth3 to be the interface which connects to 1.1.1.0/24
ifconfig s1-eth3 1.1.1.1
arp -s 1.1.1.2 00:00:01:00:00:01

3. Now, I started ping h1 to h2 via mininet CLI.

I expected to see VxLAN tagged packets on s1-eth3. But I do not.

When I do ofproto/trace I see an error appears, "native tunnel routing
failed"

root@mininet:~# ovs-ofctl dump-flows s1
 cookie=0x0, duration=197.045s, table=0, n_packets=3, n_bytes=210,
in_port="s1-eth1" actions=output:vxlan

root@mininet:~# ovs-appctl ofproto/trace s1
in_port=1,dl_dst=00:01:02:03:04:05
Flow:
in_port=1,vlan_tci=0x,dl_src=00:00:00:00:00:00,dl_dst=00:01:02:03:04:05,dl_type=0x

bridge("s1")

 0. in_port=1, priority 32768
output:10
 -> output to native tunnel
 >> native tunnel routing failed

Final flow: unchanged
Megaflow: recirc_id=0,eth,in_port=1,dl_type=0x
Datapath actions: drop

root@mininet:~# ovs-appctl ovs/route/lookup 1.1.1.2
src 1.1.1.1
gateway ::
dev s1-eth3

Question is: Is there anything extra I need to configure if the switch were
to use userspace datapath? Or what I am seeing is a bug?

If I do the same configuration on kernel datapath mode(everything except
the step which changes bridge datapath type), I see VxLAN packets on
s1-eth3.

Thanks

*Vasu Dasari*
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Small 802.1q-prepended packets not getting through to VM

2019-06-11 Thread Steinar H. Gunderson
On Tue, Jun 11, 2019 at 11:06:15AM -0700, Gregory Rose wrote:
> That depends.  Tap/tun interfaces can transfer data between user space
> programs and in such cases the packets don't need to be padded to Ethernet
> specs.  However, to the extent that any host interface, whether it is
> virtual or physical, wants to send packets to an Ethernet switch (OVS in
> this instance) then those interfaces should take care to pad their packets
> to make them into legal Ethernet frames.  Or they might, and probably
> should, be dropped.

The VM in question _does_ pad frames before sending them out. So that part of
the equation is just fine. The only problematic direction is OVS internal
port (which I don't know how is implemented?) -> OVS bridge -> tap -> VM.

> OVS will recompute TCP/UDP checksums as well as IP checksums as necessary
> based upon the packet transformations it is required to do.  However, if
> the packet is not a legal Ethernet packet to begin with then behavior is
> undefined since OVS is an Open Flow L2 Ethernet switch.

Again, when I send TCP on an OVS internal port, it appears to never get the
right checksum, and is forwarded to the VM with the wrong one (unless I
explicitly turn off offloading on the internal port). If that's not an OVS
bug, whose is it?

/* Steinar */
-- 
Homepage: https://www.sesse.net/
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Small 802.1q-prepended packets not getting through to VM

2019-06-11 Thread Gregory Rose


On 6/10/2019 10:44 AM, Steinar H. Gunderson wrote:

On Mon, Jun 10, 2019 at 10:04:36AM -0700, Gregory Rose wrote:

It's not the issue we're encountering here. But it is an illustration that a
switch really needs to be able to pad; I don't believe openvswitch handles
this situation correctly either.

Specific examples might help.  Openvswitch does pad packets - see
pad_packet() in
../datapath/datapath.c.

Ah. If so, I'm probably mistaken.


2. However, for packets that originated internally (internal ports, STP,
   LLDP) or were modified in transit (VLAN tagging/untagging), and that
   egress, allow bad/no checksums on ingress, but pad (if needed) and
   compute checksum on egress. This will allow you to use the NIC's
   checksum offloading if any.

I believe that in reality, #2 is already in place, _except_ that it doesn't
pad in the situations where the NIC doesn't do it (e.g. virtio).

Yes, it's up to the NIC to do that.

What do you count as the NIC in the virtio case? Should tap pad when
returning from read()? What about hostnet?


That depends.  Tap/tun interfaces can transfer data between user space 
programs and in such cases
the packets don't need to be padded to Ethernet specs.  However, to the 
extent that any host interface,
whether it is virtual or physical, wants to send packets to an Ethernet 
switch (OVS in this instance) then
those interfaces should take care to pad their packets to make them into 
legal Ethernet frames.  Or

they might, and probably should, be dropped.

It appears that OVS is just forwarding them without checking the size.  
We are looking into addressing
that problem by checking the size of incoming packets and if they are 
under-size for Ethernet then they

should be dropped.




Hmm, we should fix the internal ports ethtool output to specify that it
does not do any checksum offloading.

I'll look into that if you would like but I can tell you that other than
recomputing checksums for case # 2 you mention above, I don't think there
is any TCP checksum offloading in the SW datapath.

By not being any TCP checksum offloading, do you mean that it never computes
checksums? From what I can see, when the flag is on, Linux doesn't compute
TCP/UDP checksums, so it arrives with the wrong checksum at the userspace
networking stack.

/* Steinar */


OVS will recompute TCP/UDP checksums as well as IP checksums as 
necessary based upon the packet
transformations it is required to do.  However, if the packet is not a 
legal Ethernet packet to begin with
then behavior is undefined since OVS is an Open Flow L2 Ethernet 
switch.  The only real problem I've seen
identified so far is that OVS will forward undersized Ethernet packets.  
That's incorrect behavior and we'll work

on fixing that so that those packets are dropped.

Thanks,

- Greg
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] GRE over IPv6 configuration

2019-06-11 Thread Gregory Rose



On 6/6/2019 2:25 PM, Eli Britstein wrote:


Thanks, I really appreciate it. However, though I see you find it to
work on Ubuntu 16.04 with kernel 4.4 (I will try it myself too), I think
we should think how to fix the upstream tunnels (in Linux kernel),
without the need to replace some integrated modules with ones from OVS
tree. This is what I'm not clear about - which ones and what/how to do
to fix it.



OVS has to replace some integrated modules in Linux because on older 
versions of Linux they do not
have the required level of feature support.  If you configure your build 
correctly it will do the right
things in most cases.  As I mentioned though sometimes some breakage can 
get into the builds because
we do not test on every single distro/kernel version.  Specifically, we 
do not do much testing on Fedora
Core at all since not many of our user base make use of that distro.  
Generally we do some build testing
against a variety of upstream kernels.  You can see what is supported in 
the .travis.yml file.


I have downloaded, installed and built OVS for FC 24.  The build is 
broken... And more to the point I get this

warning:
/home/roseg/prj/ovs-experimental/_build/datapath/linux/ip_gre.c:1125:36: 
warning: ‘ipgre_netdev_ops’ defined but not used [-Wunused-const-variable=]

 static const struct net_device_ops ipgre_netdev_ops = {

Then there are some other depmod errors but they don't seem related.

Did you have any luck with a different distro?

- Greg

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-11 Thread Daniel Alvarez Sanchez
Thanks a lot Han for the answer!

On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
>
>
>
>
> On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara  wrote:
> >
> > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> >  wrote:
> > >
> > > Hi Han, all,
> > >
> > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack
> > > using OVN and wanted to present some results and issues that we've
> > > found with the Incremental Processing feature in ovn-controller. Below
> > > is the scenario that we executed:
> > >
> > > * 7 baremetal nodes setup: 3 controllers (running
> > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
> > > 2.10.
> > > * The test consists on:
> > >   - Create openstack network (OVN LS), subnet and router
> > >   - Attach subnet to the router and set gw to the external network
> > >   - Create an OpenStack port and apply a Security Group (ACLs to allow
> > > UDP, SSH and ICMP).
> > >   - Bind the port to one of the 4 compute nodes (randomly) by
> > > attaching it to a network namespace.
> > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
> > >   - Wait until the test can ping the port
> > > * Running browbeat/rally with 16 simultaneous process to execute the
> > > test above 150 times.
> > > * When all the 150 'fake VMs' are created, browbeat will delete all
> > > the OpenStack/OVN resources.
> > >
> > > We first tried with OVS/OVN 2.10 and pulled some results which showed
> > > 100% success but ovn-controller is quite loaded (as expected) in all
> > > the nodes especially during the deletion phase:
> > >
> > > - Compute node: https://imgur.com/a/tzxfrIR
> > > - Controller node (ovn-northd and ovsdb-servers): 
> > > https://imgur.com/a/8ffKKYF
> > >
> > > After conducting the tests above, we replaced ovn-controller in all 7
> > > nodes by the one with the current master branch (actually from last
> > > week). We also replaced ovn-northd and ovsdb-servers but the
> > > ovs-vswitchd has been left untouched (still on 2.10). The expected
> > > results were to get less ovn-controller CPU usage and also better
> > > times due to the Incremental Processing feature introduced recently.
> > > However, the results don't look very good:
> > >
> > > - Compute node: https://imgur.com/a/wuq87F1
> > > - Controller node (ovn-northd and ovsdb-servers): 
> > > https://imgur.com/a/99kiyDp
> > >
> > > One thing that we can tell from the ovs-vswitchd CPU consumption is
> > > that it's much less in the Incremental Processing (IP) case which
> > > apparently doesn't make much sense. This led us to think that perhaps
> > > ovn-controller was not installing the necessary flows in the switch
> > > and we confirmed this hypothesis by looking into the dataplane
> > > results. Out of the 150 VMs, 10% of them were unreachable via ping
> > > when using ovn-controller from master.
> > >
> > > @Han, others, do you have any ideas as of what could be happening
> > > here? We'll be able to use this setup for a few more days so let me
> > > know if you want us to pull some other data/traces, ...
> > >
> > > Some other interesting things:
> > > On each of the compute nodes, (with an almost evenly distributed
> > > number of logical ports bound to them), the max amount of logical
> > > flows in br-int is ~90K (by the end of the test, right before deleting
> > > the resources).
> > >
> > > It looks like with the IP version, ovn-controller leaks some memory:
> > > https://imgur.com/a/trQrhWd
> > > While with OVS 2.10, it remains pretty flat during the test:
> > > https://imgur.com/a/KCkIT4O
> >
> > Hi Daniel, Han,
> >
> > I just sent a small patch for the ovn-controller memory leak:
> > https://patchwork.ozlabs.org/patch/1113758/
> >
> > At least on my setup this is what valgrind was pointing at.
> >
> > Cheers,
> > Dumitru
> >
> > >
> > > Looking forward to hearing back :)
> > > Daniel
> > >
> > > PS. Sorry for my previous email, I sent it by mistake without the subject
> > > ___
> > > discuss mailing list
> > > disc...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
> Thanks Daniel for the testing and reporting, and thanks Dumitru for fixing 
> the memory leak.
>
> Currently ovn-controller incremental processing only handles below SB changes 
> incrementally:
> - logical_flow
> - port_binding (for regular VIF binding NOT on current chassis)
> - mc_group
> - address_set
> - port_group
> - mac_binding
>
> So, in test scenario you described, since each iteration creates network (SB 
> datapath changes) and router ports (port_binding changes for non VIF), the 
> incremental processing would not help much, because most steps in your test 
> should trigger recompute. It would help if you create more Fake VMs in each 
> iteration, e.g. create 10 VMs or more on each LS. Secondly, when VIF 
> port-binding happens on current chassis, the ovn-controller will still do 
> re-compute, and because you have only 4 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-11 Thread Han Zhou
On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara  wrote:
>
> On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>  wrote:
> >
> > Hi Han, all,
> >
> > Lucas, Numan and I have been doing some 'scale' testing of OpenStack
> > using OVN and wanted to present some results and issues that we've
> > found with the Incremental Processing feature in ovn-controller. Below
> > is the scenario that we executed:
> >
> > * 7 baremetal nodes setup: 3 controllers (running
> > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
> > 2.10.
> > * The test consists on:
> >   - Create openstack network (OVN LS), subnet and router
> >   - Attach subnet to the router and set gw to the external network
> >   - Create an OpenStack port and apply a Security Group (ACLs to allow
> > UDP, SSH and ICMP).
> >   - Bind the port to one of the 4 compute nodes (randomly) by
> > attaching it to a network namespace.
> >   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
> >   - Wait until the test can ping the port
> > * Running browbeat/rally with 16 simultaneous process to execute the
> > test above 150 times.
> > * When all the 150 'fake VMs' are created, browbeat will delete all
> > the OpenStack/OVN resources.
> >
> > We first tried with OVS/OVN 2.10 and pulled some results which showed
> > 100% success but ovn-controller is quite loaded (as expected) in all
> > the nodes especially during the deletion phase:
> >
> > - Compute node: https://imgur.com/a/tzxfrIR
> > - Controller node (ovn-northd and ovsdb-servers):
https://imgur.com/a/8ffKKYF
> >
> > After conducting the tests above, we replaced ovn-controller in all 7
> > nodes by the one with the current master branch (actually from last
> > week). We also replaced ovn-northd and ovsdb-servers but the
> > ovs-vswitchd has been left untouched (still on 2.10). The expected
> > results were to get less ovn-controller CPU usage and also better
> > times due to the Incremental Processing feature introduced recently.
> > However, the results don't look very good:
> >
> > - Compute node: https://imgur.com/a/wuq87F1
> > - Controller node (ovn-northd and ovsdb-servers):
https://imgur.com/a/99kiyDp
> >
> > One thing that we can tell from the ovs-vswitchd CPU consumption is
> > that it's much less in the Incremental Processing (IP) case which
> > apparently doesn't make much sense. This led us to think that perhaps
> > ovn-controller was not installing the necessary flows in the switch
> > and we confirmed this hypothesis by looking into the dataplane
> > results. Out of the 150 VMs, 10% of them were unreachable via ping
> > when using ovn-controller from master.
> >
> > @Han, others, do you have any ideas as of what could be happening
> > here? We'll be able to use this setup for a few more days so let me
> > know if you want us to pull some other data/traces, ...
> >
> > Some other interesting things:
> > On each of the compute nodes, (with an almost evenly distributed
> > number of logical ports bound to them), the max amount of logical
> > flows in br-int is ~90K (by the end of the test, right before deleting
> > the resources).
> >
> > It looks like with the IP version, ovn-controller leaks some memory:
> > https://imgur.com/a/trQrhWd
> > While with OVS 2.10, it remains pretty flat during the test:
> > https://imgur.com/a/KCkIT4O
>
> Hi Daniel, Han,
>
> I just sent a small patch for the ovn-controller memory leak:
> https://patchwork.ozlabs.org/patch/1113758/
>
> At least on my setup this is what valgrind was pointing at.
>
> Cheers,
> Dumitru
>
> >
> > Looking forward to hearing back :)
> > Daniel
> >
> > PS. Sorry for my previous email, I sent it by mistake without the
subject
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Thanks Daniel for the testing and reporting, and thanks Dumitru for fixing
the memory leak.

Currently ovn-controller incremental processing only handles below SB
changes incrementally:
- logical_flow
- port_binding (for regular VIF binding NOT on current chassis)
- mc_group
- address_set
- port_group
- mac_binding

So, in test scenario you described, since each iteration creates network
(SB datapath changes) and router ports (port_binding changes for non VIF),
the incremental processing would not help much, because most steps in your
test should trigger recompute. It would help if you create more Fake VMs in
each iteration, e.g. create 10 VMs or more on each LS. Secondly, when VIF
port-binding happens on current chassis, the ovn-controller will still do
re-compute, and because you have only 4 compute nodes, so 1/4 of the
compute node will still recompute even when binding a regular VIF port.
When you have more compute nodes you would see incremental processing more
effective.

However, what really worries me is the 10% VM unreachable. I have one
confusion here on the test steps. The last step you described was: - 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-11 Thread Dumitru Ceara
On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
 wrote:
>
> Hi Han, all,
>
> Lucas, Numan and I have been doing some 'scale' testing of OpenStack
> using OVN and wanted to present some results and issues that we've
> found with the Incremental Processing feature in ovn-controller. Below
> is the scenario that we executed:
>
> * 7 baremetal nodes setup: 3 controllers (running
> ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
> 2.10.
> * The test consists on:
>   - Create openstack network (OVN LS), subnet and router
>   - Attach subnet to the router and set gw to the external network
>   - Create an OpenStack port and apply a Security Group (ACLs to allow
> UDP, SSH and ICMP).
>   - Bind the port to one of the 4 compute nodes (randomly) by
> attaching it to a network namespace.
>   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
>   - Wait until the test can ping the port
> * Running browbeat/rally with 16 simultaneous process to execute the
> test above 150 times.
> * When all the 150 'fake VMs' are created, browbeat will delete all
> the OpenStack/OVN resources.
>
> We first tried with OVS/OVN 2.10 and pulled some results which showed
> 100% success but ovn-controller is quite loaded (as expected) in all
> the nodes especially during the deletion phase:
>
> - Compute node: https://imgur.com/a/tzxfrIR
> - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF
>
> After conducting the tests above, we replaced ovn-controller in all 7
> nodes by the one with the current master branch (actually from last
> week). We also replaced ovn-northd and ovsdb-servers but the
> ovs-vswitchd has been left untouched (still on 2.10). The expected
> results were to get less ovn-controller CPU usage and also better
> times due to the Incremental Processing feature introduced recently.
> However, the results don't look very good:
>
> - Compute node: https://imgur.com/a/wuq87F1
> - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp
>
> One thing that we can tell from the ovs-vswitchd CPU consumption is
> that it's much less in the Incremental Processing (IP) case which
> apparently doesn't make much sense. This led us to think that perhaps
> ovn-controller was not installing the necessary flows in the switch
> and we confirmed this hypothesis by looking into the dataplane
> results. Out of the 150 VMs, 10% of them were unreachable via ping
> when using ovn-controller from master.
>
> @Han, others, do you have any ideas as of what could be happening
> here? We'll be able to use this setup for a few more days so let me
> know if you want us to pull some other data/traces, ...
>
> Some other interesting things:
> On each of the compute nodes, (with an almost evenly distributed
> number of logical ports bound to them), the max amount of logical
> flows in br-int is ~90K (by the end of the test, right before deleting
> the resources).
>
> It looks like with the IP version, ovn-controller leaks some memory:
> https://imgur.com/a/trQrhWd
> While with OVS 2.10, it remains pretty flat during the test:
> https://imgur.com/a/KCkIT4O

Hi Daniel, Han,

I just sent a small patch for the ovn-controller memory leak:
https://patchwork.ozlabs.org/patch/1113758/

At least on my setup this is what valgrind was pointing at.

Cheers,
Dumitru

>
> Looking forward to hearing back :)
> Daniel
>
> PS. Sorry for my previous email, I sent it by mistake without the subject
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-11 Thread Daniel Alvarez Sanchez
Hi Han, all,

Lucas, Numan and I have been doing some 'scale' testing of OpenStack
using OVN and wanted to present some results and issues that we've
found with the Incremental Processing feature in ovn-controller. Below
is the scenario that we executed:

* 7 baremetal nodes setup: 3 controllers (running
ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
2.10.
* The test consists on:
  - Create openstack network (OVN LS), subnet and router
  - Attach subnet to the router and set gw to the external network
  - Create an OpenStack port and apply a Security Group (ACLs to allow
UDP, SSH and ICMP).
  - Bind the port to one of the 4 compute nodes (randomly) by
attaching it to a network namespace.
  - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
  - Wait until the test can ping the port
* Running browbeat/rally with 16 simultaneous process to execute the
test above 150 times.
* When all the 150 'fake VMs' are created, browbeat will delete all
the OpenStack/OVN resources.

We first tried with OVS/OVN 2.10 and pulled some results which showed
100% success but ovn-controller is quite loaded (as expected) in all
the nodes especially during the deletion phase:

- Compute node: https://imgur.com/a/tzxfrIR
- Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF

After conducting the tests above, we replaced ovn-controller in all 7
nodes by the one with the current master branch (actually from last
week). We also replaced ovn-northd and ovsdb-servers but the
ovs-vswitchd has been left untouched (still on 2.10). The expected
results were to get less ovn-controller CPU usage and also better
times due to the Incremental Processing feature introduced recently.
However, the results don't look very good:

- Compute node: https://imgur.com/a/wuq87F1
- Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp

One thing that we can tell from the ovs-vswitchd CPU consumption is
that it's much less in the Incremental Processing (IP) case which
apparently doesn't make much sense. This led us to think that perhaps
ovn-controller was not installing the necessary flows in the switch
and we confirmed this hypothesis by looking into the dataplane
results. Out of the 150 VMs, 10% of them were unreachable via ping
when using ovn-controller from master.

@Han, others, do you have any ideas as of what could be happening
here? We'll be able to use this setup for a few more days so let me
know if you want us to pull some other data/traces, ...

Some other interesting things:
On each of the compute nodes, (with an almost evenly distributed
number of logical ports bound to them), the max amount of logical
flows in br-int is ~90K (by the end of the test, right before deleting
the resources).

It looks like with the IP version, ovn-controller leaks some memory:
https://imgur.com/a/trQrhWd
While with OVS 2.10, it remains pretty flat during the test:
https://imgur.com/a/KCkIT4O

Looking forward to hearing back :)
Daniel

PS. Sorry for my previous email, I sent it by mistake without the subject
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] (no subject)

2019-06-11 Thread Daniel Alvarez Sanchez
Hi Han, all,

Lucas, Numan and I have been doing some 'scale' testing of OpenStack
using OVN and wanted to present some results and issues that we've
found with the Incremental Processing feature in ovn-controller. Below
is the scenario that we executed:

* 7 baremetal nodes setup: 3 controllers (running
ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
2.10.
* The test consists on:
  - Create openstack network (OVN LS), subnet and router
  - Attach subnet to the router and set gw to the external network
  - Create an OpenStack port and apply a Security Group (ACLs to allow
UDP, SSH and ICMP).
  - Bind the port to one of the 4 compute nodes (randomly) by
attaching it to a network namespace.
  - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
  - Wait until the test can ping the port
* Running browbeat/rally with 16 simultaneous process to execute the
test above 150 times.
* When all the 150 'fake VMs' are created, browbeat will delete all
the OpenStack/OVN resources.

We first tried with OVS/OVN 2.10 and pulled some results which showed
100% success but ovn-controller is quite loaded (as expected) in all
the nodes especially during the deletion phase:

- Compute node: https://imgur.com/a/tzxfrIR
- Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF

After conducting the tests above, we replaced ovn-controller in all 7
nodes by the one with the current master branch (actually from last
week). We also replaced ovn-northd and ovsdb-servers but the
ovs-vswitchd has been left untouched (still on 2.10). The expected
results were to get less ovn-controller CPU usage and also better
times due to the Incremental Processing feature introduced recently.
However, the results don't look very good:

- Compute node: https://imgur.com/a/wuq87F1
- Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp

One thing that we can tell from the ovs-vswitchd CPU consumption is
that it's much less in the Incremental Processing (IP) case which
apparently doesn't make much sense. This led us to think that perhaps
ovn-controller was not installing the necessary flows in the switch
and we confirmed this hypothesis by looking into the dataplane
results. Out of the 150 VMs, 10% of them were unreachable via ping
when using ovn-controller from master.

@Han, others, do you have any ideas as of what could be happening
here? We'll be able to use this setup for a few more days so let me
know if you want us to pull some other data/traces, ...

Some other interesting things:
On each of the compute nodes, (with an almost evenly distributed
number of logical ports bound to them), the max amount of logical
flows in br-int is ~90K (by the end of the test, right before deleting
the resources).

It looks like with the IP version, ovn-controller leaks some memory:
https://imgur.com/a/trQrhWd
While with OVS 2.10, it remains pretty flat during the test:
https://imgur.com/a/KCkIT4O

Looking forward to hearing back :)
Daniel
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss