Re: [ovs-dev] [PATCH v2] ovs-tcpdump: Cleanup mirror port on SIGHUP/SIGTERM

2022-08-31 Thread Mike Pattrick
On Thu, Aug 25, 2022 at 6:14 AM Daniel Ding  wrote:
>
> If ovs-tcpdump received HUP or TERM signal, mirror and mirror
> interface should be destroyed. This often happens, when
> controlling terminal is closed, like ssh session closed, and
> other users use kill to terminate it.
>
> Signed-off-by: Daniel Ding 
> ---
>  utilities/ovs-tcpdump.in | 30 ++
>  1 file changed, 18 insertions(+), 12 deletions(-)
>
> diff --git a/utilities/ovs-tcpdump.in b/utilities/ovs-tcpdump.in
> index 7fd26e405..1d88a4666 100755
> --- a/utilities/ovs-tcpdump.in
> +++ b/utilities/ovs-tcpdump.in
> @@ -44,6 +44,7 @@ try:
>  from ovs import jsonrpc
>  from ovs.poller import Poller
>  from ovs.stream import Stream
> +from ovs.fatal_signal import add_hook
>  except Exception:
>  print("ERROR: Please install the correct Open vSwitch python support")
>  print("   libraries (version @VERSION@).")
> @@ -405,6 +406,20 @@ def py_which(executable):
> for path in os.environ["PATH"].split(os.pathsep))
>
>
> +def teardown(db_sock, interface, mirror_interface, tap_created):
> +def cleanup_mirror():
> +try:
> +ovsdb = OVSDB(db_sock)
> +ovsdb.destroy_mirror(interface, ovsdb.port_bridge(interface))
> +ovsdb.destroy_port(mirror_interface, 
> ovsdb.port_bridge(interface))
> +if tap_created is True:
> +_del_taps[sys.platform](mirror_interface)
> +except Exception as e:
> +print("ERROR: Unable to clean mirror: %s" % str(e))
> +
> +add_hook(cleanup_mirror, None, True)
> +
> +
>  def main():
>  rundir = os.environ.get('OVS_RUNDIR', '@RUNDIR@')
>  db_sock = 'unix:%s' % os.path.join(rundir, "db.sock")
> @@ -489,6 +504,9 @@ def main():
>  print("ERROR: Mirror port (%s) exists for port %s." %
>(mirror_interface, interface))
>  sys.exit(1)
> +
> +teardown(db_sock, interface, mirror_interface, tap_created)
> +
>  try:
>  ovsdb.make_port(mirror_interface, ovsdb.port_bridge(interface))
>  ovsdb.bridge_mirror(interface, mirror_interface,
> @@ -496,12 +514,6 @@ def main():
>  mirror_select_all)
>  except OVSDBException as oe:
>  print("ERROR: Unable to properly setup the mirror: %s." % str(oe))
> -try:
> -ovsdb.destroy_port(mirror_interface, 
> ovsdb.port_bridge(interface))
> -if tap_created is True:
> -_del_taps[sys.platform](mirror_interface)
> -except Exception:
> -pass
>  sys.exit(1)
>
>  ovsdb.close_idl()
> @@ -517,12 +529,6 @@ def main():
>  except KeyboardInterrupt:
>  if pipes.poll() is None:
>  pipes.terminate()
> -
> -ovsdb = OVSDB(db_sock)
> -ovsdb.destroy_mirror(interface, ovsdb.port_bridge(interface))
> -ovsdb.destroy_port(mirror_interface, ovsdb.port_bridge(interface))
> -if tap_created is True:
> -_del_taps[sys.platform](mirror_interface)
>  except Exception:
>  print("Unable to tear down the ports and mirrors.")
>  print("Please use ovs-vsctl to remove the ports and mirrors 
> created.")

Nit: This exception text is no longer relevant. That said, it probably
wasn't even relevant before, as it wouldn't have caught any exceptions
in the KeyboardInterrupt code above.

Otherwise, it looks good!

Acked-by: Mike Pattrick 

> --
> 2.27.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [OVSCON] Submission deadline extended to September 14

2022-08-31 Thread Aaron Conole
Greetings,

There were a few requests over the weekend to extended the final
deadline.  We also had a number of submissions (about 1/2 so far) that
have asked us to accommodate virtual presentations.

After the conference planning meeting today, we have worked to make sure
that we will can accommodate remote / virtual presentations.
Additionally, we have at least one talk submission that we are still
waiting on a "final" abstract.

Because of the above, we are extending the submission deadline by two
weeks.  There might be folks who have decided that they want to submit
given that the option to present remotely is available.  This allows
them to consider putting an abstract together.

Because the submission deadline is extended, we will also extend the
final acceptance announcements to September 21.

Thanks,
OvS + OVN Conf planning

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-dpdk: fix tx_dropped counters value

2022-08-31 Thread Mike Pattrick
On Wed, Aug 3, 2022 at 4:58 AM Robin Jarry  wrote:
>
> Packets that could not be transmitted because the TXQ are full should be
> taken into account in the global ovs_tx_failure_drops as it was the case
> before commit 29b94e12d57d ("netdev-dpdk: Refactor the DPDK transmit
> path.").
>
> Also, the global tx_dropped should take all ovs_*_tx_drops counters into
> account.
>
> Fixes: 29b94e12d57d ("netdev-dpdk: Refactor the DPDK transmit path.")
> Signed-off-by: Robin Jarry 
> ---
> v1 -> v2: fixed build error (last minute copy paste without testing...)
>
>  lib/netdev-dpdk.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 0dd655507b50..84f0fbf24355 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2883,8 +2883,12 @@ netdev_dpdk_eth_send(struct netdev *netdev, int qid,
>  cnt = netdev_dpdk_common_send(netdev, batch, );
>
>  dropped = batch_cnt - cnt;
> -
>  dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, cnt);
> +stats.tx_failure_drops += dropped;

Hello Robin,

I think this will double count tx_failure_drops if there were any in
netdev_dpdk_common_send().

> +dropped = stats.tx_mtu_exceeded_drops +
> +  stats.tx_qos_drops +
> +  stats.tx_failure_drops +
> +  stats.tx_invalid_hwol_drops;

I'm not too clear on why we have to recalculate this, what case would
ccase dropped to be incorrect here.

Thank you,
M

>  if (OVS_UNLIKELY(dropped)) {
>  struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
>
> --
> 2.37.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] Don't blindly save original dst IP and Port to avoid megaflow unwildcarding.

2022-08-31 Thread Ilya Maximets
On 8/31/22 17:14, Ilya Maximets wrote:
> On 8/31/22 17:12, Han Zhou wrote:
>>
>>
>> On Wed, Aug 31, 2022 at 2:06 AM Ilya Maximets > > wrote:
>>>
>>> On 8/31/22 01:32, Han Zhou wrote:


 On Tue, Aug 30, 2022 at 9:35 AM Ilya Maximets >>>  >> wrote:
>
> On 8/24/22 08:40, Han Zhou wrote:
>> The ls_in_pre_stateful priority 120 flow that saves dst IP and Port to
>> registers is causing a critical dataplane performance impact to
>> short-lived connections, because it unwildcards megaflows with exact
>> match on dst IP and L4 ports. Any new connections with a different
>> client side L4 port will encounter datapath flow miss and upcall to
>> ovs-vswitchd, which makes typical use cases such as HTTP1.0 based
>> RESTful API calls suffer big performance degredations.
>>
>> These fields (dst IP and port) were saved to registers to solve a
>> problem of LB hairpin use case when different VIPs are sharing
>> overlapping backend+port [0]. The change [0] might not have as wide
>> performance impact as it is now because at that time one of the match
>> condition "REGBIT_CONNTRACK_NAT == 1" was set only for established and
>> natted traffic, while now the impact is more obvious because
>> REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP
>> configured on the LS) since commit [1], after several other indirectly
>> related optimizations and refactors.
>>
>> This patch fixes the problem by modifying the priority-120 flows in
>> ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for any
>> traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and L4
>> port only for traffic matching the LB VIPs, because these are the ones
>> that need to be saved for the hairpin purpose. The existed priority-110
>> flows will match the rest of the traffic just like before but wouldn't
>> not save dst IP and L4 port, so any server->client traffic would not
>> unwildcard megaflows with client side L4 ports.
>
> Hmm, but if higher priority flows have matches on these fields, datapath
> flows will have them unwildcarded anyway.  So, why exactly that is better
> than the current approach?
>
 Hi Ilya,

 The problem of the current approach is that it blindly saves the L4 dst 
 port for any traffic in any direction, as long as there are VIPs 
 configured on the datapath.
 So consider the most typical scenario of a client sending API requests to 
 server backends behind a VIP. On the server side, any *reply* packets 
 would hit the flow that saves the client side L4 port because for 
 server->client direction the client port is the dst. If the client sends 
 10 requests, each with a different source port, the server side will end 
 up with unwildcarded DP flows like below: (192.168.1.2 is client IP)
 recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51224),
  packets:5, bytes:2475, used:1.118s, flags:FP., 
 actions:ct(zone=8,nat),recirc(0x20)
 recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51226),
  packets:5, bytes:2475, used:1.105s, flags:FP., 
 actions:ct(zone=8,nat),recirc(0x21)
 recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=37798),
  packets:5, bytes:2475, used:0.574s, flags:FP., 
 actions:ct(zone=8,nat),recirc(0x40)
 recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51250),
  packets:5, bytes:2475, used:0.872s, flags:FP., 
 actions:ct(zone=8,nat),recirc(0x2d)
 recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46940),
  packets:5, bytes:2475, used:0.109s, flags:FP., 
 actions:ct(zone=8,nat),recirc(0x60)
 recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46938),
  packets:5, bytes:2475, used:0.118s, flags:FP., 
 actions:ct(zone=8,nat),recirc(0x5f)
 recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51236),
  packets:5, bytes:2475, used:0.938s, flags:FP., 
 actions:ct(zone=8,nat),recirc(0x26)
 ...

 As a result, DP flows explode and every new request is going to be a miss 
 and upcall to userspace, 

Re: [ovs-dev] [PATCH ovn v2] Don't blindly save original dst IP and Port to avoid megaflow unwildcarding.

2022-08-31 Thread Ilya Maximets
On 8/31/22 17:12, Han Zhou wrote:
> 
> 
> On Wed, Aug 31, 2022 at 2:06 AM Ilya Maximets  > wrote:
>>
>> On 8/31/22 01:32, Han Zhou wrote:
>> >
>> >
>> > On Tue, Aug 30, 2022 at 9:35 AM Ilya Maximets > >  > > >> wrote:
>> >>
>> >> On 8/24/22 08:40, Han Zhou wrote:
>> >> > The ls_in_pre_stateful priority 120 flow that saves dst IP and Port to
>> >> > registers is causing a critical dataplane performance impact to
>> >> > short-lived connections, because it unwildcards megaflows with exact
>> >> > match on dst IP and L4 ports. Any new connections with a different
>> >> > client side L4 port will encounter datapath flow miss and upcall to
>> >> > ovs-vswitchd, which makes typical use cases such as HTTP1.0 based
>> >> > RESTful API calls suffer big performance degredations.
>> >> >
>> >> > These fields (dst IP and port) were saved to registers to solve a
>> >> > problem of LB hairpin use case when different VIPs are sharing
>> >> > overlapping backend+port [0]. The change [0] might not have as wide
>> >> > performance impact as it is now because at that time one of the match
>> >> > condition "REGBIT_CONNTRACK_NAT == 1" was set only for established and
>> >> > natted traffic, while now the impact is more obvious because
>> >> > REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP
>> >> > configured on the LS) since commit [1], after several other indirectly
>> >> > related optimizations and refactors.
>> >> >
>> >> > This patch fixes the problem by modifying the priority-120 flows in
>> >> > ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for any
>> >> > traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and L4
>> >> > port only for traffic matching the LB VIPs, because these are the ones
>> >> > that need to be saved for the hairpin purpose. The existed priority-110
>> >> > flows will match the rest of the traffic just like before but wouldn't
>> >> > not save dst IP and L4 port, so any server->client traffic would not
>> >> > unwildcard megaflows with client side L4 ports.
>> >>
>> >> Hmm, but if higher priority flows have matches on these fields, datapath
>> >> flows will have them unwildcarded anyway.  So, why exactly that is better
>> >> than the current approach?
>> >>
>> > Hi Ilya,
>> >
>> > The problem of the current approach is that it blindly saves the L4 dst 
>> > port for any traffic in any direction, as long as there are VIPs 
>> > configured on the datapath.
>> > So consider the most typical scenario of a client sending API requests to 
>> > server backends behind a VIP. On the server side, any *reply* packets 
>> > would hit the flow that saves the client side L4 port because for 
>> > server->client direction the client port is the dst. If the client sends 
>> > 10 requests, each with a different source port, the server side will end 
>> > up with unwildcarded DP flows like below: (192.168.1.2 is client IP)
>> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51224),
>> >  packets:5, bytes:2475, used:1.118s, flags:FP., 
>> > actions:ct(zone=8,nat),recirc(0x20)
>> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51226),
>> >  packets:5, bytes:2475, used:1.105s, flags:FP., 
>> > actions:ct(zone=8,nat),recirc(0x21)
>> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=37798),
>> >  packets:5, bytes:2475, used:0.574s, flags:FP., 
>> > actions:ct(zone=8,nat),recirc(0x40)
>> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51250),
>> >  packets:5, bytes:2475, used:0.872s, flags:FP., 
>> > actions:ct(zone=8,nat),recirc(0x2d)
>> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46940),
>> >  packets:5, bytes:2475, used:0.109s, flags:FP., 
>> > actions:ct(zone=8,nat),recirc(0x60)
>> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46938),
>> >  packets:5, bytes:2475, used:0.118s, flags:FP., 
>> > actions:ct(zone=8,nat),recirc(0x5f)
>> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51236),
>> >  packets:5, bytes:2475, used:0.938s, flags:FP., 
>> > actions:ct(zone=8,nat),recirc(0x26)
>> > ...
>> >
>> > As a result, DP flows explode and every new request is going to be a miss 
>> > and upcall to userspace, which is very 

Re: [ovs-dev] [PATCH ovn v2] Don't blindly save original dst IP and Port to avoid megaflow unwildcarding.

2022-08-31 Thread Han Zhou
On Wed, Aug 31, 2022 at 2:06 AM Ilya Maximets  wrote:
>
> On 8/31/22 01:32, Han Zhou wrote:
> >
> >
> > On Tue, Aug 30, 2022 at 9:35 AM Ilya Maximets mailto:i.maxim...@ovn.org>> wrote:
> >>
> >> On 8/24/22 08:40, Han Zhou wrote:
> >> > The ls_in_pre_stateful priority 120 flow that saves dst IP and Port
to
> >> > registers is causing a critical dataplane performance impact to
> >> > short-lived connections, because it unwildcards megaflows with exact
> >> > match on dst IP and L4 ports. Any new connections with a different
> >> > client side L4 port will encounter datapath flow miss and upcall to
> >> > ovs-vswitchd, which makes typical use cases such as HTTP1.0 based
> >> > RESTful API calls suffer big performance degredations.
> >> >
> >> > These fields (dst IP and port) were saved to registers to solve a
> >> > problem of LB hairpin use case when different VIPs are sharing
> >> > overlapping backend+port [0]. The change [0] might not have as wide
> >> > performance impact as it is now because at that time one of the match
> >> > condition "REGBIT_CONNTRACK_NAT == 1" was set only for established
and
> >> > natted traffic, while now the impact is more obvious because
> >> > REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP
> >> > configured on the LS) since commit [1], after several other
indirectly
> >> > related optimizations and refactors.
> >> >
> >> > This patch fixes the problem by modifying the priority-120 flows in
> >> > ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for
any
> >> > traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and L4
> >> > port only for traffic matching the LB VIPs, because these are the
ones
> >> > that need to be saved for the hairpin purpose. The existed
priority-110
> >> > flows will match the rest of the traffic just like before but
wouldn't
> >> > not save dst IP and L4 port, so any server->client traffic would not
> >> > unwildcard megaflows with client side L4 ports.
> >>
> >> Hmm, but if higher priority flows have matches on these fields,
datapath
> >> flows will have them unwildcarded anyway.  So, why exactly that is
better
> >> than the current approach?
> >>
> > Hi Ilya,
> >
> > The problem of the current approach is that it blindly saves the L4 dst
port for any traffic in any direction, as long as there are VIPs configured
on the datapath.
> > So consider the most typical scenario of a client sending API requests
to server backends behind a VIP. On the server side, any *reply* packets
would hit the flow that saves the client side L4 port because for
server->client direction the client port is the dst. If the client sends 10
requests, each with a different source port, the server side will end up
with unwildcarded DP flows like below: (192.168.1.2 is client IP)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51224),
packets:5, bytes:2475, used:1.118s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x20)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51226),
packets:5, bytes:2475, used:1.105s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x21)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=37798),
packets:5, bytes:2475, used:0.574s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x40)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51250),
packets:5, bytes:2475, used:0.872s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x2d)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46940),
packets:5, bytes:2475, used:0.109s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x60)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46938),
packets:5, bytes:2475, used:0.118s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x5f)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51236),
packets:5, bytes:2475, used:0.938s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x26)
> > ...
> >
> > As a result, DP flows explode and every new request is going to be a
miss and upcall to userspace, which is very inefficient. Even worse, as the
flow is so generic, even traffic unrelated to the VIP would have the same
impact, as long as a server on a LS with any VIP configuration is replying
client requests.
> > With the fix, only the client->VIP packets would hit such flows, and in
those cases the dst port is the server (well known) 

Re: [ovs-dev] [PATCH ovn v2] Don't blindly save original dst IP and Port to avoid megaflow unwildcarding.

2022-08-31 Thread Han Zhou
On Wed, Aug 31, 2022 at 1:38 AM Dumitru Ceara  wrote:
>
> On 8/31/22 02:17, Han Zhou wrote:
> > On Tue, Aug 30, 2022 at 8:18 AM Dumitru Ceara  wrote:
> >>
> >> Hi Han,
> >>
> >> On 8/24/22 08:40, Han Zhou wrote:
> >>> The ls_in_pre_stateful priority 120 flow that saves dst IP and Port to
> >>> registers is causing a critical dataplane performance impact to
> >>> short-lived connections, because it unwildcards megaflows with exact
> >>> match on dst IP and L4 ports. Any new connections with a different
> >>> client side L4 port will encounter datapath flow miss and upcall to
> >>> ovs-vswitchd, which makes typical use cases such as HTTP1.0 based
> >>> RESTful API calls suffer big performance degredations.
> >>>
> >>> These fields (dst IP and port) were saved to registers to solve a
> >>> problem of LB hairpin use case when different VIPs are sharing
> >>> overlapping backend+port [0]. The change [0] might not have as wide
> >>> performance impact as it is now because at that time one of the match
> >>> condition "REGBIT_CONNTRACK_NAT == 1" was set only for established and
> >>> natted traffic, while now the impact is more obvious because
> >>> REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP
> >>> configured on the LS) since commit [1], after several other indirectly
> >>> related optimizations and refactors.
> >>>
> >>> This patch fixes the problem by modifying the priority-120 flows in
> >>> ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for
any
> >>> traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and L4
> >>> port only for traffic matching the LB VIPs, because these are the ones
> >>> that need to be saved for the hairpin purpose. The existed
priority-110
> >>> flows will match the rest of the traffic just like before but wouldn't
> >>> not save dst IP and L4 port, so any server->client traffic would not
> >>> unwildcard megaflows with client side L4 ports.
> >>>
> >>
> >> Just to be 100% sure, the client->server traffic will still generate up
> >> to V x H flows where V is "the number of VIP:port tuples" and H is "the
> >> number of potential (dp_)hash values", right?
> >
> > Yes, if all the VIPs and backends are being accessed at the same time.
This
> > is expected. For any given pair of client<->server, regardless of the
> > connections (source port number changes), the packets should match the
same
> > mega-flow and be handled by fastpath (instead of going to userspace,
which
> > is the behavior before this patch).
> >
>
> Thanks for the confirmation!
>
> >>
> >>> [0] ce0ef8d59850 ("Properly handle hairpin traffic for VIPs with
shared
> > backends.")
> >>> [1] 0038579d1928 ("northd: Optimize ct nat for load balancer
traffic.")
> >>>
> >>> Signed-off-by: Han Zhou 
> >>> ---
> >>> v1 -> v2: Add the missing changes for ovn-northd.8.xml which I forgot
> > to commit
> >>>   before sending v1.
> >>>
> >>>  northd/northd.c | 125
+---
> >>>  northd/ovn-northd.8.xml |  14 ++---
> >>>  tests/ovn-northd.at |  87 ++--
> >>>  tests/ovn.at|  14 ++---
> >>>  4 files changed, 118 insertions(+), 122 deletions(-)
> >>>
> >>> diff --git a/northd/northd.c b/northd/northd.c
> >>> index 7e2681865..860641936 100644
> >>> --- a/northd/northd.c
> >>> +++ b/northd/northd.c
> >>> @@ -273,15 +273,15 @@ enum ovn_stage {
> >>>   * || REGBIT_ACL_HINT_{ALLOW_NEW/ALLOW/DROP/BLOCK} |   |
> >|
> >>>   * || REGBIT_ACL_LABEL | X |
> >|
> >>>   * ++--+ X |
> >|
> >>> - * | R1 | ORIG_DIP_IPV4 (>= IN_STATEFUL)   | R |
> >|
> >>> + * | R1 | ORIG_DIP_IPV4 (>= IN_PRE_STATEFUL)   | R |
> >|
> >>>   * ++--+ E |
> >|
> >>> - * | R2 | ORIG_TP_DPORT (>= IN_STATEFUL)   | G |
> >|
> >>> + * | R2 | ORIG_TP_DPORT (>= IN_PRE_STATEFUL)   | G |
> >|
> >>>   * ++--+ 0 |
> >|
> >>>   * | R3 |  ACL LABEL   |   |
> >|
> >>>   *
> >
++--+---+--+
> >>>   * | R4 |   UNUSED |   |
> >|
> >>> - * ++--+ X |
> > ORIG_DIP_IPV6  |
> >>> - * | R5 |   UNUSED | X | (>=
> > IN_STATEFUL) |
> >>> + * ++--+ X |
> > ORIG_DIP_IPV6(>= |
> >>> + * | R5 |   UNUSED | X |
> > IN_PRE_STATEFUL) |
> >>>   * ++--+ R |
> >|
> >>>   * | R6 |   UNUSED | E |
> >|
> >>>   * ++--+ G |
> >|
> >>> @@ 

[ovs-dev] OVN branch-22.09 created; 22.09.0 release delayed

2022-08-31 Thread Mark Michelson

Hi everyone.

With 
https://patchwork.ozlabs.org/project/ovn/patch/20220819134550.3954822-1-mmich...@redhat.com/ 
being merged, I have created branch-22.09 of OVN.


The published release date for OVN 22.09.0 is this Friday, which is a 
very short turnaround time for release. Therefore, the release is going 
to be delayed until Friday 16 September. This gives us 2+ weeks for 
testing and bug fixing before release.


Thanks,
Mark Michelson

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: don't add drop lflow if LB VIP matches LRP IP

2022-08-31 Thread Vladislav Odintsov
Please, add this tag before applying the patch:

Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2022-August/052021.html

Regards,
Vladislav Odintsov

> On 31 Aug 2022, at 16:06, Vladislav Odintsov  wrote:
> 
> If it is needed to create Load Balancer within LR with VIP, which matches
> any of LR's LRP IP, there is no need to create SNAT entry.  Now such
> traffic destined to LRP IP is not dropped.
> 
> With this patch a drop lflow with match=(ipX.dst == {IP}) is not added to
> lr_in_ip_input stage if LRP IP matches associated with this LR LB VIP.
> 
> Tests are added as well.
> 
> Signed-off-by: Vladislav Odintsov 
> ---
> NEWS|  3 ++
> northd/northd.c | 10 --
> tests/ovn-northd.at | 86 +
> 3 files changed, 96 insertions(+), 3 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 0f12b6abf..98dc17dd3 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -18,6 +18,9 @@ Post v22.06.0
>   - Added MAC binding aging mechanism, that is disabled by default.
> It can be enabled per logical router with option
> "mac_binding_age_threshold".
> +  - If it is needed to create Load Balancer within LR with VIP, which matches
> +any of LR's LRP IP, there is no need to create SNAT entry.  Now such
> +traffic destined to LRP IP is not dropped.
> 
> OVN v22.06.0 - 03 Jun 2022
> --
> diff --git a/northd/northd.c b/northd/northd.c
> index 7e2681865..338091728 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -10664,7 +10664,9 @@ build_lrouter_drop_own_dest(struct ovn_port *op, enum 
> ovn_stage stage,
> const char *ip = op->lrp_networks.ipv4_addrs[i].addr_s;
> 
> bool router_ip_in_snat_ips = !!shash_find(>od->snat_ips, ip);
> -bool drop_router_ip = (drop_snat_ip == router_ip_in_snat_ips);
> +bool router_ip_in_lb_ips = !!sset_find(>od->lb_ips_v4, ip);
> +bool drop_router_ip = (drop_snat_ip == (router_ip_in_snat_ips ||
> +router_ip_in_lb_ips));
> 
> if (drop_router_ip) {
> ds_put_format(_ips, "%s, ", ip);
> @@ -10690,7 +10692,9 @@ build_lrouter_drop_own_dest(struct ovn_port *op, enum 
> ovn_stage stage,
> const char *ip = op->lrp_networks.ipv6_addrs[i].addr_s;
> 
> bool router_ip_in_snat_ips = !!shash_find(>od->snat_ips, ip);
> -bool drop_router_ip = (drop_snat_ip == router_ip_in_snat_ips);
> +bool router_ip_in_lb_ips = !!sset_find(>od->lb_ips_v6, ip);
> +bool drop_router_ip = (drop_snat_ip == (router_ip_in_snat_ips ||
> +router_ip_in_lb_ips));
> 
> if (drop_router_ip) {
> ds_put_format(_ips, "%s, ", ip);
> @@ -12865,7 +12869,7 @@ build_lrouter_ipv4_ip_input(struct ovn_port *op,
>  * also a SNAT IP. Those are dropped later, in stage
>  * "lr_in_arp_resolve", if unSNAT was unsuccessful.
>  *
> - * If op->pd->lb_force_snat_router_ip is true, it means the IP of the
> + * If op->od->lb_force_snat_router_ip is true, it means the IP of the
>  * router port is also SNAT IP.
>  *
>  * Priority 60.
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index 157f9f60c..a60b3b0a9 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -1499,6 +1499,92 @@ AT_CHECK([grep "lr_in_unsnat" sbflows | sort], [0], 
> [dnl
> AT_CLEANUP
> ])
> 
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([LRP same IP as VIP or SNAT])
> +ovn_start
> +
> +check ovn-nbctl lr-add lr0
> +check ovn-nbctl lrp-add lr0 lr0-public 00:00:00:00:00:10 192.168.0.1/24 
> 2000::1/64
> +check ovn-nbctl --wait=sb lrp-add lr0 lr0-join 00:00:00:00:00:20 
> 10.10.0.1/24 192.168.1.1/24
> +
> +ovn-sbctl dump-flows lr0 > sbflows
> +AT_CAPTURE_FILE([sbflows])
> +
> +# There should be drop lflows for all IPs of both LRPs
> +AT_CHECK([grep "lr_in_ip_input" sbflows | grep 'ip.\.dst == {' | grep drop | 
> sed 's/table=../table=??/g' | sort], [0], [dnl
> +  table=??(lr_in_ip_input ), priority=60   , match=(ip4.dst == 
> {10.10.0.1, 192.168.1.1}), action=(drop;)
> +  table=??(lr_in_ip_input ), priority=60   , match=(ip4.dst == 
> {192.168.0.1}), action=(drop;)
> +  table=??(lr_in_ip_input ), priority=60   , match=(ip6.dst == {2000::1, 
> fe80::200:ff:fe00:10}), action=(drop;)
> +  table=??(lr_in_ip_input ), priority=60   , match=(ip6.dst == 
> {fe80::200:ff:fe00:20}), action=(drop;)
> +])
> +
> +# create SNAT with external IP equal to LRP's IP
> +check ovn-nbctl --wait=sb lr-nat-add lr0 snat 192.168.0.1 10.10.0.0/24
> +
> +ovn-sbctl dump-flows lr0 > sbflows
> +AT_CAPTURE_FILE([sbflows])
> +
> +# There should be no drop lflow for 192.168.0.1
> +AT_CHECK([grep "lr_in_ip_input" sbflows | grep 'ip.\.dst == {' | grep drop | 
> sed 's/table=../table=??/g' | sort], [0], [dnl
> +  table=??(lr_in_ip_input ), priority=60   , 

[ovs-dev] [PATCH ovn] northd: don't add drop lflow if LB VIP matches LRP IP

2022-08-31 Thread Vladislav Odintsov
If it is needed to create Load Balancer within LR with VIP, which matches
any of LR's LRP IP, there is no need to create SNAT entry.  Now such
traffic destined to LRP IP is not dropped.

With this patch a drop lflow with match=(ipX.dst == {IP}) is not added to
lr_in_ip_input stage if LRP IP matches associated with this LR LB VIP.

Tests are added as well.

Signed-off-by: Vladislav Odintsov 
---
 NEWS|  3 ++
 northd/northd.c | 10 --
 tests/ovn-northd.at | 86 +
 3 files changed, 96 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 0f12b6abf..98dc17dd3 100644
--- a/NEWS
+++ b/NEWS
@@ -18,6 +18,9 @@ Post v22.06.0
   - Added MAC binding aging mechanism, that is disabled by default.
 It can be enabled per logical router with option
 "mac_binding_age_threshold".
+  - If it is needed to create Load Balancer within LR with VIP, which matches
+any of LR's LRP IP, there is no need to create SNAT entry.  Now such
+traffic destined to LRP IP is not dropped.
 
 OVN v22.06.0 - 03 Jun 2022
 --
diff --git a/northd/northd.c b/northd/northd.c
index 7e2681865..338091728 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -10664,7 +10664,9 @@ build_lrouter_drop_own_dest(struct ovn_port *op, enum 
ovn_stage stage,
 const char *ip = op->lrp_networks.ipv4_addrs[i].addr_s;
 
 bool router_ip_in_snat_ips = !!shash_find(>od->snat_ips, ip);
-bool drop_router_ip = (drop_snat_ip == router_ip_in_snat_ips);
+bool router_ip_in_lb_ips = !!sset_find(>od->lb_ips_v4, ip);
+bool drop_router_ip = (drop_snat_ip == (router_ip_in_snat_ips ||
+router_ip_in_lb_ips));
 
 if (drop_router_ip) {
 ds_put_format(_ips, "%s, ", ip);
@@ -10690,7 +10692,9 @@ build_lrouter_drop_own_dest(struct ovn_port *op, enum 
ovn_stage stage,
 const char *ip = op->lrp_networks.ipv6_addrs[i].addr_s;
 
 bool router_ip_in_snat_ips = !!shash_find(>od->snat_ips, ip);
-bool drop_router_ip = (drop_snat_ip == router_ip_in_snat_ips);
+bool router_ip_in_lb_ips = !!sset_find(>od->lb_ips_v6, ip);
+bool drop_router_ip = (drop_snat_ip == (router_ip_in_snat_ips ||
+router_ip_in_lb_ips));
 
 if (drop_router_ip) {
 ds_put_format(_ips, "%s, ", ip);
@@ -12865,7 +12869,7 @@ build_lrouter_ipv4_ip_input(struct ovn_port *op,
  * also a SNAT IP. Those are dropped later, in stage
  * "lr_in_arp_resolve", if unSNAT was unsuccessful.
  *
- * If op->pd->lb_force_snat_router_ip is true, it means the IP of the
+ * If op->od->lb_force_snat_router_ip is true, it means the IP of the
  * router port is also SNAT IP.
  *
  * Priority 60.
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 157f9f60c..a60b3b0a9 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -1499,6 +1499,92 @@ AT_CHECK([grep "lr_in_unsnat" sbflows | sort], [0], [dnl
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([LRP same IP as VIP or SNAT])
+ovn_start
+
+check ovn-nbctl lr-add lr0
+check ovn-nbctl lrp-add lr0 lr0-public 00:00:00:00:00:10 192.168.0.1/24 
2000::1/64
+check ovn-nbctl --wait=sb lrp-add lr0 lr0-join 00:00:00:00:00:20 10.10.0.1/24 
192.168.1.1/24
+
+ovn-sbctl dump-flows lr0 > sbflows
+AT_CAPTURE_FILE([sbflows])
+
+# There should be drop lflows for all IPs of both LRPs
+AT_CHECK([grep "lr_in_ip_input" sbflows | grep 'ip.\.dst == {' | grep drop | 
sed 's/table=../table=??/g' | sort], [0], [dnl
+  table=??(lr_in_ip_input ), priority=60   , match=(ip4.dst == {10.10.0.1, 
192.168.1.1}), action=(drop;)
+  table=??(lr_in_ip_input ), priority=60   , match=(ip4.dst == 
{192.168.0.1}), action=(drop;)
+  table=??(lr_in_ip_input ), priority=60   , match=(ip6.dst == {2000::1, 
fe80::200:ff:fe00:10}), action=(drop;)
+  table=??(lr_in_ip_input ), priority=60   , match=(ip6.dst == 
{fe80::200:ff:fe00:20}), action=(drop;)
+])
+
+# create SNAT with external IP equal to LRP's IP
+check ovn-nbctl --wait=sb lr-nat-add lr0 snat 192.168.0.1 10.10.0.0/24
+
+ovn-sbctl dump-flows lr0 > sbflows
+AT_CAPTURE_FILE([sbflows])
+
+# There should be no drop lflow for 192.168.0.1
+AT_CHECK([grep "lr_in_ip_input" sbflows | grep 'ip.\.dst == {' | grep drop | 
sed 's/table=../table=??/g' | sort], [0], [dnl
+  table=??(lr_in_ip_input ), priority=60   , match=(ip4.dst == {10.10.0.1, 
192.168.1.1}), action=(drop;)
+  table=??(lr_in_ip_input ), priority=60   , match=(ip6.dst == {2000::1, 
fe80::200:ff:fe00:10}), action=(drop;)
+  table=??(lr_in_ip_input ), priority=60   , match=(ip6.dst == 
{fe80::200:ff:fe00:20}), action=(drop;)
+])
+
+check ovn-nbctl lr-nat-del lr0
+
+# create SNAT with external IPv6 equal to LRP's IPv6
+check ovn-nbctl --wait=sb lr-nat-add 

Re: [ovs-dev] [PATCH] github: Skip debian packaging for dpdk-latest.

2022-08-31 Thread Ilya Maximets
On 8/25/22 14:52, David Marchand wrote:
> Debian packaging target builds OVS against a packaged DPDK version.
> As a result, it is not relevant in the dpdk-latest branch which follows
> DPDK development branch.
> 
> Signed-off-by: David Marchand 
> ---
>  .github/workflows/build-and-test.yml | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/.github/workflows/build-and-test.yml 
> b/.github/workflows/build-and-test.yml
> index 58ab85e5d7..b555f45024 100644
> --- a/.github/workflows/build-and-test.yml
> +++ b/.github/workflows/build-and-test.yml
> @@ -198,6 +198,7 @@ jobs:
>  path: config.log
>  
>build-linux-deb:
> +if: ${{ github.ref != 'refs/heads/dpdk-latest' }}

Hmm.  This doesn't seem to work for the robot as it doesn't use the
'dpdk-latest' as a branch name.  Maybe we should define the DPDK_VER
variable globally and check it instead?


BTW, not a problem of this patch, but I spotted this:

  - compiler: gcc
dpdk_shared:  dpdk_experimental

While it should be:

  - compiler:   gcc
dpdk_experimental:  yes

Otherwise, I don't think we're testing experimental APIs.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix error messages for nonexistent ports/recirc_ids.

2022-08-31 Thread Eelco Chaudron



On 31 Aug 2022, at 12:25, Ilya Maximets wrote:

> If tnl_port_should_receive() is false, we're looking for a normal
> port, not a tunnel one.  And it's better to print recirculation IDs
> in hex since they are typically printed this way in flow dumps.
>
> Fixes: d40533fc820c ("odp-util: Improve log messages and error reporting for 
> Netlink parsing.")
> Signed-off-by: Ilya Maximets 

Changes look good to me with one small nit below. Compile and make check tested.

Acked-by: Eelco Chaudron 

//Eelco

> ---
>  ofproto/ofproto-dpif-xlate.c |  4 ++--
>  tests/ofproto-dpif.at| 14 ++
>  2 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index fda802e83..5c3021765 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -1515,7 +1515,7 @@ xlate_lookup_ofproto_(const struct dpif_backer *backer,
>  if (OVS_UNLIKELY(!recirc_id_node)) {
>  if (errorp) {
>  *errorp = xasprintf("no recirculation data for recirc_id "
> -"%"PRIu32, flow->recirc_id);
> +"%#"PRIx32, flow->recirc_id);
>  }
>  return NULL;
>  }
> @@ -1556,7 +1556,7 @@ xlate_lookup_ofproto_(const struct dpif_backer *backer,
>  if (errorp) {
>  *errorp = (tnl_port_should_receive(flow)
> ? xstrdup("no OpenFlow tunnel port for this packet")
> -   : xasprintf("no OpenFlow tunnel port for datapath "
> +   : xasprintf("no OpenFlow port for datapath "
> "port %"PRIu32, flow->in_port.odp_port));

nit: should we move port to the previous line now that we have room, it makes 
grep for strings easier?

Something like:

+   : xasprintf("no OpenFlow port for datapath port %"
+   PRIu32, flow->in_port.odp_port));

>  }
>  return NULL;
> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
> index 2e91ae1a1..f32a16981 100644
> --- a/tests/ofproto-dpif.at
> +++ b/tests/ofproto-dpif.at
> @@ -6393,6 +6393,20 @@ AT_CHECK([tail -2 stderr], [0], [dnl
>  ovs-appctl: ovs-vswitchd: server returned an error
>  ])
>
> +# Test incorrect command: ofproto/trace with nonexistent port number
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy "in_port(42)" ], [2], [stdout], 
> [stderr])
> +AT_CHECK([tail -2 stderr], [0], [dnl
> +no OpenFlow port for datapath port 42
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +# Test incorrect command: ofproto/trace with nonexistent recirc_id
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy "recirc_id(0x42)" ], [2], 
> [stdout], [stderr])
> +AT_CHECK([tail -2 stderr], [0], [dnl
> +no recirculation data for recirc_id 0x42
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
>
> -- 
> 2.34.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] tests: Use NETNS_DAEMONIZE instead of NS_CHECK_EXEC to start tcpdump.

2022-08-31 Thread Eelco Chaudron
Using NETNS_DAEMONIZE will start tcpdump in the background, and it will
also make sure it gets killed in corner cases.

For the check_pkt_len tests, we also kill tcpdump between individual
tests in the same test case to avoid confusion when analyzing results.

Fixes: 02dabb21f243 ("tests: Add check_pkt_len action test to 
system-offload-traffic.")
Suggested-by: david.march...@redhat.com
Signed-off-by: Eelco Chaudron 
---
v2:
  - Replaced NS_CHECK_EXEC with NETNS_DAEMONIZE for all tcpdump use cases.

 tests/system-offloads-traffic.at |   28 
 tests/system-traffic.at  |   22 +++---
 2 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/tests/system-offloads-traffic.at b/tests/system-offloads-traffic.at
index 1e1012965..d9b815a5d 100644
--- a/tests/system-offloads-traffic.at
+++ b/tests/system-offloads-traffic.at
@@ -297,8 +297,8 @@ table=4,in_port=1,reg0=0x0 actions=output:4
 ])
 AT_CHECK([ovs-ofctl --protocols=OpenFlow10 add-flows br0 flows.txt])
 
-NS_CHECK_EXEC([at_ns3], [tcpdump -l -n -U -i p3 dst 10.1.1.2 and icmp > 
p3.pcap 2>/dev/null &])
-NS_CHECK_EXEC([at_ns4], [tcpdump -l -n -U -i p4 dst 10.1.1.2 and icmp > 
p4.pcap 2>/dev/null &])
+NETNS_DAEMONIZE([at_ns3], [tcpdump -l -n -U -i p3 dst 10.1.1.2 and icmp > 
p3.pcap 2>/dev/null], [tcpdump3.pid])
+NETNS_DAEMONIZE([at_ns4], [tcpdump -l -n -U -i p4 dst 10.1.1.2 and icmp > 
p4.pcap 2>/dev/null], [tcpdump4.pid])
 sleep 1
 
 NS_CHECK_EXEC([at_ns1], [ping -q -c 10 -i 0.1 -w 2 -s 64 10.1.1.2 | 
FORMAT_PING], [0], [dnl
@@ -325,10 +325,10 @@ AT_CHECK([test $(ovs-appctl upcall/show | grep -c 
"offloaded flows") -eq 0], [0]
 
 OVS_TRAFFIC_VSWITCHD_STOP
 
-AT_CHECK([cat p3.pcap | awk '{print $NF}' | uniq -c | awk '{$1=$1;print}'], 
[0], [dnl
+AT_CHECK([cat p3.pcap | awk 'NF{print $NF}' | uniq -c | awk '{$1=$1;print}'], 
[0], [dnl
 10 1032
 ])
-AT_CHECK([cat p4.pcap | awk '{print $NF}' | uniq -c | awk '{$1=$1;print}'], 
[0], [dnl
+AT_CHECK([cat p4.pcap | awk 'NF{print $NF}' | uniq -c | awk '{$1=$1;print}'], 
[0], [dnl
 10 72
 ])
 
@@ -355,8 +355,8 @@ table=4,in_port=1,reg0=0x0 actions=output:4
 ])
 AT_CHECK([ovs-ofctl --protocols=OpenFlow10 add-flows br0 flows.txt])
 
-NS_CHECK_EXEC([at_ns3], [tcpdump -l -n -U -i p3 dst 10.1.1.2 and icmp > 
p3.pcap 2>/dev/null &])
-NS_CHECK_EXEC([at_ns4], [tcpdump -l -n -U -i p4 dst 10.1.1.2 and icmp > 
p4.pcap 2>/dev/null &])
+NETNS_DAEMONIZE([at_ns3], [tcpdump -l -n -U -i p3 dst 10.1.1.2 and icmp > 
p3.pcap 2>/dev/null], [tcpdump3.pid])
+NETNS_DAEMONIZE([at_ns4], [tcpdump -l -n -U -i p4 dst 10.1.1.2 and icmp > 
p4.pcap 2>/dev/null], [tcpdump4.pid])
 sleep 1
 
 NS_CHECK_EXEC([at_ns1], [ping -q -c 10 -i 0.1 -w 2 -s 64 10.1.1.2 | 
FORMAT_PING], [0], [dnl
@@ -382,10 +382,12 @@ in_port(3),eth(),eth_type(0x0800),ipv4(frag=no), 
packets:19, bytes:11614, used:0
 AT_CHECK([ovs-appctl upcall/show | grep -E "offloaded flows : [[1-9]]"], [0], 
[ignore])
 
 sleep 1
-AT_CHECK([cat p3.pcap | awk '{print $NF}' | uniq -c | awk '{$1=$1;print}'], 
[0], [dnl
+kill $(cat tcpdump3.pid)
+kill $(cat tcpdump4.pid)
+AT_CHECK([cat p3.pcap | awk 'NF{print $NF}' | uniq -c | awk '{$1=$1;print}'], 
[0], [dnl
 10 1032
 ])
-AT_CHECK([cat p4.pcap | awk '{print $NF}' | uniq -c | awk '{$1=$1;print}'], 
[0], [dnl
+AT_CHECK([cat p4.pcap | awk 'NF{print $NF}' | uniq -c | awk '{$1=$1;print}'], 
[0], [dnl
 10 72
 ])
 
@@ -501,8 +503,8 @@ table=4,in_port=1,reg0=0x0 actions=mod_nw_tos:8,output:4
 
 AT_CHECK([ovs-ofctl --protocols=OpenFlow10 add-flows br0 flows.txt])
 
-NS_CHECK_EXEC([at_ns3], [tcpdump -l -n -U -i p3 dst 10.1.1.2 and icmp > 
p3_2.pcap 2>/dev/null &])
-NS_CHECK_EXEC([at_ns4], [tcpdump -l -n -U -i p4 dst 10.1.1.2 and icmp > 
p4_2.pcap 2>/dev/null &])
+NETNS_DAEMONIZE([at_ns3], [tcpdump -l -n -U -i p3 dst 10.1.1.2 and icmp > 
p3_2.pcap 2>/dev/null], [tcpdump3_2.pid])
+NETNS_DAEMONIZE([at_ns4], [tcpdump -l -n -U -i p4 dst 10.1.1.2 and icmp > 
p4_2.pcap 2>/dev/null], [tcpdump4_2.pid])
 sleep 1
 
 NS_CHECK_EXEC([at_ns1], [ping -q -c 10 -i 0.1 -w 2 -s 64 10.1.1.2 | 
FORMAT_PING], [0], [dnl
@@ -519,10 +521,12 @@ in_port(3),eth(),eth_type(0x0800),ipv4(frag=no), 
packets:20, bytes:11720, used:0
 ])
 
 sleep 1
-AT_CHECK([cat p3_2.pcap | awk '{print $NF}' | uniq -c | awk '{$1=$1;print}'], 
[0], [dnl
+kill $(cat tcpdump3_2.pid)
+kill $(cat tcpdump4_2.pid)
+AT_CHECK([cat p3_2.pcap | awk 'NF{print $NF}' | uniq -c | awk 
'{$1=$1;print}'], [0], [dnl
 10 1032
 ])
-AT_CHECK([cat p4_2.pcap | awk '{print $NF}' | uniq -c | awk '{$1=$1;print}'], 
[0], [dnl
+AT_CHECK([cat p4_2.pcap | awk 'NF{print $NF}' | uniq -c | awk 
'{$1=$1;print}'], [0], [dnl
 10 72
 ])
 
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index f7f885036..3f76b3f19 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1894,7 +1894,7 @@ dnl eth/ip/icmp --> OVS --> eth/mpls/eth/ip/icmp
 AT_CHECK([ovs-ofctl -Oopenflow13 add-flow br0 
"table=0,priority=100,dl_type=0x0800 

[ovs-dev] [PATCH V2 1/1] netdev-offload: Set 'miss_api_supported' to be under netdev

2022-08-31 Thread Eli Britstein via dev
Commit [1] introduced a flag in dpif-netdev level, to optimize
performance and avoid hw_miss_packet_recover() for devices with no such
support.
However, there is a race condition between traffic processing and
assigning a 'flow_api' object to the netdev. In such case, EOPNOTSUPP is
returned by netdev_hw_miss_packet_recover() in netdev-offload.c layer
because 'flow_api' is not yet initialized. As a result, the flag is
falsely disabled, and subsequent packets won't be recovered, though they
should.

In order to fix it, move the flag to be in netdev-offload layer, to
avoid that race.

[1]: 6e50c1651869 ("dpif-netdev: Avoid hw_miss_packet_recover() for devices 
with no support.")

Fixes: 6e50c1651869 ("dpif-netdev: Avoid hw_miss_packet_recover() for devices 
with no support.")
Signed-off-by: Eli Britstein 
---
 lib/dpif-netdev.c| 18 +++---
 lib/netdev-offload.c | 28 +++-
 lib/netdev-offload.h |  2 ++
 lib/netdev.c |  1 +
 4 files changed, 33 insertions(+), 16 deletions(-)

Revision history:
- v2: bool -> atomic_bool

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a45b460145..2c08a71c8d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -431,7 +431,6 @@ struct dp_netdev_rxq {
 unsigned intrvl_idx;   /* Write index for 'cycles_intrvl'. */
 struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this queue. */
 bool is_vhost; /* Is rxq of a vhost port. */
-bool hw_miss_api_supported;/* hw_miss_packet_recover() supported.*/
 
 /* Counters of cycles spent successfully polling and processing pkts. */
 atomic_ullong cycles[RXQ_N_CYCLES];
@@ -5416,7 +5415,6 @@ port_reconfigure(struct dp_netdev_port *port)
 
 port->rxqs[i].port = port;
 port->rxqs[i].is_vhost = !strncmp(port->type, "dpdkvhost", 9);
-port->rxqs[i].hw_miss_api_supported = true;
 
 err = netdev_rxq_open(netdev, >rxqs[i].rx, i);
 if (err) {
@@ -8034,17 +8032,15 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread 
*pmd,
 #ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
 /* Restore the packet if HW processing was terminated before completion. */
 struct dp_netdev_rxq *rxq = pmd->ctx.last_rxq;
+bool miss_api_supported;
 
-if (rxq->hw_miss_api_supported) {
+atomic_read_relaxed(>port->netdev->hw_info.miss_api_supported,
+_api_supported);
+if (miss_api_supported) {
 int err = netdev_hw_miss_packet_recover(rxq->port->netdev, packet);
-if (err) {
-if (err != EOPNOTSUPP) {
-COVERAGE_INC(datapath_drop_hw_miss_recover);
-return -1;
-} else {
-/* API unsupported by the port; avoid subsequent calls. */
-rxq->hw_miss_api_supported = false;
-}
+if (err && err != EOPNOTSUPP) {
+COVERAGE_INC(datapath_drop_hw_miss_recover);
+return -1;
 }
 }
 #endif
diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index 9fde5f7a95..4592262bd3 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -183,6 +183,7 @@ netdev_assign_flow_api(struct netdev *netdev)
 CMAP_FOR_EACH (rfa, cmap_node, _flow_apis) {
 if (!rfa->flow_api->init_flow_api(netdev)) {
 ovs_refcount_ref(>refcnt);
+atomic_store_relaxed(>hw_info.miss_api_supported, true);
 ovsrcu_set(>flow_api, rfa->flow_api);
 VLOG_INFO("%s: Assigned flow API '%s'.",
   netdev_get_name(netdev), rfa->flow_api->type);
@@ -191,6 +192,7 @@ netdev_assign_flow_api(struct netdev *netdev)
 VLOG_DBG("%s: flow API '%s' is not suitable.",
  netdev_get_name(netdev), rfa->flow_api->type);
 }
+atomic_store_relaxed(>hw_info.miss_api_supported, false);
 VLOG_INFO("%s: No suitable flow API found.", netdev_get_name(netdev));
 
 return -1;
@@ -322,12 +324,28 @@ int
 netdev_hw_miss_packet_recover(struct netdev *netdev,
   struct dp_packet *packet)
 {
-const struct netdev_flow_api *flow_api =
-ovsrcu_get(const struct netdev_flow_api *, >flow_api);
+const struct netdev_flow_api *flow_api;
+bool miss_api_supported;
+int rv;
+
+atomic_read_relaxed(>hw_info.miss_api_supported,
+_api_supported);
+if (!miss_api_supported) {
+return EOPNOTSUPP;
+}
+
+flow_api = ovsrcu_get(const struct netdev_flow_api *, >flow_api);
+if (!flow_api || !flow_api->hw_miss_packet_recover) {
+return EOPNOTSUPP;
+}
+
+rv = flow_api->hw_miss_packet_recover(netdev, packet);
+if (rv == EOPNOTSUPP) {
+/* API unsupported by the port; avoid subsequent calls. */
+atomic_store_relaxed(>hw_info.miss_api_supported, false);
+}
 
-return (flow_api && flow_api->hw_miss_packet_recover)
-? 

[ovs-dev] [PATCH] ofproto-dpif-xlate: Fix error messages for nonexistent ports/recirc_ids.

2022-08-31 Thread Ilya Maximets
If tnl_port_should_receive() is false, we're looking for a normal
port, not a tunnel one.  And it's better to print recirculation IDs
in hex since they are typically printed this way in flow dumps.

Fixes: d40533fc820c ("odp-util: Improve log messages and error reporting for 
Netlink parsing.")
Signed-off-by: Ilya Maximets 
---
 ofproto/ofproto-dpif-xlate.c |  4 ++--
 tests/ofproto-dpif.at| 14 ++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index fda802e83..5c3021765 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -1515,7 +1515,7 @@ xlate_lookup_ofproto_(const struct dpif_backer *backer,
 if (OVS_UNLIKELY(!recirc_id_node)) {
 if (errorp) {
 *errorp = xasprintf("no recirculation data for recirc_id "
-"%"PRIu32, flow->recirc_id);
+"%#"PRIx32, flow->recirc_id);
 }
 return NULL;
 }
@@ -1556,7 +1556,7 @@ xlate_lookup_ofproto_(const struct dpif_backer *backer,
 if (errorp) {
 *errorp = (tnl_port_should_receive(flow)
? xstrdup("no OpenFlow tunnel port for this packet")
-   : xasprintf("no OpenFlow tunnel port for datapath "
+   : xasprintf("no OpenFlow port for datapath "
"port %"PRIu32, flow->in_port.odp_port));
 }
 return NULL;
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 2e91ae1a1..f32a16981 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -6393,6 +6393,20 @@ AT_CHECK([tail -2 stderr], [0], [dnl
 ovs-appctl: ovs-vswitchd: server returned an error
 ])
 
+# Test incorrect command: ofproto/trace with nonexistent port number
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy "in_port(42)" ], [2], [stdout], 
[stderr])
+AT_CHECK([tail -2 stderr], [0], [dnl
+no OpenFlow port for datapath port 42
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+# Test incorrect command: ofproto/trace with nonexistent recirc_id
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy "recirc_id(0x42)" ], [2], 
[stdout], [stderr])
+AT_CHECK([tail -2 stderr], [0], [dnl
+no recirculation data for recirc_id 0x42
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
-- 
2.34.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] tests: Kill tcpdump in check_pkt_len action tests.

2022-08-31 Thread Eelco Chaudron



On 30 Aug 2022, at 23:45, Ilya Maximets wrote:

> On 8/29/22 12:02, Eelco Chaudron wrote:
>> This change will terminate tcpdump between each test case to avoid
>> confusion when analyzing results.
>>
>> Fixes: 02dabb21f243 ("tests: Add check_pkt_len action test to 
>> system-offload-traffic.")
>> Suggested-by: David Marchand 
>> Signed-off-by: Eelco Chaudron 
>> ---
>>  tests/system-offloads-traffic.at |   15 +--
>>  1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/tests/system-offloads-traffic.at 
>> b/tests/system-offloads-traffic.at
>> index 1e1012965..7260654b0 100644
>> --- a/tests/system-offloads-traffic.at
>> +++ b/tests/system-offloads-traffic.at
>> @@ -324,11 +324,12 @@ AT_CHECK([ovs-appctl dpctl/dump-flows 
>> type=tc,offloaded], [0], [])
>>  AT_CHECK([test $(ovs-appctl upcall/show | grep -c "offloaded flows") -eq 
>> 0], [0], [ignore])
>>
>>  OVS_TRAFFIC_VSWITCHD_STOP
>> +kill $(pidof tcpdump)
>
> Can we just use NETNS_DAEMONIZE() to start tcpdump in the first place?
> Should we also do that in all other system tests?

Yes will change this to NETNS_DAEMONIZE() so we know the pid and only kill 
those tcpdump instances :)

Will also change the instances in system-traffic.

//Eelco


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] sset, smap, hmapx: Reserve hash map space while cloning.

2022-08-31 Thread Ilya Maximets
On 8/30/22 09:18, Eelco Chaudron wrote:
> 
> 
> On 29 Aug 2022, at 22:52, Ilya Maximets wrote:
> 
>> This makes the clone a little bit faster by avoiding multiple
>> incremental expansions with re-allocations on big sets.
>>
>> Signed-off-by: Ilya Maximets 
> 
> Changes look good to me, only compile and make check tested.
> 
> Acked-by: Eelco Chaudron 
> 

Applied.  Thanks!

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] raft: Fix unnecessary periodic compactions.

2022-08-31 Thread Ilya Maximets
On 8/26/22 16:08, Dumitru Ceara wrote:
> On 8/20/22 01:08, Ilya Maximets wrote:
>> While creating a new database file and storing a new snapshot
>> into it, raft module by mistake updates the base offset for the
>> old file.  So, the base offset of a new file remains zero.  Then
>> the old file is getting replaced with the new one, copying new
>> offsets as well.  In the end, after a full compaction, base offset
>> is always zero.  And any offset is twice as large as zero.  That
>> triggers a new compaction again at the earliest scheduled time.
>> In practice this issue triggers compaction every 10-20 minutes
>> regardless of the database load, after the first one is triggered
>> by the actual file growth or by the 24h maximum limit.
>>
>> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
>> databases.")
>> Reported-at: 
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-August/051977.html
>> Reported-by: Oleksandr Mykhalskyi 
>> Signed-off-by: Ilya Maximets 
>> ---
>>  ovsdb/raft.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
>> index b2c21e70f..b2361b173 100644
>> --- a/ovsdb/raft.c
>> +++ b/ovsdb/raft.c
>> @@ -4039,7 +4039,7 @@ raft_write_snapshot(struct raft *raft, struct 
>> ovsdb_log *log,
>>  if (error) {
>>  return error;
>>  }
>> -ovsdb_log_mark_base(raft->log);
>> +ovsdb_log_mark_base(log);
>>  
>>  /* Write log records. */
>>  for (uint64_t index = new_log_start; index < raft->log_end; index++) {
> 
> Nice!
> 
> Acked-by: Dumitru Ceara 
> 

Thanks!  Applied and backported down to 2.13.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-tc: Parse tunnel options only for geneve ports.

2022-08-31 Thread Ilya Maximets
On 8/28/22 10:45, Roi Dayan wrote:
> 
> 
> On 2022-08-19 10:51 PM, Ilya Maximets wrote:
>> Cited commit correctly fixed the issue of incorrect reporting
>> of zero-length geneve option matches as wildcarded.  But as a
>> side effect, exact match on the metadata length was added to
>> every tunnel flow match, even if the tunnel is not geneve.
>> That doesn't generate any functional issues, but it maybe
>> confusing to see 'tunnel(...,geneve(),...)' while looking at
>> datapath flow dumps for, e.g., vxlan tunnel flows.
>>
>> Fix that by checking the port type before parsing geneve options.
>> tunnel() attribute itself doesn't have enough information to
>> figure out the tunnel type.
>>
>> Fixes: 7a6c8074c5d2 ("netdev-offload-tc: Fix the mask for tunnel metadata 
>> length.")
>> Signed-off-by: Ilya Maximets 
>> ---
>>   lib/netdev-offload-tc.c | 18 --
>>   1 file changed, 12 insertions(+), 6 deletions(-)
>>
> 
> Reviewed-by: Roi Dayan 

Thanks, Roi and Eelco!   Applied.  Backported down to 2.17.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb: Don't store rows that didn't change in transaction history.

2022-08-31 Thread Ilya Maximets
On 8/26/22 15:46, Dumitru Ceara wrote:
> On 8/4/22 18:37, Ilya Maximets wrote:
>> Transaction history is used to construct database updates for clients.
>> But if the row didn't change it will never be used for monitor updates,
>> because ovsdb_monitor_changes_classify() will always return
>> OVSDB_CHANGES_NO_EFFECT.  So, ovsdb_monitor_history_change_cb()
>> will never add it to the update.
>>
>> This condition is very common for rows with references.  While
>> processing strong references in ovsdb_txn_adjust_atom_refs() the
>> whole destination row will be cloned into transaction just to update
>> the reference counter.  If this row will not be changed later in
>> the transaction, it will just stay in that state and will be added
>> to the transaction history.  Since the data didn't change, both 'old'
>> and 'new' datums will be the same and equal to one in the database.
>> So, we're keeping 2 copies of the same row in memory and we are
>> never using them.  In this case, we should just not add them to the
>> transaction history in the first place.
>>
>> This change should save some space in the transaction history in case
>> of transactions with rows with big number of strong references.
>> This should also speed up the processing since we will not clone
>> these rows for transaction history and will not count their atoms.
>>
>> Testing shows about 5-10% performance improvement in ovn-heater
>> test scenarios.
>>
>> 'n_atoms' counter for transaction adjusted to count only changed
>> rows, so we will have accurate value for a number of atoms in the
>> history.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
> 
> Looks good to me, thanks!
> 
> Acked-by: Dumitru Ceara 

Applied.  Thanks!

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] netlink-conntrack: Do not fail to parse if optional TCP protocol attributes are not found.

2022-08-31 Thread Ilya Maximets
On 8/4/22 18:07, Paolo Valerio wrote:
> Some of the CTA_PROTOINFO_TCP nested attributes are not always
> included in the received message, but the parsing logic considers them
> as required, failing in case they are not found.
> 
> This was observed while monitoring some connections by reading the
> events sent by conntrack:
> 
> ./ovstest test-netlink-conntrack monitor
> [...]
> 2022-08-04T09:39:02Z|7|netlink_conntrack|ERR|Could not parse nested TCP 
> protoinfo
>   options. Possibly incompatible Linux kernel version.
> 2022-08-04T09:39:02Z|8|netlink_notifier|WARN|unexpected netlink message 
> contents
> [...]
> 
> All the TCP DELETE/DESTROY events fail to parse with the message
> above.
> 
> Fix it by turning the relevant attributes to optional.
> 
> Signed-off-by: Paolo Valerio 
> ---
> - [1] is the related piece of code that skips flags and wscale for the
>   destroy evts.
> 
> [1] 
> https://github.com/torvalds/linux/blob/master/net/netfilter/nf_conntrack_proto_tcp.c#L1202
> ---
>  lib/netlink-conntrack.c |   45 +++--
>  1 file changed, 27 insertions(+), 18 deletions(-)

Thanks!  I applied this one patch.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] python-c-ext: Use designated initializers for type and module.

2022-08-31 Thread Ilya Maximets
On 8/26/22 15:27, Dumitru Ceara wrote:
> On 7/22/22 21:19, Ilya Maximets wrote:
>> Python documentation suggests to do so "to avoid listing all the
>> PyTypeObject fields that you don't care about and also to avoid
>> caring about the fields' declaration order".  And that does make
>> sense.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
> 
> Aside from the 0-day bot style issue this looks good to me:
> 
> Acked-by: Dumitru Ceara 

Thanks!  I added the missed space and applied the patch.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] Don't blindly save original dst IP and Port to avoid megaflow unwildcarding.

2022-08-31 Thread Ilya Maximets
On 8/31/22 01:32, Han Zhou wrote:
> 
> 
> On Tue, Aug 30, 2022 at 9:35 AM Ilya Maximets  > wrote:
>>
>> On 8/24/22 08:40, Han Zhou wrote:
>> > The ls_in_pre_stateful priority 120 flow that saves dst IP and Port to
>> > registers is causing a critical dataplane performance impact to
>> > short-lived connections, because it unwildcards megaflows with exact
>> > match on dst IP and L4 ports. Any new connections with a different
>> > client side L4 port will encounter datapath flow miss and upcall to
>> > ovs-vswitchd, which makes typical use cases such as HTTP1.0 based
>> > RESTful API calls suffer big performance degredations.
>> >
>> > These fields (dst IP and port) were saved to registers to solve a
>> > problem of LB hairpin use case when different VIPs are sharing
>> > overlapping backend+port [0]. The change [0] might not have as wide
>> > performance impact as it is now because at that time one of the match
>> > condition "REGBIT_CONNTRACK_NAT == 1" was set only for established and
>> > natted traffic, while now the impact is more obvious because
>> > REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP
>> > configured on the LS) since commit [1], after several other indirectly
>> > related optimizations and refactors.
>> >
>> > This patch fixes the problem by modifying the priority-120 flows in
>> > ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for any
>> > traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and L4
>> > port only for traffic matching the LB VIPs, because these are the ones
>> > that need to be saved for the hairpin purpose. The existed priority-110
>> > flows will match the rest of the traffic just like before but wouldn't
>> > not save dst IP and L4 port, so any server->client traffic would not
>> > unwildcard megaflows with client side L4 ports.
>>
>> Hmm, but if higher priority flows have matches on these fields, datapath
>> flows will have them unwildcarded anyway.  So, why exactly that is better
>> than the current approach?
>>
> Hi Ilya,
> 
> The problem of the current approach is that it blindly saves the L4 dst port 
> for any traffic in any direction, as long as there are VIPs configured on the 
> datapath.
> So consider the most typical scenario of a client sending API requests to 
> server backends behind a VIP. On the server side, any *reply* packets would 
> hit the flow that saves the client side L4 port because for server->client 
> direction the client port is the dst. If the client sends 10 requests, each 
> with a different source port, the server side will end up with unwildcarded 
> DP flows like below: (192.168.1.2 is client IP)
> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51224),
>  packets:5, bytes:2475, used:1.118s, flags:FP., 
> actions:ct(zone=8,nat),recirc(0x20)
> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51226),
>  packets:5, bytes:2475, used:1.105s, flags:FP., 
> actions:ct(zone=8,nat),recirc(0x21)
> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=37798),
>  packets:5, bytes:2475, used:0.574s, flags:FP., 
> actions:ct(zone=8,nat),recirc(0x40)
> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51250),
>  packets:5, bytes:2475, used:0.872s, flags:FP., 
> actions:ct(zone=8,nat),recirc(0x2d)
> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46940),
>  packets:5, bytes:2475, used:0.109s, flags:FP., 
> actions:ct(zone=8,nat),recirc(0x60)
> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46938),
>  packets:5, bytes:2475, used:0.118s, flags:FP., 
> actions:ct(zone=8,nat),recirc(0x5f)
> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51236),
>  packets:5, bytes:2475, used:0.938s, flags:FP., 
> actions:ct(zone=8,nat),recirc(0x26)
> ...
> 
> As a result, DP flows explode and every new request is going to be a miss and 
> upcall to userspace, which is very inefficient. Even worse, as the flow is so 
> generic, even traffic unrelated to the VIP would have the same impact, as 
> long as a server on a LS with any VIP configuration is replying client 
> requests.
> With the fix, only the client->VIP packets would hit such flows, and in those 
> cases the dst port is the server (well known) port, which is expected to be 
> matched in megaflows anyway, while the client 

Re: [ovs-dev] [PATCH ovn v2] Don't blindly save original dst IP and Port to avoid megaflow unwildcarding.

2022-08-31 Thread Dumitru Ceara
On 8/31/22 02:17, Han Zhou wrote:
> On Tue, Aug 30, 2022 at 8:18 AM Dumitru Ceara  wrote:
>>
>> Hi Han,
>>
>> On 8/24/22 08:40, Han Zhou wrote:
>>> The ls_in_pre_stateful priority 120 flow that saves dst IP and Port to
>>> registers is causing a critical dataplane performance impact to
>>> short-lived connections, because it unwildcards megaflows with exact
>>> match on dst IP and L4 ports. Any new connections with a different
>>> client side L4 port will encounter datapath flow miss and upcall to
>>> ovs-vswitchd, which makes typical use cases such as HTTP1.0 based
>>> RESTful API calls suffer big performance degredations.
>>>
>>> These fields (dst IP and port) were saved to registers to solve a
>>> problem of LB hairpin use case when different VIPs are sharing
>>> overlapping backend+port [0]. The change [0] might not have as wide
>>> performance impact as it is now because at that time one of the match
>>> condition "REGBIT_CONNTRACK_NAT == 1" was set only for established and
>>> natted traffic, while now the impact is more obvious because
>>> REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP
>>> configured on the LS) since commit [1], after several other indirectly
>>> related optimizations and refactors.
>>>
>>> This patch fixes the problem by modifying the priority-120 flows in
>>> ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for any
>>> traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and L4
>>> port only for traffic matching the LB VIPs, because these are the ones
>>> that need to be saved for the hairpin purpose. The existed priority-110
>>> flows will match the rest of the traffic just like before but wouldn't
>>> not save dst IP and L4 port, so any server->client traffic would not
>>> unwildcard megaflows with client side L4 ports.
>>>
>>
>> Just to be 100% sure, the client->server traffic will still generate up
>> to V x H flows where V is "the number of VIP:port tuples" and H is "the
>> number of potential (dp_)hash values", right?
> 
> Yes, if all the VIPs and backends are being accessed at the same time. This
> is expected. For any given pair of client<->server, regardless of the
> connections (source port number changes), the packets should match the same
> mega-flow and be handled by fastpath (instead of going to userspace, which
> is the behavior before this patch).
> 

Thanks for the confirmation!

>>
>>> [0] ce0ef8d59850 ("Properly handle hairpin traffic for VIPs with shared
> backends.")
>>> [1] 0038579d1928 ("northd: Optimize ct nat for load balancer traffic.")
>>>
>>> Signed-off-by: Han Zhou 
>>> ---
>>> v1 -> v2: Add the missing changes for ovn-northd.8.xml which I forgot
> to commit
>>>   before sending v1.
>>>
>>>  northd/northd.c | 125 +---
>>>  northd/ovn-northd.8.xml |  14 ++---
>>>  tests/ovn-northd.at |  87 ++--
>>>  tests/ovn.at|  14 ++---
>>>  4 files changed, 118 insertions(+), 122 deletions(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index 7e2681865..860641936 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -273,15 +273,15 @@ enum ovn_stage {
>>>   * || REGBIT_ACL_HINT_{ALLOW_NEW/ALLOW/DROP/BLOCK} |   |
>|
>>>   * || REGBIT_ACL_LABEL | X |
>|
>>>   * ++--+ X |
>|
>>> - * | R1 | ORIG_DIP_IPV4 (>= IN_STATEFUL)   | R |
>|
>>> + * | R1 | ORIG_DIP_IPV4 (>= IN_PRE_STATEFUL)   | R |
>|
>>>   * ++--+ E |
>|
>>> - * | R2 | ORIG_TP_DPORT (>= IN_STATEFUL)   | G |
>|
>>> + * | R2 | ORIG_TP_DPORT (>= IN_PRE_STATEFUL)   | G |
>|
>>>   * ++--+ 0 |
>|
>>>   * | R3 |  ACL LABEL   |   |
>|
>>>   *
> ++--+---+--+
>>>   * | R4 |   UNUSED |   |
>|
>>> - * ++--+ X |
> ORIG_DIP_IPV6  |
>>> - * | R5 |   UNUSED | X | (>=
> IN_STATEFUL) |
>>> + * ++--+ X |
> ORIG_DIP_IPV6(>= |
>>> + * | R5 |   UNUSED | X |
> IN_PRE_STATEFUL) |
>>>   * ++--+ R |
>|
>>>   * | R6 |   UNUSED | E |
>|
>>>   * ++--+ G |
>|
>>> @@ -5899,43 +5899,17 @@ build_pre_stateful(struct ovn_datapath *od,
>>>  ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_STATEFUL, 0, "1",
> "next;");
>>>  ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_STATEFUL, 0, "1",
> "next;");
>>>
>>> -const char *ct_lb_action = 

Re: [ovs-dev] [PATCH ovn v2 2/2] Prepare for post-22.09.0

2022-08-31 Thread Numan Siddique
On Sat, Aug 20, 2022 at 1:09 AM Ilya Maximets  wrote:
>
> On 8/19/22 15:45, Mark Michelson wrote:
> > Signed-off-by: Mark Michelson 
> > ---
> >  NEWS | 3 +++
> >  configure.ac | 2 +-
> >  debian/changelog | 6 ++
> >  3 files changed, 10 insertions(+), 1 deletion(-)
>
> LGTM,
> Acked-by: Ilya Maximets 

Looks like the branch-22.09 is not yet created.

For the entire series :
Acked-by: Numan Siddique 

Numan

> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev