Re: [ovs-dev] [PATCH v4] datapath: Add a new action dec_ttl

2021-05-26 Thread Matteo Croce
On Wed, May 26, 2021 at 2:50 PM Eelco Chaudron  wrote:
>
>
>
> On 26 May 2021, at 14:46, Matteo Croce wrote:
>
> > On Wed, May 26, 2021 at 2:34 PM Eelco Chaudron  wrote:
> >> On 18 May 2021, at 20:15, Ilya Maximets wrote:
> >>>> +/* Tests whether 'dpif' datapath supports decrement of the IP TTL via
> >>>> + * OVS_ACTION_DEC_TTL. */
> >>>> +static bool
> >>>> +check_dec_ttl_action(struct dpif *dpif)
> >>>> +{
> >>>> +struct odputil_keybuf keybuf;
> >>>> +struct flow flow = { 0 };
> >>>
> >>> It's probbaly better to just memset it as in other similar functions
> >>> to avoid compiler's complains.
> >>
> >> ACK, will use a memset here.
> >>
> >
> > Which complaint? Memset is a bit slower because it clears also the
> > struct padding.
>
> See the robot’s responses to this patchset:
>
> ofproto/ofproto-dpif.c:1195:12: error: missing braces around initializer 
> [-Werror=missing-braces]
>  struct flow flow = { 0 };
>
> But I guess memset is ok, as other similar functions use it.
>
> >
> > --
> > per aspera ad upstream
>

With C99 you can just do:

struct flow flow = { };

-- 
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] datapath: Add a new action dec_ttl

2021-05-26 Thread Matteo Croce
On Wed, May 26, 2021 at 2:34 PM Eelco Chaudron  wrote:
> On 18 May 2021, at 20:15, Ilya Maximets wrote:
> >> +/* Tests whether 'dpif' datapath supports decrement of the IP TTL via
> >> + * OVS_ACTION_DEC_TTL. */
> >> +static bool
> >> +check_dec_ttl_action(struct dpif *dpif)
> >> +{
> >> +struct odputil_keybuf keybuf;
> >> +struct flow flow = { 0 };
> >
> > It's probbaly better to just memset it as in other similar functions
> > to avoid compiler's complains.
>
> ACK, will use a memset here.
>

Which complaint? Memset is a bit slower because it clears also the
struct padding.

-- 
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] datapath: Add a new action dec_ttl

2021-02-15 Thread Matteo Croce
On Tue, Nov 24, 2020 at 11:43 AM Eelco Chaudron  wrote:
>
> Add support for the dec_ttl action. Instead of programming the datapath with
> a flow that matches the packet TTL and an IP set, use a single dec_ttl action.
>
> The old behavior is kept if the new action is not supported by the datapath.
>
>   # ovs-ofctl dump-flows br0
>cookie=0x0, duration=12.538s, table=0, n_packets=4, n_bytes=392, ip 
> actions=dec_ttl,NORMAL
>cookie=0x0, duration=12.536s, table=0, n_packets=4, n_bytes=168, 
> actions=NORMAL
>
>   # ping -c1 -t 20 192.168.0.2
>   PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
>   IP (tos 0x0, ttl 19, id 45336, offset 0, flags [DF], proto ICMP (1), length 
> 84)
>   192.168.0.1 > 192.168.0.2: ICMP echo request, id 8865, seq 1, length 64
>
> Linux netlink datapath support depends on upstream Linux commit:
>   744676e77720 ("openvswitch: add TTL decrement action")
>
>
> Note that in the Linux kernel tree the OVS_ACTION_ATTR_ADD_MPLS has been
> defined, and to make sure the IDs are in sync, it had to be added to the
> OVS source tree. This required some additional case statements, which
> should be revisited once the OVS implementation is added.
>
>
> Co-developed-by: Matteo Croce 
> Co-developed-by: Bindiya Kurle 
> Signed-off-by: Eelco Chaudron 
>

Acked-by: Matteo Croce 

-- 
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10] dpif-netlink: distribute polling to discreet handlers

2020-11-26 Thread Matteo Croce
On Thu, Nov 26, 2020 at 9:25 PM Paolo Valerio  wrote:
>
> Hi Mark,
>
> Mark Gray  writes:
>
> > From: Aaron Conole 
> >
> > Currently, the channel handlers are polled globally.  On some
> > systems, this causes a thundering herd issue where multiple
> > handler threads become active, only to do no work and immediately
> > sleep.
> >
> > The approach here is to push the netlink socket channels to discreet
> > handler threads to process, rather than polling on every thread.
> > This will eliminate the need to wake multiple threads.
> >
> > To check:
> >
> >   ip netns add left
> >   ip netns add right
> >   ip link add center-left type veth peer name left0 netns left
> >   ip link add center-right type veth peer name right0 netns right
> >   ip link set center-left up
> >   ip link set center-right up
> >   ip -n left ip link set left0 up
> >   ip -n left ip addr add 172.31.110.10/24 dev left0
> >   ip -n right ip link set right0 up
> >   ip -n right ip addr add 172.31.110.11/24 dev right0
> >
>
> I think the "ip -n" sequence should be like:
>
> ip -n left link set left0 up
>
> Paolo
>

Right, good catch!

-- 
Matteo Croce

perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net] net: openvswitch: fix TTL decrement action netlink message format

2020-11-23 Thread Matteo Croce
On Fri, Nov 20, 2020 at 10:12 PM Jakub Kicinski  wrote:
>
> On Thu, 19 Nov 2020 04:04:04 -0500 Eelco Chaudron wrote:
> > Currently, the openvswitch module is not accepting the correctly formated
> > netlink message for the TTL decrement action. For both setting and getting
> > the dec_ttl action, the actions should be nested in the
> > OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi.
>
> IOW this change will not break any known user space, correct?
>
> But existing OvS user space already expects it to work like you
> make it work now?
>
> What's the harm in leaving it as is?
>
> > Fixes: 744676e77720 ("openvswitch: add TTL decrement action")
> > Signed-off-by: Eelco Chaudron 
>
> Can we get a review from OvS folks? Matteo looks good to you (as the
> original author)?
>

Hi,

I think that the userspace still has to implement the dec_ttl action;
by now dec_ttl is implemented with set_ttl().
So there is no breakage yet.

Eelco, with this fix we will encode the netlink attribute in the same
way for the kernel and netdev datapath?

If so, go for it.


> > - err = __ovs_nla_copy_actions(net, attr, key, sfa, eth_type,
> > + err = __ovs_nla_copy_actions(net, actions, key, sfa, eth_type,
> >vlan_tci, mpls_label_count, log);
> >   if (err)
> >   return err;
>
> You're not canceling any nests on error, I assume this is normal.
>
> > + add_nested_action_end(*sfa, action_start);
> >   add_nested_action_end(*sfa, start);
> >   return 0;
> >  }
>


-- 
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] datapath: Add a new action dec_ttl

2020-11-13 Thread Matteo Croce
On Fri, Nov 13, 2020 at 3:28 PM Eelco Chaudron  wrote:
>
> It is already in the mainline kernel, so changing it now would break the
> UAPI. Don't think this is allowed from the kernel side.
>
> > What I'm suggesting is to send a bugfix to kernel
> > to accept only format with nested OVS_DEC_TTL_ATTR_ACTION.  Since this
> > feature was never implemented in userspace OVS, this change will not
> > break anything.  On the OVS side we should always format netlink
> > messages
> > in a correct way.  We have a code that checks feature existence in
> > kernel
> > and it should fail if kernel is broken (as it is right now).  In this
> > case
> > OVS will continue to use old implementation with setting the ttl
> > field.
> >
> > Since the patch for a kernel is a bug fix, it should be likely
> > backported
> > to stable versions, and distributions will pick it up too.
>
> If the kernel UAPI breakage is not a problem, I think this would work.
>
> > Thoughts?
> >
> > Best regards, Ilya Maximets.
>

Who is the only user of the kernel dec_ttl action now?
I guess that only a patched vswitchd will, unless someone tries to
program the datapath by hand.

Cheers,



--
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath: Add a new action dec_ttl

2020-11-11 Thread Matteo Croce
On Wed, Nov 11, 2020 at 4:13 PM Eelco Chaudron  wrote:
>
> Add support for the dec_ttl action. Instead of programming the datapath with
> a flow that matches the packet TTL and an IP set, use a single dec_ttl action.
>
> The old behavior is kept if the new action is not supported by the datapath.
>
>   # ovs-ofctl dump-flows br0
>cookie=0x0, duration=12.538s, table=0, n_packets=4, n_bytes=392, ip 
> actions=dec_ttl,NORMAL
>cookie=0x0, duration=12.536s, table=0, n_packets=4, n_bytes=168, 
> actions=NORMAL
>
>   # ping -c1 -t 20 192.168.0.2
>   PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
>   IP (tos 0x0, ttl 19, id 45336, offset 0, flags [DF], proto ICMP (1), length 
> 84)
>   192.168.0.1 > 192.168.0.2: ICMP echo request, id 8865, seq 1, length 64
>
> Linux netlink datapath support depends on upstream Linux commit:
>   744676e77720 ("openvswitch: add TTL decrement action")
>
>
> Note that in the Linux kernel tree the OVS_ACTION_ATTR_ADD_MPLS has been
> defined, and to make sure the IDs are in sync, it had to be added to the
> OVS source tree. This required some additional case statements, which
> should be revisited once the OVS implementation is added.
>
>
> Co-developed-by: Matteo Croce 
> Co-developed-by: Bindiya Kurle 
> Signed-off-by: Eelco Chaudron 
> ---
>  datapath/linux/compat/include/linux/openvswitch.h |8 ++
>  lib/dpif-netdev.c |4 +
>  lib/dpif.c|4 +
>  lib/odp-execute.c |  102 
> -
>  lib/odp-execute.h |2
>  lib/odp-util.c|   55 +++
>  lib/packets.h |   13 ++-
>  ofproto/ofproto-dpif-ipfix.c  |2
>  ofproto/ofproto-dpif-sflow.c  |2
>  ofproto/ofproto-dpif-xlate.c  |   54 +--
>  ofproto/ofproto-dpif.c|   37 
>  ofproto/ofproto-dpif.h|6 +
>  12 files changed, 266 insertions(+), 23 deletions(-)
>
> diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
> b/datapath/linux/compat/include/linux/openvswitch.h
> index 2d884312f..66c386462 100644
> --- a/datapath/linux/compat/include/linux/openvswitch.h
> +++ b/datapath/linux/compat/include/linux/openvswitch.h
> @@ -1021,6 +1021,8 @@ enum ovs_action_attr {
> OVS_ACTION_ATTR_METER,/* u32 meter number. */
> OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
> OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
> +   OVS_ACTION_ATTR_ADD_MPLS,  /* struct ovs_action_add_mpls. */
> +   OVS_ACTION_ATTR_DEC_TTL,   /* Nested OVS_DEC_TTL_ATTR_*. */
>
>  #ifndef __KERNEL__
> OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
> @@ -1040,6 +1042,12 @@ enum ovs_action_attr {
>
>  #define OVS_ACTION_ATTR_MAX (__OVS_ACTION_ATTR_MAX - 1)
>
> +enum ovs_dec_ttl_attr {
> +   OVS_DEC_TTL_ATTR_UNSPEC,
> +   OVS_DEC_TTL_ATTR_ACTION,/* Nested struct nlattr. */
> +   __OVS_DEC_TTL_ATTR_MAX
> +};
> +
>  /* Meters. */
>  #define OVS_METER_FAMILY  "ovs_meter"
>  #define OVS_METER_MCGROUP "ovs_meter"
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 300861ca5..35c6560c3 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -7975,6 +7975,8 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
> *packets_,
>  case OVS_ACTION_ATTR_CT_CLEAR:
>  case OVS_ACTION_ATTR_CHECK_PKT_LEN:
>  case OVS_ACTION_ATTR_DROP:
> +case OVS_ACTION_ATTR_DEC_TTL:
> +case OVS_ACTION_ATTR_ADD_MPLS:
>  case __OVS_ACTION_ATTR_MAX:
>  OVS_NOT_REACHED();
>  }
> @@ -7991,7 +7993,7 @@ dp_netdev_execute_actions(struct dp_netdev_pmd_thread 
> *pmd,
>  struct dp_netdev_execute_aux aux = { pmd, flow };
>
>  odp_execute_actions(, packets, should_steal, actions,
> -actions_len, dp_execute_cb);
> +actions_len, dp_execute_cb, false);
>  }
>
>  struct dp_netdev_ct_dump {
> diff --git a/lib/dpif.c b/lib/dpif.c
> index ac2860764..644d32d47 100644
> --- a/lib/dpif.c
> +++ b/lib/dpif.c
> @@ -1273,6 +1273,8 @@ dpif_execute_helper_cb(void *aux_, struct 
> dp_packet_batch *packets_,
>  case OVS_ACTION_ATTR_UNSPEC:
>  case OVS_ACTION_ATTR_CHECK_PKT_LEN:
>  case OVS_ACTION_ATTR_DROP:
> +case OVS_ACTION_ATTR_ADD_MPLS:
> +case OVS_ACTION_ATTR_DEC_TTL:
>  case __OVS_ACTION_ATTR_MAX:
> 

Re: [ovs-dev] [PATCH v3 2/3] dpif-netlink: distribute polling to discreet handlers

2020-08-28 Thread Matteo Croce
On Fri, Aug 28, 2020 at 7:00 PM Mark Gray  wrote:
> v3: one port latency results as per Matteo's comments
>
> Stock:
>   min/avg/max/mdev = 21.5/36.5/96.5/1.0 us
> With Patch:
>   min/avg/max/mdev = 5.3/9.7/98.4/0.5 us
>

Awesome, LGTM!

-- 
Matteo Croce

perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netlink: distribute polling to discreet handlers

2020-07-22 Thread Matteo Croce
On Wed, Jul 22, 2020 at 1:27 AM Aaron Conole  wrote:
> To check:
>
>   ip netns add left
>   ip netns add right
>   ip link add center-left type veth peer name left0
>   ip link add center-right type veth peer name right0
>   ip link set left0 netns left
>   ip link set right0 netns right

Nit: ip can set the peer netns upon veth creation:

  ip link add center-left type veth peer name left0 netns left
  ip link add center-right type veth peer name right0 netns right

> +static uint32_t
> +dpif_handler_port_idx_max(const struct dpif_handler *handler,
> +  struct dpif_channel **out_chn)
> +{
> +uint32_t max = 0;
> +size_t i;
> +
> +for (i = 0; i < handler->n_channels; ++i) {

Any reason for using the prefix operator everywhere?
They should be equal nowadays.

> --
> 2.25.4
>

Good job!
Did you have the chance to run the same tests I did in 69c51582ff78?
I'm curious to see how much it improves.

Bye,
-- 
Matteo Croce

perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC] dpif-netlink: distribute polling to discreet handlers

2020-06-25 Thread Matteo Croce
On Wed, Jun 24, 2020 at 3:48 PM Aaron Conole  wrote:
>
> Currently, the channel handlers are polled globally.  On some
> systems, this causes a thundering herd issue where multiple
> handler threads become active, only to do no work and immediately
> sleep.
>
> The approach here is to push the netlink socket channels to discreet
> handler threads to process, rather than polling on every thread.
> This will eliminate the need to wake multiple threads.
>
> To check:
>
>   ip netns add left
>   ip netns add right
>   ip link add center-left type veth peer name left0
>   ip link add center-right type veth peer name right0
>   ip link set left0 netns left
>   ip link set right0 netns right
>   ip link set center-left up
>   ip link set center-right up
>   ip netns exec left ip link set left0 up
>   ip netns exec left ip addr add 172.31.110.10/24 dev left0
>   ip netns exec right ip link set right0 up
>   ip netns exec right ip addr add 172.31.110.11/24 dev right0
>
>   ovs-vsctl add-br br0
>   ovs-vsctl add-port br0 center-right
>   ovs-vsctl add-port br0 center-left
>
>   # in one terminal
>   perf record -e sched:sched_wakeup,irq:softirq_entry -ag
>
>   # in a separate terminal
>   ip netns exec left arping -I left0 -c 1 172.31.110.11
>
>   # in the perf terminal after exiting
>   perf script
>
> Look for the number of 'handler' threads which were made active.
>
> Reported-by: David Ahern 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/365857.html
> Cc: Matteo Croce 
> Cc: Flavio Leitner 
> Signed-off-by: Aaron Conole 
>

Great work!

Just to understand, this reverts some logic of my patch, and
transposes the threads and sockets?

Maybe it's the case to add a Fixes tag pointing to 69c51582f ?

Nitpick: `iproute -n` instead of `ip netns exec` where possible, e.g.

ip netns exec left ip link set left0 up

becomes

ip -n left link set left0 up

Cheers,
-- 
Matteo Croce

perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] tests: add unit test for lb-output-action

2020-05-25 Thread Matteo Croce
Extend the balance-tcp one so it tests lb-output-action too.
The test checks that that the option is shown in bond/show,
and that the lb_output action is programmed in the datapath.

Signed-off-by: Matteo Croce 
---
 tests/ofproto-dpif.at | 31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 41164d735..dfcfc7161 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -152,6 +152,8 @@ ovs-appctl time/stop
 ovs-appctl time/warp 100
 ovs-appctl lacp/show > lacp.txt
 ovs-appctl bond/show > bond.txt
+# Check that lb_output is not enabled by default
+AT_CHECK([grep -q '^lb-output-action: disabled' bond.txt])
 (
 for i in `seq 0 255` ;
 do
@@ -164,9 +166,32 @@ AT_CHECK([ovs-appctl dpif/dump-flows br0 |grep tcp > 
br0_flows.txt])
 AT_CHECK([ovs-appctl dpif/dump-flows br1 |grep tcp > br1_flows.txt])
 # Make sure there is resonable distribution to all three ports.
 # We don't want to make this check precise, in case hash function changes.
-AT_CHECK([test `grep in_port.4 br1_flows.txt |wc -l` -gt 24])
-AT_CHECK([test `grep in_port.5 br1_flows.txt |wc -l` -gt 24])
-AT_CHECK([test `grep in_port.6 br1_flows.txt |wc -l` -gt 24])
+AT_CHECK([test $(grep -c in_port.4 br1_flows.txt) -gt 24])
+AT_CHECK([test $(grep -c in_port.5 br1_flows.txt) -gt 24])
+AT_CHECK([test $(grep -c in_port.6 br1_flows.txt) -gt 24])
+# Check that bonding is doing dp_hash
+AT_CHECK([grep -q dp_hash br0_flows.txt])
+# Enabling lb_output
+AT_CHECK([ovs-vsctl set Port bond0 other_config:lb-output-action=true])
+OVS_WAIT_UNTIL([ovs-appctl bond/show | grep -q '^lb-output-action: enabled'])
+ovs-appctl time/warp 1 500
+ovs-appctl revalidator/wait
+OVS_WAIT_WHILE([ovs-appctl dpif/dump-flows br1 | grep -q tcp])
+(
+for i in $(seq 256) ;
+do
+
pkt="in_port(7),eth(src=50:54:00:00:00:05,dst=50:54:00:00:01:00),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=$i),tcp_flags(ack)"
+AT_CHECK([ovs-appctl netdev-dummy/receive p7 $pkt])
+done
+)
+ovs-appctl time/warp 300 100
+AT_CHECK([ovs-appctl dpif/dump-flows br0 | grep tcp > br0_flows.txt])
+AT_CHECK([ovs-appctl dpif/dump-flows br1 | grep tcp > br1_flows.txt])
+# Make sure there is resonable distribution to all three ports, again.
+AT_CHECK([test $(grep -c in_port.4 br1_flows.txt) -gt 24])
+AT_CHECK([test $(grep -c in_port.5 br1_flows.txt) -gt 24])
+AT_CHECK([test $(grep -c in_port.6 br1_flows.txt) -gt 24])
+AT_CHECK([grep -q lb_output br0_flows.txt])
 
 OVS_VSWITCHD_STOP()
 AT_CLEANUP
-- 
2.26.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v12 0/1] Balance-tcp bond mode optimization

2020-03-31 Thread Matteo Croce
On Wed, Mar 11, 2020 at 3:26 PM Vishal Deep Ajmera
 wrote:
>
> v11->v12:
>  Addressed most of comments from Ilya and Eelco.
>  https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367832.html
>  https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367842.html
>  Rebased to OVS master.
>

Hi Vishal,

I tested your patch, but it seems there is a memory leak which depletes
all the buffers in a very short time.
I've found that reverting a chunk fixed it, this is the change I made:

--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -7354,15 +7354,17 @@ dp_execute_lb_output_action(struct dp_netdev_pmd_thread 
*pmd,
 struct dp_packet_batch *packets_,
 bool should_steal, uint32_t bond)
 {
-struct dp_packet *packet;
-uint32_t i;
-const uint32_t cnt = dp_packet_batch_size(packets_);
 struct tx_bond *p_bond = tx_bond_lookup(>tx_bonds, bond);
 
 if (!p_bond) {
 return false;
 }
-DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
+
+struct dp_packet_batch del_pkts;
+dp_packet_batch_init(_pkts);
+
+struct dp_packet *packet;
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) {
 /*
  * Lookup the bond-hash table using hash to get the slave.
  */
@@ -7379,11 +7381,16 @@ dp_execute_lb_output_action(struct dp_netdev_pmd_thread 
*pmd,
 non_atomic_ullong_add(_entry->n_packets, 1);
 non_atomic_ullong_add(_entry->n_bytes, size);
 } else {
-dp_packet_batch_refill(packets_, packet, i);
+dp_packet_batch_add(_pkts, packet);
 }
 }
 
-return dp_packet_batch_is_empty(packets_);
+/* Delete packets that failed OUTPUT action */
+COVERAGE_ADD(datapath_drop_invalid_port,
+ dp_packet_batch_size(_pkts));
+dp_packet_delete_batch(_pkts, should_steal);
+
+return true;
 }
 
 static void

Regards,
-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v5] openvswitch: add TTL decrement action

2020-02-15 Thread Matteo Croce
New action to decrement TTL instead of setting it to a fixed value.
This action will decrement the TTL and, in case of expired TTL, drop it
or execute an action passed via a nested attribute.
The default TTL expired action is to drop the packet.

Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.

Tested with a corresponding change in the userspace:

# ovs-dpctl dump-flows
in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
actions:dec_ttl{ttl<=1 action:(drop)},1
in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
actions:dec_ttl{ttl<=1 action:(drop)},2
in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1

# ping -c1 192.168.0.2 -t 42
IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 
84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
# ping -c1 192.168.0.2 -t 120
IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
length 84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
# ping -c1 192.168.0.2 -t 1
#

Co-developed-by: Bindiya Kurle 
Signed-off-by: Bindiya Kurle 
Signed-off-by: Matteo Croce 
---
 include/uapi/linux/openvswitch.h |  7 
 net/openvswitch/actions.c| 67 ++
 net/openvswitch/flow_netlink.c   | 70 
 3 files changed, 144 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index ae2bff14e7e1..9b14519e74d9 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -958,6 +958,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
+   OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
@@ -1050,4 +1051,10 @@ struct ovs_zone_limit {
__u32 count;
 };
 
+enum ovs_dec_ttl_attr {
+   OVS_DEC_TTL_ATTR_UNSPEC,
+   OVS_DEC_TTL_ATTR_ACTION,/* Nested struct nlattr */
+   __OVS_DEC_TTL_ATTR_MAX
+};
+
 #endif /* _LINUX_OPENVSWITCH_H */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 7fbfe2adfffa..fc0efd8833c8 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -964,6 +964,25 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
return ovs_dp_upcall(dp, skb, key, , cutlen);
 }
 
+static int dec_ttl_exception_handler(struct datapath *dp, struct sk_buff *skb,
+struct sw_flow_key *key,
+const struct nlattr *attr, bool last)
+{
+   /* The first action is always 'OVS_DEC_TTL_ATTR_ARG'. */
+   struct nlattr *dec_ttl_arg = nla_data(attr);
+   int rem = nla_len(attr);
+
+   if (nla_len(dec_ttl_arg)) {
+   struct nlattr *actions = nla_next(dec_ttl_arg, );
+
+   if (actions)
+   return clone_execute(dp, skb, key, 0, actions, rem,
+last, false);
+   }
+   consume_skb(skb);
+   return 0;
+}
+
 /* When 'last' is true, sample() should always consume the 'skb'.
  * Otherwise, sample() should keep 'skb' intact regardless what
  * actions are executed within sample().
@@ -1180,6 +1199,45 @@ static int execute_check_pkt_len(struct datapath *dp, 
struct sk_buff *skb,
 nla_len(actions), last, clone_flow_key);
 }
 
+static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key)
+{
+   int err;
+
+   if (skb->protocol == htons(ETH_P_IPV6)) {
+   struct ipv6hdr *nh;
+
+   err = skb_ensure_writable(skb, skb_network_offset(skb) +
+ sizeof(*nh));
+   if (unlikely(err))
+   return err;
+
+   nh = ipv6_hdr(skb);
+
+   if (nh->hop_limit <= 1)
+   return -EHOSTUNREACH;
+
+   key->ip.ttl = --nh->hop_limit;
+   } else {
+   struct iphdr *nh;
+   u8 old_ttl;
+
+   err = skb_ensure_writable(skb, skb_network_offset(skb) +
+ sizeof(*nh));
+   if (unlikely(err))
+   return err;
+
+   nh = ip_hdr(skb);
+   if (nh->ttl <= 1)
+   return -EHOSTUNREACH;
+
+   old_ttl = nh->ttl--;
+   csum_replace2(>check, htons(old_ttl << 8),
+ htons(nh-

[ovs-dev] [PATCH net-next v4] openvswitch: add TTL decrement action

2020-02-06 Thread Matteo Croce
New action to decrement TTL instead of setting it to a fixed value.
This action will decrement the TTL and, in case of expired TTL, drop it
or execute an action passed via a nested attribute.
The default TTL expired action is to drop the packet.

Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.

Tested with a corresponding change in the userspace:

# ovs-dpctl dump-flows
in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
actions:dec_ttl{ttl<=1 action:(drop)},1
in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
actions:dec_ttl{ttl<=1 action:(drop)},2
in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1

# ping -c1 192.168.0.2 -t 42
IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 
84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
# ping -c1 192.168.0.2 -t 120
IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
length 84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
# ping -c1 192.168.0.2 -t 1
#

Co-developed-by: Bindiya Kurle 
Signed-off-by: Bindiya Kurle 
Signed-off-by: Matteo Croce 
---
 include/uapi/linux/openvswitch.h |  7 
 net/openvswitch/actions.c| 67 ++
 net/openvswitch/flow_netlink.c   | 70 
 3 files changed, 144 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index ae2bff14e7e1..9b14519e74d9 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -958,6 +958,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
+   OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
@@ -1050,4 +1051,10 @@ struct ovs_zone_limit {
__u32 count;
 };
 
+enum ovs_dec_ttl_attr {
+   OVS_DEC_TTL_ATTR_UNSPEC,
+   OVS_DEC_TTL_ATTR_ACTION,/* Nested struct nlattr */
+   __OVS_DEC_TTL_ATTR_MAX
+};
+
 #endif /* _LINUX_OPENVSWITCH_H */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 7fbfe2adfffa..fc0efd8833c8 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -964,6 +964,25 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
return ovs_dp_upcall(dp, skb, key, , cutlen);
 }
 
+static int dec_ttl_exception_handler(struct datapath *dp, struct sk_buff *skb,
+struct sw_flow_key *key,
+const struct nlattr *attr, bool last)
+{
+   /* The first action is always 'OVS_DEC_TTL_ATTR_ARG'. */
+   struct nlattr *dec_ttl_arg = nla_data(attr);
+   int rem = nla_len(attr);
+
+   if (nla_len(dec_ttl_arg)) {
+   struct nlattr *actions = nla_next(dec_ttl_arg, );
+
+   if (actions)
+   return clone_execute(dp, skb, key, 0, actions, rem,
+last, false);
+   }
+   consume_skb(skb);
+   return 0;
+}
+
 /* When 'last' is true, sample() should always consume the 'skb'.
  * Otherwise, sample() should keep 'skb' intact regardless what
  * actions are executed within sample().
@@ -1180,6 +1199,45 @@ static int execute_check_pkt_len(struct datapath *dp, 
struct sk_buff *skb,
 nla_len(actions), last, clone_flow_key);
 }
 
+static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key)
+{
+   int err;
+
+   if (skb->protocol == htons(ETH_P_IPV6)) {
+   struct ipv6hdr *nh;
+
+   err = skb_ensure_writable(skb, skb_network_offset(skb) +
+ sizeof(*nh));
+   if (unlikely(err))
+   return err;
+
+   nh = ipv6_hdr(skb);
+
+   if (nh->hop_limit <= 1)
+   return -EHOSTUNREACH;
+
+   key->ip.ttl = --nh->hop_limit;
+   } else {
+   struct iphdr *nh;
+   u8 old_ttl;
+
+   err = skb_ensure_writable(skb, skb_network_offset(skb) +
+ sizeof(*nh));
+   if (unlikely(err))
+   return err;
+
+   nh = ip_hdr(skb);
+   if (nh->ttl <= 1)
+   return -EHOSTUNREACH;
+
+   old_ttl = nh->ttl--;
+   csum_replace2(>check, htons(old_ttl << 8),
+ htons(nh-

[ovs-dev] [PATCH net-next v3] openvswitch: add TTL decrement action

2020-01-15 Thread Matteo Croce
New action to decrement TTL instead of setting it to a fixed value.
This action will decrement the TTL and, in case of expired TTL, drop it
or execute an action passed via a nested attribute.
The default TTL expired action is to drop the packet.

Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.

Tested with a corresponding change in the userspace:

# ovs-dpctl dump-flows
in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
actions:dec_ttl{ttl<=1 action:(drop)},1
in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
actions:dec_ttl{ttl<=1 action:(drop)},2
in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1

# ping -c1 192.168.0.2 -t 42
IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 
84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
# ping -c1 192.168.0.2 -t 120
IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
length 84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
# ping -c1 192.168.0.2 -t 1
#

Co-developed-by: Bindiya Kurle 
Signed-off-by: Bindiya Kurle 
Signed-off-by: Matteo Croce 
---
 include/uapi/linux/openvswitch.h |  2 +
 net/openvswitch/actions.c| 67 ++
 net/openvswitch/flow_netlink.c   | 71 
 3 files changed, 140 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index ae2bff14e7e1..9d3f040847af 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -958,6 +958,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
+   OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
@@ -1050,4 +1051,5 @@ struct ovs_zone_limit {
__u32 count;
 };
 
+#define OVS_DEC_TTL_ATTR_EXEC  0
 #endif /* _LINUX_OPENVSWITCH_H */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 7fbfe2adfffa..1b0afc9bf1ad 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -964,6 +964,26 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
return ovs_dp_upcall(dp, skb, key, , cutlen);
 }
 
+static int dec_ttl(struct datapath *dp, struct sk_buff *skb,
+  struct sw_flow_key *key,
+  const struct nlattr *attr, bool last)
+{
+   /* The first action is always 'OVS_DEC_TTL_ATTR_ARG'. */
+   struct nlattr *dec_ttl_arg = nla_data(attr);
+   u32 nested = nla_get_u32(dec_ttl_arg);
+   int rem = nla_len(attr);
+
+   if (nested) {
+   struct nlattr *actions = nla_next(dec_ttl_arg, );
+
+   if (actions)
+   return clone_execute(dp, skb, key, 0, actions, rem,
+last, false);
+   }
+   consume_skb(skb);
+   return 0;
+}
+
 /* When 'last' is true, sample() should always consume the 'skb'.
  * Otherwise, sample() should keep 'skb' intact regardless what
  * actions are executed within sample().
@@ -1180,6 +1200,45 @@ static int execute_check_pkt_len(struct datapath *dp, 
struct sk_buff *skb,
 nla_len(actions), last, clone_flow_key);
 }
 
+static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key)
+{
+   int err;
+
+   if (skb->protocol == htons(ETH_P_IPV6)) {
+   struct ipv6hdr *nh;
+
+   err = skb_ensure_writable(skb, skb_network_offset(skb) +
+ sizeof(*nh));
+   if (unlikely(err))
+   return err;
+
+   nh = ipv6_hdr(skb);
+
+   if (nh->hop_limit <= 1)
+   return -EHOSTUNREACH;
+
+   key->ip.ttl = --nh->hop_limit;
+   } else {
+   struct iphdr *nh;
+   u8 old_ttl;
+
+   err = skb_ensure_writable(skb, skb_network_offset(skb) +
+ sizeof(*nh));
+   if (unlikely(err))
+   return err;
+
+   nh = ip_hdr(skb);
+   if (nh->ttl <= 1)
+   return -EHOSTUNREACH;
+
+   old_ttl = nh->ttl--;
+   csum_replace2(>check, htons(old_ttl << 8),
+ htons(nh->ttl << 8));
+   key->ip.ttl = nh->ttl;
+   }
+   return 0;
+}
+
 /* Execute a list of actions against 'skb'. *

Re: [ovs-dev] [PATCH net-next v2] openvswitch: add TTL decrement action

2019-12-20 Thread Matteo Croce
On Tue, Dec 17, 2019 at 5:30 PM Nikolay Aleksandrov
 wrote:
>
> On 17/12/2019 17:51, Matteo Croce wrote:
> > New action to decrement TTL instead of setting it to a fixed value.
> > This action will decrement the TTL and, in case of expired TTL, drop it
> > or execute an action passed via a nested attribute.
> > The default TTL expired action is to drop the packet.
> >
> > Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.
> >
> > Tested with a corresponding change in the userspace:
> >
> > # ovs-dpctl dump-flows
> > in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> > actions:dec_ttl{ttl<=1 action:(drop)},1,1
> > in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> > actions:dec_ttl{ttl<=1 action:(drop)},1,2
> > in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> > actions:2
> > in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> > actions:1
> >
> > # ping -c1 192.168.0.2 -t 42
> > IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), 
> > length 84)
> > 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 
> > 64
> > # ping -c1 192.168.0.2 -t 120
> > IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
> > length 84)
> > 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 
> > 64
> > # ping -c1 192.168.0.2 -t 1
> > #
> >
> > Co-authored-by: Bindiya Kurle 
> > Signed-off-by: Bindiya Kurle 
> > Signed-off-by: Matteo Croce 
> > ---
> >  include/uapi/linux/openvswitch.h |  22 +++
> >  net/openvswitch/actions.c|  71 +
> >  net/openvswitch/flow_netlink.c   | 105 +++
> >  3 files changed, 198 insertions(+)
> >
>
> Hi Matteo,
>
> [snip]
> > +}
> > +
> >  /* When 'last' is true, sample() should always consume the 'skb'.
> >   * Otherwise, sample() should keep 'skb' intact regardless what
> >   * actions are executed within sample().
> > @@ -1176,6 +1201,44 @@ static int execute_check_pkt_len(struct datapath 
> > *dp, struct sk_buff *skb,
> >nla_len(actions), last, clone_flow_key);
> >  }
> >
> > +static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key)
> > +{
> > + int err;
> > +
> > + if (skb->protocol == htons(ETH_P_IPV6)) {
> > + struct ipv6hdr *nh = ipv6_hdr(skb);
> > +
> > + err = skb_ensure_writable(skb, skb_network_offset(skb) +
> > +   sizeof(*nh));
>
> skb_ensure_writable() calls pskb_may_pull() which may reallocate so nh might 
> become invalid.
> It seems the IPv4 version below is ok as the ptr is reloaded.
>

Right

> One q as I don't know ovs that much - can this action be called only with
> skb->protocol ==  ETH_P_IP/IPV6 ? I.e. Are we sure that if it's not v6, then 
> it must be v4 ?
>

I'm adding a check in validate_and_copy_dec_ttl() so only ipv4/ipv6
packet will pass.

Thanks,

-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: add TTL decrement action

2019-12-19 Thread Matteo Croce
On Wed, Dec 18, 2019 at 4:06 AM Pravin Shelar  wrote:
>
> On Tue, Dec 17, 2019 at 7:51 AM Matteo Croce  wrote:
> >
> > New action to decrement TTL instead of setting it to a fixed value.
> > This action will decrement the TTL and, in case of expired TTL, drop it
> > or execute an action passed via a nested attribute.
> > The default TTL expired action is to drop the packet.
> >
> > Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.
> >
> > Tested with a corresponding change in the userspace:
> >
> > # ovs-dpctl dump-flows
> > in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> > actions:dec_ttl{ttl<=1 action:(drop)},1,1
> > in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> > actions:dec_ttl{ttl<=1 action:(drop)},1,2
> > in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> > actions:2
> > in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> > actions:1
> >
> > # ping -c1 192.168.0.2 -t 42
> > IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), 
> > length 84)
> > 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 
> > 64
> > # ping -c1 192.168.0.2 -t 120
> > IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
> > length 84)
> > 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 
> > 64
> > # ping -c1 192.168.0.2 -t 1
> > #
> >
> > Co-authored-by: Bindiya Kurle 
> > Signed-off-by: Bindiya Kurle 
> > Signed-off-by: Matteo Croce 
> > ---
> >  include/uapi/linux/openvswitch.h |  22 +++
> >  net/openvswitch/actions.c|  71 +
> >  net/openvswitch/flow_netlink.c   | 105 +++
> >  3 files changed, 198 insertions(+)
> >
> > diff --git a/include/uapi/linux/openvswitch.h 
> > b/include/uapi/linux/openvswitch.h
> > index a87b44cd5590..b6684bc04883 100644
> > --- a/include/uapi/linux/openvswitch.h
> > +++ b/include/uapi/linux/openvswitch.h
> > @@ -927,6 +927,7 @@ enum ovs_action_attr {
> > OVS_ACTION_ATTR_METER,/* u32 meter ID. */
> > OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
> > OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. 
> > */
> > +   OVS_ACTION_ATTR_DEC_TTL,   /* Nested OVS_DEC_TTL_ATTR_*. */
> >
> > __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
> >* from userspace. */
> > @@ -939,6 +940,23 @@ enum ovs_action_attr {
> >  };
> >
> >  #define OVS_ACTION_ATTR_MAX (__OVS_ACTION_ATTR_MAX - 1)
> > +enum ovs_dec_ttl_attr {
> > +   OVS_DEC_TTL_ATTR_UNSPEC,
> > +   OVS_DEC_TTL_ATTR_ACTION_TYPE,/* Action Type u32 */
> > +   OVS_DEC_TTL_ATTR_ACTION, /* nested action */
> > +   __OVS_DEC_TTL_ATTR_MAX,
> > +#ifdef __KERNEL__
> > +   OVS_DEC_TTL_ATTR_ARG  /* struct sample_arg  */
> > +#endif
> > +};
> > +
>
> I do not see need for type or OVS_DEC_TTL_ACTION_DROP, if there are no
> nested action the datapath can drop the packet.
>
> > +#ifdef __KERNEL__
> > +struct dec_ttl_arg {
> > +   u32 action_type;/* dec_ttl action type.*/
> > +};
> > +#endif
> > +
> > +#define OVS_DEC_TTL_ATTR_MAX (__OVS_DEC_TTL_ATTR_MAX - 1)
> >
> >  /* Meters. */
> >  #define OVS_METER_FAMILY  "ovs_meter"
> > @@ -1009,6 +1027,10 @@ enum ovs_ct_limit_attr {
> > __OVS_CT_LIMIT_ATTR_MAX
> >  };
> >
> > +enum ovs_dec_ttl_action {/*Actions supported by dec_ttl */
> > +   OVS_DEC_TTL_ACTION_DROP,
> > +   OVS_DEC_TTL_ACTION_USER_SPACE
> > +};
> >  #define OVS_CT_LIMIT_ATTR_MAX (__OVS_CT_LIMIT_ATTR_MAX - 1)
> >
> >  #define OVS_ZONE_LIMIT_DEFAULT_ZONE -1
> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 4c8395462303..5329668732b1 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -960,6 +960,31 @@ static int output_userspace(struct datapath *dp, 
> > struct sk_buff *skb,
> > return ovs_dp_upcall(dp, skb, key, , cutlen);
> >  }
> >
> > +static int dec_ttl(struct datapath *dp, struct sk_buff *skb,
> > +  struct sw_flow_key *fk, const struct nlattr *attr, bool 
> > last)
> > +{
> > +   struct nlattr 

Re: [ovs-dev] [PATCH v8 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-12-17 Thread Matteo Croce
On Tue, Dec 17, 2019 at 11:03 AM Vishal Deep Ajmera
 wrote:
>
> Problem:
> 
> In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
> forwarded to the bond member port based on 8-bits of the datapath hash
> value computed through dp_hash. This causes performance degradation in the
> following ways:
>

ACK.

Just a nitpick: can you remove the non ASCII double quotes from the
commit message? They shows in git log as:

<80><9C>balance-tcp<80><9D>

Regards,
-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v2] openvswitch: add TTL decrement action

2019-12-17 Thread Matteo Croce
New action to decrement TTL instead of setting it to a fixed value.
This action will decrement the TTL and, in case of expired TTL, drop it
or execute an action passed via a nested attribute.
The default TTL expired action is to drop the packet.

Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.

Tested with a corresponding change in the userspace:

# ovs-dpctl dump-flows
in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
actions:dec_ttl{ttl<=1 action:(drop)},1,1
in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
actions:dec_ttl{ttl<=1 action:(drop)},1,2
in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1

# ping -c1 192.168.0.2 -t 42
IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 
84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
# ping -c1 192.168.0.2 -t 120
IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
length 84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
# ping -c1 192.168.0.2 -t 1
#

Co-authored-by: Bindiya Kurle 
Signed-off-by: Bindiya Kurle 
Signed-off-by: Matteo Croce 
---
 include/uapi/linux/openvswitch.h |  22 +++
 net/openvswitch/actions.c|  71 +
 net/openvswitch/flow_netlink.c   | 105 +++
 3 files changed, 198 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index a87b44cd5590..b6684bc04883 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -927,6 +927,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_METER,/* u32 meter ID. */
OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
+   OVS_ACTION_ATTR_DEC_TTL,   /* Nested OVS_DEC_TTL_ATTR_*. */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
@@ -939,6 +940,23 @@ enum ovs_action_attr {
 };
 
 #define OVS_ACTION_ATTR_MAX (__OVS_ACTION_ATTR_MAX - 1)
+enum ovs_dec_ttl_attr {
+   OVS_DEC_TTL_ATTR_UNSPEC,
+   OVS_DEC_TTL_ATTR_ACTION_TYPE,/* Action Type u32 */
+   OVS_DEC_TTL_ATTR_ACTION, /* nested action */
+   __OVS_DEC_TTL_ATTR_MAX,
+#ifdef __KERNEL__
+   OVS_DEC_TTL_ATTR_ARG  /* struct sample_arg  */
+#endif
+};
+
+#ifdef __KERNEL__
+struct dec_ttl_arg {
+   u32 action_type;/* dec_ttl action type.*/
+};
+#endif
+
+#define OVS_DEC_TTL_ATTR_MAX (__OVS_DEC_TTL_ATTR_MAX - 1)
 
 /* Meters. */
 #define OVS_METER_FAMILY  "ovs_meter"
@@ -1009,6 +1027,10 @@ enum ovs_ct_limit_attr {
__OVS_CT_LIMIT_ATTR_MAX
 };
 
+enum ovs_dec_ttl_action {/*Actions supported by dec_ttl */
+   OVS_DEC_TTL_ACTION_DROP,
+   OVS_DEC_TTL_ACTION_USER_SPACE
+};
 #define OVS_CT_LIMIT_ATTR_MAX (__OVS_CT_LIMIT_ATTR_MAX - 1)
 
 #define OVS_ZONE_LIMIT_DEFAULT_ZONE -1
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 4c8395462303..5329668732b1 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -960,6 +960,31 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
return ovs_dp_upcall(dp, skb, key, , cutlen);
 }
 
+static int dec_ttl(struct datapath *dp, struct sk_buff *skb,
+  struct sw_flow_key *fk, const struct nlattr *attr, bool last)
+{
+   struct nlattr *actions;
+   struct nlattr *dec_ttl_arg;
+   int rem = nla_len(attr);
+   const struct dec_ttl_arg *arg;
+
+   /* The first action is always OVS_DEC_TTL_ATTR_ARG. */
+   dec_ttl_arg = nla_data(attr);
+   arg = nla_data(dec_ttl_arg);
+   actions = nla_next(dec_ttl_arg, );
+
+   switch (arg->action_type) {
+   case OVS_DEC_TTL_ACTION_DROP:
+   consume_skb(skb);
+   break;
+
+   case OVS_DEC_TTL_ACTION_USER_SPACE:
+   return clone_execute(dp, skb, fk, 0, actions, rem, last, false);
+   }
+
+   return 0;
+}
+
 /* When 'last' is true, sample() should always consume the 'skb'.
  * Otherwise, sample() should keep 'skb' intact regardless what
  * actions are executed within sample().
@@ -1176,6 +1201,44 @@ static int execute_check_pkt_len(struct datapath *dp, 
struct sk_buff *skb,
 nla_len(actions), last, clone_flow_key);
 }
 
+static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key)
+{
+   int err;
+
+   if (skb->protocol == htons(ETH_P_IPV6)) {
+   struct ipv6hdr *nh = ipv6_hdr(skb);
+
+   err = skb_ensure_writable(skb, skb_network_offset(skb) +
+ sizeof(*nh));
+   

Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread Matteo Croce
On Tue, Dec 10, 2019 at 10:00 PM David Ahern  wrote:
>
> [ adding Jason as author of the patch that added the epoll exclusive flag ]
>
> On 12/10/19 12:37 PM, Matteo Croce wrote:
> > On Tue, Dec 10, 2019 at 8:13 PM David Ahern  wrote:
> >>
> >> Hi Matteo:
> >>
> >> On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
> >> thundering herd wake up problem. Every packet punted to userspace wakes
> >> up every one of the handler threads. On a box with 96 cpus, there are 71
> >> handler threads which means 71 process wakeups for every packet punted.
> >>
> >> This is really easy to see, just watch sched:sched_wakeup tracepoints.
> >> With a few extra probes:
> >>
> >> perf probe sock_def_readable sk=%di
> >> perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
> >> perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
> >> wake_flags=%cx key=%8
> >>
> >> you can see there is a single netlink socket and its wait queue contains
> >> an entry for every handler thread.
> >>
> >> This does not happen with the 2.7.3 version. Roaming commits it appears
> >> that the change in behavior comes from this commit:
> >>
> >> commit 69c51582ff786a68fc325c1c50624715482bc460
> >> Author: Matteo Croce 
> >> Date:   Tue Sep 25 10:51:05 2018 +0200
> >>
> >> dpif-netlink: don't allocate per thread netlink sockets
> >>
> >>
> >> Is this a known problem?
> >>
> >> David
> >>
> >
> > Hi David,
> >
> > before my patch, vswitchd created NxM sockets, being N the ports and M
> > the active cores,
> > because every thread opens a netlink socket per port.
> >
> > With my patch, a pool is created with N socket, one per port, and all
> > the threads polls the same list
> > with the EPOLLEXCLUSIVE flag.
> > As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one
> > of the waiting threads.
> >
> > I'm not aware of this problem, but it goes against the intended
> > behaviour of EPOLLEXCLUSIVE.
> > Such flag exists since Linux 4.5, can you check that it's passed
> > correctly to epoll()?
> >
>
> This the commit that added the EXCLUSIVE flag:
>
> commit df0108c5da561c66c333bb46bfe3c1fc65905898
> Author: Jason Baron 
> Date:   Wed Jan 20 14:59:24 2016 -0800
>
> epoll: add EPOLLEXCLUSIVE flag
>
>
> The commit message acknowledges that multiple threads can still be awakened:
>
> "The implementation walks the list of exclusive waiters, and queues an
> event to each epfd, until it finds the first waiter that has threads
> blocked on it via epoll_wait().  The idea is to search for threads which
> are idle and ready to process the wakeup events.  Thus, we queue an
> event to at least 1 epfd, but may still potentially queue an event to
> all epfds that are attached to the shared fd source."
>
> To me that means all idle handler threads are going to be awakened on
> each upcall message even though only 1 is needed to handle the message.
>
> Jason: What was the rationale behind the exclusive flag that still wakes
> up more than 1 waiter? In the case of OVS and vswitchd I am seeing all N
> handler threads awakened on every single event which is a horrible
> scaling property.
>

Actually, I didn't look at that commit message, but I read the
epoll_ctl manpage which says:

"When a wakeup event occurs and multiple epoll file descriptors
are attached to the same target file using EPOLLEXCLUSIVE, one
or more of the epoll file descriptors will receive an event
with epoll_wait(2).  The default in this scenario (when
EPOLLEXCLUSIVE is not set) is for all epoll file descriptors
to receive an event.  EPOLLEXCLUSIVE is thus useful for avoid‐
ing thundering herd problems in certain scenarios."

I'd expect "one or more" to be probably greater than 1, but still much
lower than all.

Before this patch (which unfortunately is needed to avoid -EMFILE
errors with many ports), how many sockets are awakened when an ARP is
received?

Regards,

-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread Matteo Croce
On Tue, Dec 10, 2019 at 8:41 PM David Ahern  wrote:
>
> On 12/10/19 12:37 PM, Matteo Croce wrote:
> > On Tue, Dec 10, 2019 at 8:13 PM David Ahern  wrote:
> >>
> >> Hi Matteo:
> >>
> >> On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
> >> thundering herd wake up problem. Every packet punted to userspace wakes
> >> up every one of the handler threads. On a box with 96 cpus, there are 71
> >> handler threads which means 71 process wakeups for every packet punted.
> >>
> >> This is really easy to see, just watch sched:sched_wakeup tracepoints.
> >> With a few extra probes:
> >>
> >> perf probe sock_def_readable sk=%di
> >> perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
> >> perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
> >> wake_flags=%cx key=%8
> >>
> >> you can see there is a single netlink socket and its wait queue contains
> >> an entry for every handler thread.
> >>
> >> This does not happen with the 2.7.3 version. Roaming commits it appears
> >> that the change in behavior comes from this commit:
> >>
> >> commit 69c51582ff786a68fc325c1c50624715482bc460
> >> Author: Matteo Croce 
> >> Date:   Tue Sep 25 10:51:05 2018 +0200
> >>
> >> dpif-netlink: don't allocate per thread netlink sockets
> >>
> >>
> >> Is this a known problem?
> >>
> >> David
> >>
> >
> > Hi David,
> >
> > before my patch, vswitchd created NxM sockets, being N the ports and M
> > the active cores,
> > because every thread opens a netlink socket per port.
> >
> > With my patch, a pool is created with N socket, one per port, and all
> > the threads polls the same list
> > with the EPOLLEXCLUSIVE flag.
> > As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one
> > of the waiting threads.
> >
> > I'm not aware of this problem, but it goes against the intended
> > behaviour of EPOLLEXCLUSIVE.
> > Such flag exists since Linux 4.5, can you check that it's passed
> > correctly to epoll()?
> >
>
> I get the theory, but the reality is that all threads are awakened.
> Also, it is not limited to the 4.14 kernel; I see the same behavior with
> 5.4.
>

So all threads are awakened, even if there is only an upcall packet to read?
This is not good, epoll() should wake just one thread at time, as the
manpage says.

I have to check this, do you have a minimal setup? How do you generate
upcalls, it's BUM traffic or via action=userspace?

Bye,
-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread Matteo Croce
On Tue, Dec 10, 2019 at 8:13 PM David Ahern  wrote:
>
> Hi Matteo:
>
> On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
> thundering herd wake up problem. Every packet punted to userspace wakes
> up every one of the handler threads. On a box with 96 cpus, there are 71
> handler threads which means 71 process wakeups for every packet punted.
>
> This is really easy to see, just watch sched:sched_wakeup tracepoints.
> With a few extra probes:
>
> perf probe sock_def_readable sk=%di
> perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
> perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
> wake_flags=%cx key=%8
>
> you can see there is a single netlink socket and its wait queue contains
> an entry for every handler thread.
>
> This does not happen with the 2.7.3 version. Roaming commits it appears
> that the change in behavior comes from this commit:
>
> commit 69c51582ff786a68fc325c1c50624715482bc460
> Author: Matteo Croce 
> Date:   Tue Sep 25 10:51:05 2018 +0200
>
> dpif-netlink: don't allocate per thread netlink sockets
>
>
> Is this a known problem?
>
> David
>

Hi David,

before my patch, vswitchd created NxM sockets, being N the ports and M
the active cores,
because every thread opens a netlink socket per port.

With my patch, a pool is created with N socket, one per port, and all
the threads polls the same list
with the EPOLLEXCLUSIVE flag.
As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one
of the waiting threads.

I'm not aware of this problem, but it goes against the intended
behaviour of EPOLLEXCLUSIVE.
Such flag exists since Linux 4.5, can you check that it's passed
correctly to epoll()?

Bye,

-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] openvswitch: add TTL decrement action

2019-11-22 Thread Matteo Croce
On Mon, Nov 18, 2019 at 5:20 PM Ben Pfaff  wrote:
>
> On Tue, Nov 12, 2019 at 04:46:12PM +0100, Matteo Croce wrote:
> > On Tue, Nov 12, 2019 at 4:00 PM Simon Horman  
> > wrote:
> > >
> > > On Tue, Nov 12, 2019 at 11:25:18AM +0100, Matteo Croce wrote:
> > > > New action to decrement TTL instead of setting it to a fixed value.
> > > > This action will decrement the TTL and, in case of expired TTL, send the
> > > > packet to userspace via output_userspace() to take care of it.
> > > >
> > > > Supports both IPv4 and IPv6 via the ttl and hop_limit fields, 
> > > > respectively.
> > > >
> > >
> > > Usually OVS achieves this behaviour by matching on the TTL and
> > > setting it to the desired value, pre-calculated as TTL -1.
> > > With that in mind could you explain the motivation for this
> > > change?
> > >
> >
> > Hi,
> >
> > the problem is that OVS creates a flow for each ttl it see. I can let
> > vswitchd create 255 flows with like this:
> >
> > $ for i in {2..255}; do ping 192.168.0.2 -t $i -c1 -w1 &>/dev/null & done
> > $ ovs-dpctl dump-flows |fgrep -c 'set(ipv4(ttl'
> > 255
>
> Sure, you can easily invent a situation.  In real traffic there's not
> usually such a variety of TTLs for a flow that matches on the number of
> fields that OVS usually needs to match.  Do you see a real problem given
> actual traffic in practice?
>

Hi Ben,

yes, my situation was a bit artificious, but you can get a similar
situation in practice.

Imagine a router with some subnetworks behind it on N levels, with
some nodes hosting virtual machines.
Windows and Linux have different default TTL values, 128 and 64
respectively, so you could see N*2 different TTL values.

Bye,



--
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] openvswitch: add TTL decrement action

2019-11-12 Thread Matteo Croce
On Tue, Nov 12, 2019 at 4:00 PM Simon Horman  wrote:
>
> On Tue, Nov 12, 2019 at 11:25:18AM +0100, Matteo Croce wrote:
> > New action to decrement TTL instead of setting it to a fixed value.
> > This action will decrement the TTL and, in case of expired TTL, send the
> > packet to userspace via output_userspace() to take care of it.
> >
> > Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.
> >
>
> Usually OVS achieves this behaviour by matching on the TTL and
> setting it to the desired value, pre-calculated as TTL -1.
> With that in mind could you explain the motivation for this
> change?
>

Hi,

the problem is that OVS creates a flow for each ttl it see. I can let
vswitchd create 255 flows with like this:

$ for i in {2..255}; do ping 192.168.0.2 -t $i -c1 -w1 &>/dev/null & done
$ ovs-dpctl dump-flows |fgrep -c 'set(ipv4(ttl'
255


> > @@ -1174,6 +1174,43 @@ static int execute_check_pkt_len(struct datapath 
> > *dp, struct sk_buff *skb,
> >nla_len(actions), last, clone_flow_key);
> >  }
> >
> > +static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key)
> > +{
> > + int err;
> > +
> > + if (skb->protocol == htons(ETH_P_IPV6)) {
> > + struct ipv6hdr *nh = ipv6_hdr(skb);
> > +
> > + err = skb_ensure_writable(skb, skb_network_offset(skb) +
> > +   sizeof(*nh));
> > + if (unlikely(err))
> > + return err;
> > +
> > + if (nh->hop_limit <= 1)
> > + return -EHOSTUNREACH;
> > +
> > + key->ip.ttl = --nh->hop_limit;
> > + } else {
> > + struct iphdr *nh = ip_hdr(skb);
> > + u8 old_ttl;
> > +
> > + err = skb_ensure_writable(skb, skb_network_offset(skb) +
> > +   sizeof(*nh));
> > + if (unlikely(err))
> > + return err;
> > +
> > + if (nh->ttl <= 1)
> > + return -EHOSTUNREACH;
> > +
> > + old_ttl = nh->ttl--;
> > + csum_replace2(>check, htons(old_ttl << 8),
> > +   htons(nh->ttl << 8));
> > + key->ip.ttl = nh->ttl;
> > + }
>
> The above may send packets with TTL = 0, is that desired?
>

If TTL is 1 or 0, execute_dec_ttl() returns -EHOSTUNREACH, and the
caller will just send the packet to userspace and then free it.
I think this is enough, am I missing something?

Bye,
-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next] openvswitch: add TTL decrement action

2019-11-12 Thread Matteo Croce
New action to decrement TTL instead of setting it to a fixed value.
This action will decrement the TTL and, in case of expired TTL, send the
packet to userspace via output_userspace() to take care of it.

Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.

Tested with a corresponding change in the userspace:

# ovs-dpctl dump-flows
in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
actions:dec_ttl,1
in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
actions:dec_ttl,2
in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1

# ping -c1 192.168.0.2 -t 42
IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 
84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
# ping -c1 192.168.0.2 -t 120
IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
length 84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
# ping -c1 192.168.0.2 -t 1
#

Co-authored-by: Bindiya Kurle 
Signed-off-by: Bindiya Kurle 
Signed-off-by: Matteo Croce 
---
 include/uapi/linux/openvswitch.h |  2 ++
 net/openvswitch/actions.c| 46 
 net/openvswitch/flow_netlink.c   |  6 +
 3 files changed, 54 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 1887a451c388..a3bdb1ecd1e7 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -890,6 +890,7 @@ struct check_pkt_len_arg {
  * @OVS_ACTION_ATTR_CHECK_PKT_LEN: Check the packet length and execute a set
  * of actions if greater than the specified packet length, else execute
  * another set of actions.
+ * @OVS_ACTION_ATTR_DEC_TTL: Decrement the IP TTL.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -925,6 +926,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_METER,/* u32 meter ID. */
OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
+   OVS_ACTION_ATTR_DEC_TTL,  /* Decrement ttl action */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 12936c151cc0..077b7f309c93 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1174,6 +1174,43 @@ static int execute_check_pkt_len(struct datapath *dp, 
struct sk_buff *skb,
 nla_len(actions), last, clone_flow_key);
 }
 
+static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key)
+{
+   int err;
+
+   if (skb->protocol == htons(ETH_P_IPV6)) {
+   struct ipv6hdr *nh = ipv6_hdr(skb);
+
+   err = skb_ensure_writable(skb, skb_network_offset(skb) +
+ sizeof(*nh));
+   if (unlikely(err))
+   return err;
+
+   if (nh->hop_limit <= 1)
+   return -EHOSTUNREACH;
+
+   key->ip.ttl = --nh->hop_limit;
+   } else {
+   struct iphdr *nh = ip_hdr(skb);
+   u8 old_ttl;
+
+   err = skb_ensure_writable(skb, skb_network_offset(skb) +
+ sizeof(*nh));
+   if (unlikely(err))
+   return err;
+
+   if (nh->ttl <= 1)
+   return -EHOSTUNREACH;
+
+   old_ttl = nh->ttl--;
+   csum_replace2(>check, htons(old_ttl << 8),
+ htons(nh->ttl << 8));
+   key->ip.ttl = nh->ttl;
+   }
+
+   return 0;
+}
+
 /* Execute a list of actions against 'skb'. */
 static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
  struct sw_flow_key *key,
@@ -1345,6 +1382,15 @@ static int do_execute_actions(struct datapath *dp, 
struct sk_buff *skb,
 
break;
}
+
+   case OVS_ACTION_ATTR_DEC_TTL:
+   err = execute_dec_ttl(skb, key);
+   if (err == -EHOSTUNREACH) {
+   output_userspace(dp, skb, key, a, attr,
+len, OVS_CB(skb)->cutlen);
+   OVS_CB(skb)->cutlen = 0;
+   }
+   break;
}
 
if (unlikely(err)) {
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 65c2e3458ff5..d17f2d4b420f 100644
--- a/net/ope

Re: [ovs-dev] Extending ovs_action_attr to add a new action

2019-11-09 Thread Matteo Croce
On Fri, Nov 8, 2019 at 11:04 PM William Tu  wrote:
>
> On Fri, Nov 08, 2019 at 05:12:55PM +0100, Matteo Croce wrote:
> > Hi,
> >
> > I need to add a field to enum ovs_action_attr, but I see that the
> > definition between the upstream header[1] and the one in compat[2]
> > differs.
> > Upstream enum stops at OVS_ACTION_ATTR_CHECK_PKT_LEN, with an extra
> > "hidden" element after __OVS_ACTION_ATTR_MAX (22)
> > Our compat version instead, has OVS_ACTION_ATTR_TUNNEL_{PUSH,POP}
> > defined only #ifndef __KERNEL__, with __OVS_ACTION_ATTR_MAX being 22
> > for the kernel and 24 for userspace.
> >
> > If I add a field OVS_ACTION_ATTR_WHATEVER just before
> > __OVS_ACTION_ATTR_MAX in the kernel, older userspace will incorrectly
> > see the new action as OVS_ACTION_ATTR_TUNNEL_PUSH.
>
> "older userspace" means you're using userspace datapath (dpif-netdev)?
> If true, then it's not using kernel module.
>
> if "older userspace" means just ovs-vswitchd using kernel module,
> and you want to upgrade ovs kernel module with your new action
> and without upgrade ovs-vswitchd?
>

Yes, I mean older vswitchd with a new kernel module. If I add a field
after OVS_ACTION_ATTR_TUNNEL_POP, and then I downgrade the userspace
utils I end up in this situation:

# ovs-dpctl dump-flows
in_port(1),eth(),eth_type(0x0800), packets:1, bytes:98, used:605.381s,
actions:tnl_push(tnl_port(65544),header(size=252,type=131097,eth(dst=14:00:00:00:84:03,src=00:00:03:01:00:00,dl_type=0x0500),ipv6(src=1400::800:1300:0:0:800,dst=200::800:300:200:0:800,label=21504,proto=8,tclass=0x0,hlimit=0),),out_port(2)),2

while adding it before OVS_ACTION_ATTR_TUNNEL_POP I get this:

# ovs-dpctl dump-flows
in_port(1),eth(),eth_type(0x0800), packets:1, bytes:98, used:3.661s,
actions:bad length 0, expected 4 for: action22,2

> Usually we also upgrade ovs-vswitchd, so I don't know how this can be done.
>

So it's not a problem, we can assume that the userspace can't be older
than the kernel datapath.

Regards,
-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Extending ovs_action_attr to add a new action

2019-11-08 Thread Matteo Croce
Hi,

I need to add a field to enum ovs_action_attr, but I see that the
definition between the upstream header[1] and the one in compat[2]
differs.
Upstream enum stops at OVS_ACTION_ATTR_CHECK_PKT_LEN, with an extra
"hidden" element after __OVS_ACTION_ATTR_MAX (22)
Our compat version instead, has OVS_ACTION_ATTR_TUNNEL_{PUSH,POP}
defined only #ifndef __KERNEL__, with __OVS_ACTION_ATTR_MAX being 22
for the kernel and 24 for userspace.

If I add a field OVS_ACTION_ATTR_WHATEVER just before
__OVS_ACTION_ATTR_MAX in the kernel, older userspace will incorrectly
see the new action as OVS_ACTION_ATTR_TUNNEL_PUSH.

How can we extend this enum without breaking compatibility?
If OVS_ACTION_ATTR_TUNNEL_* really is unused in the kernel datapath,
what if we pad the kernel header with two padding fields?

-%<-
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -925,6 +926,8 @@ enum ovs_action_attr {
  OVS_ACTION_ATTR_METER,/* u32 meter ID. */
  OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
  OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
+ _OVS_ACTION_ATTR_TUNNEL_PUSH, /* unused in kernel datapath. */
+ _OVS_ACTION_ATTR_TUNNEL_POP,  /* unused in kernel datapath. */

  __OVS_ACTION_ATTR_MAX,   /* Nothing past this will be accepted
->%-

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/openvswitch.h?h=v5.3#n899
[2] 
https://github.com/openvswitch/ovs/blob/v2.12.0/datapath/linux/compat/include/linux/openvswitch.h#L962

Regards,
-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 0/1] Balance-tcp bond mode optimization

2019-09-19 Thread Matteo Croce
On Tue, Sep 17, 2019 at 10:57 AM Vishal Deep Ajmera
 wrote:
>
> v6->v7:
>  Fixed issue reported by Matteo for bond/show.
>
> v5->v6:
>  Addressed comments from Ilya Maximets.
>  https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
>  Rebased to OVS master.
>
> v4->v5:
>  Support for stats per hash bucket.
>  Support for dynamic load balancing.
>  Rebased to OVS Master.
>
> v3->v4:
>  Addressed Ilya Maximets comments.
>  https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html
>
> v2->v3:
>  Rebased to OVS master.
>  Fixed git merge issue.
>
> v1->v2:
>  Updated datapath action to hash + lb-output.
>  Updated throughput test observations.
>  Rebased to OVS master.
>

All unit test passed, plus others I did.

Tested-by: Matteo Croce 

-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 0/1] Balance-tcp bond mode optimization

2019-09-13 Thread Matteo Croce
On Wed, Sep 11, 2019 at 8:37 PM Matteo Croce  wrote:
>
> On Tue, Sep 10, 2019 at 11:53 AM Vishal Deep Ajmera
>  wrote:
> >
> > v5->v6:
> >  Addressed comments from Ilya Maximets.
> >  https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
> >  Rebased to OVS master.
> >
> > v4->v5:
> >  Support for stats per hash bucket.
> >  Support for dynamic load balancing.
> >  Rebased to OVS Master.
> >
> > v3->v4:
> >  Addressed Ilya Maximets comments.
> >  https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html
> >
> > v2->v3:
> >  Rebased to OVS master.
> >  Fixed git merge issue.
> >
> > v1->v2:
> >  Updated datapath action to hash + lb-output.
> >  Updated throughput test observations.
> >  Rebased to OVS master.
> >
> > Vishal Deep Ajmera (1):
> >   Avoid dp_hash recirculation for balance-tcp bond selection mode
> >
> >  datapath/linux/compat/include/linux/openvswitch.h |   2 +
> >  lib/dpif-netdev.c | 515 
> > --
> >  lib/dpif-netlink.c|   3 +
> >  lib/dpif-provider.h   |   8 +
> >  lib/dpif.c|  48 ++
> >  lib/dpif.h|   7 +
> >  lib/odp-execute.c |   2 +
> >  lib/odp-util.c|   4 +
> >  ofproto/bond.c|  52 ++-
> >  ofproto/bond.h|   9 +
> >  ofproto/ofproto-dpif-ipfix.c  |   1 +
> >  ofproto/ofproto-dpif-sflow.c  |   1 +
> >  ofproto/ofproto-dpif-xlate.c  |  39 +-
> >  ofproto/ofproto-dpif.c|  32 ++
> >  ofproto/ofproto-dpif.h|  12 +-
> >  tests/lacp.at |   9 +
> >  vswitchd/bridge.c |   4 +
> >  vswitchd/vswitch.xml  |  10 +
> >  18 files changed, 698 insertions(+), 60 deletions(-)
> >
> > --
> > 1.9.1
> >
>
> Hi,
>
> I confirm a decent performance improvement with DPDK and balance-tcp bonding:
>
> lb-output-action=false
>
> rx: 740 Mbps 1446 kpps
>
> lb-output-action=true
>
> rx: 860 Mbps 1680 kpps
>
> I'm running a very simple test with a tweaked version of testpmd which
> generates 256 L4 flows, I guess that with much flows the improvement
> is way higher.
>
> Tested-by: Matteo Croce 
>
> --
> Matteo Croce
> per aspera ad upstream

Hi,

I found an issue with the patch. It's not 100% reproducible, but
sometimes the option gets enabled regardless of the bonding type and
the configuration.
This breaks the unit tests, as the bond/show output is wrong:

# ovs-vsctl add-br br0
[63129.589500] device ovs-system entered promiscuous mode
[63129.591802] device br0 entered promiscuous mode

# ovs-vsctl add-bond br0 bond dummy0 dummy1 -- set Port bond
lacp=active bond-mode=active-backup
[63150.700542] device dummy1 entered promiscuous mode
[63150.700892] device dummy0 entered promiscuous mode

# ovs-vsctl get port bond other_config
{}

# ovs-appctl bond/show
 bond 
bond_mode: active-backup
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
lb-output-action: enabled
updelay: 0 ms
downdelay: 0 ms
lacp_status: negotiated
lacp_fallback_ab: false
active slave mac: 00:00:00:00:00:00(none)

slave dummy0: disabled
may_enable: false

slave dummy1: disabled
may_enable: false

Regards,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 0/1] Balance-tcp bond mode optimization

2019-09-11 Thread Matteo Croce
On Tue, Sep 10, 2019 at 11:53 AM Vishal Deep Ajmera
 wrote:
>
> v5->v6:
>  Addressed comments from Ilya Maximets.
>  https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
>  Rebased to OVS master.
>
> v4->v5:
>  Support for stats per hash bucket.
>  Support for dynamic load balancing.
>  Rebased to OVS Master.
>
> v3->v4:
>  Addressed Ilya Maximets comments.
>  https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html
>
> v2->v3:
>  Rebased to OVS master.
>  Fixed git merge issue.
>
> v1->v2:
>  Updated datapath action to hash + lb-output.
>  Updated throughput test observations.
>  Rebased to OVS master.
>
> Vishal Deep Ajmera (1):
>   Avoid dp_hash recirculation for balance-tcp bond selection mode
>
>  datapath/linux/compat/include/linux/openvswitch.h |   2 +
>  lib/dpif-netdev.c | 515 
> --
>  lib/dpif-netlink.c|   3 +
>  lib/dpif-provider.h   |   8 +
>  lib/dpif.c|  48 ++
>  lib/dpif.h|   7 +
>  lib/odp-execute.c |   2 +
>  lib/odp-util.c|   4 +
>  ofproto/bond.c|  52 ++-
>  ofproto/bond.h|   9 +
>  ofproto/ofproto-dpif-ipfix.c  |   1 +
>  ofproto/ofproto-dpif-sflow.c  |   1 +
>  ofproto/ofproto-dpif-xlate.c  |  39 +-
>  ofproto/ofproto-dpif.c|  32 ++
>  ofproto/ofproto-dpif.h|  12 +-
>  tests/lacp.at |   9 +
>  vswitchd/bridge.c |   4 +
>  vswitchd/vswitch.xml  |  10 +
>  18 files changed, 698 insertions(+), 60 deletions(-)
>
> --
> 1.9.1
>

Hi,

I confirm a decent performance improvement with DPDK and balance-tcp bonding:

lb-output-action=false

rx: 740 Mbps 1446 kpps

lb-output-action=true

rx: 860 Mbps 1680 kpps

I'm running a very simple test with a tweaked version of testpmd which
generates 256 L4 flows, I guess that with much flows the improvement
is way higher.

Tested-by: Matteo Croce 

--
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-08-26 Thread Matteo Croce
36576, used:0.000s, 
> actions:hash(l4(0)),lb_output(bond,1)
>
> A new CLI has been added to dump the per-PMD bond cache as given below.
>
> “sudo ovs-appctl dpif-netdev/pmd-bond-show”
>
> root@ubuntu-190:performance_scripts # sudo ovs-appctl 
> dpif-netdev/pmd-bond-show
> pmd thread numa_id 0 core_id 4:
> Bond cache:
> bond-id 1 :
> bucket 0 - slave 2
> bucket 1 - slave 1
> bucket 2 - slave 2
> bucket 3 - slave 1
>
> Performance improvement:
> 
> With a prototype of the proposed idea, the following perf improvement is seen
> with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
> is even more enhanced (due to reduced number of flows).
>
> 1 VM:
> *
> +--+
> | mpps |
> +--+
> | Flows  master  with-opt.   %delta|
> +--+
> | 1  4.535.8929.96
> | 10 4.165.8941.51
> | 4003.555.5556.22
> | 1k 3.445.4558.30
> | 10k2.504.6385.34
> | 100k   2.294.2786.15
> | 500k   2.254.2789.23
> +--+
> mpps: million packets per second.
>
> Signed-off-by: Manohar Krishnappa Chidambaraswamy 
> Co-authored-by: Manohar Krishnappa Chidambaraswamy 
> Signed-off-by: Vishal Deep Ajmera 
>
> CC: Jan Scheurich 
> CC: Venkatesan Pradeep 
> CC: Nitin Katiyar 

Hi Vishal,

I quickly tested your patch on two servers with 2 ixgbe cards each
linked via a juniper switch with LACP.
With testpmd using a single core the switching rate raised from ~2.0
Mpps to ~2.4+ Mpps, so I read at least a +20% gain.

Please add an example command on how to enable it in the commit message, e.g.

  ovs-vsctl set port bond0 other_config:opt-bond-tcp=true

Thanks,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-08-09 Thread Matteo Croce
On Thu, Aug 8, 2019 at 10:57 AM Vishal Deep Ajmera
 wrote:
> --- a/ofproto/bond.c
> +++ b/ofproto/bond.c
> @@ -1939,3 +1959,9 @@ bond_get_changed_active_slave(const char *name, struct 
> eth_addr *mac,
>
>  return false;
>  }
> +
> +bool
> +bond_get_cache_mode(const struct bond *bond)
> +{
> +return bond->use_bond_cache;
> +}

Hi,

why not a static function in the header file? So it gets inlined.

Regards,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] dpif-netlink: fix null pointer

2018-10-08 Thread Matteo Croce
In dpif_netlink_port_add__(), socksp could be NULL, because
vport_socksp_to_pids() would allocate a new array and return a single
zero element.
Following vport_socksp_to_pids() removal, a NULL pointer can happen when
dpif_netlink_port_add__() is called and dpif->handlers is 0.

Restore the old behaviour of using a zero pid when dpif->handlers is 0.

Fixes: 69c51582f ("dpif-netlink: don't allocate per thread netlink sockets")
Reported-by: Flavio Leitner 
Reported-by: Guru Shetty 
Tested-by: Guru Shetty 
Signed-off-by: Matteo Croce 
---
v2: fix checkpatch.py error about coding style

 lib/dpif-netlink.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 21315033c..ac3d2edeb 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -712,7 +712,7 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif, const 
char *name,
 struct dpif_netlink_vport request, reply;
 struct ofpbuf *buf;
 struct nl_sock *socksp = NULL;
-uint32_t upcall_pids;
+uint32_t upcall_pids = 0;
 int error = 0;
 
 if (dpif->handlers) {
@@ -728,7 +728,9 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif, const 
char *name,
 request.name = name;
 
 request.port_no = *port_nop;
-upcall_pids = nl_sock_pid(socksp);
+if (socksp) {
+upcall_pids = nl_sock_pid(socksp);
+}
 request.n_upcall_pids = 1;
 request.upcall_pids = _pids;
 
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netlink: fix null pointer

2018-10-08 Thread Matteo Croce
On Mon, Oct 8, 2018 at 3:11 PM Guru Shetty  wrote:
>
>
>
> On Sat, 6 Oct 2018 at 09:20, Matteo Croce  wrote:
>>
>> In dpif_netlink_port_add__(), socksp could be NULL, because
>> vport_socksp_to_pids() would allocate a new array and return a single
>> zero element.
>> Following vport_socksp_to_pids() removal, a NULL pointer can happen when
>> dpif_netlink_port_add__() is called and dpif->handlers is 0.
>>
>> Restore the old behaviour of using a zero pid when dpif->handlers is 0.
>>
>> Fixes: 69c51582f ("dpif-netlink: don't allocate per thread netlink sockets")
>> Reported-by: Flavio Leitner 
>> Reported-by: Guru Shetty 
>> Signed-off-by: Matteo Croce 
>> ---
>
>
> Not a review of the code. But I can confirm that the patch does fix the 
> segmentation fault that I was facing.
>
>>
>>  lib/dpif-netlink.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>> index 21315033c..310bc947d 100644
>> --- a/lib/dpif-netlink.c
>> +++ b/lib/dpif-netlink.c
>> @@ -712,7 +712,7 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif, const 
>> char *name,
>>  struct dpif_netlink_vport request, reply;
>>  struct ofpbuf *buf;
>>  struct nl_sock *socksp = NULL;
>> -uint32_t upcall_pids;
>> +uint32_t upcall_pids = 0;
>>  int error = 0;
>>
>>  if (dpif->handlers) {
>> @@ -728,7 +728,8 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif, const 
>> char *name,
>>  request.name = name;
>>
>>  request.port_no = *port_nop;
>> -upcall_pids = nl_sock_pid(socksp);
>> +if (socksp)
>> +upcall_pids = nl_sock_pid(socksp);
>>  request.n_upcall_pids = 1;
>>  request.upcall_pids = _pids;
>>
>> --
>> 2.17.1
>>

Ok thanks. I'me sending a v2 with the checkpatch.py warning found by
Aaron's bot fixed

-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netlink: don't allocate per thread netlink sockets

2018-10-06 Thread Matteo Croce
On Fri, Oct 5, 2018 at 9:58 AM Markos Chandras  wrote:
>
> On 05/10/2018 09:55, Matteo Croce wrote:
> > On Fri, Oct 5, 2018 at 8:32 AM Markos Chandras  wrote:
> >>
> >> On 25/09/2018 22:14, Ben Pfaff wrote:
> >>>
> >>> Applied to master thanks!
> >>>
> >>> I sent a patch to remove support for multiple queues in the
> >>> infrastructure layer:
> >>> https://patchwork.ozlabs.org/patch/974755/
> >>> ___
> >>> dev mailing list
> >>> d...@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
> >>
> >> Hello Ben and Matteo,
> >>
> >> This patch applies cleanly to branch-2.10, branch-2.9, and branch-2.8.
> >> Would it make sense to backport it to these as well?
> >>
> >> --
> >> markos
> >>
> >> SUSE LINUX GmbH | GF: Felix Imendörffer, Jane Smithard, Graham Norton
> >> HRB 21284 (AG Nürnberg) Maxfeldstr. 5, D-90409, Nürnberg
> >
> > Hi Markos,
> >
> > the file descriptor limit issue was first reported on OVS 2.8, so it's 
> > worth it.
> > If you backport it, please consider backporting the following commit from 
> > Ben,
> > which removes some code which become unused after my patch:
> >
> > commit 769b50349f28c5f9e4bff102bc61dadcb9b99c37
> > Author: Ben Pfaff 
> > Date:   Tue Sep 25 15:14:13 2018 -0700
> >
> > dpif: Remove support for multiple queues per port.
> >
> > Regards,
> >
>
> Thank you Matteo,
>
> Ben, please let me know if you can backport these up to branch-2.8
> otherwise I can send patches for each branch myself.
>
> --
> markos
>
> SUSE LINUX GmbH | GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg) Maxfeldstr. 5, D-90409, Nürnberg

Hi Markos,

there is a possible null pointer in my patch, please wait until the
possible fix is merged into master and backport it as well. Here is
the fix:

https://patchwork.ozlabs.org/patch/979946/

Regards,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpif-netlink: fix null pointer

2018-10-06 Thread Matteo Croce
In dpif_netlink_port_add__(), socksp could be NULL, because
vport_socksp_to_pids() would allocate a new array and return a single
zero element.
Following vport_socksp_to_pids() removal, a NULL pointer can happen when
dpif_netlink_port_add__() is called and dpif->handlers is 0.

Restore the old behaviour of using a zero pid when dpif->handlers is 0.

Fixes: 69c51582f ("dpif-netlink: don't allocate per thread netlink sockets")
Reported-by: Flavio Leitner 
Reported-by: Guru Shetty 
Signed-off-by: Matteo Croce 
---
 lib/dpif-netlink.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 21315033c..310bc947d 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -712,7 +712,7 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif, const 
char *name,
 struct dpif_netlink_vport request, reply;
 struct ofpbuf *buf;
 struct nl_sock *socksp = NULL;
-uint32_t upcall_pids;
+uint32_t upcall_pids = 0;
 int error = 0;
 
 if (dpif->handlers) {
@@ -728,7 +728,8 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif, const 
char *name,
 request.name = name;
 
 request.port_no = *port_nop;
-upcall_pids = nl_sock_pid(socksp);
+if (socksp)
+upcall_pids = nl_sock_pid(socksp);
 request.n_upcall_pids = 1;
 request.upcall_pids = _pids;
 
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netlink: don't allocate per thread netlink sockets

2018-10-05 Thread Matteo Croce
On Fri, Oct 5, 2018 at 11:27 PM Guru Shetty  wrote:
> I get a segfault with this patch with the following backtrace. I have not 
> investigated.
>
> Program received signal SIGSEGV, Segmentation fault.
> nl_sock_pid (sock=0x0) at lib/netlink-socket.c:1424
> 1424 return sock->pid;
> (gdb) where
> #0  nl_sock_pid (sock=0x0) at lib/netlink-socket.c:1424
> #1  0x0052fb79 in dpif_netlink_port_add__ (dpif=dpif@entry=0x89de90, 
> name=name@entry=0x7fffdf20 "genev_sys_6081", 
> type=type@entry=OVS_VPORT_TYPE_GENEVE,
> options=0x7fffdee0, port_nop=0x7fffdf9c) at lib/dpif-netlink.c:731
> #2  0x0052fdc7 in dpif_netlink_port_add_compat 
> (port_nop=0x7fffdf9c, netdev=, dpif=0x89de90) at 
> lib/dpif-netlink.c:836
> #3  dpif_netlink_port_add (dpif_=0x89de90, netdev=, 
> port_nop=0x7fffdf9c) at lib/dpif-netlink.c:882
> #4  0x004778be in dpif_port_add (dpif=0x89de90, 
> netdev=netdev@entry=0x911830, port_nop=port_nop@entry=0x7fffdffc) at 
> lib/dpif.c:579
> #5  0x00426efe in port_add (ofproto_=0x8a97f0, netdev=0x911830) at 
> ofproto/ofproto-dpif.c:3689
> #6  0x0041d951 in ofproto_port_add (ofproto=0x8a97f0, 
> netdev=0x911830, ofp_portp=ofp_portp@entry=0x7fffe0d8) at 
> ofproto/ofproto.c:2008
> #7  0x0040c709 in iface_do_create (errp=0x7fffe0e8, 
> netdevp=0x7fffe0e0, ofp_portp=0x7fffe0d8, iface_cfg=0x8a0620, 
> br=0x8a8bb0) at vswitchd/bridge.c:1803
> #8  iface_create (port_cfg=0x8a4e60, iface_cfg=0x8a0620, br=0x8a8bb0) at 
> vswitchd/bridge.c:1841
> #9  bridge_add_ports__ (br=br@entry=0x8a8bb0, 
> wanted_ports=wanted_ports@entry=0x8a8c90, 
> with_requested_port=with_requested_port@entry=false) at vswitchd/bridge.c:935
> #10 0x0040e02f in bridge_add_ports (wanted_ports=0x8a8c90, 
> br=0x8a8bb0) at vswitchd/bridge.c:951
> #11 bridge_reconfigure (ovs_cfg=ovs_cfg@entry=0x8a7fc0) at 
> vswitchd/bridge.c:665
> #12 0x004114b9 in bridge_run () at vswitchd/bridge.c:3023
> #13 0x004080c5 in main (argc=2, argv=0x7fffe5e8) at 
> vswitchd/ovs-vswitchd.c:125
>
>

Hi Guru,

I received a similar report on IRC and I have already investigated it.
Can you try if the following patch fixes it?

Fixes: 69c51582f ("dpif-netlink: don't allocate per thread netlink sockets")
Signed-off-by: Matteo Croce 
---
 lib/dpif-netlink.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 21315033cc..310bc947db 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -712,7 +712,7 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif,
const char *name,
 struct dpif_netlink_vport request, reply;
 struct ofpbuf *buf;
 struct nl_sock *socksp = NULL;
-uint32_t upcall_pids;
+uint32_t upcall_pids = 0;
 int error = 0;

 if (dpif->handlers) {
@@ -728,7 +728,8 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif,
const char *name,
 request.name = name;

 request.port_no = *port_nop;
-upcall_pids = nl_sock_pid(socksp);
+if (socksp)
+upcall_pids = nl_sock_pid(socksp);
 request.n_upcall_pids = 1;
 request.upcall_pids = _pids;


--
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netlink: don't allocate per thread netlink sockets

2018-10-05 Thread Matteo Croce
On Fri, Oct 5, 2018 at 8:32 AM Markos Chandras  wrote:
>
> On 25/09/2018 22:14, Ben Pfaff wrote:
> >
> > Applied to master thanks!
> >
> > I sent a patch to remove support for multiple queues in the
> > infrastructure layer:
> > https://patchwork.ozlabs.org/patch/974755/
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
> Hello Ben and Matteo,
>
> This patch applies cleanly to branch-2.10, branch-2.9, and branch-2.8.
> Would it make sense to backport it to these as well?
>
> --
> markos
>
> SUSE LINUX GmbH | GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg) Maxfeldstr. 5, D-90409, Nürnberg

Hi Markos,

the file descriptor limit issue was first reported on OVS 2.8, so it's worth it.
If you backport it, please consider backporting the following commit from Ben,
which removes some code which become unused after my patch:

commit 769b50349f28c5f9e4bff102bc61dadcb9b99c37
Author: Ben Pfaff 
Date:   Tue Sep 25 15:14:13 2018 -0700

dpif: Remove support for multiple queues per port.

Regards,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif: Remove support for multiple queues per port.

2018-09-26 Thread Matteo Croce
On Wed, Sep 26, 2018 at 10:29 PM Yifeng Sun  wrote:
>
> It looks good to me, and testing shows no problem. Thanks.
>
> Tested-by: Yifeng Sun 
> Reviewed-by: Yifeng Sun 
>

Works fine here too.
Regards,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] dpif-netlink: don't allocate per thread netlink sockets

2018-09-25 Thread Matteo Croce
When using the kernel datapath, OVS allocates a pool of sockets to handle
netlink events. The number of sockets is: ports * n-handler-threads, where
n-handler-threads is user configurable and defaults to 3/4*number of cores.

This because vswitchd starts n-handler-threads threads, each one with a
netlink socket for every port of the switch. Every thread then, starts
listening on events on its set of sockets with epoll().

On setup with lot of CPUs and ports, the number of sockets easily hits
the process file descriptor limit, and ovs-vswitchd will exit with -EMFILE.

Change the number of allocated sockets to just one per port by moving
the socket array from a per handler structure to a per datapath one,
and let all the handlers share the same sockets by using EPOLLEXCLUSIVE
epoll flag which avoids duplicate events, on systems that support it.

The patch was tested on a 56 core machine running Linux 4.18 and latest
Open vSwitch. A bridge was created with 2000+ ports, some of them being
veth interfaces with the peer outside the bridge. The latency of the upcall
is measured by setting a single 'action=controller,local' OpenFlow rule to
force all the packets going to the slow path and then to the local port.
A tool[1] injects some packets to the veth outside the bridge, and measures
the delay until the packet is captured on the local port. The rx timestamp
is get from the socket ancillary data in the attribute SO_TIMESTAMPNS, to
avoid having the scheduler delay in the measured time.

The first test measures the average latency for an upcall generated from
a single port. To measure it 100k packets, one every msec, are sent to a
single port and the latencies are measured.

The second test is meant to check latency fairness among ports, namely if
latency is equal between ports or if some ports have lower priority.
The previous test is repeated for every port, the average of the average
latencies and the standard deviation between averages is measured.

The third test serves to measure responsiveness under load. Heavy traffic
is sent through all ports, latency and packet loss is measured
on a single idle port.

The fourth test is all about fairness. Heavy traffic is injected in all
ports but one, latency and packet loss is measured on the single idle port.

This is the test setup:

  # nproc
  56
  # ovs-vsctl show |grep -c Port
  2223
  # ovs-ofctl dump-flows ovs_upc_br
   cookie=0x0, duration=4.827s, table=0, n_packets=0, n_bytes=0, 
actions=CONTROLLER:65535,LOCAL
  # uname -a
  Linux fc28 4.18.7-200.fc28.x86_64 #1 SMP Mon Sep 10 15:44:45 UTC 2018 x86_64 
x86_64 x86_64 GNU/Linux

And these are the results of the tests:

  Stock OVS Patched
  netlink sockets
  in use by vswitchd
  lsof -p $(pidof ovs-vswitchd) \
  |grep -c GENERIC911872227

  Test 1
  one port latency
  min/avg/max/mdev (us)   2.7/6.6/238.7/1.8   1.6/6.8/160.6/1.7

  Test 2
  all port
  avg latency/mdev (us)   6.51/0.97   6.86/0.17

  Test 3
  single port latency
  under load
  avg/mdev (us) 7.5/5.9 3.8/4.8
  packet loss  95 %62 %

  Test 4
  idle port latency
  under load
  min/avg/max/mdev (us)   0.8/1.5/210.5/0.9   1.0/2.1/344.5/1.2
  packet loss  94 % 4 %

CPU and RAM usage seems not to be affected, the resource usage of vswitchd
idle with 2000+ ports is unchanged:

  # ps u $(pidof ovs-vswitchd)
  USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
  openvsw+  5430 54.3  0.3 4263964 510968 pts/1  RLl+ 16:20   0:50 ovs-vswitchd

Additionally, to check if vswitchd is thread safe with this patch, the
following test was run for circa 48 hours: on a 56 core machine, a
bridge with kernel datapath is filled with 2200 dummy interfaces and 22
veth, then 22 traffic generators are run in parallel piping traffic into
the veths peers outside the bridge.
To generate as many upcalls as possible, all packets were forced to the
slowpath with an openflow rule like 'action=controller,local' and packet
size was set to 64 byte. Also, to avoid overflowing the FDB early and
slowing down the upcall processing, generated mac addresses were restricted
to a small interval. vswitchd ran without problems for 48+ hours,
obviously with all the handler threads with almost 99% CPU usage.

[1] https://github.com/teknoraver/network-tools/blob/master/weed.c

Signed-off-by: Matteo Croce 
---
v1 -> v2:
  - define EPOLLEXCLUSIVE on systems with older kernel headers
  - explain the thread safety test in the commit message

 lib/dpif-netlink.c | 311 -
 1 file changed, 82 insertions(+), 229 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index e6d5a6ec5..bb565ffee 100644
--- a/lib/dpif-netlink.c
++

Re: [ovs-dev] [PATCH] dpif-netlink: don't allocate per thread netlink sockets

2018-09-23 Thread Matteo Croce
On Fri, Sep 21, 2018 at 8:41 PM Ben Pfaff  wrote:
>
> On Wed, Sep 19, 2018 at 02:47:03PM +0200, Matteo Croce wrote:
> > When using the kernel datapath, OVS allocates a pool of sockets to handle
> > netlink events. The number of sockets is: ports * n-handler-threads, where
> > n-handler-threads is user configurable and defaults to 3/4*number of cores.
> >
> > This because vswitchd starts n-handler-threads threads, each one with a
> > netlink socket for every port of the switch. Every thread then, starts
> > listening on events on its set of sockets with epoll().
> >
> > On setup with lot of CPUs and ports, the number of sockets easily hits
> > the process file descriptor limit, and ovs-vswitchd will exit with -EMFILE.
> >
> > Change the number of allocated sockets to just one per port by moving
> > the socket array from a per handler structure to a per datapath one,
> > and let all the handlers share the same sockets by using EPOLLEXCLUSIVE
> > epoll flag which avoids duplicate events, on systems that support it.
>
> Thanks a lot for working on this.  OVS clearly uses too many fds in some
> situations and we need to fix it.
>
> I noticed that the check for EPOLLEXCLUSIVE is only a build-time check,
> using #ifdef.  This will only be effective if the build system is where
> OVS runs and only if the build system uses its own kernel headers for
> the build.  Usually, in OVS, we instead try to always enable a feature
> at build time, e.g. through something like:
>
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (whatever)
> #endif
>
> and then at runtime test whether the feature is actually available and
> try to use it.
>

Hi Ben,

Thanks for the review.

Got it, will do it in a v2

BTW, one of my tests was to run the binary compiled against the latest
kernel header on:
- RHEL 7.2, which has a 3.10 kernel without EPOLLEXCLUSIVE backported
(it was backported in 7.3)
- Devuan Jessie, which has a 3.16 kernel.
I ran strace on vswitchd and I clearly saw this call to epoll_ctl()
with an unknown flag:

epoll_ctl(22, EPOLL_CTL_ADD, 23, {EPOLLIN|0x1000, {u32=0, u64=0}}) = 0

which means that the call doesn't fail and the extra flag is ignored.

> Is EPOLLEXCLUSIVE safe, that is, can we miss events or end up sleeping
> before processing them?  If it is safe, then what invariants does OVS
> need to maintain to assure them?
>

What do you mean by safe? I've skimmed the kernel code, the flag will
just let the kernel use add_wait_queue_exclusive() instead of
add_wait_queue(), so the raised or eventually missed events should be
the same, just notified to only one handler.

> With this commit, the "hash" argument to dpif_port_get_pid() is never
> used.  The overall design of the function can be simplified.  That could
> be folded into this patch or go in a later patch.
>

Yes I noticed it, but didn't want to mess with generic code, so I
simply used OVS_UNUSED to suppress compiler warnings.
Such refactor can be done as a second step.

> Does this patch maintain the same level of thread safety as before?
>

I thinked a lot about it. Reading from a socket is thread safe, I
highly doubt that a netlink datagram can be split and received half
and half by two handlers.
Also, the code which adds or removes sockets is protected with
OVS_REQ_WRLOCK(dpif->upcall_lock) so everything should be safe as
before.

I ran this test: on a 56 core machine, I created a bridge with 2200
dummy interfaces and 22 veth, then started 22 mausezahn on every veth
peer outside the bridge.
All packets were forced to the slowpath with action=controller, and
the whole test lasted 48 hours, no errors found so far.

As always, four eyes are better than two, if you find something
suspicious, I'll try to sort it out.

Regards,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpif-netlink: don't allocate per thread netlink sockets

2018-09-19 Thread Matteo Croce
When using the kernel datapath, OVS allocates a pool of sockets to handle
netlink events. The number of sockets is: ports * n-handler-threads, where
n-handler-threads is user configurable and defaults to 3/4*number of cores.

This because vswitchd starts n-handler-threads threads, each one with a
netlink socket for every port of the switch. Every thread then, starts
listening on events on its set of sockets with epoll().

On setup with lot of CPUs and ports, the number of sockets easily hits
the process file descriptor limit, and ovs-vswitchd will exit with -EMFILE.

Change the number of allocated sockets to just one per port by moving
the socket array from a per handler structure to a per datapath one,
and let all the handlers share the same sockets by using EPOLLEXCLUSIVE
epoll flag which avoids duplicate events, on systems that support it.

The patch was tested on a 56 core machine running Linux 4.18 and latest
Open vSwitch. A bridge was created with 2000+ ports, some of them being
veth interfaces with the peer outside the bridge. The latency of the upcall
is measured by setting a single 'action=controller,local' OpenFlow rule to
force all the packets going to the slow path and then to the local port.
A tool[1] injects some packets to the veth outside the bridge, and measures
the delay until the packet is captured on the local port. The rx timestamp
is get from the socket ancillary data in the attribute SO_TIMESTAMPNS, to
avoid having the scheduler delay in the measured time.

The first test measures the average latency for an upcall generated from
a single port. To measure it 100k packets, one every msec, are sent to a
single port and the latencies are measured.

The second test is meant to check latency fairness among ports, namely if
latency is equal between ports or if some ports have lower priority.
The previous test is repeated for every port, the average of the average
latencies and the standard deviation between averages is measured.

The third test serves to measure responsiveness under load. Heavy traffic
is sent through all ports, latency and packet loss is measured
on a single idle port.

The fourth test is all about fairness. Heavy traffic is injected in all
ports but one, latency and packet loss is measured on the single idle port.

This is the test setup:

  # nproc
  56
  # ovs-vsctl show |grep -c Port
  2223
  # ovs-ofctl dump-flows ovs_upc_br
   cookie=0x0, duration=4.827s, table=0, n_packets=0, n_bytes=0, 
actions=CONTROLLER:65535,LOCAL
  # uname -a
  Linux fc28 4.18.7-200.fc28.x86_64 #1 SMP Mon Sep 10 15:44:45 UTC 2018 x86_64 
x86_64 x86_64 GNU/Linux

And these are the results of the tests:

  Stock OVS Patched
  netlink sockets
  in use by vswitchd
  lsof -p $(pidof ovs-vswitchd) \
  |grep -c GENERIC911872227

  Test 1
  one port latency
  min/avg/max/mdev (us)   2.7/6.6/238.7/1.8   1.6/6.8/160.6/1.7

  Test 2
  all port
  avg latency/mdev (us)   6.51/0.97   6.86/0.17

  Test 3
  single port latency
  under load
  avg/mdev (us) 7.5/5.9 3.8/4.8
  packet loss  95 %62 %

  Test 4
  idle port latency
  under load
  min/avg/max/mdev (us)   0.8/1.5/210.5/0.9   1.0/2.1/344.5/1.2
  packet loss  94 % 4 %

CPU and RAM usage seems not to be affected, the resource usage of vswitchd
idle with 2000+ ports is unchanged:

  # ps u $(pidof ovs-vswitchd)
  USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
  openvsw+  5430 54.3  0.3 4263964 510968 pts/1  RLl+ 16:20   0:50 ovs-vswitchd

[1] https://github.com/teknoraver/network-tools/blob/master/weed.c

Signed-off-by: Matteo Croce 
---
 lib/dpif-netlink.c | 308 -
 1 file changed, 80 insertions(+), 228 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index e6d5a6ec5..79036bf0c 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -170,7 +170,6 @@ struct dpif_windows_vport_sock {
 #endif
 
 struct dpif_handler {
-struct dpif_channel *channels;/* Array of channels for each handler. */
 struct epoll_event *epoll_events;
 int epoll_fd; /* epoll fd that includes channel socks. */
 int n_events; /* Num events returned by epoll_wait(). */
@@ -193,6 +192,7 @@ struct dpif_netlink {
 struct fat_rwlock upcall_lock;
 struct dpif_handler *handlers;
 uint32_t n_handlers;   /* Num of upcall handlers. */
+struct dpif_channel *channels; /* Array of channels for each port. */
 int uc_array_size; /* Size of 'handler->channels' and */
/* 'handler->epoll_events'. */
 
@@ -331,43 +331,6 @@ open_dpif(const struct dpif_netlink_dp *dp, struct dpif 
*

Re: [ovs-dev] [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-07-31 Thread Matteo Croce
On Mon, Jul 16, 2018 at 4:54 PM Matteo Croce  wrote:
>
> On Tue, Jul 10, 2018 at 6:31 PM Pravin Shelar  wrote:
> >
> > On Wed, Jul 4, 2018 at 7:23 AM, Matteo Croce  wrote:
> > > From: Stefano Brivio 
> > >
> > > Open vSwitch sends to userspace all received packets that have
> > > no associated flow (thus doing an "upcall"). Then the userspace
> > > program creates a new flow and determines the actions to apply
> > > based on its configuration.
> > >
> > > When a single port generates a high rate of upcalls, it can
> > > prevent other ports from dispatching their own upcalls. vswitchd
> > > overcomes this problem by creating many netlink sockets for each
> > > port, but it quickly exceeds any reasonable maximum number of
> > > open files when dealing with huge amounts of ports.
> > >
> > > This patch queues all the upcalls into a list, ordering them in
> > > a per-port round-robin fashion, and schedules a deferred work to
> > > queue them to userspace.
> > >
> > > The algorithm to queue upcalls in a round-robin fashion,
> > > provided by Stefano, is based on these two rules:
> > >  - upcalls for a given port must be inserted after all the other
> > >occurrences of upcalls for the same port already in the queue,
> > >in order to avoid out-of-order upcalls for a given port
> > >  - insertion happens once the highest upcall count for any given
> > >port (excluding the one currently at hand) is greater than the
> > >count for the port we're queuing to -- if this condition is
> > >never true, upcall is queued at the tail. This results in a
> > >per-port round-robin order.
> > >
> > > In order to implement a fair round-robin behaviour, a variable
> > > queueing delay is introduced. This will be zero if the upcalls
> > > rate is below a given threshold, and grows linearly with the
> > > queue utilisation (i.e. upcalls rate) otherwise.
> > >
> > > This ensures fairness among ports under load and with few
> > > netlink sockets.
> > >
> > Thanks for the patch.
> > This patch is adding following overhead for upcall handling:
> > 1. kmalloc.
> > 2. global spin-lock.
> > 3. context switch to single worker thread.
> > I think this could become bottle neck on most of multi core systems.
> > You have mentioned issue with existing fairness mechanism, Can you
> > elaborate on those, I think we could improve that before implementing
> > heavy weight fairness in upcall handling.
>
> Hi Pravin,
>
> vswitchd allocates N * P netlink sockets, where N is the number of
> online CPU cores, and P the number of ports.
> With some setups, this number can grow quite fast, also exceeding the
> system maximum file descriptor limit.
> I've seen a 48 core server failing with -EMFILE when trying to create
> more than 65535 netlink sockets needed for handling 1800+ ports.
>
> I made a previous attempt to reduce the sockets to one per CPU, but
> this was discussed and rejected on ovs-dev because it would remove
> fairness among ports[1].
> I think that the current approach of opening a huge number of sockets
> doesn't really work, (it doesn't scale for sure), it still needs some
> queueing logic (either in kernel or user space) if we really want to
> be sure that low traffic ports gets their upcalls quota when other
> ports are doing way more traffic.
>
> If you are concerned about the kmalloc or spinlock, we can solve them
> with kmem_cache or two copies of the list and rcu, I'll happy to
> discuss the implementation details, as long as we all agree that the
> current implementation doesn't scale well and has an issue.
>
> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344279.html
>
> --
> Matteo Croce
> per aspera ad upstream

Hi all,

any idea on how to solve the file descriptor limit hit by the netlink sockets?
I see this issue happen very often, and raising the FD limit to 400k
seems not the right way to solve it.
Any other suggestion on how to improve the patch, or solve the problem
in a different way?

Regards,



--
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-07-16 Thread Matteo Croce
On Tue, Jul 10, 2018 at 6:31 PM Pravin Shelar  wrote:
>
> On Wed, Jul 4, 2018 at 7:23 AM, Matteo Croce  wrote:
> > From: Stefano Brivio 
> >
> > Open vSwitch sends to userspace all received packets that have
> > no associated flow (thus doing an "upcall"). Then the userspace
> > program creates a new flow and determines the actions to apply
> > based on its configuration.
> >
> > When a single port generates a high rate of upcalls, it can
> > prevent other ports from dispatching their own upcalls. vswitchd
> > overcomes this problem by creating many netlink sockets for each
> > port, but it quickly exceeds any reasonable maximum number of
> > open files when dealing with huge amounts of ports.
> >
> > This patch queues all the upcalls into a list, ordering them in
> > a per-port round-robin fashion, and schedules a deferred work to
> > queue them to userspace.
> >
> > The algorithm to queue upcalls in a round-robin fashion,
> > provided by Stefano, is based on these two rules:
> >  - upcalls for a given port must be inserted after all the other
> >occurrences of upcalls for the same port already in the queue,
> >in order to avoid out-of-order upcalls for a given port
> >  - insertion happens once the highest upcall count for any given
> >port (excluding the one currently at hand) is greater than the
> >count for the port we're queuing to -- if this condition is
> >never true, upcall is queued at the tail. This results in a
> >per-port round-robin order.
> >
> > In order to implement a fair round-robin behaviour, a variable
> > queueing delay is introduced. This will be zero if the upcalls
> > rate is below a given threshold, and grows linearly with the
> > queue utilisation (i.e. upcalls rate) otherwise.
> >
> > This ensures fairness among ports under load and with few
> > netlink sockets.
> >
> Thanks for the patch.
> This patch is adding following overhead for upcall handling:
> 1. kmalloc.
> 2. global spin-lock.
> 3. context switch to single worker thread.
> I think this could become bottle neck on most of multi core systems.
> You have mentioned issue with existing fairness mechanism, Can you
> elaborate on those, I think we could improve that before implementing
> heavy weight fairness in upcall handling.

Hi Pravin,

vswitchd allocates N * P netlink sockets, where N is the number of
online CPU cores, and P the number of ports.
With some setups, this number can grow quite fast, also exceeding the
system maximum file descriptor limit.
I've seen a 48 core server failing with -EMFILE when trying to create
more than 65535 netlink sockets needed for handling 1800+ ports.

I made a previous attempt to reduce the sockets to one per CPU, but
this was discussed and rejected on ovs-dev because it would remove
fairness among ports[1].
I think that the current approach of opening a huge number of sockets
doesn't really work, (it doesn't scale for sure), it still needs some
queueing logic (either in kernel or user space) if we really want to
be sure that low traffic ports gets their upcalls quota when other
ports are doing way more traffic.

If you are concerned about the kmalloc or spinlock, we can solve them
with kmem_cache or two copies of the list and rcu, I'll happy to
discuss the implementation details, as long as we all agree that the
current implementation doesn't scale well and has an issue.

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344279.html

--
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-07-04 Thread Matteo Croce
From: Stefano Brivio 

Open vSwitch sends to userspace all received packets that have
no associated flow (thus doing an "upcall"). Then the userspace
program creates a new flow and determines the actions to apply
based on its configuration.

When a single port generates a high rate of upcalls, it can
prevent other ports from dispatching their own upcalls. vswitchd
overcomes this problem by creating many netlink sockets for each
port, but it quickly exceeds any reasonable maximum number of
open files when dealing with huge amounts of ports.

This patch queues all the upcalls into a list, ordering them in
a per-port round-robin fashion, and schedules a deferred work to
queue them to userspace.

The algorithm to queue upcalls in a round-robin fashion,
provided by Stefano, is based on these two rules:
 - upcalls for a given port must be inserted after all the other
   occurrences of upcalls for the same port already in the queue,
   in order to avoid out-of-order upcalls for a given port
 - insertion happens once the highest upcall count for any given
   port (excluding the one currently at hand) is greater than the
   count for the port we're queuing to -- if this condition is
   never true, upcall is queued at the tail. This results in a
   per-port round-robin order.

In order to implement a fair round-robin behaviour, a variable
queueing delay is introduced. This will be zero if the upcalls
rate is below a given threshold, and grows linearly with the
queue utilisation (i.e. upcalls rate) otherwise.

This ensures fairness among ports under load and with few
netlink sockets.

Signed-off-by: Matteo Croce 
Co-authored-by: Stefano Brivio 
---
 net/openvswitch/datapath.c | 143 ++---
 net/openvswitch/datapath.h |  27 ++-
 2 files changed, 161 insertions(+), 9 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0f5ce77460d4..2cfd504562d8 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -59,6 +59,10 @@
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
 
+#define UPCALL_QUEUE_TIMEOUT   msecs_to_jiffies(10)
+#define UPCALL_QUEUE_MAX_DELAY msecs_to_jiffies(10)
+#define UPCALL_QUEUE_MAX_LEN   200
+
 unsigned int ovs_net_id __read_mostly;
 
 static struct genl_family dp_packet_genl_family;
@@ -225,6 +229,116 @@ void ovs_dp_detach_port(struct vport *p)
ovs_vport_del(p);
 }
 
+static void ovs_dp_upcall_dequeue(struct work_struct *work)
+{
+   struct datapath *dp = container_of(work, struct datapath,
+  upcalls.work.work);
+   struct dp_upcall_info *u, *n;
+
+   spin_lock_bh(>upcalls.lock);
+   list_for_each_entry_safe(u, n, >upcalls.list, list) {
+   if (unlikely(ovs_dp_upcall(dp, u->skb, >key, u, 0)))
+   kfree_skb(u->skb);
+   else
+   consume_skb(u->skb);
+   kfree(u);
+   }
+   dp->upcalls.len = 0;
+   INIT_LIST_HEAD(>upcalls.list);
+   spin_unlock_bh(>upcalls.lock);
+}
+
+/* Calculate the delay of the deferred work which sends the upcalls. If it ran
+ * more than UPCALL_QUEUE_TIMEOUT ago, schedule the work immediately. Otherwise
+ * return a time between 0 and UPCALL_QUEUE_MAX_DELAY, depending linearly on 
the
+ * queue utilisation.
+ */
+static unsigned long ovs_dp_upcall_delay(int queue_len, unsigned long last_run)
+{
+   if (jiffies - last_run >= UPCALL_QUEUE_TIMEOUT)
+   return 0;
+
+   return UPCALL_QUEUE_MAX_DELAY -
+  UPCALL_QUEUE_MAX_DELAY * queue_len / UPCALL_QUEUE_MAX_LEN;
+}
+
+static int ovs_dp_upcall_queue_roundrobin(struct datapath *dp,
+ struct dp_upcall_info *upcall)
+{
+   struct list_head *head = >upcalls.list;
+   struct dp_upcall_info *here = NULL, *pos;
+   bool find_next = true;
+   unsigned long delay;
+   int err = 0;
+   u8 count;
+
+   spin_lock_bh(>upcalls.lock);
+   if (dp->upcalls.len > UPCALL_QUEUE_MAX_LEN) {
+   err = -ENOSPC;
+   goto out;
+   }
+
+   /* Insert upcalls in the list in a per-port round-robin fashion, look
+* for insertion point:
+* - to avoid out-of-order per-port upcalls, we can insert only after
+*   the last occurrence of upcalls for the same port
+* - insert upcall only after we reach a count of occurrences for a
+*   given port greater than the one we're inserting this upcall for
+*/
+   list_for_each_entry(pos, head, list) {
+   /* Count per-port upcalls. */
+   if (dp->upcalls.count[pos->port_no] == U8_MAX - 1) {
+   err = -ENOSPC;
+   goto out_clear;
+   }
+   dp->upcalls.count[pos->port_no]++;
+
+   if (pos->port_no == upcall-

[ovs-dev] [PATCH] docs: fix the documentation of n-handler-threads and n-revalidator-threads

2018-02-06 Thread Matteo Croce
Documentation for both n-handler-threads and n-revalidator-threads was
wrong, the spawned threads are per port instead of per datapath.

Update the docs according to vswitchd/ovs-vswitchd.8.in and code behaviour.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 vswitchd/vswitch.xml | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 0c6a43d60..4b3d7610a 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -379,15 +379,12 @@
   type='{"type": "integer", "minInteger": 1}'>
 
   Specifies the number of threads for software datapaths to use for
-  handling new flows.  The default the number of online CPU cores minus
+  handling new flows.  The default is the number of online CPU cores 
minus
   the number of revalidators.
 
 
-  This configuration is per datapath.  If you have more than one
-  software datapath (e.g. some system bridges and some
-  netdev bridges), then the total number of threads is
-  n-handler-threads times the number of software
-  datapaths.
+  This configuration is per port, the total number of threads is
+  n-handler-threads times the number of ports.
 
   
 
@@ -403,11 +400,8 @@
   of handler threads.
 
 
-  This configuration is per datapath.  If you have more than one
-  software datapath (e.g. some system bridges and some
-  netdev bridges), then the total number of threads is
-  n-handler-threads times the number of software
-  datapaths.
+  This configuration is per port: the total number of threads is
+  n-revalidator-threads times the number of ports.
 
   
 
-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] vswitchd: show DPDK version

2018-01-22 Thread Matteo Croce
On Mon, Jan 15, 2018 at 7:21 PM, Matteo Croce <mcr...@redhat.com> wrote:
> Show DPDK version if Open vSwitch is compiled with DPDK support.
> Version can be retrieved with `ovs-switchd --version` or from OVS logs.
> Small change in ovs-ctl to avoid breakage on output change.
>
> Signed-off-by: Matteo Croce <mcr...@redhat.com>
> ---
> v3:
>  print version in OVS logs too
>  don't save the version in the DB
>  use dpdk-stub.c instead of #ifdef
>
> v4:
>  fix only patch subject, I left ovs-vsctl from the older v1/v2, sorry
>
>  lib/dpdk-stub.c | 5 +
>  lib/dpdk.c  | 8 
>  lib/dpdk.h  | 1 +
>  utilities/ovs-ctl.in| 2 +-
>  vswitchd/ovs-vswitchd.c | 1 +
>  5 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
> index 36021807c..041cd0cbb 100644
> --- a/lib/dpdk-stub.c
> +++ b/lib/dpdk-stub.c
> @@ -54,3 +54,8 @@ dpdk_vhost_iommu_enabled(void)
>  {
>  return false;
>  }
> +
> +void
> +print_dpdk_version(void)
> +{
> +}
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index 6710d10fc..3f5a55fc1 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -24,6 +24,7 @@
>
>  #include 
>  #include 
> +#include 
>  #ifdef DPDK_PDUMP
>  #include 
>  #include 
> @@ -471,6 +472,7 @@ dpdk_init(const struct smap *ovs_other_config)
>  static struct ovsthread_once once_enable = 
> OVSTHREAD_ONCE_INITIALIZER;
>
>  if (ovsthread_once_start(_enable)) {
> +VLOG_INFO("Using %s", rte_version());
>  VLOG_INFO("DPDK Enabled - initializing...");
>  dpdk_init__(ovs_other_config);
>  enabled = true;
> @@ -501,3 +503,9 @@ dpdk_set_lcore_id(unsigned cpu)
>  ovs_assert(cpu != NON_PMD_CORE_ID);
>  RTE_PER_LCORE(_lcore_id) = cpu;
>  }
> +
> +void
> +print_dpdk_version(void)
> +{
> +puts(rte_version());
> +}
> diff --git a/lib/dpdk.h b/lib/dpdk.h
> index dc58d968a..b04153591 100644
> --- a/lib/dpdk.h
> +++ b/lib/dpdk.h
> @@ -38,5 +38,6 @@ void dpdk_init(const struct smap *ovs_other_config);
>  void dpdk_set_lcore_id(unsigned cpu);
>  const char *dpdk_get_vhost_sock_dir(void);
>  bool dpdk_vhost_iommu_enabled(void);
> +void print_dpdk_version(void);
>
>  #endif /* dpdk.h */
> diff --git a/utilities/ovs-ctl.in b/utilities/ovs-ctl.in
> index 1df56c4a5..ef06dd967 100755
> --- a/utilities/ovs-ctl.in
> +++ b/utilities/ovs-ctl.in
> @@ -72,7 +72,7 @@ set_hostname () {
>  set_system_ids () {
>  set ovs_vsctl set Open_vSwitch .
>
> -OVS_VERSION=`ovs-vswitchd --version | sed 's/.*) //;1q'`
> +OVS_VERSION=`ovs-vswitchd --version | awk '/Open vSwitch/{print $NF}'`
>  set "$@" ovs-version="$OVS_VERSION"
>
>  case $SYSTEM_ID in
> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
> index d5e07c037..a5e8f4d39 100644
> --- a/vswitchd/ovs-vswitchd.c
> +++ b/vswitchd/ovs-vswitchd.c
> @@ -187,6 +187,7 @@ parse_options(int argc, char *argv[], char 
> **unixctl_pathp)
>
>  case 'V':
>  ovs_print_version(0, 0);
> +print_dpdk_version();
>  exit(EXIT_SUCCESS);
>
>  case OPT_MLOCKALL:
> --
> 2.14.3
>

If you consider applying it, please fix the typo in the commit message:
ovs-vswitchd instead of ovs-switchd

Thanks
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] dpif-netlink: don't allocate per port netlink sockets

2018-01-22 Thread Matteo Croce
When using the kernel datapath OVS allocates a pool of sockets to handle
netlink events. The number of sockets is: ports * n-handler-threads.
n-handler-threads is user configurable but defaults to the number of cores.

On machines with lot of CPUs and ports, the number of sockets easily hits
the process file descriptor limit, which is 65536, and will cause
ovs-vswitchd to abort.

Change the number of allocated sockets to just n-handler-threads which,
if set slighty greater than the number of cores, is enough to handle to
handler the netlink events.

Replace the struct dpif_channel array with a single dpif_channel instance
and edit the code accordingly. By doing so, the port idx information is lost
in upcall event code, so the call to report_loss() must be commented, as it
already is in the Windows code path.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 lib/dpif-netlink.c | 167 ++---
 1 file changed, 55 insertions(+), 112 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index f8d75eb45..c13d503f8 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -167,11 +167,10 @@ struct dpif_windows_vport_sock {
 #endif
 
 struct dpif_handler {
-struct dpif_channel *channels;/* Array of channels for each handler. */
-struct epoll_event *epoll_events;
+struct dpif_channel channels;
+struct epoll_event epoll_events;
 int epoll_fd; /* epoll fd that includes channel socks. */
 int n_events; /* Num events returned by epoll_wait(). */
-int event_offset; /* Offset into 'epoll_events'. */
 
 #ifdef _WIN32
 /* Pool of sockets. */
@@ -190,16 +189,16 @@ struct dpif_netlink {
 struct fat_rwlock upcall_lock;
 struct dpif_handler *handlers;
 uint32_t n_handlers;   /* Num of upcall handlers. */
-int uc_array_size; /* Size of 'handler->channels' and */
-   /* 'handler->epoll_events'. */
 
 /* Change notification. */
 struct nl_sock *port_notifier; /* vport multicast group subscriber. */
 bool refresh_channels;
 };
 
+/*
 static void report_loss(struct dpif_netlink *, struct dpif_channel *,
 uint32_t ch_idx, uint32_t handler_id);
+*/
 
 static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(, 5);
 
@@ -508,8 +507,7 @@ vport_socksp_to_pids(struct nl_sock **socksp, uint32_t 
n_socks)
 /* Given the port number 'port_idx', extracts the pids of netlink sockets
  * associated to the port and assigns it to 'upcall_pids'. */
 static bool
-vport_get_pids(struct dpif_netlink *dpif, uint32_t port_idx,
-   uint32_t **upcall_pids)
+vport_get_pids(struct dpif_netlink *dpif, uint32_t **upcall_pids)
 {
 uint32_t *pids;
 size_t i;
@@ -517,7 +515,7 @@ vport_get_pids(struct dpif_netlink *dpif, uint32_t port_idx,
 /* Since the nl_sock can only be assigned in either all
  * or none "dpif->handlers" channels, the following check
  * would suffice. */
-if (!dpif->handlers[0].channels[port_idx].sock) {
+if (!dpif->handlers[0].channels.sock) {
 return false;
 }
 ovs_assert(!WINDOWS || dpif->n_handlers <= 1);
@@ -525,7 +523,7 @@ vport_get_pids(struct dpif_netlink *dpif, uint32_t port_idx,
 pids = xzalloc(dpif->n_handlers * sizeof *pids);
 
 for (i = 0; i < dpif->n_handlers; i++) {
-pids[i] = nl_sock_pid(dpif->handlers[i].channels[port_idx].sock);
+pids[i] = nl_sock_pid(dpif->handlers[i].channels.sock);
 }
 
 *upcall_pids = pids;
@@ -542,37 +540,17 @@ vport_add_channels(struct dpif_netlink *dpif, odp_port_t 
port_no,
 size_t i, j;
 int error;
 
-if (dpif->handlers == NULL) {
+/* Since the sock can only be assigned in either all or none
+ * of "dpif->handlers" channels, the following check would
+ * suffice. */
+if (dpif->handlers == NULL || dpif->handlers[0].channels.sock) {
 return 0;
 }
 
-/* We assume that the datapath densely chooses port numbers, which can
- * therefore be used as an index into 'channels' and 'epoll_events' of
- * 'dpif->handler'. */
-if (port_idx >= dpif->uc_array_size) {
-uint32_t new_size = port_idx + 1;
-
-if (new_size > MAX_PORTS) {
-VLOG_WARN_RL(_rl, "%s: datapath port %"PRIu32" too big",
- dpif_name(>dpif), port_no);
-return EFBIG;
-}
-
-for (i = 0; i < dpif->n_handlers; i++) {
-struct dpif_handler *handler = >handlers[i];
-
-handler->channels = xrealloc(handler->channels,
- new_size * sizeof *handler->channels);
-
-for (j = dpif->uc_array_size; j < new_size; j++) {
-handler->channels[j].sock = NULL;
-}
-
-   

[ovs-dev] [PATCH v4] vswitchd: show DPDK version

2018-01-15 Thread Matteo Croce
Show DPDK version if Open vSwitch is compiled with DPDK support.
Version can be retrieved with `ovs-switchd --version` or from OVS logs.
Small change in ovs-ctl to avoid breakage on output change.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
v3:
 print version in OVS logs too
 don't save the version in the DB
 use dpdk-stub.c instead of #ifdef

v4:
 fix only patch subject, I left ovs-vsctl from the older v1/v2, sorry

 lib/dpdk-stub.c | 5 +
 lib/dpdk.c  | 8 
 lib/dpdk.h  | 1 +
 utilities/ovs-ctl.in| 2 +-
 vswitchd/ovs-vswitchd.c | 1 +
 5 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index 36021807c..041cd0cbb 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -54,3 +54,8 @@ dpdk_vhost_iommu_enabled(void)
 {
 return false;
 }
+
+void
+print_dpdk_version(void)
+{
+}
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 6710d10fc..3f5a55fc1 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -24,6 +24,7 @@
 
 #include 
 #include 
+#include 
 #ifdef DPDK_PDUMP
 #include 
 #include 
@@ -471,6 +472,7 @@ dpdk_init(const struct smap *ovs_other_config)
 static struct ovsthread_once once_enable = OVSTHREAD_ONCE_INITIALIZER;
 
 if (ovsthread_once_start(_enable)) {
+VLOG_INFO("Using %s", rte_version());
 VLOG_INFO("DPDK Enabled - initializing...");
 dpdk_init__(ovs_other_config);
 enabled = true;
@@ -501,3 +503,9 @@ dpdk_set_lcore_id(unsigned cpu)
 ovs_assert(cpu != NON_PMD_CORE_ID);
 RTE_PER_LCORE(_lcore_id) = cpu;
 }
+
+void
+print_dpdk_version(void)
+{
+puts(rte_version());
+}
diff --git a/lib/dpdk.h b/lib/dpdk.h
index dc58d968a..b04153591 100644
--- a/lib/dpdk.h
+++ b/lib/dpdk.h
@@ -38,5 +38,6 @@ void dpdk_init(const struct smap *ovs_other_config);
 void dpdk_set_lcore_id(unsigned cpu);
 const char *dpdk_get_vhost_sock_dir(void);
 bool dpdk_vhost_iommu_enabled(void);
+void print_dpdk_version(void);
 
 #endif /* dpdk.h */
diff --git a/utilities/ovs-ctl.in b/utilities/ovs-ctl.in
index 1df56c4a5..ef06dd967 100755
--- a/utilities/ovs-ctl.in
+++ b/utilities/ovs-ctl.in
@@ -72,7 +72,7 @@ set_hostname () {
 set_system_ids () {
 set ovs_vsctl set Open_vSwitch .
 
-OVS_VERSION=`ovs-vswitchd --version | sed 's/.*) //;1q'`
+OVS_VERSION=`ovs-vswitchd --version | awk '/Open vSwitch/{print $NF}'`
 set "$@" ovs-version="$OVS_VERSION"
 
 case $SYSTEM_ID in
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index d5e07c037..a5e8f4d39 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -187,6 +187,7 @@ parse_options(int argc, char *argv[], char **unixctl_pathp)
 
 case 'V':
 ovs_print_version(0, 0);
+print_dpdk_version();
 exit(EXIT_SUCCESS);
 
 case OPT_MLOCKALL:
-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] ovs-vsctl: show DPDK version

2018-01-15 Thread Matteo Croce
Show DPDK version if Open vSwitch is compiled with DPDK support.
Version can be retrieved with `ovs-switchd --version` or from OVS logs.
Small change in ovs-ctl to avoid breakage on output change.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
v3:
 print version in OVS logs too
 don't save the version in the DB
 use dpdk-stub.c instead of #ifdef

 lib/dpdk-stub.c | 5 +
 lib/dpdk.c  | 8 
 lib/dpdk.h  | 1 +
 utilities/ovs-ctl.in| 2 +-
 vswitchd/ovs-vswitchd.c | 1 +
 5 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index 36021807c..041cd0cbb 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -54,3 +54,8 @@ dpdk_vhost_iommu_enabled(void)
 {
 return false;
 }
+
+void
+print_dpdk_version(void)
+{
+}
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 6710d10fc..3f5a55fc1 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -24,6 +24,7 @@
 
 #include 
 #include 
+#include 
 #ifdef DPDK_PDUMP
 #include 
 #include 
@@ -471,6 +472,7 @@ dpdk_init(const struct smap *ovs_other_config)
 static struct ovsthread_once once_enable = OVSTHREAD_ONCE_INITIALIZER;
 
 if (ovsthread_once_start(_enable)) {
+VLOG_INFO("Using %s", rte_version());
 VLOG_INFO("DPDK Enabled - initializing...");
 dpdk_init__(ovs_other_config);
 enabled = true;
@@ -501,3 +503,9 @@ dpdk_set_lcore_id(unsigned cpu)
 ovs_assert(cpu != NON_PMD_CORE_ID);
 RTE_PER_LCORE(_lcore_id) = cpu;
 }
+
+void
+print_dpdk_version(void)
+{
+puts(rte_version());
+}
diff --git a/lib/dpdk.h b/lib/dpdk.h
index dc58d968a..b04153591 100644
--- a/lib/dpdk.h
+++ b/lib/dpdk.h
@@ -38,5 +38,6 @@ void dpdk_init(const struct smap *ovs_other_config);
 void dpdk_set_lcore_id(unsigned cpu);
 const char *dpdk_get_vhost_sock_dir(void);
 bool dpdk_vhost_iommu_enabled(void);
+void print_dpdk_version(void);
 
 #endif /* dpdk.h */
diff --git a/utilities/ovs-ctl.in b/utilities/ovs-ctl.in
index 1df56c4a5..ef06dd967 100755
--- a/utilities/ovs-ctl.in
+++ b/utilities/ovs-ctl.in
@@ -72,7 +72,7 @@ set_hostname () {
 set_system_ids () {
 set ovs_vsctl set Open_vSwitch .
 
-OVS_VERSION=`ovs-vswitchd --version | sed 's/.*) //;1q'`
+OVS_VERSION=`ovs-vswitchd --version | awk '/Open vSwitch/{print $NF}'`
 set "$@" ovs-version="$OVS_VERSION"
 
 case $SYSTEM_ID in
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index d5e07c037..a5e8f4d39 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -187,6 +187,7 @@ parse_options(int argc, char *argv[], char **unixctl_pathp)
 
 case 'V':
 ovs_print_version(0, 0);
+print_dpdk_version();
 exit(EXIT_SUCCESS);
 
 case OPT_MLOCKALL:
-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ovs-vsctl: show DPDK version

2018-01-04 Thread Matteo Croce
On Thu, Jan 4, 2018 at 11:50 AM, Vishal Deep Ajmera
<vishal.deep.ajm...@ericsson.com> wrote:
> Hi Matteo,
>
> Can we also have DPDK version in the logs (VLOG_INFO) when DPDK is 
> initialized ? May be in dpdk_init() function. This will help in identifying 
> the correct DPDK version from the log files while debugging issues.
>
> Warm Regards,
> Vishal Ajmera
>

Sure, makes sense.

Thanks,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ovs-vsctl: show DPDK version

2018-01-02 Thread Matteo Croce
On Tue, Jan 2, 2018 at 6:08 PM, Aaron Conole <acon...@redhat.com> wrote:
> Hi Matteo,
>
> Matteo Croce <mcr...@redhat.com> writes:
>
>> Show DPDK version if Open vSwitch is compiled with DPDK support.
>> Add a `dpdk_version` column in the datamodel, change ovs-ctl and ovs-dev.py
>> to populate the field with the right value.
>>
>> Signed-off-by: Matteo Croce <mcr...@redhat.com>
>> ---
>
> Sorry I missed some of the development for this - I think it's better to
> have the dpdk version retrieved via an appctl command.  I can't think of
> a need for this information to be persistent - if it's for historical
> information when debugging, we can put it in with the logs when dpdk is
> started.  The user cannot influence DPDK version dynamically - it is not
> just read-only, but it's read-only from a compile time decision.  I
> might be misunderstanding the point of putting this in the DB, though?

Hi Aaron,

I was threating dpdk_version the same way as ovs_version.
ovs_version is saved into the DB by ovs-ctl and it isn't read-only either.
What about putting it only in `ovs-vswitchd --version` output and not
into the DB?

-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ovs-vsctl: show DPDK version

2017-12-26 Thread Matteo Croce
On Tue, Dec 26, 2017 at 6:38 PM, Ben Pfaff <b...@ovn.org> wrote:
> On Mon, Dec 25, 2017 at 03:31:02PM +0100, Matteo Croce wrote:
>> Show DPDK version if Open vSwitch is compiled with DPDK support.
>> Add a `dpdk_version` column in the datamodel, change ovs-ctl and ovs-dev.py
>> to populate the field with the right value.
>>
>> Signed-off-by: Matteo Croce <mcr...@redhat.com>
>
> OK, this is better, but why does ovs-vswitchd need a shell helper for
> this?  It should just set the key itself.

On ovs-vswitchd startup?
I think I set dpdk_version the same way ovs_version was set.
I can do it in ovs-vswitchd startup but in that case ovs_version
should be set in ovs-vswitchd too.

Cheers,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] ovs-vsctl: show DPDK version

2017-12-25 Thread Matteo Croce
Show DPDK version if Open vSwitch is compiled with DPDK support.
Add a `dpdk_version` column in the datamodel, change ovs-ctl and ovs-dev.py
to populate the field with the right value.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 utilities/ovs-ctl.in   | 9 -
 utilities/ovs-dev.py   | 7 +++
 vswitchd/ovs-vswitchd.c| 7 +++
 vswitchd/vswitch.ovsschema | 7 +--
 vswitchd/vswitch.xml   | 4 
 5 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/utilities/ovs-ctl.in b/utilities/ovs-ctl.in
index f1b01d1d3..1e334de2b 100755
--- a/utilities/ovs-ctl.in
+++ b/utilities/ovs-ctl.in
@@ -76,9 +76,16 @@ set_hostname () {
 set_system_ids () {
 set ovs_vsctl set Open_vSwitch .
 
-OVS_VERSION=`ovs-vswitchd --version | sed 's/.*) //;1q'`
+OVS_VERSION=`ovs-vswitchd --version |awk '/Open vSwitch/{print $NF}'`
 set "$@" ovs-version="$OVS_VERSION"
 
+DPDK_VERSION=`ovs-vswitchd --version |awk '/^DPDK/{print$2}'`
+if [ -n "$DPDK_VERSION" ]; then
+set "$@" dpdk-version="$DPDK_VERSION"
+else
+ovs_vsctl clear Open_vSwitch . dpdk_version
+fi
+
 case $SYSTEM_ID in
 random)
 id_file=$etcdir/system-id.conf
diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py
index 9ce0f04c7..70170e21d 100755
--- a/utilities/ovs-dev.py
+++ b/utilities/ovs-dev.py
@@ -284,9 +284,16 @@ def run():
 "other_config:dpdk-init=true" % root_uuid)
 _sh("ovs-vsctl --no-wait set Open_vSwitch %s other_config:"
 "dpdk-extra=\"%s\"" % (root_uuid, ' '.join(options.dpdk)))
+dpdk_version = _sh("ovs-vswitchd --version", capture=True)
+dpdk_version = [s for s in dpdk_version if "DPDK" in s]
+if dpdk_version:
+dpdk_version = dpdk_version[0].decode().strip().split()[1]
+_sh("ovs-vsctl --no-wait set Open_vSwitch %s dpdk_version=%s"
+% (root_uuid, dpdk_version))
 else:
 _sh("ovs-vsctl --no-wait set Open_vSwitch %s "
 "other_config:dpdk-init=false" % root_uuid)
+_sh("ovs-vsctl --no-wait clear Open_vSwitch %s dpdk_version" % 
root_uuid)
 
 if options.gdb:
 cmd = ["gdb", "--args"] + cmd
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index d5e07c037..e033a7d1f 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -25,6 +25,10 @@
 #include 
 #endif
 
+#ifdef DPDK_NETDEV
+#include 
+#endif
+
 #include "bridge.h"
 #include "command-line.h"
 #include "compiler.h"
@@ -187,6 +191,9 @@ parse_options(int argc, char *argv[], char **unixctl_pathp)
 
 case 'V':
 ovs_print_version(0, 0);
+#ifdef DPDK_NETDEV
+puts(rte_version());
+#endif
 exit(EXIT_SUCCESS);
 
 case OPT_MLOCKALL:
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 90e50b626..7015a2687 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
 {"name": "Open_vSwitch",
- "version": "7.15.1",
- "cksum": "3682332033 23608",
+ "version": "7.16.0",
+ "cksum": "654116098 23718",
  "tables": {
"Open_vSwitch": {
  "columns": {
@@ -36,6 +36,9 @@
"db_version": {
  "type": {"key": {"type": "string"},
   "min": 0, "max": 1}},
+   "dpdk_version": {
+ "type": {"key": {"type": "string"},
+  "min": 0, "max": 1}},
"system_type": {
  "type": {"key": {"type": "string"},
   "min": 0, "max": 1}},
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 37d04b7cf..74890f72b 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -611,6 +611,10 @@
 
   
 
+  
+DPDK version number, e.g. 17.11.
+  
+
   
 
   An identifier for the type of system on top of which Open vSwitch
-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovs-vsctl: show DPDK version

2017-12-25 Thread Matteo Croce
On Sun, Dec 24, 2017 at 8:02 PM, Ben Pfaff <b...@ovn.org> wrote:
> On Sun, Dec 24, 2017 at 02:58:37PM +0100, Matteo Croce wrote:
>> Show DPDK version if Open vSwitch is compiled with DPDK support.
>> Add a `dpdk_version` column in the datamodel, change ovs-ctl and ovs-dev.py
>> to populate the field with the right value.
>>
>> Signed-off-by: Matteo Croce <mcr...@redhat.com>
>> ---
>> v2: edit ovs-ctl as well
>
> Thank you for the update.
>
> I think that this is likely to cause confusion eventually because it
> reports the version of DPDK that ovs-vsctl is linked against.  That
> doesn't even make sense because the sole reason that ovs-vsctl is linked
> against DPDK is to report the version.
>
> I think it would be more sensible to make ovs-vswitchd set and report
> the dpdk version.  That can't go wrong in the same way.
>
> Thanks,
>
> Ben.

That's true. Initially I tought putting the version info in vswitchd
but I didn't want to break anything relying on
'ovs-vswitchd --version' output so I put it in ovs-vsctl as it
already contained another field, the DB version.
I'm making a v3 with the DPDK version in vswitchd and a safer
ovs-ctl/ovs-dev.phy which didn't require the OVS version to be the
first line of output.

Cheers,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovs-vsctl: show DPDK version

2017-12-24 Thread Matteo Croce
Show DPDK version if Open vSwitch is compiled with DPDK support.
Add a `dpdk_version` column in the datamodel, change ovs-ctl and ovs-dev.py
to populate the field with the right value.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
v2: edit ovs-ctl as well

 utilities/ovs-ctl.in   |  7 +++
 utilities/ovs-dev.py   | 15 ---
 utilities/ovs-vsctl.c  |  7 +++
 vswitchd/vswitch.ovsschema |  7 +--
 vswitchd/vswitch.xml   |  4 
 5 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/utilities/ovs-ctl.in b/utilities/ovs-ctl.in
index f1b01d1d3..3ce8e1849 100755
--- a/utilities/ovs-ctl.in
+++ b/utilities/ovs-ctl.in
@@ -79,6 +79,13 @@ set_system_ids () {
 OVS_VERSION=`ovs-vswitchd --version | sed 's/.*) //;1q'`
 set "$@" ovs-version="$OVS_VERSION"
 
+DPDK_VERSION=`ovs-vsctl --version |awk '/^DPDK/{print$2}'`
+if [ -n "$DPDK_VERSION" ]; then
+set "$@" dpdk-version="$DPDK_VERSION"
+else
+ovs_vsctl clear Open_vSwitch . dpdk_version
+fi
+
 case $SYSTEM_ID in
 random)
 id_file=$etcdir/system-id.conf
diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py
index 9ce0f04c7..1bc0ba8c7 100755
--- a/utilities/ovs-dev.py
+++ b/utilities/ovs-dev.py
@@ -270,11 +270,20 @@ def run():
 " %s/ovsclient-cert.pem %s/vswitchd.cacert"
 % (pki_dir, pki_dir, pki_dir))
 version = _sh("ovs-vsctl --no-wait --version", capture=True)
-version = version[0].decode().strip().split()[3]
 root_uuid = _sh("ovs-vsctl --no-wait --bare list Open_vSwitch",
 capture=True)[0].decode().strip()
-_sh("ovs-vsctl --no-wait set Open_vSwitch %s ovs_version=%s"
-% (root_uuid, version))
+
+ovs_version = [s for s in version if "Open vSwitch" in s]
+if ovs_version:
+ovs_version = ovs_version[0].decode().strip().split()[3]
+_sh("ovs-vsctl --no-wait set Open_vSwitch %s ovs_version=%s"
+% (root_uuid, ovs_version))
+
+dpdk_version = [s for s in version if "DPDK" in s]
+if dpdk_version:
+dpdk_version = dpdk_version[0].decode().strip().split()[1]
+_sh("ovs-vsctl --no-wait set Open_vSwitch %s dpdk_version=%s"
+% (root_uuid, dpdk_version))
 
 build = BUILD_CLANG if options.clang else BUILD_GCC
 cmd = [build + "/vswitchd/ovs-vswitchd"]
diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c
index 7b909431d..f830120ad 100644
--- a/utilities/ovs-vsctl.c
+++ b/utilities/ovs-vsctl.c
@@ -27,6 +27,10 @@
 #include 
 #include 
 
+#ifdef DPDK_NETDEV
+#include 
+#endif
+
 #include "db-ctl-base.h"
 
 #include "command-line.h"
@@ -304,6 +308,9 @@ parse_options(int argc, char *argv[], struct shash 
*local_options)
 case 'V':
 ovs_print_version(0, 0);
 printf("DB Schema %s\n", ovsrec_get_db_version());
+#ifdef DPDK_NETDEV
+printf("%s\n", rte_version());
+#endif
 exit(EXIT_SUCCESS);
 
 case 't':
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 90e50b626..7015a2687 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
 {"name": "Open_vSwitch",
- "version": "7.15.1",
- "cksum": "3682332033 23608",
+ "version": "7.16.0",
+ "cksum": "654116098 23718",
  "tables": {
"Open_vSwitch": {
  "columns": {
@@ -36,6 +36,9 @@
"db_version": {
  "type": {"key": {"type": "string"},
   "min": 0, "max": 1}},
+   "dpdk_version": {
+ "type": {"key": {"type": "string"},
+  "min": 0, "max": 1}},
"system_type": {
  "type": {"key": {"type": "string"},
   "min": 0, "max": 1}},
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 37d04b7cf..74890f72b 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -611,6 +611,10 @@
 
   
 
+  
+DPDK version number, e.g. 17.11.
+  
+
   
 
   An identifier for the type of system on top of which Open vSwitch
-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ovs-vsctl: show DPDK version

2017-12-22 Thread Matteo Croce
Show DPDK version if Open vSwitch is compiled with DPDK support.
Add a `dpdk_version` column in the datamodel and also edit ovs-dev.py to
populate the field with the right value.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 utilities/ovs-dev.py   | 15 ---
 utilities/ovs-vsctl.c  |  7 +++
 vswitchd/vswitch.ovsschema |  7 +--
 vswitchd/vswitch.xml   |  4 
 4 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py
index 9ce0f04c7..1bc0ba8c7 100755
--- a/utilities/ovs-dev.py
+++ b/utilities/ovs-dev.py
@@ -270,11 +270,20 @@ def run():
 " %s/ovsclient-cert.pem %s/vswitchd.cacert"
 % (pki_dir, pki_dir, pki_dir))
 version = _sh("ovs-vsctl --no-wait --version", capture=True)
-version = version[0].decode().strip().split()[3]
 root_uuid = _sh("ovs-vsctl --no-wait --bare list Open_vSwitch",
 capture=True)[0].decode().strip()
-_sh("ovs-vsctl --no-wait set Open_vSwitch %s ovs_version=%s"
-% (root_uuid, version))
+
+ovs_version = [s for s in version if "Open vSwitch" in s]
+if ovs_version:
+ovs_version = ovs_version[0].decode().strip().split()[3]
+_sh("ovs-vsctl --no-wait set Open_vSwitch %s ovs_version=%s"
+% (root_uuid, ovs_version))
+
+dpdk_version = [s for s in version if "DPDK" in s]
+if dpdk_version:
+dpdk_version = dpdk_version[0].decode().strip().split()[1]
+_sh("ovs-vsctl --no-wait set Open_vSwitch %s dpdk_version=%s"
+% (root_uuid, dpdk_version))
 
 build = BUILD_CLANG if options.clang else BUILD_GCC
 cmd = [build + "/vswitchd/ovs-vswitchd"]
diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c
index 7b909431d..f830120ad 100644
--- a/utilities/ovs-vsctl.c
+++ b/utilities/ovs-vsctl.c
@@ -27,6 +27,10 @@
 #include 
 #include 
 
+#ifdef DPDK_NETDEV
+#include 
+#endif
+
 #include "db-ctl-base.h"
 
 #include "command-line.h"
@@ -304,6 +308,9 @@ parse_options(int argc, char *argv[], struct shash 
*local_options)
 case 'V':
 ovs_print_version(0, 0);
 printf("DB Schema %s\n", ovsrec_get_db_version());
+#ifdef DPDK_NETDEV
+printf("%s\n", rte_version());
+#endif
 exit(EXIT_SUCCESS);
 
 case 't':
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 90e50b626..757d8ec17 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
 {"name": "Open_vSwitch",
- "version": "7.15.1",
- "cksum": "3682332033 23608",
+ "version": "7.15.2",
+ "cksum": "3876676959 23718",
  "tables": {
"Open_vSwitch": {
  "columns": {
@@ -36,6 +36,9 @@
"db_version": {
  "type": {"key": {"type": "string"},
   "min": 0, "max": 1}},
+   "dpdk_version": {
+ "type": {"key": {"type": "string"},
+  "min": 0, "max": 1}},
"system_type": {
  "type": {"key": {"type": "string"},
   "min": 0, "max": 1}},
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 37d04b7cf..74890f72b 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -611,6 +611,10 @@
 
   
 
+  
+DPDK version number, e.g. 17.11.
+  
+
   
 
   An identifier for the type of system on top of which Open vSwitch
-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: defer MTU set after interface start

2017-11-24 Thread Matteo Croce
On Fri, Nov 24, 2017 at 9:26 AM, Stokes, Ian <ian.sto...@intel.com> wrote:
>> Some PMD assumes that the RX buffers are already allocated when setting
>> the device MTU, because the RX buffer size depends on the MTU.
>> This worked until 67fe6d635193 added a call to rte_eth_dev_set_mtu() in
>> the init code, which would set the MTU before the RX buffer allocation,
>> triggering a segmentation fault with some PMD:
>>
>> Stack trace of thread 20680:
>> #0  0x559464396534 qede_set_mtu (ovs-vswitchd)
>> #1  0x559464323c41 rte_eth_dev_set_mtu (ovs-vswitchd)
>> #2  0x5594645f5e85 dpdk_eth_dev_queue_setup (ovs-vswitchd)
>> #3  0x5594645f8ae6 netdev_dpdk_reconfigure (ovs-vswitchd)
>> #4  0x55946452225c reconfigure_datapath (ovs-vswitchd)
>> #5  0x559464522d07 do_add_port (ovs-vswitchd)
>> #6  0x559464522e8d dpif_netdev_port_add (ovs-vswitchd)
>> #7  0x559464528dde dpif_port_add (ovs-vswitchd)
>> #8  0x5594644dc0e0 port_add (ovs-vswitchd)
>> #9  0x5594644d2ab1 ofproto_port_add (ovs-vswitchd)
>> #10 0x5594644c0a85 bridge_add_ports__ (ovs-vswitchd)
>> #11 0x5594644c26e8 bridge_reconfigure (ovs-vswitchd)
>> #12 0x5594644c5c49 bridge_run (ovs-vswitchd)
>> #13 0x5594642d4155 main (ovs-vswitchd)
>> #14 0x7f0e1444bc05 __libc_start_main (libc.so.6)
>> #15 0x5594642d8328 _start (ovs-vswitchd)
>>
>> A possible solution could be to move the first call to
>> rte_eth_dev_set_mtu() just after the device start instead of
>> dpdk_eth_dev_queue_setup() which, by the way, set the MTU multiple times
>> as the call to rte_eth_dev_set_mtu() was in a loop.
>>
>> CC: Mark Kavanagh <mark.b.kavan...@intel.com>
>> Fixes: 67fe6d635193 ("netdev-dpdk: use rte_eth_dev_set_mtu.")
>> Signed-off-by: Matteo Croce <mcr...@redhat.com>
>> ---
>>  lib/netdev-dpdk.c | 14 +++---
>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>> 76e79be25..229aa4a76 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -750,13 +750,6 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int
>> n_rxq, int n_txq)
>>  break;
>>  }
>>
>> -diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu);
>> -if (diag) {
>> -VLOG_ERR("Interface %s MTU (%d) setup error: %s",
>> -dev->up.name, dev->mtu, rte_strerror(-diag));
>> -break;
>> -}
>> -
>>  for (i = 0; i < n_txq; i++) {
>>  diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
>>dev->socket_id, NULL); @@ -
>> 849,6 +842,13 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>>  return -diag;
>>  }
>>
>> +diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu);
>> +if (diag) {
>> +VLOG_ERR("Interface %s MTU (%d) setup error: %s",
>> +dev->up.name, dev->mtu, rte_strerror(-diag));
>> +return -diag;
>> +}
>> +
>
> In my testing I didn't see any issues here, but would like to flag it to our 
> MTU Guru (Mark K).
>
> Any opinion on this? From the API description it looks like it is ok?
>
>
>>  rte_eth_promiscuous_enable(dev->port_id);
>>  rte_eth_allmulticast_enable(dev->port_id);
>>
>> --
>> 2.13.6
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

The issue only arises with the qede PMD and 67fe6d635193
("netdev-dpdk: use rte_eth_dev_set_mtu.")

Regards,
-- 
Matteo Croce
per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev-dpdk: defer MTU set after interface start

2017-11-16 Thread Matteo Croce
Some PMD assumes that the RX buffers are already allocated when setting the
device MTU, because the RX buffer size depends on the MTU.
This worked until 67fe6d635193 added a call to rte_eth_dev_set_mtu() in
the init code, which would set the MTU before the RX buffer allocation,
triggering a segmentation fault with some PMD:

Stack trace of thread 20680:
#0  0x559464396534 qede_set_mtu (ovs-vswitchd)
#1  0x559464323c41 rte_eth_dev_set_mtu (ovs-vswitchd)
#2  0x5594645f5e85 dpdk_eth_dev_queue_setup (ovs-vswitchd)
#3  0x5594645f8ae6 netdev_dpdk_reconfigure (ovs-vswitchd)
#4  0x55946452225c reconfigure_datapath (ovs-vswitchd)
#5  0x559464522d07 do_add_port (ovs-vswitchd)
#6  0x559464522e8d dpif_netdev_port_add (ovs-vswitchd)
#7  0x559464528dde dpif_port_add (ovs-vswitchd)
#8  0x5594644dc0e0 port_add (ovs-vswitchd)
#9  0x5594644d2ab1 ofproto_port_add (ovs-vswitchd)
#10 0x5594644c0a85 bridge_add_ports__ (ovs-vswitchd)
#11 0x5594644c26e8 bridge_reconfigure (ovs-vswitchd)
#12 0x5594644c5c49 bridge_run (ovs-vswitchd)
#13 0x5594642d4155 main (ovs-vswitchd)
#14 0x7f0e1444bc05 __libc_start_main (libc.so.6)
#15 0x5594642d8328 _start (ovs-vswitchd)

A possible solution could be to move the first call to
rte_eth_dev_set_mtu() just after the device start instead of
dpdk_eth_dev_queue_setup() which, by the way, set the MTU multiple times
as the call to rte_eth_dev_set_mtu() was in a loop.

CC: Mark Kavanagh <mark.b.kavan...@intel.com>
Fixes: 67fe6d635193 ("netdev-dpdk: use rte_eth_dev_set_mtu.")
Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 lib/netdev-dpdk.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 76e79be25..229aa4a76 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -750,13 +750,6 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int 
n_rxq, int n_txq)
 break;
 }
 
-diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu);
-if (diag) {
-VLOG_ERR("Interface %s MTU (%d) setup error: %s",
-dev->up.name, dev->mtu, rte_strerror(-diag));
-break;
-}
-
 for (i = 0; i < n_txq; i++) {
 diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
   dev->socket_id, NULL);
@@ -849,6 +842,13 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 return -diag;
 }
 
+diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu);
+if (diag) {
+VLOG_ERR("Interface %s MTU (%d) setup error: %s",
+dev->up.name, dev->mtu, rte_strerror(-diag));
+return -diag;
+}
+
 rte_eth_promiscuous_enable(dev->port_id);
 rte_eth_allmulticast_enable(dev->port_id);
 
-- 
2.13.6

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ovs-lib: dont't purge corrupted DB

2017-09-27 Thread Matteo Croce
In ovs-lib there is a function named upgrade_db which tries to convert a
database after OVS {up,down}grades. This function uses ovsdb-tool to
check if the DB needs to be upgraded. If the upgrade fails,
it purges the DB and create an empty one.
ovsdb-tool returns "yes" or "no" to indicate if the DB needs upgrading,
but if the DB is corrupted it returns a list of errors.
Change a condition from "!= no" to "= yes" because in case of DB
corruption upgrade_db would purge the existing DB without writing
anything in the logs.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 utilities/ovs-lib.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
index 8665698bb..ea5a12375 100644
--- a/utilities/ovs-lib.in
+++ b/utilities/ovs-lib.in
@@ -400,7 +400,7 @@ upgrade_db () {
 log_warning_msg "$DB_FILE does not exist"
 install_dir `dirname $DB_FILE`
 create_db "$DB_FILE" "$DB_SCHEMA"
-elif test X"`ovsdb_tool needs-conversion "$DB_FILE" "$DB_SCHEMA"`" != Xno; 
then
+elif test X"`ovsdb_tool needs-conversion "$DB_FILE" "$DB_SCHEMA"`" = Xyes; 
then
 # Back up the old version.
 version=`ovsdb_tool db-version "$DB_FILE"`
 cksum=`ovsdb_tool db-cksum "$DB_FILE" | awk '{print $1}'`
-- 
2.13.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev