Re: [ovs-dev] [PATCH v2 0/3] dpdk: Add support for TSO

2019-01-11 Thread Darrell Ball
On Fri, Jan 11, 2019 at 8:22 AM Ilya Maximets 
wrote:

> Nothing significantly changed since the previous versions.
> This patch set effectively breaks following cases if multi-segment
> mbufs enabled:
>
>   1. Host <-> VM communication broken.
>
>  With this patch applied VMs has offloading enabled by default.
>  This makes all the packets that goes out from the VM to have
>  incorrect checksums (beside possible multi-segmenting). As a
>  result any packet that reaches host interfaces dropped by the
>  host kernel as corrupted. Could be easily reproduced by trying
>  to use ssh from host to VM, which is the common case for usual
>  OpenStack installations.
>
>  If checksums will be fixed, segmented packets will be dropped
>  anyway in netdev-linux/bsd/dummy.
>
>   2. Broken VM to VM communication in case of not negotiated offloading
>  in one of VMs.
>
>  If one of VMs does not negotiate offloading, all the packets
>  will be dropped for the same reason as in Host to VM case. This
>  is a really bad case because virtio driver could change the
>  feature set in runtime and this should be handled in OVS.
>
>  Note that linux kernel virtio driver always has rx checksum
>  offloading enabled, i.e. it doesn't check/recalculates the
>  checksums in software if needed. So, you need the dpdk driver or
>  a FreeBSD machine to reproduce checksumming issue.
>
>   3. Broken support of all the virtual and HW NICs that doen't support TSO.
>
>  It's actually big number of NICs. More than a half of PMDs in DPDK
>  does not support TSO.
>  All the packets just dropped by netdev-dpdk before send.
>
> This is most probably not a full list of issues. To fix most of them
> fallback to full software TSO implementation required.
>
> Until this is done I prefer not to merge the feature because it
> breaks the basic functionality. I understand that you're going to make
> it 'experimental', but, IMHO, this doesn't worth even that. Patches
> are fairly small and there is nothing actually we need to test before
> the full support. The main part is not implemented yet.
>
> NACK for the series. Sorry.
>
> 
>
> One more thing I wanted to mention is that I was surprised to not see
> the performance comparison of TSO with usual Jumbo frames in your slides
> on a recent OVS Conference [1]. You had a comparison slide with 1500 MTU,
> TSO and the kernel vhost, but not the jumbo frames. In my testing
> I have more than 22 Gbps for VM<->VM jumbo frames performance with
> current OVS and iperf. And this is almost equal to vhost kernel with TSO
> performance on my system (I have probably a bit less powerful CPU, so the
> results are a bit lower than in your slides). Also, in real cases there
> will
> be significant performance drop because of fallback to software TSO in
> many cases. Sorry, but I don't really see the benefits of using the
> Multi-segment mbufs and TSO.
>

Tiago

I am also interested in the performance benefit question.
Could you answer it ?

Thanks Darrell




> All of this work positioned as a feature to cover the gap between the
> kernel
> and userspace datapaths, but there is no any gap from the performance point
> of view


Tiago

Is this true ?

Thanks Darrell




> and overcomplicating the OVS by multisegmenting and TSO is really
> not necessary.
>




>
> [1] http://www.openvswitch.org/support/ovscon2018/5/0935-lam.pptx
>
>
> On 10.01.2019 19:58, Tiago Lam wrote:
> > Enabling TSO offload allows a host stack to delegate the segmentation of
> > oversized TCP packets to the underlying physical NIC, if supported. In
> the case
> > of a VM this means that the segmentation of the packets is not performed
> by the
> > guest kernel, but by the host NIC itself. In turn, since the TSO
> calculations
> > and checksums are being performed in hardware, this alleviates the CPU
> load on
> > the host system. In inter VM communication this might account to
> significant
> > savings, and higher throughput, even more so if the VMs are running on
> the same
> > host.
> >
> > Thus, although inter VM communication is already possible as is, there's
> a
> > sacrifice in terms of CPU, which may affect the overall throughput.
> >
> > This series adds support for TSO in OvS-DPDK, by making use of the TSO
> > offloading feature already supported by DPDK vhost backend, having the
> > following scenarios in mind:
> > - Inter VM communication on the same host;
> > - Inter VM communication on different hosts;
> > - The same two use cases above, but on a VLAN network.
> >
> > The work is based on [1]; It has been rebased to run on top of the
> > multi-segment mbufs work (v13) [2] and re-worked to use DPDK v18.11.
> >
> > [1] https://patchwork.ozlabs.org/patch/749564/
> > [2]
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-January/354950.html
> >
> > Considerations:
> > - As mentioned above, this series depends on the multi-segment mbuf
> series
> >   (v13) 

Re: [ovs-dev] [PATCH v2 0/3] dpdk: Add support for TSO

2019-01-11 Thread Lam, Tiago
On 11/01/2019 19:37, Ian Stokes wrote:
> On 1/11/2019 4:14 PM, Ilya Maximets wrote:
>> Nothing significantly changed since the previous versions.
>> This patch set effectively breaks following cases if multi-segment
>> mbufs enabled:
>>
> 
> Hi Ilya, thanks for your feedback. A few queries and clarifications for 
> discussion below.
> 
>  From reading the mail at first I was under the impression that all 
> these features were being broken by default after applying the patch?
> 
> To clarify, in your testing, after applying these patches, are the use 
> cases below broken if you are not using multiseg/TSO? (by default these 
> are disabled).
> 
> The intention of this work, as an experimental feature, would be that it 
> is disabled by default and will not introduce new regressions to users 
> who do not enable it. So there would be no impact on a user who does not 
> wish to enable this and use OVS DPDK as is today.
> 
> The series would be the first steps to enable OVS DPDK to use TSO in 
> both inter VM and Host to Host environments using DPDK interfaces that 
> support the feature. Kernel interfaces such as tap devices are not 
> catered for as of yet.
> 
> The use cases that are catered for are
> 
> - Inter VM communication on the same host using DPDK Interfaces;
> - Inter VM communication on different hosts DPDK Interfaces;
> - The same two use cases above, but on a VLAN network.
> 
> Again these are the first steps showing TSO in DPDK with multiseg feature.
> 
>>1. Host <-> VM communication broken.
>>
>>   With this patch applied VMs has offloading enabled by default.
>>   This makes all the packets that goes out from the VM to have
>>   incorrect checksums (beside possible multi-segmenting). As a
>>   result any packet that reaches host interfaces dropped by the
>>   host kernel as corrupted. Could be easily reproduced by trying
>>   to use ssh from host to VM, which is the common case for usual
>>   OpenStack installations.
>>
> I feel it's important to stress that by default however the patches 
> don't break this functionality. Both Multi-segment mbufs and TSO are 
> disabled by default, and thus, TSO won't be negotiated and the VMs won't 
> be sending multi-segment mbufs to OvS.
> 
>  From Tiagos testing if you decide to enable multi-segments mbuf and 
> TSO, *which is considered experimental* at the moment, the VM-Host 
> communication will be broken only if your VM negotiates TSO enabled by 
> default. This is because with TSO enabled checksum offloading comes into 
> play, which means packets from the VM will arrive to the host with bad 
> checksums.
> 
> Otherwise, if your VM doesn't enable TSO by default, or if you disable 
> with `ethtool -K  tx off`, everything should work as before, but 
> using multi-segment mbufs. That was my impression.
> 
> @Tiago can you confirm?

That's correct, yes. The expected is that with TSO it won't work (for
the reasons you mention above), but one will get the appropriate
warnings / errors. However, making use of multi-segment mbus only
doesn't break this use-case.
> 
> The only way to circumvent that, as you mention below, is "fallback to 
> full software TSO implementation".

Which is called GSO.

> 
> As an experimental feature I would be surprised to see Open Stack 
> provide support for this. Again by default it should not impact an Open 
> Stack deployment unless it is explicitly enabled and support for that 
> would be some way off as of yet.
> 
>>   If checksums will be fixed, segmented packets will be dropped
>>   anyway in netdev-linux/bsd/dummy.
> 
> By segmented I assume you are referring to TCP segment rather than multi 
> segment, can you clarify?
> 
> This is a limitation, but one to be factored in by a user when choosing 
> to enable or disable TSO with OVS DPDK until it is resolved.
> 
>>
>>2. Broken VM to VM communication in case of not negotiated offloading
>>   in one of VMs.
>>
>>   If one of VMs does not negotiate offloading, all the packets
>>   will be dropped for the same reason as in Host to VM case. This
>>   is a really bad case because virtio driver could change the
>>   feature set in runtime and this should be handled in OVS.
>>
>>   Note that linux kernel virtio driver always has rx checksum
>>   offloading enabled, i.e. it doesn't check/recalculates the
>>   checksums in software if needed. So, you need the dpdk driver or
>>   a FreeBSD machine to reproduce checksumming issue.
>>
> 
> Again by default I believe this still functions as expected. Tiago can 
> you comment any further on this from your testing?

Sure.

That's correct, this is not broken when using the defaults. In fact, if
multi-segments is disabled, we keep the explicit call to
rte_vhost_driver_disable_features() as before, in order to disable TSO
and checksum offloading.

If you decide to enable multi-segments mbuf and TSO, as you say, I
haven't noticed any issues with Linux VMs, 

Re: [ovs-dev] [PATCH v2 0/3] dpdk: Add support for TSO

2019-01-11 Thread Lam, Tiago
On 11/01/2019 16:14, Ilya Maximets wrote:

[snip]

> One more thing I wanted to mention is that I was surprised to not see
> the performance comparison of TSO with usual Jumbo frames in your slides
> on a recent OVS Conference [1]. You had a comparison slide with 1500 MTU,
> TSO and the kernel vhost, but not the jumbo frames. In my testing
> I have more than 22 Gbps for VM<->VM jumbo frames performance with
> current OVS and iperf. And this is almost equal to vhost kernel with TSO
> performance on my system (I have probably a bit less powerful CPU, so the
> results are a bit lower than in your slides). Also, in real cases there will
> be significant performance drop because of fallback to software TSO in
> many cases. Sorry, but I don't really see the benefits of using the
> Multi-segment mbufs and TSO.
> All of this work positioned as a feature to cover the gap between the kernel
> and userspace datapaths, but there is no any gap from the performance point
> of view and overcomplicating the OVS by multisegmenting and TSO is really
> not necessary.
> 
> [1] http://www.openvswitch.org/support/ovscon2018/5/0935-lam.pptx
> 

Right. Thanks for watching...

You can have the same behavior without this patchset. No one said
otherwise. At the cost of user experience. And that's why every so often
there's a new person in the ovs-discuss mailing list complaining about
performance issues on a PVP setup using OvS-DPDK. TSO improves usability
since it leaves up to the Host on how to do that segmentation, which is
where it matters. Otherwise you need to set the MTU in the VM
accordingly, risking double segmentation or screwing something up.

What's more, if TCP's Window Scaling option is being negotiated between
VMs, if you don't have TSO you will inevitably have a performance hit.
You've configured a manual MTU which is likely going to constrain what's
negotiated with Window Scaling, leading to your VM TCP/IP stack
segmenting the packet according to that. With TSO you avoid that and
only do the segmentation, in the Host, if the packet is actually leaving
the host. VM to VM communication within the same host would have no need
for segmentation.

Tiago.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/3] dpdk: Add support for TSO

2019-01-11 Thread Ian Stokes

On 1/11/2019 4:14 PM, Ilya Maximets wrote:

Nothing significantly changed since the previous versions.
This patch set effectively breaks following cases if multi-segment
mbufs enabled:



Hi Ilya, thanks for your feedback. A few queries and clarifications for 
discussion below.


From reading the mail at first I was under the impression that all 
these features were being broken by default after applying the patch?


To clarify, in your testing, after applying these patches, are the use 
cases below broken if you are not using multiseg/TSO? (by default these 
are disabled).


The intention of this work, as an experimental feature, would be that it 
is disabled by default and will not introduce new regressions to users 
who do not enable it. So there would be no impact on a user who does not 
wish to enable this and use OVS DPDK as is today.


The series would be the first steps to enable OVS DPDK to use TSO in 
both inter VM and Host to Host environments using DPDK interfaces that 
support the feature. Kernel interfaces such as tap devices are not 
catered for as of yet.


The use cases that are catered for are

- Inter VM communication on the same host using DPDK Interfaces;
- Inter VM communication on different hosts DPDK Interfaces;
- The same two use cases above, but on a VLAN network.

Again these are the first steps showing TSO in DPDK with multiseg feature.


   1. Host <-> VM communication broken.

  With this patch applied VMs has offloading enabled by default.
  This makes all the packets that goes out from the VM to have
  incorrect checksums (beside possible multi-segmenting). As a
  result any packet that reaches host interfaces dropped by the
  host kernel as corrupted. Could be easily reproduced by trying
  to use ssh from host to VM, which is the common case for usual
  OpenStack installations.

I feel it's important to stress that by default however the patches 
don't break this functionality. Both Multi-segment mbufs and TSO are 
disabled by default, and thus, TSO won't be negotiated and the VMs won't 
be sending multi-segment mbufs to OvS.


From Tiagos testing if you decide to enable multi-segments mbuf and 
TSO, *which is considered experimental* at the moment, the VM-Host 
communication will be broken only if your VM negotiates TSO enabled by 
default. This is because with TSO enabled checksum offloading comes into 
play, which means packets from the VM will arrive to the host with bad 
checksums.


Otherwise, if your VM doesn't enable TSO by default, or if you disable 
with `ethtool -K  tx off`, everything should work as before, but 
using multi-segment mbufs. That was my impression.


@Tiago can you confirm?

The only way to circumvent that, as you mention below, is "fallback to 
full software TSO implementation".


As an experimental feature I would be surprised to see Open Stack 
provide support for this. Again by default it should not impact an Open 
Stack deployment unless it is explicitly enabled and support for that 
would be some way off as of yet.



  If checksums will be fixed, segmented packets will be dropped
  anyway in netdev-linux/bsd/dummy.


By segmented I assume you are referring to TCP segment rather than multi 
segment, can you clarify?


This is a limitation, but one to be factored in by a user when choosing 
to enable or disable TSO with OVS DPDK until it is resolved.




   2. Broken VM to VM communication in case of not negotiated offloading
  in one of VMs.

  If one of VMs does not negotiate offloading, all the packets
  will be dropped for the same reason as in Host to VM case. This
  is a really bad case because virtio driver could change the
  feature set in runtime and this should be handled in OVS.

  Note that linux kernel virtio driver always has rx checksum
  offloading enabled, i.e. it doesn't check/recalculates the
  checksums in software if needed. So, you need the dpdk driver or
  a FreeBSD machine to reproduce checksumming issue.



Again by default I believe this still functions as expected. Tiago can 
you comment any further on this from your testing?



   3. Broken support of all the virtual and HW NICs that doen't support TSO.

  It's actually big number of NICs. More than a half of PMDs in DPDK
  does not support TSO.
  All the packets just dropped by netdev-dpdk before send.


There are a number of NICS/VDEVs that do not support TSO in DPDK. 
However there are a number of NICs that do support it and this will 
increase in future DPDK releases.


I don't think should be a blocking factor.

In the case where multi segment is enabled in OVS DPDK we discussed 
simply not allowing a NIC to be added if it did not support multi seg 
mbufs. This can be extended to TSO also, as currently multi seg is a 
prerequirement for TSO.


This avoids a case for the moment where someone is mixing devices where 
support does and does not exist. Again this occurs only if the 

Re: [ovs-dev] [patch v1] conntrack: Fix FTP seq_skew boundary adjustments.

2019-01-11 Thread Darrell Ball
On Thu, Jan 10, 2019 at 12:58 PM Darrell Ball  wrote:

>
>
> On Thu, Jan 10, 2019 at 1:03 AM David Marchand 
> wrote:
>
>> Hello,
>>
>> On Wed, Jan 9, 2019 at 4:44 AM Darrell Ball  wrote:
>>
>>> Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
>>> Signed-off-by: Darrell Ball 
>>> ---
>>>
>>> Backport to 2.8.
>>>
>>>  lib/conntrack.c | 10 --
>>>  1 file changed, 4 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>>> index 6f6021a..e992b77 100644
>>> --- a/lib/conntrack.c
>>> +++ b/lib/conntrack.c
>>> @@ -3255,10 +3255,9 @@ handle_ftp_ctl(struct conntrack *ct, const struct
>>> conn_lookup_ctx *ctx,
>>>  uint32_t tcp_ack = ntohl(get_16aligned_be32(>tcp_ack));
>>>
>>>  if ((seq_skew > 0) && (tcp_ack < seq_skew)) {
>>> -/* Should not be possible; will be marked invalid. */
>>> -tcp_ack = 0;
>>> +tcp_ack = UINT32_MAX - (seq_skew - tcp_ack - 1);
>>>  } else if ((seq_skew < 0) && (UINT32_MAX - tcp_ack <
>>> -seq_skew)) {
>>> -tcp_ack = (-seq_skew) - (UINT32_MAX - tcp_ack);
>>> +tcp_ack = (-seq_skew) - (UINT32_MAX - tcp_ack) - 1;
>>>  } else {
>>>  tcp_ack -= seq_skew;
>>>  }
>>> @@ -3267,10 +3266,9 @@ handle_ftp_ctl(struct conntrack *ct, const struct
>>> conn_lookup_ctx *ctx,
>>>  } else {
>>>  uint32_t tcp_seq = ntohl(get_16aligned_be32(>tcp_seq));
>>>  if ((seq_skew > 0) && (UINT32_MAX - tcp_seq < seq_skew)) {
>>> -tcp_seq = seq_skew - (UINT32_MAX - tcp_seq);
>>> +tcp_seq = seq_skew - (UINT32_MAX - tcp_seq) - 1;
>>>  } else if ((seq_skew < 0) && (tcp_seq < -seq_skew)) {
>>> -/* Should not be possible; will be marked invalid. */
>>> -tcp_seq = 0;
>>> +tcp_seq = UINT32_MAX - ((-seq_skew) - tcp_seq - 1);
>>>  } else {
>>>  tcp_seq += seq_skew;
>>>  }
>>> --
>>> 1.9.1
>>>
>>>
>> Ah, now that I think about it, I had seen some packets with ack == 0 but
>> did not reproduce (it was in my todolist).
>> Good catch.
>>
>> Just wondering, can't we rely integer promotion then truncation on 32bits
>> ?
>> My test program tends to show it works.
>>
>
> I believe a compiler is free to convert the operands to int64 and AFAIK
> overflow/underflow is undefined for signed variables.
>

nevermind; it appears we can trust the compiler in this case and simplify.



>
>
>>
>>
>
>>
>
>> Something like:
>>
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index eb36353..04ddf5e 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -3329,32 +3329,13 @@ handle_ftp_ctl(struct conntrack *ct, const struct
>> conn_lookup_ctx *ctx,
>>
>>  if (nat && ec->seq_skew != 0) {
>>  if (ctx->reply != ec->seq_skew_dir) {
>> -
>>  uint32_t tcp_ack = ntohl(get_16aligned_be32(>tcp_ack));
>>
>> -if ((ec->seq_skew > 0) && (tcp_ack < ec->seq_skew)) {
>> -/* Should not be possible; will be marked invalid. */
>> -tcp_ack = 0;
>> -} else if ((ec->seq_skew < 0) &&
>> -   (UINT32_MAX - tcp_ack < -ec->seq_skew)) {
>> -tcp_ack = (-ec->seq_skew) - (UINT32_MAX - tcp_ack);
>> -} else {
>> -tcp_ack -= ec->seq_skew;
>> -}
>> -ovs_be32 new_tcp_ack = htonl(tcp_ack);
>> -put_16aligned_be32(>tcp_ack, new_tcp_ack);
>> +put_16aligned_be32(>tcp_ack, htonl(tcp_ack -
>> ec->seq_skew));
>>  } else {
>>  uint32_t tcp_seq = ntohl(get_16aligned_be32(>tcp_seq));
>> -if ((ec->seq_skew > 0) && (UINT32_MAX - tcp_seq <
>> ec->seq_skew)) {
>> -tcp_seq = ec->seq_skew - (UINT32_MAX - tcp_seq);
>> -} else if ((ec->seq_skew < 0) && (tcp_seq < -ec->seq_skew)) {
>> -/* Should not be possible; will be marked invalid. */
>> -tcp_seq = 0;
>> -} else {
>> -tcp_seq += ec->seq_skew;
>> -}
>> -ovs_be32 new_tcp_seq = htonl(tcp_seq);
>> -put_16aligned_be32(>tcp_seq, new_tcp_seq);
>> +
>> +put_16aligned_be32(>tcp_seq, htonl(tcp_seq +
>> ec->seq_skew));
>>  }
>>  }
>>
>>
>> --
>> David Marchand
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 8/8] ofproto: Handle flow monitor requests with multiple parts.

2019-01-11 Thread Justin Pettit


> On Aug 30, 2018, at 1:00 PM, Ben Pfaff  wrote:
> 
> Signed-off-by: Ben Pfaff 
> ---
> ofproto/ofproto.c | 84 +--
> 1 file changed, 44 insertions(+), 40 deletions(-)
> 
> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
> index d65a3fea1559..54f56d9f100e 100644
> --- a/ofproto/ofproto.c
> +++ b/ofproto/ofproto.c
> @@ -6344,49 +6344,60 @@ flow_monitor_delete(struct ofconn *ofconn, uint32_t 
> id)
> }
> 
> static enum ofperr
> -handle_flow_monitor_request(struct ofconn *ofconn, const struct ofp_header 
> *oh)
> +handle_flow_monitor_request(struct ofconn *ofconn, const struct ovs_list 
> *msgs)
> OVS_EXCLUDED(ofproto_mutex)
> {
> ...
> +if (request.table_id != 0xff

Do you think it would be worth using OFPTT_ALL?

> @@ -8398,7 +8400,7 @@ handle_single_part_openflow(struct ofconn *ofconn, 
> const struct ofp_header *oh,
> return handle_port_desc_stats_request(ofconn, oh);
> 
> case OFPTYPE_FLOW_MONITOR_STATS_REQUEST:
> -return handle_flow_monitor_request(ofconn, oh);
> +OVS_NOT_REACHED();

Grouping this with the OFPTYPE_TABLE_FEATURES_STATS_REQUEST case and adding a 
comment that they are handled by the multi-part requests might make it clearer 
why this should not be reached.

Acked-by: Justin Pettit 

--Justin


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 2/2] lib/tc: add set ipv6 traffic class action offload via pedit

2019-01-11 Thread Ben Pfaff
On Fri, Jan 11, 2019 at 05:35:45PM +, Pieter Jansen van Vuuren wrote:
> On 11/01/2019 16:49, Ben Pfaff wrote:
> > On Fri, Jan 11, 2019 at 11:51:53AM +, Pieter Jansen van Vuuren wrote:
> >> +/* These functions specifically help shifting words in network
> >> + * byte order, given that they are specified in host order. */
> >> +static inline uint32_t
> >> +shift_ovs_be32_left(uint32_t word, int shift)
> >> +{
> >> +uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) << shift;
> >> +
> >> +return ntohl((OVS_FORCE ovs_be32)word_shifted);
> >> +}
> >> +
> >> +static inline uint32_t
> >> +shift_ovs_be32_right(uint32_t word, int shift)
> >> +{
> >> +uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) >> shift;
> >> +
> >> +return ntohl((OVS_FORCE ovs_be32)word_shifted);
> >> +}
> > 
> > I don't understand how these new operations make sense.  Can you
> > explain?
> > 
> 
> Sure,
> 
> Background:
> The original issue is that pedit performs set action operation per word, and
> the way we are describing the translation from OvS to TC is by using byte
> offsets(flower_pedit_map). However some fields like Traffic Class in IPv6 span
> 2 pedit-words - one nibble of the field is in one word and the other nibble is
> in another word. In order to accommodate this a shift may be used.
> 
> We need to do this shift in network order thus we need the conversion from 
> host
> to network, but the types for htonl/ntohl either takes as parameter or returns
> ovs_be32. Doing the shift operation on ovs_be32 results in the compiler
> degrades to integer:
> 
>  error: restricted ovs_be32 degrades to integer
> 
> Passing that back to ntohl would result in:
> 
>  error: incorrect type in argument 1 (different base types)
> expected restricted ovs_be32 [usertype] x
> got unsigned int
> 
> therefore we introduced these new operations. Hope this helps. I'm open to
> alternate approaches to this problem.

The part I don't understand is why doing a shift of a word in network
byte order makes any sense.  It will have different semantics depending
on whether the host is little-endian or big-endian.  Why is that
desirable?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 2/2] lib/tc: add set ipv6 traffic class action offload via pedit

2019-01-11 Thread Pieter Jansen van Vuuren
On 11/01/2019 16:49, Ben Pfaff wrote:
> On Fri, Jan 11, 2019 at 11:51:53AM +, Pieter Jansen van Vuuren wrote:
>> +/* These functions specifically help shifting words in network
>> + * byte order, given that they are specified in host order. */
>> +static inline uint32_t
>> +shift_ovs_be32_left(uint32_t word, int shift)
>> +{
>> +uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) << shift;
>> +
>> +return ntohl((OVS_FORCE ovs_be32)word_shifted);
>> +}
>> +
>> +static inline uint32_t
>> +shift_ovs_be32_right(uint32_t word, int shift)
>> +{
>> +uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) >> shift;
>> +
>> +return ntohl((OVS_FORCE ovs_be32)word_shifted);
>> +}
> 
> I don't understand how these new operations make sense.  Can you
> explain?
> 

Sure,

Background:
The original issue is that pedit performs set action operation per word, and
the way we are describing the translation from OvS to TC is by using byte
offsets(flower_pedit_map). However some fields like Traffic Class in IPv6 span
2 pedit-words - one nibble of the field is in one word and the other nibble is
in another word. In order to accommodate this a shift may be used.

We need to do this shift in network order thus we need the conversion from host
to network, but the types for htonl/ntohl either takes as parameter or returns
ovs_be32. Doing the shift operation on ovs_be32 results in the compiler
degrades to integer:

 error: restricted ovs_be32 degrades to integer

Passing that back to ntohl would result in:

 error: incorrect type in argument 1 (different base types)
expected restricted ovs_be32 [usertype] x
got unsigned int

therefore we introduced these new operations. Hope this helps. I'm open to
alternate approaches to this problem.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 2/2] lib/tc: add set ipv6 traffic class action offload via pedit

2019-01-11 Thread Ben Pfaff
On Fri, Jan 11, 2019 at 11:51:53AM +, Pieter Jansen van Vuuren wrote:
> +/* These functions specifically help shifting words in network
> + * byte order, given that they are specified in host order. */
> +static inline uint32_t
> +shift_ovs_be32_left(uint32_t word, int shift)
> +{
> +uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) << shift;
> +
> +return ntohl((OVS_FORCE ovs_be32)word_shifted);
> +}
> +
> +static inline uint32_t
> +shift_ovs_be32_right(uint32_t word, int shift)
> +{
> +uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) >> shift;
> +
> +return ntohl((OVS_FORCE ovs_be32)word_shifted);
> +}

I don't understand how these new operations make sense.  Can you
explain?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/3] flake8 fixes

2019-01-11 Thread Ben Pfaff
On Fri, Jan 11, 2019 at 03:48:44PM +0530, Numan Siddique wrote:
> On Fri, Jan 11, 2019 at 4:54 AM Ben Pfaff  wrote:
> 
> > This is needed to make OVS work with flake8 3.6.
> >
> >
> Tested all the patches in this series on fedora 29 with flake8 3.7 and it
> fixes all
> the flake8 issues.
> 
> Ack for the whole series.
> 
> Acked-by: Numan Siddique 
> Tested-by: Numan Siddique 

Thanks!  I applied this series to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-actions: Remove unneded unicode symbols.

2019-01-11 Thread Ben Pfaff
On Fri, Jan 11, 2019 at 12:36:52PM +0300, Ilya Maximets wrote:
> Fixes manpage-check warnings on FreeBSD 11.2:
> 
> lib/ovs-actions.7:1389: warning: invalid input character code 128
> lib/ovs-actions.7:1389: warning: invalid input character code 128
> lib/ovs-actions.7:1389: warning: can't find character with input code 144
> 
> Fixes: be51cd417343 ("ovs-actions: New document describing OVS actions in 
> detail.")
> Signed-off-by: Ilya Maximets 

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] vconn: Fix using of uninitialized deadline.

2019-01-11 Thread Ben Pfaff
On Fri, Jan 11, 2019 at 11:09:19AM +0300, Ilya Maximets wrote:
> Typo introduced while making minor refactoring before applying the
> patch.
> 
> Fixes logic and the clang build:
> 
>   lib/vconn.c:707:47: error:
>   variable 'deadline' is uninitialized when
>   used within its own initialization [-Werror,-Wuninitialized]
>   ? time_msec() + deadline
>   ^~~~
> 
> Fixes: 04895042e9f6 ("vconn: Allow timeout configuration for blocking 
> connection.")
> Signed-off-by: Ilya Maximets 

Well, I'm an idiot.  Sorry about that.

I applied this to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/3] netdev-dpdk: Consider packets marked for TSO.

2019-01-11 Thread Lam, Tiago
Hi Ian,

Agreed with all that's said below and will work on the changes, just a
small comment in-line.

On 11/01/2019 12:26, Ian Stokes wrote:
> On 1/10/2019 4:58 PM, Tiago Lam wrote:
>> Previously, TSO was being explicity disabled on vhost interfaces,
>> meaning the guests wouldn't have TSO support negotiated in. With TSO
>> negotiated and enabled, packets are now marked for TSO, through the
>> PKT_TX_TCP_SEG flag.
>>
>> In order to deal with this type of packets, a new function,
>> netdev_dpdk_prep_tso_packet(), has been introduced, with the main
>> purpose of setting correctly the l2, l3 and l4 length members of the
>> mbuf struct, and the appropriate ol_flags. This function supports TSO
>> both in IPv4 and IPv6.
>>
>> netdev_dpdk_prep_tso_packet() is then only called when packets are
>> marked with the PKT_TX_TCP_SEG flag, meaning they have been marked for
>> TSO, and when the packet will be traversing the NIC.
>>
>> Additionally, if a packet is marked for TSO but the egress netdev
>> doesn't support it, the packet is dropped.
>>
> 
> Hi Tiago,
> 
> a high level first pass, I still need to test this in more detail so not 
> a full review yet but some minor issues to be addressed below.
> 
>> Co-authored-by: Mark Kavanagh 
>>
>> Signed-off-by: Mark Kavanagh 
>> Signed-off-by: Tiago Lam 
>> ---
>>   lib/dp-packet.h|  14 +++
>>   lib/netdev-bsd.c   |  11 -
>>   lib/netdev-dpdk.c  | 121 
>> ++---
>>   lib/netdev-dummy.c |  11 -
>>   lib/netdev-linux.c |  15 +++
>>   5 files changed, 146 insertions(+), 26 deletions(-)
>>
>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>> index 970aaf2..c384416 100644
>> --- a/lib/dp-packet.h
>> +++ b/lib/dp-packet.h
>> @@ -104,6 +104,8 @@ static inline void dp_packet_set_size(struct dp_packet 
>> *, uint32_t);
>>   static inline uint16_t dp_packet_get_allocated(const struct dp_packet *);
>>   static inline void dp_packet_set_allocated(struct dp_packet *, uint16_t);
>>   
>> +static inline bool dp_packet_is_tso(struct dp_packet *b);
>> +
>>   void *dp_packet_resize_l2(struct dp_packet *, int increment);
>>   void *dp_packet_resize_l2_5(struct dp_packet *, int increment);
>>   static inline void *dp_packet_eth(const struct dp_packet *);
>> @@ -761,6 +763,12 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t s)
>>   b->mbuf.buf_len = s;
>>   }
>>   
>> +static inline bool
>> +dp_packet_is_tso(struct dp_packet *b)
>> +{
>> +return (b->mbuf.ol_flags & PKT_TX_TCP_SEG) ? true : false;
>> +}
>> +
>>   static inline void
>>   dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet 
>> *src)
>>   {
>> @@ -972,6 +980,12 @@ dp_packet_get_allocated(const struct dp_packet *b)
>>   return b->allocated_;
>>   }
>>   
>> +static inline bool
>> +dp_packet_is_tso(struct dp_packet *b)
>> +{
> This will cause a unused parameter warning. Need to pass OVS_UNUSED 
> above also.
> 
>> +return false;
>> +}
>> +
>>   static inline void
>>   dp_packet_set_allocated(struct dp_packet *b, uint16_t s)
>>   {
>> diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
>> index 278c8a9..5e8c5cc 100644
>> --- a/lib/netdev-bsd.c
>> +++ b/lib/netdev-bsd.c
>> @@ -700,13 +700,22 @@ netdev_bsd_send(struct netdev *netdev_, int qid 
>> OVS_UNUSED,
>>   }
>>   
>>   DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
>> +size_t size = dp_packet_size(packet);
>> +
>> +/* TSO not supported in BSD netdev */
>> +if (dp_packet_is_tso(packet)) {
>> +VLOG_WARN_RL(, "%s: No TSO enabled on port, TSO packet 
>> dropped "
>> + "%" PRIu32 " ", name, size);
>> +
>> +continue;
>> +}
>> +
>>   /* We need the whole data to send the packet on the device */
>>   if (!dp_packet_is_linear(packet)) {
>>   dp_packet_linearize(packet);
>>   }
>>   
>>   const void *data = dp_packet_data(packet);
>> -size_t size = dp_packet_size(packet);
>>   
>>   while (!error) {
>>   ssize_t retval;
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 77d04fc..ad7223a 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -1375,14 +1375,16 @@ netdev_dpdk_vhost_construct(struct netdev *netdev)
>>   goto out;
>>   }
>>   
>> -err = rte_vhost_driver_disable_features(dev->vhost_id,
>> -1ULL << VIRTIO_NET_F_HOST_TSO4
>> -| 1ULL << VIRTIO_NET_F_HOST_TSO6
>> -| 1ULL << VIRTIO_NET_F_CSUM);
>> -if (err) {
>> -VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user "
>> - "port: %s\n", name);
>> -goto out;
>> +if (!dpdk_multi_segment_mbufs) {
>> +err = rte_vhost_driver_disable_features(dev->vhost_id,
>> +1ULL << VIRTIO_NET_F_HOST_TSO4
>> +| 1ULL << 

Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread Ian Stokes

On 1/11/2019 4:11 PM, David Marchand wrote:


On Fri, Jan 11, 2019 at 5:07 PM Ian Stokes > wrote:


On 1/11/2019 3:45 PM, Lam, Tiago wrote:
 > Thanks for bringing this up. I can see from the thread that this is
 > pretty much clarified by now, but do we still want to include a
comment?
 > Since the docs are also versioned, and David's patch (thanks!)
will be
 > for master, I assume we still want a comment here, since people
might be
 > looking into DPDK's v18.11 docs specifically.

Good point, I would think Davids doc patch would be backported to DPDK
but I can confirm, if it is backported to 18.11, I don't think the
comment is needed.


I was going to reply that I intended to ask for a backport in 18.11 at 
least :-)


That's perfect, thanks David, much appreciated.


--
David Marchand



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/3] dpdk: Add support for TSO

2019-01-11 Thread Ilya Maximets
Nothing significantly changed since the previous versions.
This patch set effectively breaks following cases if multi-segment
mbufs enabled:

  1. Host <-> VM communication broken.

 With this patch applied VMs has offloading enabled by default.
 This makes all the packets that goes out from the VM to have
 incorrect checksums (beside possible multi-segmenting). As a
 result any packet that reaches host interfaces dropped by the
 host kernel as corrupted. Could be easily reproduced by trying
 to use ssh from host to VM, which is the common case for usual
 OpenStack installations.

 If checksums will be fixed, segmented packets will be dropped
 anyway in netdev-linux/bsd/dummy.

  2. Broken VM to VM communication in case of not negotiated offloading
 in one of VMs.

 If one of VMs does not negotiate offloading, all the packets
 will be dropped for the same reason as in Host to VM case. This
 is a really bad case because virtio driver could change the
 feature set in runtime and this should be handled in OVS.

 Note that linux kernel virtio driver always has rx checksum
 offloading enabled, i.e. it doesn't check/recalculates the
 checksums in software if needed. So, you need the dpdk driver or
 a FreeBSD machine to reproduce checksumming issue.

  3. Broken support of all the virtual and HW NICs that doen't support TSO.

 It's actually big number of NICs. More than a half of PMDs in DPDK
 does not support TSO.
 All the packets just dropped by netdev-dpdk before send.

This is most probably not a full list of issues. To fix most of them
fallback to full software TSO implementation required.

Until this is done I prefer not to merge the feature because it
breaks the basic functionality. I understand that you're going to make
it 'experimental', but, IMHO, this doesn't worth even that. Patches
are fairly small and there is nothing actually we need to test before
the full support. The main part is not implemented yet.

NACK for the series. Sorry.



One more thing I wanted to mention is that I was surprised to not see
the performance comparison of TSO with usual Jumbo frames in your slides
on a recent OVS Conference [1]. You had a comparison slide with 1500 MTU,
TSO and the kernel vhost, but not the jumbo frames. In my testing
I have more than 22 Gbps for VM<->VM jumbo frames performance with
current OVS and iperf. And this is almost equal to vhost kernel with TSO
performance on my system (I have probably a bit less powerful CPU, so the
results are a bit lower than in your slides). Also, in real cases there will
be significant performance drop because of fallback to software TSO in
many cases. Sorry, but I don't really see the benefits of using the
Multi-segment mbufs and TSO.
All of this work positioned as a feature to cover the gap between the kernel
and userspace datapaths, but there is no any gap from the performance point
of view and overcomplicating the OVS by multisegmenting and TSO is really
not necessary.

[1] http://www.openvswitch.org/support/ovscon2018/5/0935-lam.pptx


On 10.01.2019 19:58, Tiago Lam wrote:
> Enabling TSO offload allows a host stack to delegate the segmentation of
> oversized TCP packets to the underlying physical NIC, if supported. In the 
> case
> of a VM this means that the segmentation of the packets is not performed by 
> the
> guest kernel, but by the host NIC itself. In turn, since the TSO calculations
> and checksums are being performed in hardware, this alleviates the CPU load on
> the host system. In inter VM communication this might account to significant
> savings, and higher throughput, even more so if the VMs are running on the 
> same
> host.
> 
> Thus, although inter VM communication is already possible as is, there's a
> sacrifice in terms of CPU, which may affect the overall throughput.
> 
> This series adds support for TSO in OvS-DPDK, by making use of the TSO
> offloading feature already supported by DPDK vhost backend, having the
> following scenarios in mind:
> - Inter VM communication on the same host;
> - Inter VM communication on different hosts;
> - The same two use cases above, but on a VLAN network.
> 
> The work is based on [1]; It has been rebased to run on top of the
> multi-segment mbufs work (v13) [2] and re-worked to use DPDK v18.11.
> 
> [1] https://patchwork.ozlabs.org/patch/749564/
> [2] https://mail.openvswitch.org/pipermail/ovs-dev/2019-January/354950.html
> 
> Considerations:
> - As mentioned above, this series depends on the multi-segment mbuf series
>   (v13) and can't be applied on master as is;
> - The `rte_eth_tx_prepare()` API in DPDK is marked experimental, and although
>   I'm not getting any errors / warnings while compiling, do shout if get into
>   trouble while testing;
> - I'm due to send v3 in the next few days, but sending v2 now to enable early
>   testing;
> 
> Tiago Lam (3):
>   netdev-dpdk: Validate packets burst before Tx.
>   

Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread David Marchand
On Fri, Jan 11, 2019 at 5:07 PM Ian Stokes  wrote:

> On 1/11/2019 3:45 PM, Lam, Tiago wrote:
> > Thanks for bringing this up. I can see from the thread that this is
> > pretty much clarified by now, but do we still want to include a comment?
> > Since the docs are also versioned, and David's patch (thanks!) will be
> > for master, I assume we still want a comment here, since people might be
> > looking into DPDK's v18.11 docs specifically.
>
> Good point, I would think Davids doc patch would be backported to DPDK
> but I can confirm, if it is backported to 18.11, I don't think the
> comment is needed.
>

I was going to reply that I intended to ask for a backport in 18.11 at
least :-)

-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread Ian Stokes

On 1/11/2019 3:45 PM, Lam, Tiago wrote:

Hi Ian,

Thanks, comments in-line.



[snip]


+/* Validate the burst of packets for Tx. */
+nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);


So one of the gotchas here is that rte_eth_tx_prepare is experimental
although I didn't see any compilation issues (warnings etc.) and it
builds OK with travis.

Possibly a comment above it to explain it's currently experimental and
subject to change in DPDK would be useful, can be removed once the api
is non experimental in DPDK. It would be one of the conditions to remove
the experimental tag for TSO in OVS DPDK I suspect in the future.


Thanks for bringing this up. I can see from the thread that this is
pretty much clarified by now, but do we still want to include a comment?
Since the docs are also versioned, and David's patch (thanks!) will be
for master, I assume we still want a comment here, since people might be
looking into DPDK's v18.11 docs specifically.


Good point, I would think Davids doc patch would be backported to DPDK 
but I can confirm, if it is backported to 18.11, I don't think the 
comment is needed.


Ian


Tiago.



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 3/3] netdev-dpdk: Enable TSO when using multi-seg mbufs

2019-01-11 Thread Ian Stokes

On 1/10/2019 4:58 PM, Tiago Lam wrote:

TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP
network stack to delegate segmentation of a TCP segment to the hardware
NIC, thus saving compute resources. This may improve performance
significantly for TCP workload in virtualized environments.

While a previous commit already added the necesary logic to netdev-dpdk
to deal with packets marked for TSO, this set of changes enables TSO by
default when using multi-segment mbufs.

Thus, to enable TSO on the physical DPDK interfaces, only the following
command needs to be issued before starting OvS:
 ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true


A general question (and not a blocker by any means), is it more 
intuitive to change above to be 'other_config:dpdk-tso=true' or 
something similar?


I'm thinking from a user perspective, they would be familiar with the 
concept of TSO already but not necessarily multi segments. It seems to 
be a technical per-requisite for DPDK (and OVS DPDK) under the hood.


But again multi segments will be needed for other offloads besides TSO 
so maybe it makes more sense to stick with the argument you have already.




Co-authored-by: Mark Kavanagh 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
---
  Documentation/automake.mk   |   1 +
  Documentation/topics/dpdk/index.rst |   1 +
  Documentation/topics/dpdk/tso.rst   | 111 
  NEWS|   1 +
  lib/netdev-dpdk.c   |  63 +---
  5 files changed, 168 insertions(+), 9 deletions(-)
  create mode 100644 Documentation/topics/dpdk/tso.rst

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 082438e..a20deb8 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -39,6 +39,7 @@ DOC_SOURCE = \
Documentation/topics/dpdk/index.rst \
Documentation/topics/dpdk/bridge.rst \
Documentation/topics/dpdk/jumbo-frames.rst \
+   Documentation/topics/dpdk/tso.rst \
Documentation/topics/dpdk/memory.rst \
Documentation/topics/dpdk/pdump.rst \
Documentation/topics/dpdk/phy.rst \
diff --git a/Documentation/topics/dpdk/index.rst 
b/Documentation/topics/dpdk/index.rst
index cf24a7b..eb2a04d 100644
--- a/Documentation/topics/dpdk/index.rst
+++ b/Documentation/topics/dpdk/index.rst
@@ -40,4 +40,5 @@ The DPDK Datapath
 /topics/dpdk/qos
 /topics/dpdk/pdump
 /topics/dpdk/jumbo-frames
+   /topics/dpdk/tso
 /topics/dpdk/memory
diff --git a/Documentation/topics/dpdk/tso.rst 
b/Documentation/topics/dpdk/tso.rst
new file mode 100644
index 000..503354f
--- /dev/null
+++ b/Documentation/topics/dpdk/tso.rst
@@ -0,0 +1,111 @@
+..
+  Copyright 2018, Red Hat, Inc.


Probably Intel in this case rather than RH.


+
+  Licensed under the Apache License, Version 2.0 (the "License"); you may
+  not use this file except in compliance with the License. You may obtain
+  a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+  License for the specific language governing permissions and limitations
+  under the License.
+
+  Convention for heading levels in Open vSwitch documentation:
+
+  ===  Heading 0 (reserved for the title in a document)
+  ---  Heading 1
+  ~~~  Heading 2
+  +++  Heading 3
+  '''  Heading 4
+
+  Avoid deeper levels because they do not render well.
+
+===
+TSO
+===

You should probably add a notice that this feature is (experimental).


+
+.. versionadded:: 2.11.0
It's good practice to add the version added but traditionally we've done 
this after the fact once the release is out. Other wise you can have 
commits on master referring to a release that is not officially out yet.


This has come up before and in that case we waited for the release 
before adding.

+
+TCP Segmentation Offload (TSO) is a mechanism which allows a TCP/IP stack to
+offload the TCP segmentation into hardware, thus saving the cycles that would
+be required to perform this same segmentation in Software.
Minor: Is software capitalized for a reason above and throughout the 
text below? I would have thought lower case would be fine.



+
+TCP Segmentation Offload (TSO) enables a network stack to delegate segmentation
+of an oversized TCP segment to the underlying physical NIC. Offload of frame
+segmentation achieves computational savings in the core, freeing up CPU cycles
+for more useful work.
+
+A common use case for TSO is when using virtualization, where traffic that's
+coming in from a VM can offload the TCP segmentation, thus avoiding the
+fragmentation in Software. Additionally, if the traffic 

Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread Lam, Tiago
Hi Ian,

Thanks, comments in-line.

On 11/01/2019 10:50, Ian Stokes wrote:
> On 1/10/2019 4:58 PM, Tiago Lam wrote:
>> Given that multi-segment mbufs might be sent between interfaces that
>> support different capabilities, and may even support different layouts
>> of mbufs, outgoing packets should be validated before sent on the egress
>> interface. Thus, netdev_dpdk_eth_tx_burst() now calls DPDK's
>> rte_eth_tx_prepare() function, if and only multi-segments is enbaled, in
>> order to validate the following (taken from the DPDK documentation), on
>> a device specific manner:
>> - Check if packet meets devices requirements for tx offloads.
>> - Check limitations about number of segments.
>> - Check additional requirements when debug is enabled.
>> - Update and/or reset required checksums when tx offload is set for
>> packet.
>>
> 
> Thanks Tiago comments below.
> 
>> Signed-off-by: Tiago Lam 
>> ---
>>   lib/netdev-dpdk.c | 21 +++--
>>   1 file changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index d6114ee..77d04fc 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -2029,6 +2029,10 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
>>   
>>   /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes ownership 
>> of
>>* 'pkts', even in case of failure.
>> + * In case multi-segment mbufs / TSO is being used, it also prepares. In 
>> such
>> + * cases, only the prepared packets will be sent to Tx burst, meaning that 
>> if
>> + * an invalid packet appears in 'pkts'[3] only the validated packets in 
>> indices
>> + * 0, 1 and 2 will be sent.
>>*
>>* Returns the number of packets that weren't transmitted. */
>>   static inline int
>> @@ -2036,11 +2040,24 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, 
>> int qid,
>>struct rte_mbuf **pkts, int cnt)
>>   {
>>   uint32_t nb_tx = 0;
>> +uint16_t nb_prep = cnt;
>>   
>> -while (nb_tx != cnt) {
>> +if (dpdk_multi_segment_mbufs) {
> As this feature would be experimental and not enabled, by default this 
> would be false.
> 
> Would it make sense to deal with the default non multiseg case first?
> 
> Extra checks in the datapath can be costly and I'd like to minimize any 
> impact for the non multi seg traffic which is the default use case 
> currently. Possibly checking for !dpdk_multi_segment_mbufs first? Or 
> possibly the use of OVS_LIKELY/OVS_UNLIKELY. Have you given thought to this?

Sounds reasonable to me, since this won't be the default. I'll use
`OVS_UNLIKELY` for the next iteration.

> 
>> +/* Validate the burst of packets for Tx. */
>> +nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
> 
> So one of the gotchas here is that rte_eth_tx_prepare is experimental 
> although I didn't see any compilation issues (warnings etc.) and it 
> builds OK with travis.
> 
> Possibly a comment above it to explain it's currently experimental and 
> subject to change in DPDK would be useful, can be removed once the api 
> is non experimental in DPDK. It would be one of the conditions to remove 
> the experimental tag for TSO in OVS DPDK I suspect in the future.

Thanks for bringing this up. I can see from the thread that this is
pretty much clarified by now, but do we still want to include a comment?
Since the docs are also versioned, and David's patch (thanks!) will be
for master, I assume we still want a comment here, since people might be
looking into DPDK's v18.11 docs specifically.

Tiago.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 06/11] dp-packet: Add support for data "linearization".

2019-01-11 Thread Lam, Tiago
On 11/01/2019 11:29, Ilya Maximets wrote:
> Not a full review.
> Comments inline.

Thanks Ilya, responses in-line.

> 
> Best regards, Ilya Maximets.
> 
> On 09.01.2019 21:05, Tiago Lam wrote:
>> Previous commits have added support to the dp_packet API to handle
>> multi-segmented packets, where data is not stored contiguously in
>> memory. However, in some cases, it is inevitable and data must be
>> provided contiguously. Examples of such cases are when performing csums
>> over the entire packet data, or when write()'ing to a file descriptor
>> (for a tap interface, for example). For such cases, the dp_packet API
>> has been extended to provide a way to transform a multi-segmented
>> DPBUF_DPDK packet into a DPBUF_MALLOC system packet (at the expense of a
>> copy of memory). If the packet's data is already stored in memory
>> contigously then there's no need to convert the packet.
>>
>> Thus, the main use cases that were assuming that a dp_packet's data is
>> always held contiguously in memory were changed to make use of the new
>> "linear functions" in the dp_packet API when there's a need to traverse
>> the entire's packet data. Per the example above, when the packet's data
>> needs to be write() to the tap's file descriptor, or when the conntrack
>> module needs to verify a packet's checksum, the data is now linearized.
>>
>> Additionally, the miniflow_extract() function has been modified to check
>> if the respective packet headers don't span across multiple mbufs. This
>> requirement is needed to guarantee that callers can assume headers are
>> always in contiguous memory.
>>
>> Signed-off-by: Tiago Lam 
>> ---
>>  lib/bfd.c |   3 +-
>>  lib/conntrack-private.h   |   4 +-
>>  lib/conntrack.c   |  43 +--
>>  lib/crc32c.c  |  35 --
>>  lib/crc32c.h  |   2 +
>>  lib/dp-packet.c   |  18 +++
>>  lib/dp-packet.h   | 267 
>> ++
>>  lib/dpif-netdev.c |  13 +-
>>  lib/dpif-netlink.c|   5 +
>>  lib/dpif.c|   9 ++
>>  lib/flow.c|  97 ---
>>  lib/flow.h|   2 +-
>>  lib/mcast-snooping.c  |   2 +
>>  lib/netdev-bsd.c  |   5 +
>>  lib/netdev-dummy.c|  13 +-
>>  lib/netdev-linux.c|  13 +-
>>  lib/netdev-native-tnl.c   |  26 ++--
>>  lib/odp-execute.c |  12 +-
>>  lib/ovs-lldp.c|   3 +-
>>  lib/packets.c |  85 --
>>  lib/packets.h |   7 ++
>>  ofproto/ofproto-dpif-upcall.c |  20 +++-
>>  ofproto/ofproto-dpif-xlate.c  |  32 -
>>  tests/test-rstp.c |   8 +-
>>  tests/test-stp.c  |   8 +-
>>  25 files changed, 579 insertions(+), 153 deletions(-)
>>
> 
> [snip]
> 
>> +/* Linearizes the data on packet 'b', by copying the data into system's 
>> memory.
>> + * After this the packet is effectively a DPBUF_MALLOC packet. The caller is
>> + * responsible * for ensuring 'b' needs linearization, by calling
>> + * dp_packet_is_linear().
>> + *
>> + * This is an expensive operation which should only be performed as a last
>> + * resort, when multi-segments are under use but data must be accessed
>> + * linearly. */
>> +static inline void
>> +dp_packet_linearize(struct dp_packet *b)
>> +{
>> +struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, >mbuf);
>> +struct dp_packet *pkt = CONST_CAST(struct dp_packet *, b);
>> +uint32_t pkt_len = dp_packet_size(pkt);
>> +struct mbuf_state *mstate = NULL;
>> +void *dst = xmalloc(pkt_len);
>> +
>> +/* Copy packet's data to system's memory */
>> +if (!rte_pktmbuf_read(mbuf, 0, pkt_len, dst)) {
> 
> 'dst' leak.

ACK. Will fix.

> 
>> +return;
>> +}
>> +
>> +/* Free all mbufs except for the first */
>> +dp_packet_clear(pkt);
>> +
>> +/* Save mbuf's buf_addr to restore later */
>> +mstate = xmalloc(sizeof(*mstate));
>> +mstate->addr = pkt->mbuf.buf_addr;
>> +mstate->len = pkt->mbuf.buf_len;
>> +mstate->off = pkt->mbuf.data_off;
>> +pkt->mstate = mstate;
>> +
>> +/* Tranform DPBUF_DPDK packet into a DPBUF_MALLOC packet */
>> +pkt->source = DPBUF_MALLOC;
>> +pkt->mbuf.buf_addr = dst;
>> +pkt->mbuf.buf_len = pkt_len;
>> +pkt->mbuf.data_off = 0;
>> +dp_packet_set_size(pkt, pkt_len);
>> +}
>>  #else /* DPDK_NETDEV */
>>  static inline bool
>>  dp_packet_equal(const struct dp_packet *a, const struct dp_packet *b)
>> @@ -945,6 +1050,24 @@ dp_packet_has_flow_mark(struct dp_packet *p OVS_UNUSED,
>>  {
>>  return false;
>>  }
>> +
> 
> [snip]
> 
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 1564db9..1f1188d 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -5710,6 +5710,11 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, 
>> struct dp_packet *packet_,
>>  .support = 

Re: [ovs-dev] [PATCH v2 2/3] netdev-dpdk: Consider packets marked for TSO.

2019-01-11 Thread Ian Stokes

On 1/11/2019 12:26 PM, Ian Stokes wrote:

On 1/10/2019 4:58 PM, Tiago Lam wrote:

Previously, TSO was being explicity disabled on vhost interfaces,
meaning the guests wouldn't have TSO support negotiated in. With TSO
negotiated and enabled, packets are now marked for TSO, through the
PKT_TX_TCP_SEG flag.

In order to deal with this type of packets, a new function,
netdev_dpdk_prep_tso_packet(), has been introduced, with the main
purpose of setting correctly the l2, l3 and l4 length members of the
mbuf struct, and the appropriate ol_flags. This function supports TSO
both in IPv4 and IPv6.

netdev_dpdk_prep_tso_packet() is then only called when packets are
marked with the PKT_TX_TCP_SEG flag, meaning they have been marked for
TSO, and when the packet will be traversing the NIC.

Additionally, if a packet is marked for TSO but the egress netdev
doesn't support it, the packet is dropped.



Hi Tiago,

a high level first pass, I still need to test this in more detail so not 
a full review yet but some minor issues to be addressed below.



One more addition to below.

[snip]


diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index fa79b2a..1476096 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1379,6 +1379,13 @@ netdev_linux_sock_batch_send(int sock, int 
ifindex,

  struct dp_packet *packet;
  DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+    /* TSO not supported in Linux netdev */
+    if (dp_packet_is_tso(packet)) {
+    VLOG_WARN_RL(, "%d: No TSO enabled on port, TSO packet 
dropped "

+ "%ld", sock, size);

Will cause compilation warnings:

format ‘%ld’ expects argument of type ‘long int’, but argument 6 has 
type ‘size_t’


Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread David Marchand
On Fri, Jan 11, 2019 at 1:49 PM Ian Stokes  wrote:

> On 1/11/2019 12:29 PM, Ferruh Yigit wrote:
> > On 1/11/2019 11:20 AM, Andrew Rybchenko wrote:
> >> On 1/11/19 2:11 PM, David Marchand wrote:
> >>> Indeed, the documentation tells it is experimental, but the
> experimental tag
> >>> has always been missing.
> >>> On the other hand, this api has been there since 17.02 and its form
> did not
> >>> change afaics.
> >>> I am not sure it qualifies as experimental anymore.
> >>
> >> I think that we should fix the documentation and remove experimental
> tag.
> >
> > Agreed to remove experimental from rte_eth_tx_prepare().
> >
> > Most probably rte_eth_tx_prepare() developed before we started to use
> > __rte_experimental tag, that is why it doesn't have it.
> >
> Sounds good, thanks Ferruh.
>

Ok, let me send a patch for dpdk.


-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread Ian Stokes

On 1/11/2019 12:29 PM, Ferruh Yigit wrote:

On 1/11/2019 11:20 AM, Andrew Rybchenko wrote:

On 1/11/19 2:11 PM, David Marchand wrote:


On Fri, Jan 11, 2019 at 11:56 AM Ian Stokes mailto:ian.sto...@intel.com>> wrote:

 On 1/10/2019 4:58 PM, Tiago Lam wrote:

 > +        /* Validate the burst of packets for Tx. */
 > +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);

 So one of the gotchas here is that rte_eth_tx_prepare is experimental
 although I didn't see any compilation issues (warnings etc.) and it
 builds OK with travis.


Indeed, the documentation tells it is experimental, but the experimental tag
has always been missing.
On the other hand, this api has been there since 17.02 and its form did not
change afaics.
I am not sure it qualifies as experimental anymore.


I think that we should fix the documentation and remove experimental tag.


Agreed to remove experimental from rte_eth_tx_prepare().

Most probably rte_eth_tx_prepare() developed before we started to use
__rte_experimental tag, that is why it doesn't have it.


Sounds good, thanks Ferruh.

Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread Ferruh Yigit
On 1/11/2019 11:20 AM, Andrew Rybchenko wrote:
> On 1/11/19 2:11 PM, David Marchand wrote:
>>
>> On Fri, Jan 11, 2019 at 11:56 AM Ian Stokes > > wrote:
>>
>> On 1/10/2019 4:58 PM, Tiago Lam wrote:
>>
>> > +        /* Validate the burst of packets for Tx. */
>> > +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>>
>> So one of the gotchas here is that rte_eth_tx_prepare is experimental
>> although I didn't see any compilation issues (warnings etc.) and it
>> builds OK with travis.
>>
>>
>> Indeed, the documentation tells it is experimental, but the experimental tag
>> has always been missing.
>> On the other hand, this api has been there since 17.02 and its form did not
>> change afaics.
>> I am not sure it qualifies as experimental anymore.
> 
> I think that we should fix the documentation and remove experimental tag.

Agreed to remove experimental from rte_eth_tx_prepare().

Most probably rte_eth_tx_prepare() developed before we started to use
__rte_experimental tag, that is why it doesn't have it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread Ferruh Yigit
On 1/11/2019 11:20 AM, Andrew Rybchenko wrote:
> On 1/11/19 2:11 PM, David Marchand wrote:
>>
>> On Fri, Jan 11, 2019 at 11:56 AM Ian Stokes > > wrote:
>>
>> On 1/10/2019 4:58 PM, Tiago Lam wrote:
>>
>> > +        /* Validate the burst of packets for Tx. */
>> > +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>>
>> So one of the gotchas here is that rte_eth_tx_prepare is experimental
>> although I didn't see any compilation issues (warnings etc.) and it
>> builds OK with travis.
>>
>>
>> Indeed, the documentation tells it is experimental, but the experimental tag
>> has always been missing.
>> On the other hand, this api has been there since 17.02 and its form did not
>> change afaics.
>> I am not sure it qualifies as experimental anymore.
> 
> I think that we should fix the documentation and remove experimental tag.

Agreed to remove experimental from rte_eth_tx_prepare().

Most probably rte_eth_tx_prepare() developed before we started to use
__rte_experimental tag, that is why it doesn't have it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/3] netdev-dpdk: Consider packets marked for TSO.

2019-01-11 Thread Ian Stokes

On 1/10/2019 4:58 PM, Tiago Lam wrote:

Previously, TSO was being explicity disabled on vhost interfaces,
meaning the guests wouldn't have TSO support negotiated in. With TSO
negotiated and enabled, packets are now marked for TSO, through the
PKT_TX_TCP_SEG flag.

In order to deal with this type of packets, a new function,
netdev_dpdk_prep_tso_packet(), has been introduced, with the main
purpose of setting correctly the l2, l3 and l4 length members of the
mbuf struct, and the appropriate ol_flags. This function supports TSO
both in IPv4 and IPv6.

netdev_dpdk_prep_tso_packet() is then only called when packets are
marked with the PKT_TX_TCP_SEG flag, meaning they have been marked for
TSO, and when the packet will be traversing the NIC.

Additionally, if a packet is marked for TSO but the egress netdev
doesn't support it, the packet is dropped.



Hi Tiago,

a high level first pass, I still need to test this in more detail so not 
a full review yet but some minor issues to be addressed below.



Co-authored-by: Mark Kavanagh 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
---
  lib/dp-packet.h|  14 +++
  lib/netdev-bsd.c   |  11 -
  lib/netdev-dpdk.c  | 121 ++---
  lib/netdev-dummy.c |  11 -
  lib/netdev-linux.c |  15 +++
  5 files changed, 146 insertions(+), 26 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 970aaf2..c384416 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -104,6 +104,8 @@ static inline void dp_packet_set_size(struct dp_packet *, 
uint32_t);
  static inline uint16_t dp_packet_get_allocated(const struct dp_packet *);
  static inline void dp_packet_set_allocated(struct dp_packet *, uint16_t);
  
+static inline bool dp_packet_is_tso(struct dp_packet *b);

+
  void *dp_packet_resize_l2(struct dp_packet *, int increment);
  void *dp_packet_resize_l2_5(struct dp_packet *, int increment);
  static inline void *dp_packet_eth(const struct dp_packet *);
@@ -761,6 +763,12 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t s)
  b->mbuf.buf_len = s;
  }
  
+static inline bool

+dp_packet_is_tso(struct dp_packet *b)
+{
+return (b->mbuf.ol_flags & PKT_TX_TCP_SEG) ? true : false;
+}
+
  static inline void
  dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src)
  {
@@ -972,6 +980,12 @@ dp_packet_get_allocated(const struct dp_packet *b)
  return b->allocated_;
  }
  
+static inline bool

+dp_packet_is_tso(struct dp_packet *b)
+{
This will cause a unused parameter warning. Need to pass OVS_UNUSED 
above also.



+return false;
+}
+
  static inline void
  dp_packet_set_allocated(struct dp_packet *b, uint16_t s)
  {
diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
index 278c8a9..5e8c5cc 100644
--- a/lib/netdev-bsd.c
+++ b/lib/netdev-bsd.c
@@ -700,13 +700,22 @@ netdev_bsd_send(struct netdev *netdev_, int qid 
OVS_UNUSED,
  }
  
  DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {

+size_t size = dp_packet_size(packet);
+
+/* TSO not supported in BSD netdev */
+if (dp_packet_is_tso(packet)) {
+VLOG_WARN_RL(, "%s: No TSO enabled on port, TSO packet dropped "
+ "%" PRIu32 " ", name, size);
+
+continue;
+}
+
  /* We need the whole data to send the packet on the device */
  if (!dp_packet_is_linear(packet)) {
  dp_packet_linearize(packet);
  }
  
  const void *data = dp_packet_data(packet);

-size_t size = dp_packet_size(packet);
  
  while (!error) {

  ssize_t retval;
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 77d04fc..ad7223a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1375,14 +1375,16 @@ netdev_dpdk_vhost_construct(struct netdev *netdev)
  goto out;
  }
  
-err = rte_vhost_driver_disable_features(dev->vhost_id,

-1ULL << VIRTIO_NET_F_HOST_TSO4
-| 1ULL << VIRTIO_NET_F_HOST_TSO6
-| 1ULL << VIRTIO_NET_F_CSUM);
-if (err) {
-VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user "
- "port: %s\n", name);
-goto out;
+if (!dpdk_multi_segment_mbufs) {
+err = rte_vhost_driver_disable_features(dev->vhost_id,
+1ULL << VIRTIO_NET_F_HOST_TSO4
+| 1ULL << VIRTIO_NET_F_HOST_TSO6
+| 1ULL << VIRTIO_NET_F_CSUM);
+if (err) {
+VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user "
+ "client port: %s\n", dev->up.name);
+goto out;
+}
  }
  
  err = rte_vhost_driver_start(dev->vhost_id);

@@ -2027,6 +2029,44 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
  rte_free(rx);
  }
  
+/* Should only be called if PKT_TX_TCP_SEG is set in ol_flags.

+ * 

[ovs-dev] (no subject)

2019-01-11 Thread Phee Rujiphan


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] Adding support for PMD auto load balancing

2019-01-11 Thread Kevin Traynor
On 01/11/2019 07:45 AM, Nitin Katiyar wrote:
> Port rx queues that have not been statically assigned to PMDs are currently
> assigned based on periodically sampled load measurements.
> The assignment is performed at specific instances – port addition, port
> deletion, upon reassignment request via CLI etc.
> 
> Due to change in traffic pattern over time it can cause uneven load among
> the PMDs and thus resulting in lower overall throughout.
> 
> This patch enables the support of auto load balancing of PMDs based on
> measured load of RX queues. Each PMD measures the processing load for each
> of its associated queues every 10 seconds. If the aggregated PMD load reaches
> 95% for 6 consecutive intervals then PMD considers itself to be overloaded.
> 
> If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
> performed by OVS main thread. The dry-run does NOT change the existing
> queue to PMD assignments.
> 
> If the resultant mapping of dry-run indicates an improved distribution
> of the load then the actual reassignment will be performed.
> 
> The automatic rebalancing will be disabled by default and has to be
> enabled via configuration option. The interval (in minutes) between
> two consecutive rebalancing can also be configured via CLI, default
> is 1 min.
> 
> Following example commands can be used to set the auto-lb params:
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebal-interval="5"
> 
> Co-authored-by: Rohith Basavaraja 
> Co-authored-by: Venkatesan Pradeep 
> Signed-off-by: Rohith Basavaraja 
> Signed-off-by: Venkatesan Pradeep 
> Signed-off-by: Nitin Katiyar 
> ---
>  Documentation/topics/dpdk/pmd.rst |  41 +
>  NEWS  |   1 +
>  lib/dpif-netdev.c | 378 
> ++
>  vswitchd/vswitch.xml  |  41 +
>  4 files changed, 461 insertions(+)
> 
> diff --git a/Documentation/topics/dpdk/pmd.rst 
> b/Documentation/topics/dpdk/pmd.rst
> index dd9172d..c273b40 100644
> --- a/Documentation/topics/dpdk/pmd.rst
> +++ b/Documentation/topics/dpdk/pmd.rst
> @@ -183,3 +183,44 @@ or can be triggered by using::
> In addition, the output of ``pmd-rxq-show`` was modified to include
> Rx queue utilization of the PMD as a percentage. Prior to this, tracking 
> of
> stats was not available.
> +
> +Automatic assignment of Port/Rx Queue to PMD Threads (experimental)
> +---
> +
> +Cycle or utilization based allocation of Rx queues to PMDs gives efficient
> +load distribution but it is not adaptive to change in traffic pattern 
> occuring
> +over the time. This causes uneven load among the PMDs which results in 
> overall
> +lower throughput.
> +
> +To address this automatic load balancing of PMDs can be set by::
> +
> +$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> +
> +If pmd-auto-lb is set to true AND cycle based assignment is enabled then auto
> +load balancing of PMDs is enabled provided there are 2 or more non-isolated
> +PMDs and at least one of these PMDs is polling more than one RX queue.
> +
> +Once auto load balancing is set, each non-isolated PMD measures the 
> processing
> +load for each of its associated queues every 10 seconds. If the aggregated 
> PMD
> +load reaches 95% for 6 consecutive intervals then PMD considers itself to be
> +overloaded.
> +
> +If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
> +performed by OVS main thread. The dry-run does NOT change the existing queue
> +to PMD assignments.
> +
> +If the resultant mapping of dry-run indicates an improved distribution of the
> +load then the actual reassignment will be performed.
> +
> +The minimum time between 2 consecutive PMD auto load balancing iterations can
> +also be configured by::
> +
> +$ ovs-vsctl set open_vswitch .\
> +other_config:pmd-auto-lb-rebal-interval=""
> +
> +where  is a value in minutes. The default interval is 1 minute
> +and setting it to 0 will also result in default value i.e. 1 min.
> +
> +A user can use this option to avoid frequent trigger of auto load balancing 
> of
> +PMDs. For e.g. set this (in min) such that it occurs once in few hours or a 
> day
> +or a week.
> diff --git a/NEWS b/NEWS
> index 2de844f..0e9fcb1 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -23,6 +23,7 @@ Post-v2.10.0
>   * Add option for simple round-robin based Rxq to PMD assignment.
> It can be set with pmd-rxq-assign.
>   * Add support for DPDK 18.11
> + * Add support for Auto load balancing of PMDs (experimental)
> - Add 'symmetric_l3' hash function.
> - OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured.
> - ovs-vswitchd:
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 1564db9..81e9cbf 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -80,6 +80,12 @@
>  
>  

[ovs-dev] [RFC 1/2] lib/tc: add set ipv4 dscp and ecn action offload via pedit

2019-01-11 Thread Pieter Jansen van Vuuren
Add setting of ipv4 dscp and ecn fields in tc offload using pedit.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Louis Peens 
Reviewed-by: Simon Horman 
---
 lib/netdev-tc-offloads.c | 6 +-
 lib/tc.c | 5 +
 lib/tc.h | 1 +
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-tc-offloads.c b/lib/netdev-tc-offloads.c
index 73ce7b952..90bd3c585 100644
--- a/lib/netdev-tc-offloads.c
+++ b/lib/netdev-tc-offloads.c
@@ -52,7 +52,7 @@ struct netlink_field {
 int size;
 };
 
-static struct netlink_field set_flower_map[][3] = {
+static struct netlink_field set_flower_map[][4] = {
 [OVS_KEY_ATTR_IPV4] = {
 { offsetof(struct ovs_key_ipv4, ipv4_src),
   offsetof(struct tc_flower_key, ipv4.ipv4_src),
@@ -66,6 +66,10 @@ static struct netlink_field set_flower_map[][3] = {
   offsetof(struct tc_flower_key, ipv4.rewrite_ttl),
   MEMBER_SIZEOF(struct tc_flower_key, ipv4.rewrite_ttl)
 },
+{ offsetof(struct ovs_key_ipv4, ipv4_tos),
+  offsetof(struct tc_flower_key, ipv4.rewrite_tos),
+  MEMBER_SIZEOF(struct tc_flower_key, ipv4.rewrite_tos)
+},
 },
 [OVS_KEY_ATTR_IPV6] = {
 { offsetof(struct ovs_key_ipv6, ipv6_src),
diff --git a/lib/tc.c b/lib/tc.c
index b19f075f2..1696c9836 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -91,6 +91,11 @@ static struct flower_key_to_pedit flower_pedit_map[] = {
 8,
 offsetof(struct tc_flower_key, ipv4.rewrite_ttl),
 MEMBER_SIZEOF(struct tc_flower_key, ipv4.rewrite_ttl)
+}, {
+TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
+1,
+offsetof(struct tc_flower_key, ipv4.rewrite_tos),
+MEMBER_SIZEOF(struct tc_flower_key, ipv4.rewrite_tos)
 }, {
 TCA_PEDIT_KEY_EX_HDR_TYPE_IP6,
 7,
diff --git a/lib/tc.h b/lib/tc.h
index 7196a32d7..04b08e298 100644
--- a/lib/tc.h
+++ b/lib/tc.h
@@ -103,6 +103,7 @@ struct tc_flower_key {
 ovs_be32 ipv4_src;
 ovs_be32 ipv4_dst;
 uint8_t rewrite_ttl;
+uint8_t rewrite_tos;
 } ipv4;
 struct {
 struct in6_addr ipv6_src;
-- 
2.17.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC 0/2] extend ovs-tc offload for more pedit action

2019-01-11 Thread Pieter Jansen van Vuuren
Hi all,

This set extends the ovs-tc pedit interface to allow setting ipv4
dscp and ecn fields as well as ipv6 traffic class in tc via pedit.
Patch 2 in this set also introduces the notion of boundary shifts
to allow translating non-byte-aligned fields like traffic class and
flow limit for IPv6 to be offloaded.

Pieter Jansen van Vuuren (2):
  lib/tc: add set ipv4 dscp and ecn action offload via pedit
  lib/tc: add set ipv6 traffic class action offload via pedit

 lib/byte-order.h | 18 +++
 lib/netdev-tc-offloads.c | 10 +-
 lib/tc.c | 66 +---
 lib/tc.h |  2 ++
 4 files changed, 78 insertions(+), 18 deletions(-)

-- 
2.17.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread Andrew Rybchenko

On 1/11/19 2:11 PM, David Marchand wrote:

On Fri, Jan 11, 2019 at 11:56 AM Ian Stokes 
mailto:ian.sto...@intel.com>> wrote:
On 1/10/2019 4:58 PM, Tiago Lam wrote:


+/* Validate the burst of packets for Tx. */
+nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);


So one of the gotchas here is that rte_eth_tx_prepare is experimental
although I didn't see any compilation issues (warnings etc.) and it
builds OK with travis.

Indeed, the documentation tells it is experimental, but the experimental tag 
has always been missing.
On the other hand, this api has been there since 17.02 and its form did not 
change afaics.
I am not sure it qualifies as experimental anymore.

I think that we should fix the documentation and remove experimental tag.

Andrew.
The information contained in this message is confidential and is intended for 
the addressee(s) only. If you have received this message in error, please 
notify the sender immediately and delete the message. Unless you are an 
addressee (or authorized to receive for an addressee), you may not use, copy or 
disclose to anyone this message or any information contained in this message. 
The unauthorized use, disclosure, copying or alteration of this message is 
strictly prohibited.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 06/11] dp-packet: Add support for data "linearization".

2019-01-11 Thread Ilya Maximets
Not a full review.
Comments inline.

Best regards, Ilya Maximets.

On 09.01.2019 21:05, Tiago Lam wrote:
> Previous commits have added support to the dp_packet API to handle
> multi-segmented packets, where data is not stored contiguously in
> memory. However, in some cases, it is inevitable and data must be
> provided contiguously. Examples of such cases are when performing csums
> over the entire packet data, or when write()'ing to a file descriptor
> (for a tap interface, for example). For such cases, the dp_packet API
> has been extended to provide a way to transform a multi-segmented
> DPBUF_DPDK packet into a DPBUF_MALLOC system packet (at the expense of a
> copy of memory). If the packet's data is already stored in memory
> contigously then there's no need to convert the packet.
> 
> Thus, the main use cases that were assuming that a dp_packet's data is
> always held contiguously in memory were changed to make use of the new
> "linear functions" in the dp_packet API when there's a need to traverse
> the entire's packet data. Per the example above, when the packet's data
> needs to be write() to the tap's file descriptor, or when the conntrack
> module needs to verify a packet's checksum, the data is now linearized.
> 
> Additionally, the miniflow_extract() function has been modified to check
> if the respective packet headers don't span across multiple mbufs. This
> requirement is needed to guarantee that callers can assume headers are
> always in contiguous memory.
> 
> Signed-off-by: Tiago Lam 
> ---
>  lib/bfd.c |   3 +-
>  lib/conntrack-private.h   |   4 +-
>  lib/conntrack.c   |  43 +--
>  lib/crc32c.c  |  35 --
>  lib/crc32c.h  |   2 +
>  lib/dp-packet.c   |  18 +++
>  lib/dp-packet.h   | 267 
> ++
>  lib/dpif-netdev.c |  13 +-
>  lib/dpif-netlink.c|   5 +
>  lib/dpif.c|   9 ++
>  lib/flow.c|  97 ---
>  lib/flow.h|   2 +-
>  lib/mcast-snooping.c  |   2 +
>  lib/netdev-bsd.c  |   5 +
>  lib/netdev-dummy.c|  13 +-
>  lib/netdev-linux.c|  13 +-
>  lib/netdev-native-tnl.c   |  26 ++--
>  lib/odp-execute.c |  12 +-
>  lib/ovs-lldp.c|   3 +-
>  lib/packets.c |  85 --
>  lib/packets.h |   7 ++
>  ofproto/ofproto-dpif-upcall.c |  20 +++-
>  ofproto/ofproto-dpif-xlate.c  |  32 -
>  tests/test-rstp.c |   8 +-
>  tests/test-stp.c  |   8 +-
>  25 files changed, 579 insertions(+), 153 deletions(-)
> 

[snip]

> +/* Linearizes the data on packet 'b', by copying the data into system's 
> memory.
> + * After this the packet is effectively a DPBUF_MALLOC packet. The caller is
> + * responsible * for ensuring 'b' needs linearization, by calling
> + * dp_packet_is_linear().
> + *
> + * This is an expensive operation which should only be performed as a last
> + * resort, when multi-segments are under use but data must be accessed
> + * linearly. */
> +static inline void
> +dp_packet_linearize(struct dp_packet *b)
> +{
> +struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, >mbuf);
> +struct dp_packet *pkt = CONST_CAST(struct dp_packet *, b);
> +uint32_t pkt_len = dp_packet_size(pkt);
> +struct mbuf_state *mstate = NULL;
> +void *dst = xmalloc(pkt_len);
> +
> +/* Copy packet's data to system's memory */
> +if (!rte_pktmbuf_read(mbuf, 0, pkt_len, dst)) {

'dst' leak.

> +return;
> +}
> +
> +/* Free all mbufs except for the first */
> +dp_packet_clear(pkt);
> +
> +/* Save mbuf's buf_addr to restore later */
> +mstate = xmalloc(sizeof(*mstate));
> +mstate->addr = pkt->mbuf.buf_addr;
> +mstate->len = pkt->mbuf.buf_len;
> +mstate->off = pkt->mbuf.data_off;
> +pkt->mstate = mstate;
> +
> +/* Tranform DPBUF_DPDK packet into a DPBUF_MALLOC packet */
> +pkt->source = DPBUF_MALLOC;
> +pkt->mbuf.buf_addr = dst;
> +pkt->mbuf.buf_len = pkt_len;
> +pkt->mbuf.data_off = 0;
> +dp_packet_set_size(pkt, pkt_len);
> +}
>  #else /* DPDK_NETDEV */
>  static inline bool
>  dp_packet_equal(const struct dp_packet *a, const struct dp_packet *b)
> @@ -945,6 +1050,24 @@ dp_packet_has_flow_mark(struct dp_packet *p OVS_UNUSED,
>  {
>  return false;
>  }
> +

[snip]

> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 1564db9..1f1188d 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -5710,6 +5710,11 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, 
> struct dp_packet *packet_,
>  .support = dp_netdev_support,
>  };
>  
> +/* Gather the whole data for printing the packet (if debug enabled) 
> */
> +if (!dp_packet_is_linear(packet_)) {
> +dp_packet_linearize(packet_);
> +}
> +
> 

Re: [ovs-dev] [PATCH] vconn: Fix using of uninitialized deadline.

2019-01-11 Thread Kevin Traynor
On 01/11/2019 08:09 AM, Ilya Maximets wrote:
> Typo introduced while making minor refactoring before applying the
> patch.
> 
> Fixes logic and the clang build:
> 
>   lib/vconn.c:707:47: error:
>   variable 'deadline' is uninitialized when
>   used within its own initialization [-Werror,-Wuninitialized]
>   ? time_msec() + deadline
>   ^~~~
> 
> Fixes: 04895042e9f6 ("vconn: Allow timeout configuration for blocking 
> connection.")
> Signed-off-by: Ilya Maximets 

Acked-by: Kevin Traynor 

> ---
>  lib/vconn.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/vconn.c b/lib/vconn.c
> index 40ebc5818..403461662 100644
> --- a/lib/vconn.c
> +++ b/lib/vconn.c
> @@ -704,7 +704,7 @@ int
>  vconn_connect_block(struct vconn *vconn, long long int timeout)
>  {
>  long long int deadline = (timeout >= 0
> -  ? time_msec() + deadline
> +  ? time_msec() + timeout
>: LLONG_MAX);
>  
>  int error;
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread David Marchand
On Fri, Jan 11, 2019 at 11:56 AM Ian Stokes  wrote:

> On 1/10/2019 4:58 PM, Tiago Lam wrote:
>
> > +/* Validate the burst of packets for Tx. */
> > +nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>
> So one of the gotchas here is that rte_eth_tx_prepare is experimental
> although I didn't see any compilation issues (warnings etc.) and it
> builds OK with travis.
>

Indeed, the documentation tells it is experimental, but the experimental
tag has always been missing.
On the other hand, this api has been there since 17.02 and its form did not
change afaics.
I am not sure it qualifies as experimental anymore.

CC Ferruh, Andrew and Thomas.


-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-11 Thread Ian Stokes

On 1/10/2019 4:58 PM, Tiago Lam wrote:

Given that multi-segment mbufs might be sent between interfaces that
support different capabilities, and may even support different layouts
of mbufs, outgoing packets should be validated before sent on the egress
interface. Thus, netdev_dpdk_eth_tx_burst() now calls DPDK's
rte_eth_tx_prepare() function, if and only multi-segments is enbaled, in
order to validate the following (taken from the DPDK documentation), on
a device specific manner:
- Check if packet meets devices requirements for tx offloads.
- Check limitations about number of segments.
- Check additional requirements when debug is enabled.
- Update and/or reset required checksums when tx offload is set for
packet.



Thanks Tiago comments below.


Signed-off-by: Tiago Lam 
---
  lib/netdev-dpdk.c | 21 +++--
  1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index d6114ee..77d04fc 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2029,6 +2029,10 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
  
  /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes ownership of

   * 'pkts', even in case of failure.
+ * In case multi-segment mbufs / TSO is being used, it also prepares. In such
+ * cases, only the prepared packets will be sent to Tx burst, meaning that if
+ * an invalid packet appears in 'pkts'[3] only the validated packets in indices
+ * 0, 1 and 2 will be sent.
   *
   * Returns the number of packets that weren't transmitted. */
  static inline int
@@ -2036,11 +2040,24 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int 
qid,
   struct rte_mbuf **pkts, int cnt)
  {
  uint32_t nb_tx = 0;
+uint16_t nb_prep = cnt;
  
-while (nb_tx != cnt) {

+if (dpdk_multi_segment_mbufs) {
As this feature would be experimental and not enabled, by default this 
would be false.


Would it make sense to deal with the default non multiseg case first?

Extra checks in the datapath can be costly and I'd like to minimize any 
impact for the non multi seg traffic which is the default use case 
currently. Possibly checking for !dpdk_multi_segment_mbufs first? Or 
possibly the use of OVS_LIKELY/OVS_UNLIKELY. Have you given thought to this?



+/* Validate the burst of packets for Tx. */
+nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);


So one of the gotchas here is that rte_eth_tx_prepare is experimental 
although I didn't see any compilation issues (warnings etc.) and it 
builds OK with travis.


Possibly a comment above it to explain it's currently experimental and 
subject to change in DPDK would be useful, can be removed once the api 
is non experimental in DPDK. It would be one of the conditions to remove 
the experimental tag for TSO in OVS DPDK I suspect in the future.


Ian

+if (nb_prep != cnt) {
+VLOG_WARN_RL(, "%s: Preparing packet tx burst failed (%u/%u "
+ "packets valid): %s", dev->up.name, nb_prep, cnt,
+ rte_strerror(rte_errno));
+}
+}
+
+/* Tx the validated burst of packets only. */
+while (nb_tx != nb_prep) {
  uint32_t ret;
  
-ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, cnt - nb_tx);

+ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx,
+   nb_prep - nb_tx);
  if (!ret) {
  break;
  }



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] Monitor Database table to manage lifecycle of IDL client.

2019-01-11 Thread Numan Siddique
On Fri, Jan 4, 2019 at 5:26 PM Numan Siddique  wrote:

> Hi Ted,
>
> This patch is failing the below test cases for me. All are python3
> related. Can you please
> take a look if that's the case with you as well ?
> **
> 2139: simple idl, writing via IDL with unicode - Python3 FAILED (
> ovsdb-idl.at:456)
> 2141: simple idl, writing via IDL with unicode - Python3 -
> register_columns FAILED (ovsdb-idl.at:456)
> 2143: simple idl, writing via IDL with unicode - Python3 - tcp FAILED (
> ovsdb-idl.at:456)
> 2145: simple idl, writing via IDL with unicode - Python3 (multiple
> remotes) - tcp FAILED (ovsdb-idl.at:456)
> 2147: simple idl, writing via IDL with unicode - Python3 - tcp6 FAILED (
> ovsdb-idl.at:456)
> 2149: simple idl, writing via IDL with unicode - Python3 (multiple
> remotes) - tcp6 FAILED (ovsdb-idl.at:456)
> 2151: simple idl, writing via IDL with unicode - Python3 - SSL FAILED (
> ovsdb-idl.at:456)
> 2153: simple idl, writing large data via IDL with unicode - Python3 FAILED
> (ovsdb-idl.at:490)
> **
>

These failures are not related to this patch. Please ignore. There seems to
be some issues with python3.7 with unicode
in Fedora 29 and I think Timothy  is planning to submit a patch to fix
these failures.

Thanks
Numan


> Please see some comments below.
>
> Thanks
> Numan
>
> On Fri, Jan 4, 2019 at 7:47 AM Ted Elhourani 
> wrote:
>
>> The Python IDL implementation supports ovsdb cluster connections.
>> This patch is a follow up to commit 31e434fc98, it adds the option of
>> connecting to the leader (the default) in the Raft-based cluster. It
>> mimics
>> the exisiting C IDL support for clusters introduced in commit 1b1d2e6daa.
>>
>> The _Server database schema is first requested, then a monitor of the
>> Database table in the _Server Database. Method __check_server_db verifies
>> the eligibility of the server. If the attempt to obtain a monitor of the
>> _Server database fails and a cluster id was not provided this
>> implementation
>> proceeds to request the data monitor. If a cluster id was provided via the
>> set_cluster_id method then the connection is aborted and a connection to a
>> different node is instead attempted, until a valid cluster node is found.
>> Thus, when supplied, cluster id is interpreted as the intention to only
>> allow connections to a clustered database. If not supplied, connections to
>> standalone nodes, or nodes that do not have the _Server database are
>> allowed. change_seqno is not incremented in the case of Database table
>> updates.
>> Signed-off-by: Ted Elhourani 
>> ---
>>  python/ovs/db/idl.py| 217
>> 
>>  python/ovs/reconnect.py |   3 +
>>  tests/ovsdb-idl.at  |  62 +++---
>>  3 files changed, 237 insertions(+), 45 deletions(-)
>>
>> diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
>> index 250e897..f989548 100644
>> --- a/python/ovs/db/idl.py
>> +++ b/python/ovs/db/idl.py
>> @@ -38,6 +38,7 @@ ROW_DELETE = "delete"
>>  OVSDB_UPDATE = 0
>>  OVSDB_UPDATE2 = 1
>>
>> +CLUSTERED = "clustered"
>>
>>  class Idl(object):
>>  """Open vSwitch Database Interface Definition Language (OVSDB IDL).
>> @@ -92,10 +93,12 @@ class Idl(object):
>>  """
>>
>>  IDL_S_INITIAL = 0
>> -IDL_S_MONITOR_REQUESTED = 1
>> -IDL_S_MONITOR_COND_REQUESTED = 2
>> +IDL_S_SERVER_SCHEMA_REQUESTED = 1
>> +IDL_S_SERVER_MONITOR_REQUESTED = 2
>> +IDL_S_DATA_MONITOR_REQUESTED = 3
>> +IDL_S_DATA_MONITOR_COND_REQUESTED = 4
>>
>> -def __init__(self, remote, schema_helper, probe_interval=None):
>> +def __init__(self, remote, schema_helper, leader_only=True,
>> probe_interval=None):
>>
>
> Can you please add the "leader_only" arg at the end. Older clients of
> python idl would
> break if they were creating Idl object as
> my_idl = Idl(remote, helper, 1000)
>
>
>  """Creates and returns a connection to the database named
>> 'db_name' on
>>  'remote', which should be in a form acceptable to
>>  ovs.jsonrpc.session.open().  The connection will maintain an
>> in-memory
>> @@ -119,6 +122,9 @@ class Idl(object):
>>
>>  The IDL uses and modifies 'schema' directly.
>>
>> +If 'leader_only' is set to True (default value) the IDL will
>> only monitor
>> +and transact with the leader of the cluster.
>> +
>>  If "probe_interval" is zero it disables the connection keepalive
>>  feature. If non-zero the value will be forced to at least 1000
>>  milliseconds. If None it will just use the default value in OVS.
>> @@ -137,6 +143,20 @@ class Idl(object):
>>  self._last_seqno = None
>>  self.change_seqno = 0
>>  self.uuid = uuid.uuid1()
>> +
>> +# Server monitor.
>> +self._server_schema_request_id = None
>> +self._server_monitor_request_id = None
>> +self._db_change_aware_request_id = None
>> +self._server_db_name = '_Server'
>> +self._server_db_table = 'Database'

Re: [ovs-dev] [PATCH v2 0/3] flake8 fixes

2019-01-11 Thread Numan Siddique
On Fri, Jan 11, 2019 at 4:54 AM Ben Pfaff  wrote:

> This is needed to make OVS work with flake8 3.6.
>
>
Tested all the patches in this series on fedora 29 with flake8 3.7 and it
fixes all
the flake8 issues.

Ack for the whole series.

Acked-by: Numan Siddique 
Tested-by: Numan Siddique 

Thanks
Numan


> v1->v2:
>- Added a few more fixes to patch 1.
>- Patch 3 is new.
>
> Ben Pfaff (3):
>   python: Fix invalid escape sequences.
>   python: Disable flake8 warning W504 "line break after binary
> operator".
>   python: Avoid flake8 warning for unused variables.
>
>  Makefile.am|  3 ++-
>  python/build/nroff.py  |  4 ++--
>  python/ovs/db/schema.py|  2 +-
>  python/ovs/json.py |  4 ++--
>  python/ovs/socket_util.py  |  2 +-
>  python/ovs/stream.py   |  2 +-
>  python/ovs/unixctl/__init__.py |  2 +-
>  python/ovs/util.py |  2 +-
>  python/ovs/vlog.py |  2 +-
>  tests/test-ovsdb.py|  2 +-
>  utilities/checkpatch.py| 24 
>  utilities/ovs-dev.py   |  2 +-
>  12 files changed, 26 insertions(+), 25 deletions(-)
>
> --
> 2.16.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 06/11] dp-packet: Add support for data "linearization".

2019-01-11 Thread Lam, Tiago
On 11/01/2019 01:34, Darrell Ball wrote:
> thanks; not a full review

Thanks, responses in-line.

> 
> On Wed, Jan 9, 2019 at 10:06 AM Tiago Lam  > wrote:
> 
> Previous commits have added support to the dp_packet API to handle
> multi-segmented packets, where data is not stored contiguously in
> memory. However, in some cases, it is inevitable and data must be
> provided contiguously. Examples of such cases are when performing csums
> over the entire packet data, or when write()'ing to a file descriptor
> (for a tap interface, for example). For such cases, the dp_packet API
> has been extended to provide a way to transform a multi-segmented
> DPBUF_DPDK packet into a DPBUF_MALLOC system packet (at the expense of a
> copy of memory). If the packet's data is already stored in memory
> contigously then there's no need to convert the packet.
> 
> Thus, the main use cases that were assuming that a dp_packet's data is
> always held contiguously in memory were changed to make use of the new
> "linear functions" in the dp_packet API when there's a need to traverse
> the entire's packet data. Per the example above, when the packet's data
> needs to be write() to the tap's file descriptor, or when the conntrack
> module needs to verify a packet's checksum, the data is now linearized.
> 
> Additionally, the miniflow_extract() function has been modified to check
> if the respective packet headers don't span across multiple mbufs. This
> requirement is needed to guarantee that callers can assume headers are
> always in contiguous memory.
> 
> Signed-off-by: Tiago Lam  >
> ---
>  lib/bfd.c                     |   3 +-
>  lib/conntrack-private.h       |   4 +-
>  lib/conntrack.c               |  43 +--
>  lib/crc32c.c                  |  35 --
>  lib/crc32c.h                  |   2 +
>  lib/dp-packet.c               |  18 +++
>  lib/dp-packet.h               | 267
> ++
>  lib/dpif-netdev.c             |  13 +-
>  lib/dpif-netlink.c            |   5 +
>  lib/dpif.c                    |   9 ++
>  lib/flow.c                    |  97 ---
>  lib/flow.h                    |   2 +-
>  lib/mcast-snooping.c          |   2 +
>  lib/netdev-bsd.c              |   5 +
>  lib/netdev-dummy.c            |  13 +-
>  lib/netdev-linux.c            |  13 +-
>  lib/netdev-native-tnl.c       |  26 ++--
>  lib/odp-execute.c             |  12 +-
>  lib/ovs-lldp.c                |   3 +-
>  lib/packets.c                 |  85 --
>  lib/packets.h                 |   7 ++
>  ofproto/ofproto-dpif-upcall.c |  20 +++-
>  ofproto/ofproto-dpif-xlate.c  |  32 -
>  tests/test-rstp.c             |   8 +-
>  tests/test-stp.c              |   8 +-
>  25 files changed, 579 insertions(+), 153 deletions(-)
> 
> diff --git a/lib/bfd.c b/lib/bfd.c
> index cc8c685..12e076a 100644
> --- a/lib/bfd.c
> +++ b/lib/bfd.c
> @@ -721,7 +721,8 @@ bfd_process_packet(struct bfd *bfd, const struct
> flow *flow,
>      if (!msg) {
>          VLOG_INFO_RL(, "%s: Received too-short BFD control
> message (only "
>                       "%"PRIdPTR" bytes long, at least %d required).",
> -                     bfd->name, (uint8_t *) dp_packet_tail(p) - l7,
> +                     bfd->name, dp_packet_size(p) -
> +                     (l7 - (uint8_t *) dp_packet_data(p)),
>                       BFD_PACKET_LEN);
>          goto out;
>      }
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index a344801..1be2df1 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -159,8 +159,8 @@ tcp_payload_length(struct dp_packet *pkt)
>  {
>      const char *tcp_payload = dp_packet_get_tcp_payload(pkt);
>      if (tcp_payload) {
> -        return ((char *) dp_packet_tail(pkt) -
> dp_packet_l2_pad_size(pkt)
> -                - tcp_payload);
> +        return dp_packet_l4_size(pkt) -
> +               (tcp_payload - (char *) dp_packet_l4(pkt));
> 
> 
> why was this change needed ?

`dp_packet_get_tcp_payload()` already makes sure the payload is within
the a segment, returning NULL otherwise. So, while this change itself
isn't needed to make this code path correct, it helps with readability
in the future now with the adoption with multi-segment mbufs. Above, the
resulting addresses of `dp_packet_tail(pkt)` and `tcp_payload` may not
be in the same segment when using multi-segment mbufs.

>  
> 
>      } else {
>          return 0;
>      }
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 6f6021a..0dd2dcc 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -636,12 +636,22 

[ovs-dev] [PATCH] ovs-actions: Remove unneded unicode symbols.

2019-01-11 Thread Ilya Maximets
Fixes manpage-check warnings on FreeBSD 11.2:

lib/ovs-actions.7:1389: warning: invalid input character code 128
lib/ovs-actions.7:1389: warning: invalid input character code 128
lib/ovs-actions.7:1389: warning: can't find character with input code 144

Fixes: be51cd417343 ("ovs-actions: New document describing OVS actions in 
detail.")
Signed-off-by: Ilya Maximets 
---
 lib/ovs-actions.xml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/ovs-actions.xml b/lib/ovs-actions.xml
index f285d2607..fec0b95fe 100644
--- a/lib/ovs-actions.xml
+++ b/lib/ovs-actions.xml
@@ -2419,11 +2419,11 @@ table=1,in_port=2,ip,ct_state=+trk+est,action=1
 symmetric_l4
 
   Hashes Ethernet source, destination, and type, VLAN ID, IPv4/IPv6
-  source, destination, and proto‐ col, and TCP or SCTP (but not UDP)
+  source, destination, and protocol, and TCP or SCTP (but not UDP)
   ports.  The hash is computed so that pairs of corresponding flows in
   each direction hash to the same value, in environments where L2 paths
   are the same in each direction.  UDP ports are not included in the
-  hash to support protocols such as VXLAN that use asym‐ metric ports
+  hash to support protocols such as VXLAN that use asymmetric ports
   in each direction.
 
 
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Support for match & set ICMPv6 reserved and options type fields

2019-01-11 Thread 0-day Robot
Bleep bloop.  Greetings Vishal Deep Ajmera, I am a robot and I have tried out 
your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 83 characters long (recommended limit is 79)
#502 FILE: lib/odp-util.c:2504:
[OVS_KEY_ATTR_ND_EXTENSIONS] = { .len = sizeof(struct 
ovs_key_nd_extensions) },

Lines checked: 780, Warnings: 1, Errors: 0


build:
mv -f ofproto/.deps/ofproto_libofproto_la-ofproto-dpif.Tpo 
ofproto/.deps/ofproto_libofproto_la-ofproto-dpif.Plo
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. 
   -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall 
-Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security 
-Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align 
-Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes 
-Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror   -g 
-O2 -MT ofproto/ofproto_libofproto_la-ofproto-dpif-ipfix.lo -MD -MP -MF 
ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-ipfix.Tpo -c -o 
ofproto/ofproto_libofproto_la-ofproto-dpif-ipfix.lo `test -f 
'ofproto/ofproto-dpif-ipfix.c' || echo './'`ofproto/ofproto-dpif-ipfix.c
libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include 
-I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare 
-Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter 
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition 
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow 
-Werror -Werror -g -O2 -MT ofproto/ofproto_libofproto_la-ofproto-dpif-ipfix.lo 
-MD -MP -MF ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-ipfix.Tpo -c 
ofproto/ofproto-dpif-ipfix.c -o 
ofproto/ofproto_libofproto_la-ofproto-dpif-ipfix.o
mv -f ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-ipfix.Tpo 
ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-ipfix.Plo
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. 
   -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall 
-Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security 
-Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align 
-Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes 
-Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror   -g 
-O2 -MT ofproto/ofproto_libofproto_la-ofproto-dpif-mirror.lo -MD -MP -MF 
ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-mirror.Tpo -c -o 
ofproto/ofproto_libofproto_la-ofproto-dpif-mirror.lo `test -f 
'ofproto/ofproto-dpif-mirror.c' || echo './'`ofproto/ofproto-dpif-mirror.c
libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include 
-I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare 
-Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter 
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition 
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow 
-Werror -Werror -g -O2 -MT ofproto/ofproto_libofproto_la-ofproto-dpif-mirror.lo 
-MD -MP -MF ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-mirror.Tpo -c 
ofproto/ofproto-dpif-mirror.c -o 
ofproto/ofproto_libofproto_la-ofproto-dpif-mirror.o
mv -f ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-mirror.Tpo 
ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-mirror.Plo
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. 
   -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall 
-Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security 
-Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align 
-Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes 
-Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror   -g 
-O2 -MT ofproto/ofproto_libofproto_la-ofproto-dpif-monitor.lo -MD -MP -MF 
ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-monitor.Tpo -c -o 
ofproto/ofproto_libofproto_la-ofproto-dpif-monitor.lo `test -f 
'ofproto/ofproto-dpif-monitor.c' || echo './'`ofproto/ofproto-dpif-monitor.c
libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include 
-I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare 
-Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter 
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition 
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow 
-Werror -Werror -g -O2 -MT 
ofproto/ofproto_libofproto_la-ofproto-dpif-monitor.lo -MD -MP -MF 
ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-monitor.Tpo -c 
ofproto/ofproto-dpif-monitor.c -o 
ofproto/ofproto_libofproto_la-ofproto-dpif-monitor.o
mv -f ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-monitor.Tpo 
ofproto/.deps/ofproto_libofproto_la-ofproto-dpif-monitor.Plo
/bin/sh 

[ovs-dev] [PATCH] vconn: Fix using of uninitialized deadline.

2019-01-11 Thread Ilya Maximets
Typo introduced while making minor refactoring before applying the
patch.

Fixes logic and the clang build:

  lib/vconn.c:707:47: error:
  variable 'deadline' is uninitialized when
  used within its own initialization [-Werror,-Wuninitialized]
  ? time_msec() + deadline
  ^~~~

Fixes: 04895042e9f6 ("vconn: Allow timeout configuration for blocking 
connection.")
Signed-off-by: Ilya Maximets 
---
 lib/vconn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/vconn.c b/lib/vconn.c
index 40ebc5818..403461662 100644
--- a/lib/vconn.c
+++ b/lib/vconn.c
@@ -704,7 +704,7 @@ int
 vconn_connect_block(struct vconn *vconn, long long int timeout)
 {
 long long int deadline = (timeout >= 0
-  ? time_msec() + deadline
+  ? time_msec() + timeout
   : LLONG_MAX);
 
 int error;
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev