Re: [ovs-dev] [PATCH v2 1/1] netdev-dpdk: Fix egress policer error detection bug.

2016-08-09 Thread Daniele Di Proietto
2016-08-09 10:20 GMT-07:00 Ian Stokes :

> When egress policer is set as a QoS type for a port, an error may occur
> during
> setup if incorrect parameters are used for the rte_meter. If this occurs
> the egress policer construct and set functions should free any allocated
> memory relevant to the policer and set the QoS configuration pointer to
> null. The netdev_dpdk_set_qos function should check the error value
> returned
> for any QoS construct/set calls with an assertion to avoid segfault.
> Also this commit modifies egress_policer_qos_set() to correctly lock the
> QoS
> spinlock while the egress policer configuration is updated to avoid
> segfault.
>
> Signed-off-by: Ian Stokes 
> ---
> v2
> * netdev-dpdk.c
> - Simplify assertion in netdev_dpdk_set_qos() to check that no error
>   has been returned and that a QoS configuration exists before checking
>   and logging an error.
> - Use rte_strerror  in netdev_dpdk_set_qos() when logging error for a
>   textual representation.
> - Align VLOG message for correct formatting in netdev_dpdk_set_qos().
> - egress_policer_qos_construct() now returns positive error.
> - egress_policer_qos_set() now return positive error.
> - Document addition of spinlock in egress_policer_qos_set() in commit
>   message.
> ---
>  lib/netdev-dpdk.c |   30 --
>  1 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index bf3a898..f37130e 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2731,11 +2731,16 @@ netdev_dpdk_set_qos(struct netdev *netdev,
>
>  /* Install new QoS configuration. */
>  error = new_ops->qos_construct(netdev, details);
> -ovs_assert((error == 0) == (dev->qos_conf != NULL));
>  }
>  } else {
>  error = new_ops->qos_construct(netdev, details);
> -ovs_assert((error == 0) == (dev->qos_conf != NULL));
> +}
> +
> +ovs_assert((error == 0) == (dev->qos_conf != NULL));
> +if (error) {
> +VLOG_ERR("Failed to set QoS type %s on port %s, returned error:
> %s",
> + type, netdev->name, rte_strerror(-error));
> +ovs_assert(dev->qos_conf == NULL);
>

This assert should be unnecessary, given the assert above the if.

I removed it and I pushed this to master, thanks
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] INSTALL.DPDK-ADVANCED: Add vhost multiqueue loopback testcase.

2016-08-09 Thread Daniele Di Proietto
Applied to master, thanks

2016-07-28 5:48 GMT-07:00 Bhanuprakash Bodireddy <
bhanuprakash.bodire...@intel.com>:

> Add steps for loopback test using vhost-user configured with multiqueue
> doing packet forwarding in kernel.
>
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  INSTALL.DPDK-ADVANCED.md | 86 ++
> ++
>  1 file changed, 86 insertions(+)
>
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 9ae536d..63440d0 100644
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -372,6 +372,91 @@ For users wanting to do packet forwarding using
> kernel stack below are the steps
> where "-n 0" refers to ring '0' i.e dpdkr0
> ```
>
> +### 5.3 PHY-VM-PHY [VHOST MULTIQUEUE]
> +
> +  The steps (1-5) in 3.3 section of [INSTALL DPDK] guide will create &
> initialize DB,
> +  start vswitchd and add dpdk devices to bridge br0.
> +
> +  1. Configure PMD and RXQs. For example set no. of dpdk port rx queues
> to atleast 2.
> + The number of rx queues at vhost-user interface gets automatically
> configured after
> + virtio device connection and doesn't need manual configuration.
> +
> + ```
> + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=c
> + ovs-vsctl set Interface dpdk0 options:n_rxq=2
> + ovs-vsctl set Interface dpdk1 options:n_rxq=2
> + ```
> +
> +  2. Instantiate Guest VM using Qemu cmdline
> +
> +   Guest Configuration
> +
> +   ```
> +   | configuration| values | comments
> +   |--||-
> +   | qemu version | 2.5.0  |
> +   | qemu thread affinity |2 cores | taskset 0x30
> +   | memory   | 4GB| -
> +   | cores| 2  | -
> +   | Qcow2 image  |Fedora22| -
> +   | multiqueue   |   on   | -
> +   ```
> +
> +   Instantiate Guest
> +
> +   ```
> +   export VM_NAME=vhost-vm
> +   export GUEST_MEM=4096M
> +   export QCOW2_IMAGE=/root/Fedora22_x86_64.qcow2
> +   export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch
> +
> +   taskset 0x30 qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive
> file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object
> memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on
> -numa node,memdev=mem -mem-prealloc -chardev 
> socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0
> -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2
> -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6
> -chardev socket,id=char2,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev
> type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=2 -device
> virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=6
> +   ```
> +
> +   Note: Queue value above should match the queues configured in OVS,
> The vector value
> +   should be set to 'no. of queues x 2 + 2'.
> +
> +  3. Guest interface configuration
> +
> + Assuming there are 2 interfaces in the guest named eth0, eth1 check
> the channel
> + configuration and set the number of combined channels to 2 for
> virtio devices.
> + More information can be found in [Vhost walkthrough] section.
> +
> +   ```
> +   ethtool -l eth0
> +   ethtool -L eth0 combined 2
> +   ethtool -L eth1 combined 2
> +   ```
> +
> +  4. Kernel Packet forwarding
> +
> + Configure IP and enable interfaces
> +
> + ```
> + ifconfig eth0 5.5.5.1/24 up
> + ifconfig eth1 90.90.90.1/24 up
> + ```
> +
> + Configure IP forwarding and add route entries
> +
> + ```
> + sysctl -w net.ipv4.ip_forward=1
> + sysctl -w net.ipv4.conf.all.rp_filter=0
> + sysctl -w net.ipv4.conf.eth0.rp_filter=0
> + sysctl -w net.ipv4.conf.eth1.rp_filter=0
> + ip route add 2.1.1.0/24 dev eth1
> + route add default gw 2.1.1.2 eth1
> + route add default gw 90.90.90.90 eth1
> + arp -s 90.90.90.90 DE:AD:BE:EF:CA:FE
> + arp -s 2.1.1.2 DE:AD:BE:EF:CA:FA
> + ```
> +
> + Check traffic on multiple queues
> +
> + ```
> + cat /proc/interrupts | grep virtio
> + ```
> +
>  ##  6. Vhost Walkthrough
>
>  DPDK 16.04 supports two types of vhost:
> @@ -848,5 +933,6 @@ Please report problems to b...@openvswitch.org.
>  [DPDK Docs]: http://dpdk.org/doc
>  [libvirt]: http://libvirt.org/formatdomain.html
>  [Guest VM using libvirt]: INSTALL.DPDK.md#ovstc
> +[Vhost walkthrough]: INSTALL.DPDK.md#vhost
>  [INSTALL DPDK]: INSTALL.DPDK.md#build
>  [INSTALL OVS]: INSTALL.DPDK.md#build
> --
> 2.4.11
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] Revert "pvector: Expose non-concurrent priority vector."

2016-08-09 Thread Daniele Di Proietto
Simple revert, looks good to me, thanks

Acked-by: Daniele Di Proietto <diproiet...@vmware.com>


2016-08-09 13:59 GMT-07:00 Jarno Rajahalme <ja...@ovn.org>:

> This reverts commit 8bdfe1313894047d44349fa4cf4402970865950f.
>
> I failed to see that lib/dpif-netdev.c actually needs the concurrency
> provided by pvector prior to this change.  More specifically, when a
> subtable is removed, concurrent lookups may skip over another subtable
> swapped in to the place of the removed subtable in the vector.
>
> Since this was the only use of the non-concurrent pvector, it is
> cleaner to revert the whole patch.
>
> Reported-by: Jan Scheurich <jan.scheur...@ericsson.com>
> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> ---
>  lib/classifier.c|  30 
>  lib/classifier.h|   6 +-
>  lib/dpif-netdev.c   |  14 ++--
>  lib/pvector.c   | 190 +++---
> --
>  lib/pvector.h   | 187 +++---
> -
>  tests/test-classifier.c |  12 +--
>  6 files changed, 182 insertions(+), 257 deletions(-)
>
> diff --git a/lib/classifier.c b/lib/classifier.c
> index 8f195d5..0551146 100644
> --- a/lib/classifier.c
> +++ b/lib/classifier.c
> @@ -325,7 +325,7 @@ classifier_init(struct classifier *cls, const uint8_t
> *flow_segments)
>  {
>  cls->n_rules = 0;
>  cmap_init(>subtables_map);
> -cpvector_init(>subtables);
> +pvector_init(>subtables);
>  cls->n_flow_segments = 0;
>  if (flow_segments) {
>  while (cls->n_flow_segments < CLS_MAX_INDICES
> @@ -359,7 +359,7 @@ classifier_destroy(struct classifier *cls)
>  }
>  cmap_destroy(>subtables_map);
>
> -cpvector_destroy(>subtables);
> +pvector_destroy(>subtables);
>  }
>  }
>
> @@ -658,20 +658,20 @@ classifier_replace(struct classifier *cls, const
> struct cls_rule *rule,
>  if (n_rules == 1) {
>  subtable->max_priority = rule->priority;
>  subtable->max_count = 1;
> -cpvector_insert(>subtables, subtable, rule->priority);
> +pvector_insert(>subtables, subtable, rule->priority);
>  } else if (rule->priority == subtable->max_priority) {
>  ++subtable->max_count;
>  } else if (rule->priority > subtable->max_priority) {
>  subtable->max_priority = rule->priority;
>  subtable->max_count = 1;
> -cpvector_change_priority(>subtables, subtable,
> rule->priority);
> +pvector_change_priority(>subtables, subtable,
> rule->priority);
>  }
>
>  /* Nothing was replaced. */
>  cls->n_rules++;
>
>  if (cls->publish) {
> -cpvector_publish(>subtables);
> +pvector_publish(>subtables);
>  }
>
>  return NULL;
> @@ -803,12 +803,12 @@ check_priority:
>  }
>  }
>  subtable->max_priority = max_priority;
> -cpvector_change_priority(>subtables, subtable,
> max_priority);
> +pvector_change_priority(>subtables, subtable,
> max_priority);
>  }
>  }
>
>  if (cls->publish) {
> -cpvector_publish(>subtables);
> +pvector_publish(>subtables);
>  }
>
>  /* free the rule. */
> @@ -959,8 +959,8 @@ classifier_lookup__(const struct classifier *cls,
> ovs_version_t version,
>
>  /* Main loop. */
>  struct cls_subtable *subtable;
> -CPVECTOR_FOR_EACH_PRIORITY (subtable, hard_pri + 1, 2, sizeof
> *subtable,
> ->subtables) {
> +PVECTOR_FOR_EACH_PRIORITY (subtable, hard_pri + 1, 2, sizeof
> *subtable,
> +   >subtables) {
>  struct cls_conjunction_set *conj_set;
>
>  /* Skip subtables with no match, or where the match is
> lower-priority
> @@ -1231,8 +1231,8 @@ classifier_rule_overlaps(const struct classifier
> *cls,
>  struct cls_subtable *subtable;
>
>  /* Iterate subtables in the descending max priority order. */
> -CPVECTOR_FOR_EACH_PRIORITY (subtable, target->priority, 2,
> -sizeof(struct cls_subtable),
> >subtables) {
> +PVECTOR_FOR_EACH_PRIORITY (subtable, target->priority, 2,
> +   sizeof(struct cls_subtable),
> >subtables) {
>  struct {
>  struct minimask mask;
>  uint64_t storage[FLOW_U64S];
> @@ -1350,8 +1350,8 @@ cls_cursor_start(const struct classifier *cls, const
> struct cls_rule *target,
>  cursor.

Re: [ovs-dev] netdev-dpdk: Fix deadlock in destroy_device().

2016-08-09 Thread Daniele Di Proietto





On 08/08/2016 01:02, "Kavanagh, Mark B" <mark.b.kavan...@intel.com> wrote:

>>
>>Minor comment inline.
>>
>>Acked-by: Ilya Maximets <i.maxim...@samsung.com>
>>
>
>Other than the comment mentioned by Ilya, this LGTM also - thanks again for 
>resolving, Daniele.
>
>Acked-by: mark.b.kavan...@intel.com

Thanks for the reviews, I applied this to master

>
>>On 05.08.2016 23:57, Daniele Di Proietto wrote:
>>> netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which
>>> can trigger the destroy_device() callback.  destroy_device() will try to
>>> take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a
>>> deadlock.
>>>
>>> This problem can be solved by dropping the mutexes before calling
>>> rte_vhost_driver_unregister().  The netdev_dpdk_vhost_destruct() and
>>> construct() call are already serialized by netdev_mutex.
>>>
>>> This commit also makes clear that dev->vhost_id is constant and can be
>>> accessed without taking any mutexes in the lifetime of the devices.
>>>
>>> Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak")
>>> Reported-by: Ilya Maximets <i.maxim...@samsung.com>
>>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>>> ---
>>>  lib/netdev-dpdk.c | 34 --
>>>  1 file changed, 24 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index f37ec1c..98bff62 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -355,8 +355,10 @@ struct netdev_dpdk {
>>>  /* True if vHost device is 'up' and has been reconfigured at least 
>>> once */
>>>  bool vhost_reconfigured;
>>>
>>> -/* Identifier used to distinguish vhost devices from each other */
>>> -char vhost_id[PATH_MAX];
>>> +/* Identifier used to distinguish vhost devices from each other.  It 
>>> does
>>> + * not change during the lifetime of a struct netdev_dpdk.  It can be 
>>> read
>>> + * without holding any mutex. */
>>> +const char vhost_id[PATH_MAX];
>>>
>>>  /* In dpdk_list. */
>>>  struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
>>> @@ -846,7 +848,8 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev)
>>>  }
>>>
>>>  ovs_mutex_lock(_mutex);
>>> -strncpy(dev->vhost_id, netdev->name, sizeof(dev->vhost_id));
>>> +strncpy(CONST_CAST(char *, dev->vhost_id), netdev->name,
>>> +sizeof dev->vhost_id);
>>>  err = vhost_construct_helper(netdev);
>>>  ovs_mutex_unlock(_mutex);
>>>  return err;
>>> @@ -878,7 +881,7 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>  /* Take the name of the vhost-user port and append it to the location 
>>> where
>>>   * the socket is to be created, then register the socket.
>>>   */
>>> -snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>>> +snprintf(CONST_CAST(char *,dev->vhost_id), sizeof(dev->vhost_id), 
>>> "%s/%s",
>>
>>Space between arguments of 'CONST_CAST()' and parenthesized operand of 
>>'sizeof'.

Fixed, thanks

>>
>>>   vhost_sock_dir, name);
>>>
>>>  err = rte_vhost_driver_register(dev->vhost_id, flags);
>>> @@ -938,6 +941,17 @@ netdev_dpdk_destruct(struct netdev *netdev)
>>>  ovs_mutex_unlock(_mutex);
>>>  }
>>>
>>> +/* rte_vhost_driver_unregister() can call back destroy_device(), which will
>>> + * try to acquire 'dpdk_mutex' and possibly 'dev->mutex'.  To avoid a
>>> + * deadlock, none of the mutexes must be held while calling this function. 
>>> */
>>> +static int
>>> +dpdk_vhost_driver_unregister(struct netdev_dpdk *dev)
>>> +OVS_EXCLUDED(dpdk_mutex)
>>> +OVS_EXCLUDED(dev->mutex)
>>> +{
>>> +return rte_vhost_driver_unregister(dev->vhost_id);
>>> +}
>>> +
>>>  static void
>>>  netdev_dpdk_vhost_destruct(struct netdev *netdev)
>>>  {
>>> @@ -955,12 +969,6 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>>>   dev->vhost_id);
>>>  }
>>>
>>> -if (rte_vhost_driver_unregister(dev->vhost_id)) {
>>> -VLOG_ERR("Unable to remove vhost-user socket %s", dev->vhost_id);
>>> -} else {
>>> -fatal_signal_remove_file_to_unlink(dev->vhost_id);
>>> -}
>>> -
>>>  free(ovsrcu_get_protected(struct ingress_policer *,
>>>>ingress_policer));
>>>
>>> @@ -970,6 +978,12 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>>>
>>>  ovs_mutex_unlock(>mutex);
>>>  ovs_mutex_unlock(_mutex);
>>> +
>>> +if (dpdk_vhost_driver_unregister(dev)) {
>>> +VLOG_ERR("Unable to remove vhost-user socket %s", dev->vhost_id);
>>> +} else {
>>> +fatal_signal_remove_file_to_unlink(dev->vhost_id);
>>> +}
>>>  }
>>>
>>>  static void
>>>
>>___
>>dev mailing list
>>dev@openvswitch.org
>>http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH V7 7/7] netdev-dpdk: add support for jumbo frames

2016-08-09 Thread Daniele Di Proietto
Thanks for all the series and the reviews, I will push this when the
dependencies (patch 2 and patch 6) are reviewed.


Daniele

2016-08-09 9:01 GMT-07:00 Mark Kavanagh <mark.b.kavan...@intel.com>:

> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
>
> Using this approach, the amount of memory allocated to each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame of a specific
> size can be carried in a single mbuf, as opposed to partitioning
> it across multiple mbuf segments.
>
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
> parameter.
>
> Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com>
> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
> [diproiet...@vmware.com rebased]
> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
> ---
>
> v7:
> - add 'Signed-off-by' for Ilya Maximets (i.maxim...@samsung.com)
>
> v6:
> - include device name in netdev_dpdk_set_mtu error log
> - resolve minor coding standards infractions
>
> v5:
> - rename dpdk_mp_configure to netdev_dpdk_mempool_configure
> - consolidate socket_id and mtu changes within
>   netdev_dpdk_mempool_configure
> - add lower bounds check for user-supplied MTU
> - add socket_id and mtu fields to mempool configure error report
> - minor cosmetic changes
>
> v4:
> - restore error reporting in *_reconfigure functions (for
>   non-mtu-configuration based errors)
> - remove 'goto' in the event of dpdk_mp_configure failure
> - remove superfluous error variables
>
>  v3:
> - replace netdev_dpdk.last_mtu with local variable
> - add comment for dpdk_mp_configure
>
>  v2:
>  - rebase to HEAD of master
>  - fall back to previous 'good' MTU if reconfigure fails
>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>fall-back
>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>  - remove superflous variable in dpdk_mp_configure
>  - fix minor coding style infraction
>
>
>  INSTALL.DPDK-ADVANCED.md |  58 ++-
>  INSTALL.DPDK.md  |   1 -
>  NEWS |   1 +
>  lib/netdev-dpdk.c| 145 ++
> +
>  4 files changed, 176 insertions(+), 29 deletions(-)
>
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 0ab43d4..5e758ce 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -1,5 +1,5 @@
>  OVS DPDK ADVANCED INSTALL GUIDE
> -=
> +===
>
>  ## Contents
>
> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>  7. [QOS](#qos)
>  8. [Rate Limiting](#rl)
>  9. [Flow Control](#fc)
> -10. [Vsperf](#vsperf)
> +10. [Jumbo Frames](#jumbo)
> +11. [Vsperf](#vsperf)
>
>  ##  1. Overview
>
> @@ -862,7 +863,58 @@ respective parameter. To disable the flow control at
> tx side,
>
>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>
> -##  10. Vsperf
> +##  10. Jumbo Frames
> +
> +By default, DPDK ports are configured with standard Ethernet MTU (1500B).
> To
> +enable Jumbo Frames support for a DPDK port, change the Interface's
> `mtu_request`
> +attribute to a sufficiently large value.
> +
> +e.g. Add a DPDK Phy port with MTU of 9000:
> +
> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set
> Interface dpdk0 mtu_request=9000`
> +
> +e.g. Change the MTU of an existing port to 6200:
> +
> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
> +
> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
> +increased, such that a full Jumbo Frame of a specific size may be
> accommodated
> +within a single mbuf segment.
> +
> +Jumbo frame support has been validated against 9728B frames (largest
> frame size
> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger
> frames
> +(particularly in use cases involving East-West traffic only), and other
> DPDK NIC
> +drivers may be supported.
> +
> +### 9.1 vHost Ports and Jumbo Frames
> +
> +Some additional configuration is needed to take advantage of jumbo frames
> with
> +vhost ports:
> +
> +1. `mergeable buffers` must be enabled for vHost ports, as
> demonstrated in
> +the QEMU command line snippet below:
> +

Re: [ovs-dev] [PATCH V7 1/7] ofproto: Consider datapath_type when looking for internal ports.

2016-08-09 Thread Daniele Di Proietto




On 09/08/2016 09:08, "Thadeu Lima de Souza Cascardo" <casca...@redhat.com> 
wrote:

>On Tue, Aug 09, 2016 at 05:01:14PM +0100, Mark Kavanagh wrote:
>> From: Daniele Di Proietto <diproiet...@vmware.com>
>> 
>> Interfaces with type "internal" end up having a netdev with type "tap"
>> in the dpif-netdev datapath, so a strcmp will fail to match internal
>> interfaces.
>> 
>> We can translate the types with ofproto_port_open_type() before calling
>> strcmp to fix this.
>> 
>> This fixes a minor issue where internal interfaces are considered
>> non-internal in the userspace datapath for the purpose of adjusting the
>> MTU.
>> 
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>Acked-by: Thadeu Lima de Souza Cascardo <casca...@redhat.com>

Thanks guys, I pushed this to master

>
>Hi, Mark.
>
>Can you keep my Ack in further submissions in case there are no changes to this
>patch?
>
>Thanks.
>Cascardo.
>
>> ---
>>  ofproto/ofproto.c | 16 +---
>>  1 file changed, 9 insertions(+), 7 deletions(-)
>> 
>> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
>> index 8e59c69..088f91a 100644
>> --- a/ofproto/ofproto.c
>> +++ b/ofproto/ofproto.c
>> @@ -220,7 +220,8 @@ static void learned_cookies_flush(struct ofproto *, 
>> struct ovs_list *dead_cookie
>>  /* ofport. */
>>  static void ofport_destroy__(struct ofport *) OVS_EXCLUDED(ofproto_mutex);
>>  static void ofport_destroy(struct ofport *, bool del);
>> -static inline bool ofport_is_internal(const struct ofport *);
>> +static inline bool ofport_is_internal(const struct ofproto *,
>> +  const struct ofport *);
>>  
>>  static int update_port(struct ofproto *, const char *devname);
>>  static int init_ports(struct ofproto *);
>> @@ -2465,7 +2466,7 @@ static void
>>  ofport_remove(struct ofport *ofport)
>>  {
>>  struct ofproto *p = ofport->ofproto;
>> -bool is_internal = ofport_is_internal(ofport);
>> +bool is_internal = ofport_is_internal(p, ofport);
>>  
>>  connmgr_send_port_status(ofport->ofproto->connmgr, NULL, >pp,
>>   OFPPR_DELETE);
>> @@ -2751,9 +2752,10 @@ init_ports(struct ofproto *p)
>>  }
>>  
>>  static inline bool
>> -ofport_is_internal(const struct ofport *port)
>> +ofport_is_internal(const struct ofproto *p, const struct ofport *port)
>>  {
>> -return !strcmp(netdev_get_type(port->netdev), "internal");
>> +return !strcmp(netdev_get_type(port->netdev),
>> +   ofproto_port_open_type(p->type, "internal"));
>>  }
>>  
>>  /* Find the minimum MTU of all non-datapath devices attached to 'p'.
>> @@ -2770,7 +2772,7 @@ find_min_mtu(struct ofproto *p)
>>  
>>  /* Skip any internal ports, since that's what we're trying to
>>   * set. */
>> -if (ofport_is_internal(ofport)) {
>> +if (ofport_is_internal(p, ofport)) {
>>  continue;
>>  }
>>  
>> @@ -2797,7 +2799,7 @@ update_mtu(struct ofproto *p, struct ofport *port)
>>  port->mtu = 0;
>>  return;
>>  }
>> -if (ofport_is_internal(port)) {
>> +if (ofport_is_internal(p, port)) {
>>  if (dev_mtu > p->min_mtu) {
>> if (!netdev_set_mtu(port->netdev, p->min_mtu)) {
>> dev_mtu = p->min_mtu;
>> @@ -2827,7 +2829,7 @@ update_mtu_ofproto(struct ofproto *p)
>>  HMAP_FOR_EACH (ofport, hmap_node, >ports) {
>>  struct netdev *netdev = ofport->netdev;
>>  
>> -if (ofport_is_internal(ofport)) {
>> +if (ofport_is_internal(p, ofport)) {
>>  if (!netdev_set_mtu(netdev, p->min_mtu)) {
>>  ofport->mtu = p->min_mtu;
>>  }
>> -- 
>> 1.9.3
>> 
>> ___
>> dev mailing list
>> dev@openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 3/3] netdev-dpdk: vHost client mode and reconnect

2016-08-05 Thread Daniele Di Proietto
The patch mostly looks good to me, thanks.

I'm not 100% sure about the interface.  Can we make the flag interface
specific?

If I'm not mistaken we currently limit vhost-sock-dir to be under OVS
rundir.  With client mode this is not necessary anymore.

I hope that client will be made the default mode at some point, I think we
should keep that in mind when considering the interface.

Since we're planning to break compatibility with the dpdk phy naming
change, maybe we can break compatibility also with vhost ports and add a
path option.

Thoughts?

Daniele

2016-08-04 7:09 GMT-07:00 Ciara Loftus :

> A new other_config DB option has been added called 'vhost-driver-mode'.
> By default this is set to 'server' which is the mode of operation OVS
> with DPDK has used up until this point - whereby OVS creates and manages
> vHost user sockets.
>
> If set to 'client', OVS will act as the vHost client and connect to
> sockets created and managed by QEMU which acts as the server. This mode
> allows for reconnect capability, which allows vHost ports to resume
> normal connectivity in event of switch reset.
>
> QEMU v2.7.0+ is required when using OVS in client mode and QEMU in
> server mode.
>
> Signed-off-by: Ciara Loftus 
> ---
> v2
> - Updated comments in vhost construct & destruct
> - Add check for server-mode before printing error when destruct is called
>   on a running VM
> - Fixed coding style/standards issues
> - Use strcmp instead of strncmp when processing 'vhost-driver-mode'
>
>  INSTALL.DPDK-ADVANCED.md | 27 +++
>  NEWS |  1 +
>  lib/netdev-dpdk.c| 31 +++
>  vswitchd/vswitch.xml | 13 +
>  4 files changed, 64 insertions(+), 8 deletions(-)
>
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index f9587b5..a773533 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -483,6 +483,33 @@ For users wanting to do packet forwarding using
> kernel stack below are the steps
> where `-L`: Changes the numbers of channels of the specified
> network device
> and `combined`: Changes the number of multi-purpose channels.
>
> +4. Enable OVS vHost client-mode & vHost reconnect (OPTIONAL)
> +
> +   By default, OVS DPDK acts as the vHost socket server and QEMU the
> +   client. In QEMU v2.7 the option is available for QEMU to act as the
> +   server. In order for this to work, OVS DPDK must be switched to
> 'client'
> +   mode. This is possible by setting the 'vhost-driver-mode' DB entry
> to
> +   'client' like so:
> +
> +   ```
> +   ovs-vsctl set Open_vSwitch . other_config:vhost-driver-
> mode="client"
> +   ```
> +
> +   This must be done before the switch is launched. It cannot
> sucessfully
> +   be changed after switch has launched.
> +
> +   One must also append ',server' to the 'chardev' arguments on the
> QEMU
> +   command line, to instruct QEMU to use vHost server mode, like so:
> +
> +   
> +   -chardev socket,id=char0,path=/usr/local/var/run/openvswitch/
> vhost0,server
> +   
> +
> +   One benefit of using this mode is the ability for vHost ports to
> +   'reconnect' in event of the switch crashing or being brought down.
> Once
> +   it is brought back up, the vHost ports will reconnect
> automatically and
> +   normal service will resume.
> +
>- VM Configuration with libvirt
>
>  * change the user/group, access control policty and restart libvirtd.
> diff --git a/NEWS b/NEWS
> index 9f09e1c..99412ba 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -70,6 +70,7 @@ Post-v2.5.0
> fragmentation or NAT support yet)
>   * Support for DPDK 16.07
>   * Remove dpdkvhostcuse port type.
> + * OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7)
> - Increase number of registers to 16.
> - ovs-benchmark: This utility has been removed due to lack of use and
>   bitrot.
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 7692cc8..39c448b 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -136,7 +136,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF /
> ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
>  #define OVS_VHOST_QUEUE_DISABLED(-2) /* Queue was disabled by guest
> and not
>* yet mapped to another queue.
> */
>
> -static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets */
> +static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets
> */
> +static uint64_t vhost_driver_flags = 0; /* Denote whether client/server
> mode */
>
>  #define VHOST_ENQ_RETRY_NUM 8
>  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
> @@ -833,7 +834,6 @@ netdev_dpdk_vhost_user_construct(struct netdev
> *netdev)
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>  const char *name = netdev->name;
>  int err;
> 

Re: [ovs-dev] [PATCH 1/3] system-userspace-macros: Check the exit code of ethtool.

2016-08-05 Thread Daniele Di Proietto





On 05/08/2016 11:16, "Joe Stringer" <j...@ovn.org> wrote:

>On 4 August 2016 at 18:40, Daniele Di Proietto <diproiet...@vmware.com> wrote:
>> If the ethtool command is not available on the system we should fail,
>> since the userspace testsuite cannot work properly without disabling
>> offloads.
>>
>> Also, add ethtool to the list of installed packages on Vagrantfile.
>>
>> Fixes: ddcf96d2dcc1 ("system-tests: Disable offloads in userspace tests.")
>> Reported-by: Joe Stringer <j...@ovn.org>
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>Thanks, this should make it more obvious when offloads are causing failures.
>
>The commit message doesn't really explain why the vagrantfile change
>is in the same commit; a simple mention that it's being added here 'to
>ensure that offloads don't cause test failures in the vagrant VM when
>the kernel is updated' would make it more clear why these two changes
>are in the same commit.

Ok

>
>Acked-by: Joe Stringer <j...@ovn.org>

Thanks! I applied this and the next patch to master
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 3/3] check-kernel: Remove '-d' from TESTSUITEFLAGS.

2016-08-05 Thread Daniele Di Proietto





On 05/08/2016 10:18, "Andy Zhou" <az...@ovn.org> wrote:

>
>
>On Thu, Aug 4, 2016 at 6:43 PM, Daniele Di Proietto 
><diproiet...@vmware.com> wrote:
>
>The '-d' flag tells autotest to always keep the testcase output, but
>prevents '--recheck' from working.  If a user wants to always keep the
>output from the tests, the '-d' flag can be passed explicitly.  This is
>more in line with other test make target ('check',
>'check-system-userspace').
>
>CC: Andy Zhou <az...@ovn.org>
>Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>---
> tests/automake.mk <http://automake.mk> | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/tests/automake.mk <http://automake.mk> b/tests/automake.mk 
><http://automake.mk>
>index a9ebf91..5d12ae5 100644
>--- a/tests/automake.mk <http://automake.mk>
>+++ b/tests/automake.mk <http://automake.mk>
>@@ -243,7 +243,7 @@ EXTRA_DIST += tests/run-ryu
>
> # Run kmod tests. Assume kernel modules has been installed or linked into the 
> kernel
> check-kernel: all tests/atconfig tests/atlocal $(SYSTEM_KMOD_TESTSUITE)
>-   $(SHELL) '$(SYSTEM_KMOD_TESTSUITE)' -C tests  
>AUTOTEST_PATH='$(AUTOTEST_PATH)' -d $(TESTSUITEFLAGS) -j1
>+   $(SHELL) '$(SYSTEM_KMOD_TESTSUITE)' -C tests  
>AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1
>
> # Testing the out of tree Kernel module
> check-kmod: all tests/atconfig tests/atlocal $(SYSTEM_KMOD_TESTSUITE)
>
>
>
>
>LGTM
>Acked-by: Andy Zhou <az...@ovn.org>

Thanks, pushed to master
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [ovs-dev,V2] netdev-dpdk: fix memory leak

2016-08-05 Thread Daniele Di Proietto
Thanks for the report, I didn't realize that the callback could come in the
same thread.

I sent a patch that I believe should fix the deadlock here:

http://openvswitch.org/pipermail/dev/2016-August/077315.html

2016-08-05 7:48 GMT-07:00 Ilya Maximets :

> On 04.08.2016 12:49, Mark Kavanagh wrote:
> > DPDK v16.07 introduces the ability to free memzones.
> > Up until this point, DPDK memory pools created in OVS could
> > not be destroyed, thus incurring a memory leak.
> >
> > Leverage the DPDK v16.07 rte_mempool API to free DPDK
> > mempools when their associated reference count reaches 0 (this
> > indicates that the memory pool is no longer in use).
> >
> > Signed-off-by: Mark Kavanagh 
> > ---
> >
> > v2->v1: rebase to head of master, and remove 'RFC' tag
> >
> >  lib/netdev-dpdk.c | 29 +++--
> >  1 file changed, 15 insertions(+), 14 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index aaac0d1..ffcd35c 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -506,7 +506,7 @@ dpdk_mp_get(int socket_id, int mtu)
> OVS_REQUIRES(dpdk_mutex)
> >  }
> >
> >  static void
> > -dpdk_mp_put(struct dpdk_mp *dmp)
> > +dpdk_mp_put(struct dpdk_mp *dmp) OVS_REQUIRES(dpdk_mutex)
> >  {
> >
> >  if (!dmp) {
> > @@ -514,15 +514,12 @@ dpdk_mp_put(struct dpdk_mp *dmp)
> >  }
> >
> >  dmp->refcount--;
> > -ovs_assert(dmp->refcount >= 0);
> >
> > -#if 0
> > -/* I could not find any API to destroy mp. */
> > -if (dmp->refcount == 0) {
> > -list_delete(dmp->list_node);
> > -/* destroy mp-pool. */
> > -}
> > -#endif
> > +if (OVS_UNLIKELY(!dmp->refcount)) {
> > +ovs_list_remove(>list_node);
> > +rte_mempool_free(dmp->mp);
> > + }
> > +
> >  }
> >
> >  static void
> > @@ -928,16 +925,18 @@ netdev_dpdk_destruct(struct netdev *netdev)
> >  {
> >  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> >
> > +ovs_mutex_lock(_mutex);
> >  ovs_mutex_lock(>mutex);
> > +
> >  rte_eth_dev_stop(dev->port_id);
> >  free(ovsrcu_get_protected(struct ingress_policer *,
> >>ingress_policer));
> > -ovs_mutex_unlock(>mutex);
> >
> > -ovs_mutex_lock(_mutex);
> >  rte_free(dev->tx_q);
> >  ovs_list_remove(>list_node);
> >  dpdk_mp_put(dev->dpdk_mp);
> > +
> > +ovs_mutex_unlock(>mutex);
> >  ovs_mutex_unlock(_mutex);
> >  }
> >
> > @@ -946,6 +945,9 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
> >  {
> >  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> >
> > +ovs_mutex_lock(_mutex);
> > +ovs_mutex_lock(>mutex);
> > +
> >  /* Guest becomes an orphan if still attached. */
> >  if (netdev_dpdk_get_vid(dev) >= 0) {
> >  VLOG_ERR("Removing port '%s' while vhost device still
> attached.",
> > @@ -961,15 +963,14 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
> >  fatal_signal_remove_file_to_unlink(dev->vhost_id);
> >  }
> >
> > -ovs_mutex_lock(>mutex);
> >  free(ovsrcu_get_protected(struct ingress_policer *,
> >>ingress_policer));
> > -ovs_mutex_unlock(>mutex);
> >
> > -ovs_mutex_lock(_mutex);
> >  rte_free(dev->tx_q);
> >  ovs_list_remove(>list_node);
> >  dpdk_mp_put(dev->dpdk_mp);
> > +
> > +ovs_mutex_unlock(>mutex);
> >  ovs_mutex_unlock(_mutex);
> >  }
>
> I agree that locking here was wrong but this change introduces issue
> because
> 'rte_vhost_driver_unregister()' may call 'destroy_device()' and OVS will
> be aborted
> on attempt to lock 'dpdk_mutex' again:
>
> VHOST_CONFIG: free connfd = 37 for device '/vhost1'
> ovs-vswitchd: lib/netdev-dpdk.c:2305: pthread_mutex_lock failed (Resource
> deadlock avoided)
>
> Program received signal SIGABRT, Aborted.
> 0x007fb7ad6d38 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x007fb7ad6d38 in raise () from /lib64/libc.so.6
> #1  0x007fb7ad8aa8 in abort () from /lib64/libc.so.6
> #2  0x00692be0 in ovs_abort_valist at lib/util.c:335
> #3  0x00692ba0 in ovs_abort at lib/util.c:327
> #4  0x00651800 in ovs_mutex_lock_at (l_=0x899ab0 ,
> where=0x78a458 "lib/netdev-dpdk.c:2305") at lib/ovs-thread.c:76
> #5  0x006c0190 in destroy_device (vid=0) at lib/netdev-dpdk.c:2305
> #6  0x004ea850 in vhost_destroy_device ()
> #7  0x004ee578 in rte_vhost_driver_unregister ()
> #8  0x006bc8c8 in netdev_dpdk_vhost_destruct (netdev=0x7f6bffed00)
> at lib/netdev-dpdk.c:944
> #9  0x005e4ad4 in netdev_unref (dev=0x7f6bffed00) at
> lib/netdev.c:499
> #10 0x005e4b9c in netdev_close (netdev=0x7f6bffed00) at
> lib/netdev.c:523
> [...]
> #20 0x0053ad94 in main (argc=7, argv=0x7ff318) at
> vswitchd/ovs-vswitchd.c:112
>
> May be reproduced by removing port while virtio still attached.
> This blocks reconnection feature and deletion of port while QEMU still
> 

[ovs-dev] [PATCH] netdev-dpdk: Fix deadlock in destroy_device().

2016-08-05 Thread Daniele Di Proietto
netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which
can trigger the destroy_device() callback.  destroy_device() will try to
take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a
deadlock.

This problem can be solved by dropping the mutexes before calling
rte_vhost_driver_unregister().  The netdev_dpdk_vhost_destruct() and
construct() call are already serialized by netdev_mutex.

This commit also makes clear that dev->vhost_id is constant and can be
accessed without taking any mutexes in the lifetime of the devices.

Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak")
Reported-by: Ilya Maximets <i.maxim...@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/netdev-dpdk.c | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index f37ec1c..98bff62 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -355,8 +355,10 @@ struct netdev_dpdk {
 /* True if vHost device is 'up' and has been reconfigured at least once */
 bool vhost_reconfigured;
 
-/* Identifier used to distinguish vhost devices from each other */
-char vhost_id[PATH_MAX];
+/* Identifier used to distinguish vhost devices from each other.  It does
+ * not change during the lifetime of a struct netdev_dpdk.  It can be read
+ * without holding any mutex. */
+const char vhost_id[PATH_MAX];
 
 /* In dpdk_list. */
 struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
@@ -846,7 +848,8 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev)
 }
 
 ovs_mutex_lock(_mutex);
-strncpy(dev->vhost_id, netdev->name, sizeof(dev->vhost_id));
+strncpy(CONST_CAST(char *, dev->vhost_id), netdev->name,
+sizeof dev->vhost_id);
 err = vhost_construct_helper(netdev);
 ovs_mutex_unlock(_mutex);
 return err;
@@ -878,7 +881,7 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
 /* Take the name of the vhost-user port and append it to the location where
  * the socket is to be created, then register the socket.
  */
-snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
+snprintf(CONST_CAST(char *,dev->vhost_id), sizeof(dev->vhost_id), "%s/%s",
  vhost_sock_dir, name);
 
 err = rte_vhost_driver_register(dev->vhost_id, flags);
@@ -938,6 +941,17 @@ netdev_dpdk_destruct(struct netdev *netdev)
 ovs_mutex_unlock(_mutex);
 }
 
+/* rte_vhost_driver_unregister() can call back destroy_device(), which will
+ * try to acquire 'dpdk_mutex' and possibly 'dev->mutex'.  To avoid a
+ * deadlock, none of the mutexes must be held while calling this function. */
+static int
+dpdk_vhost_driver_unregister(struct netdev_dpdk *dev)
+OVS_EXCLUDED(dpdk_mutex)
+OVS_EXCLUDED(dev->mutex)
+{
+return rte_vhost_driver_unregister(dev->vhost_id);
+}
+
 static void
 netdev_dpdk_vhost_destruct(struct netdev *netdev)
 {
@@ -955,12 +969,6 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
  dev->vhost_id);
 }
 
-if (rte_vhost_driver_unregister(dev->vhost_id)) {
-VLOG_ERR("Unable to remove vhost-user socket %s", dev->vhost_id);
-} else {
-fatal_signal_remove_file_to_unlink(dev->vhost_id);
-}
-
 free(ovsrcu_get_protected(struct ingress_policer *,
   >ingress_policer));
 
@@ -970,6 +978,12 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
 
 ovs_mutex_unlock(>mutex);
 ovs_mutex_unlock(_mutex);
+
+if (dpdk_vhost_driver_unregister(dev)) {
+VLOG_ERR("Unable to remove vhost-user socket %s", dev->vhost_id);
+} else {
+fatal_signal_remove_file_to_unlink(dev->vhost_id);
+}
 }
 
 static void
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [ovs-discuss] [openvswitch 2.5.90] testsuite: 2224 failed

2016-08-05 Thread Daniele Di Proietto
I can reproduce this too

With -march=native, if the CPU has CRC32 extensions we use a different hash
function.  I suspect the dhcp options are output on the packet in a
different order because of this.  Perhaps we should make the test agnostic
of the order, or order the options on the DHCP packet.

Thanks,

Daniele

2016-08-05 11:41 GMT-07:00 Lance Richardson :

> Wow, that is a very strange finding.
>
> I also see it on Fedora 23 with gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6),
> 2/100 failures with default configuration, 100% failure rate with
> -march=native.
>
>Lance
>
> - Original Message -
> > From: "Ilya Maximets" 
> > To: "Numan Siddique" , "Ben Pfaff" ,
> b...@openvswitch.org
> > Cc: dev@openvswitch.org, "Ramu Ramamurthy" ,
> "Dyasly Sergey" 
> > Sent: Friday, August 5, 2016 9:21:33 AM
> > Subject: Re: [ovs-dev] [openvswitch 2.5.90] testsuite: 2224 failed
> >
> > Exactly same situation with gcc (GCC) 6.1.1 20160510 (Red Hat 6.1.1-2).
> >
> > On 05.08.2016 14:37, Ilya Maximets wrote:
> > > There is one interesting bug:
> > >
> > > Test 2224 (ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS) constantly fails
> > > with 'CFLAGS=-march=native'. All other tests works normally.
> > >
> > > Environment:
> > >
> > > * OVS current master:
> > >   commit d59831e9b08e ("bridge: No QoS configured is not an error")
> > > * Red Hat Enterprise Linux Server release 7.2 (Maipo)
> > > * Compiler: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
> > > * Intel(R) Xeon(R) CPU E5-2690 v3
> > >
> > > Test scenario:
> > >
> > > 1. Checkout current master branch.
> > >
> > > 2. Configure OVS with default configuration:
> > >
> > ># ./boot.sh && ./configure && make
> > >
> > > 3. Check test #2224
> > >
> > ># make check TESTSUITEFLAGS='2224'
> > >2224: ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS   ok
> > >
> > > 4. Clean up
> > >
> > ># make distclean
> > >
> > > 5. Configure OVS with '-march=native':
> > >
> > ># ./boot.sh && ./configure CFLAGS="-march=native" && make
> > >
> > > 6. Check test #2224
> > >
> > ># make check TESTSUITEFLAGS='2224'
> > >2224: ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS   FAILED
> > >(ovn.at:3205)
> > >
> > > Test failed because of bad packet:
> > >
> > > ./ovn.at:3205: cat 1.packets | cut -c 53-
> > > --- expout  2016-08-05 14:29:47.205360523 +0300
> > > +++ /ovs/tests/testsuite.dir/at-groups/2224/stdout   2016-08-05
> > > 14:29:47.215360172 +0300
> > > @@ -1 +1 @@
> > > -0a010a0400430044011c020106006359aa76
> 0a04
> > >  f001
> 
> > >  
> 
> > >  
> 
> > >  
> 
> > >  
> 638253633501020104ff
> > >  0003040a0136040a0133040e10ff
> > > +0a010a0400430044011c020106006359aa76
> 0a04
> > >  f001
> 
> > >  
> 
> > >  
> 
> > >  
> 
> > >  
> 6382536335010236040a
> > >  010104ff0003040a0133040e10ff
> > >
> > > Full log attached.
> > >
> > > Best regards, Ilya Maximets.
> > >
> > ___
> > dev mailing list
> > dev@openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
> >
> ___
> discuss mailing list
> disc...@openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 3/3] check-kernel: Remove '-d' from TESTSUITEFLAGS.

2016-08-04 Thread Daniele Di Proietto
The '-d' flag tells autotest to always keep the testcase output, but
prevents '--recheck' from working.  If a user wants to always keep the
output from the tests, the '-d' flag can be passed explicitly.  This is
more in line with other test make target ('check',
'check-system-userspace').

CC: Andy Zhou <az...@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 tests/automake.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/automake.mk b/tests/automake.mk
index a9ebf91..5d12ae5 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -243,7 +243,7 @@ EXTRA_DIST += tests/run-ryu
 
 # Run kmod tests. Assume kernel modules has been installed or linked into the 
kernel
 check-kernel: all tests/atconfig tests/atlocal $(SYSTEM_KMOD_TESTSUITE)
-   $(SHELL) '$(SYSTEM_KMOD_TESTSUITE)' -C tests  
AUTOTEST_PATH='$(AUTOTEST_PATH)' -d $(TESTSUITEFLAGS) -j1
+   $(SHELL) '$(SYSTEM_KMOD_TESTSUITE)' -C tests  
AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1
 
 # Testing the out of tree Kernel module
 check-kmod: all tests/atconfig tests/atlocal $(SYSTEM_KMOD_TESTSUITE)
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 2/3] system-traffic: Flush conntrack after debug ping6.

2016-08-04 Thread Daniele Di Proietto
We want to discard any state created by the initial ping6 (used to wait
for an available IP address).  Otherwise some weird state can show up in
the connection tracking tables (such as ICMP connection from link-local
addresses).

Fixes: e5cf8cce2759("system-tests: Add ping through conntrack test.")
Reported-by: Joe Stringer <j...@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 tests/system-traffic.at | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 666a14d..5b0a1ce 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -675,6 +675,10 @@ AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
 
 OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
 
+dnl The above ping creates state in the connection tracker.  We're not
+dnl interested in that state.
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
 dnl Pings from ns1->ns0 should fail.
 NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 | FORMAT_PING], 
[0], [dnl
 7 packets transmitted, 0 received, 100% packet loss, time 0ms
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 1/3] system-userspace-macros: Check the exit code of ethtool.

2016-08-04 Thread Daniele Di Proietto
If the ethtool command is not available on the system we should fail,
since the userspace testsuite cannot work properly without disabling
offloads.

Also, add ethtool to the list of installed packages on Vagrantfile.

Fixes: ddcf96d2dcc1 ("system-tests: Disable offloads in userspace tests.")
Reported-by: Joe Stringer <j...@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 Vagrantfile  | 2 +-
 tests/system-userspace-macros.at | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Vagrantfile b/Vagrantfile
index fb06b42..843d88c 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -11,7 +11,7 @@ dnf -y install autoconf automake openssl-devel libtool \
python-twisted-core python-zope-interface \
desktop-file-utils groff graphviz rpmdevtools nc \
wget python-six pyftpdlib checkpolicy selinux-policy-devel \
-   libcap-ng-devel kernel-devel-`uname -r`
+   libcap-ng-devel kernel-devel-`uname -r` ethtool
 echo "search extra update built-in" >/etc/depmod.d/search_path.conf
 cd /vagrant
 ./boot.sh
diff --git a/tests/system-userspace-macros.at b/tests/system-userspace-macros.at
index 213425f..7e10b6c 100644
--- a/tests/system-userspace-macros.at
+++ b/tests/system-userspace-macros.at
@@ -55,7 +55,7 @@ m4_define([OVS_TRAFFIC_VSWITCHD_STOP],
 # This is a workaround, and should be removed when offloads are properly
 # supported in netdev-linux.
 m4_define([CONFIGURE_VETH_OFFLOADS],
-[ethtool -K $1 tx off]
+[AT_CHECK([ethtool -K $1 tx off], [0], [ignore])]
 )
 
 # CHECK_CONNTRACK()
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] system-traffic: Make ping6 vlan test more reliable.

2016-08-04 Thread Daniele Di Proietto
LGTM, thanks

Acked-by: 

2016-08-04 17:40 GMT-07:00 Joe Stringer :

> Previously we checked on the underlying interfaces rather than the vlan
> interfaces to verify whether IPv6 connectivity is available;
> occasionally this would fail on some systems. Wait on the VLAN IP
> instead.
>
> Signed-off-by: Joe Stringer 
> ---
>  tests/system-traffic.at | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 666a14d44ff6..e9df90e27ac8 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -113,7 +113,7 @@ ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96")
>  dnl Linux seems to take a little time to get its IPv6 stack in order.
> Without
>  dnl waiting, we get occasional failures due to the following error:
>  dnl "connect: Cannot assign requested address"
> -OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2])
>
>  NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
> FORMAT_PING], [0], [dnl
>  3 packets transmitted, 3 received, 0% packet loss, time 0ms
> --
> 2.9.2
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] bridge: No QoS configured is not an error

2016-08-04 Thread Daniele Di Proietto
Applied to master, thanks

2016-08-02 8:27 GMT-07:00 Stokes, Ian :

> > If no QoS is configured, type value is likely to be an empty string.
> >
> > This is not an error though, so use the regular command reply function,
> > not the error one.
> >
> > For example, before this patch:
> >   # ovs-appctl -t ovs-vswitchd qos/show vhost-user1
> >   QoS not configured on vhost-user1
> >   ovs-appctl: ovs-vswitchd: server returned an error
> >
> > After the patch:
> >   # ovs-appctl -t ovs-vswitchd qos/show vhost-user1
> >   QoS not configured on vhost-user1
> >
> > Signed-off-by: Maxime Coquelin 
> > ---
> >  vswitchd/bridge.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index
> > 07f7b55..ddf1fe5 100644
> > --- a/vswitchd/bridge.c
> > +++ b/vswitchd/bridge.c
> > @@ -3199,7 +3199,7 @@ qos_unixctl_show(struct unixctl_conn *conn, int
> > argc OVS_UNUSED,
> >  unixctl_command_reply(conn, ds_cstr());
> >  } else {
> >  ds_put_format(, "QoS not configured on %s\n", iface-
> > >name);
> > -unixctl_command_reply_error(conn, ds_cstr());
> > +unixctl_command_reply(conn, ds_cstr());
> >  }
> >  } else {
> >  ds_put_format(, "%s: failed to retrieve QOS configuration
> > (%s)\n",
> > --
> > 2.7.4
> >
> Thanks for the patch Maxime, looks good to me.
>
> Acked-by: Ian Stokes 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v1 1/1] netdev-dpdk: Fix egress policer error detection bug.

2016-08-04 Thread Daniele Di Proietto
Thanks for the patch, comments inline

2016-08-02 9:37 GMT-07:00 Ian Stokes :

> When egress policer is set as a QoS type for a port, an error may occur
> during
> setup if incorrect parameters are used for the rte_meter. If this occurs
> the egress policer construct and set functions should free any allocated
> memory relevant to the policer and set the QoS configuration pointer to
> null. The netdev_dpdk_set_qos function should check the error value
> returned
> for any QoS construct/set calls with an assertion to avoid segfault.
>
> Signed-off-by: Ian Stokes 
> ---
>  lib/netdev-dpdk.c |   29 -
>  1 files changed, 28 insertions(+), 1 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index c208f32..c382270 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2679,12 +2679,19 @@ netdev_dpdk_set_qos(struct netdev *netdev,
>
>  /* Install new QoS configuration. */
>  error = new_ops->qos_construct(netdev, details);
> -ovs_assert((error == 0) == (dev->qos_conf != NULL));
>  }
>  } else {
>  error = new_ops->qos_construct(netdev, details);
> +}
> +
> +if (!error) {
>  ovs_assert((error == 0) == (dev->qos_conf != NULL));
>  }
> +else {
> +VLOG_ERR("Failed to set QoS type %s on port %s, returned error
> %d",
> +type, netdev->name, error);
> +ovs_assert(dev->qos_conf == NULL);
> +}
>

I think we can replace this with:

ovs_assert((error == 0) == (dev->qos_conf != NULL));
if (!error) {
   VLOG(...)
}

type should be aligned with " on the above line

Can we use rte_strerror to print a textual representation?


>
>  ovs_mutex_unlock(>mutex);
>  return error;
> @@ -2726,6 +2733,15 @@ egress_policer_qos_construct(struct netdev *netdev,
>  policer->app_srtcm_params.ebs = 0;
>  err = rte_meter_srtcm_config(>egress_meter,
>  >app_srtcm_params);
> +
> +if (err < 0) {
> +/* Error occurred during rte_meter creation, destroy the policer
> + * and set the qos configuration for the netdev dpdk to NULL
> + */
> +free(policer);
> +dev->qos_conf = NULL;
> +}
> +
>

Can we return a positive error number instead of a negative one? This is
more inline with the rest of OVS

 rte_spinlock_unlock(>qos_lock);
>
>  return err;
> @@ -2756,11 +2772,13 @@ static int
>  egress_policer_qos_set(struct netdev *netdev, const struct smap *details)
>  {
>  struct egress_policer *policer;
> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>  const char *cir_s;
>  const char *cbs_s;
>  int err = 0;
>
>  policer = egress_policer_get__(netdev);
> +rte_spinlock_lock(>qos_lock);
>  cir_s = smap_get(details, "cir");
>  cbs_s = smap_get(details, "cbs");
>  policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL, 10) : 0;
> @@ -2769,6 +2787,15 @@ egress_policer_qos_set(struct netdev *netdev, const
> struct smap *details)
>  err = rte_meter_srtcm_config(>egress_meter,
>  >app_srtcm_params);
>
> +if (err < 0) {
> +/* Error occurred during rte_meter creation, destroy the policer
> + * and set the qos configuration for the netdev dpdk to NULL
> + */
> +free(policer);
> +dev->qos_conf = NULL;
> +}
> +rte_spinlock_unlock(>qos_lock);
> +
>

Can we return a positive error number instead of a negative one? This is
more inline with the rest of OVS

I guess we forgot to lock the spinlock here on the original patch and this
commit fixes it. Can you document this in the commit message?

In the long term I'd like this to use RCU, as we wouldn't need so many
critical sections, but it's fine to avoid it for now



>  return err;
>  }
>
>

Thanks,

Daniele
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: When no QoS set, set type to empty string

2016-08-04 Thread Daniele Di Proietto
Thanks for the fix!

I added you name to AUTHORS and applied this to master

2016-08-02 9:52 GMT-07:00 Maxime Coquelin :

>
>
> On 08/02/2016 05:19 PM, Stokes, Ian wrote:
>
>> This patch sets *typep to an empty string instead of letting it
>>> uninitialized when no QoS configuration is set.
>>>
>>> It fixes the following vswitchd crash when no QoS has been set on vhost-
>>> user interface:
>>>
>>>  $> ovs-appctl -t ovs-vswitchd qos/show vhost-user1
>>>
>>>  #0  0x7efcbadf18d7 in raise () from /lib64/libc.so.6
>>>  #1  0x7efcbadf353a in abort () from /lib64/libc.so.6
>>>  #2  0x0068d5be in ovs_abort_valist at lib/util.c:335
>>>  #3  0x00693d90 in vlog_abort_valist at lib/vlog.c:1204
>>>  #4  0x00693e17 in vlog_abort at lib/vlog.c:1218
>>>  #5  0x0068d3ae in ovs_assert_failure at lib/util.c:72
>>>  #6  0x0060425c in ds_put_format_valist at lib/dynamic-
>>> string.c:168
>>>  #7  0x006042e7 in ds_put_format at lib/dynamic-string.c:142
>>>  #8  0x005a9e75 in qos_unixctl_show at vswitchd/bridge.c:3185
>>>  #9  0x0068cda1 in process_command at lib/unixctl.c:347
>>>  #11 unixctl_server_run at lib/unixctl.c:400
>>>  #12 0x0040a3ff in main at vswitchd/ovs-vswitchd.c:113
>>>
>>> Signed-off-by: Maxime Coquelin 
>>> ---
>>>  lib/netdev-dpdk.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>> a0d541a..159fe73 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -2680,6 +2680,9 @@ netdev_dpdk_get_qos(const struct netdev *netdev,
>>>  *typep = dev->qos_conf->ops->qos_name;
>>>  error = (dev->qos_conf->ops->qos_get
>>>   ? dev->qos_conf->ops->qos_get(netdev, details): 0);
>>> +} else {
>>> +/* No QoS configuration set, return an empty string */
>>> +*typep = "";
>>>  }
>>>  ovs_mutex_unlock(>mutex);
>>>
>>> --
>>> 2.7.4
>>>
>>
>> Thanks for the Patch Maxime.
>>
>> I tried to recreate the segfault with the steps you've outlined on my own
>> system without the patch but could not.
>>
>
> Maybe you were just lucky? Or actually, not lucky!
> Indeed, as *typep contains uninitialized value, maybe that in your case,
> its value was a valid address that pointed to 0?
>
>
>> I'm Running Fedora 22 with kernel 4.1.8-200 and gcc 5.3.1. Out of
>> interest what was your test environment?
>>
>
> Fedora 21, kernel 3.19.3-200 and gcc 4.9.2.
>
> Either way I agree that type should be set to "" when no QoS is not
>> configured.
>>
>> Acked-by: Ian Stokes 
>>
>
> Thanks!
> Maxime
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Make libnuma dependencies optional

2016-08-04 Thread Daniele Di Proietto
LGTM, thanks

Pushed to master

2016-08-04 3:44 GMT-07:00 Ciara Loftus :

> Prior to this patch, OVS with DPDK required the libnuma packages to
> build. This patch removes this dependency, making it only a requirement
> when the CONFIG_RTE_LIBRTE_VHOST_NUMA option is detected as enabled in
> the DPDK build.
>
> Signed-off-by: Ciara Loftus 
> ---
>  .travis.yml |  1 -
>  INSTALL.DPDK-ADVANCED.md|  2 +-
>  INSTALL.DPDK.md |  2 +-
>  acinclude.m4| 14 --
>  rhel/openvswitch-fedora.spec.in |  2 --
>  5 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/.travis.yml b/.travis.yml
> index a46994d..4ae6a5b 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -11,7 +11,6 @@ addons:
>  packages:
>- bc
>- gcc-multilib
> -  - libnuma-dev
>- libssl-dev
>- llvm-dev
>- libjemalloc1
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index c8d69ae..0ab43d4 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -181,7 +181,7 @@ right PCIe slot.
>CONFIG_RTE_LIBRTE_VHOST_NUMA=y, vHost User ports automatically
>detect the NUMA socket of the QEMU vCPUs and will be serviced by a PMD
>from the same node provided a core on this node is enabled in the
> -  pmd-cpu-mask.
> +  pmd-cpu-mask. libnuma packages are required for this feature.
>
>  ### 3.7 Compiler Optimizations
>
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 0dae2ab..253d022 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -21,7 +21,7 @@ The DPDK support of Open vSwitch is considered
> 'experimental'.
>
>  ### Prerequisites
>
> -* Required: DPDK 16.07, libnuma
> +* Required: DPDK 16.07
>  * Hardware: [DPDK Supported NICs] when physical ports in use
>
>  ##  2. Building and Installation
> diff --git a/acinclude.m4 b/acinclude.m4
> index f02166d..bbbad23 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -209,7 +209,17 @@ AC_DEFUN([OVS_CHECK_DPDK], [
>[AC_DEFINE([VHOST_CUSE], [1], [DPDK vhost-cuse support enabled,
> vhost-user disabled.])
> DPDK_EXTRA_LIB="-lfuse"])
>
> -AC_SEARCH_LIBS([get_mempolicy],[numa],[],[AC_MSG_ERROR([unable to
> find libnuma, install the dependency package])])
> +AC_COMPILE_IFELSE([
> +  AC_LANG_PROGRAM(
> +[
> +  #include 
> +#if RTE_LIBRTE_VHOST_NUMA
> +#error
> +#endif
> +], [])
> +  ], [],
> +  [AC_SEARCH_LIBS([get_mempolicy],[numa],[],[AC_MSG_ERROR([unable to
> find libnuma, install the dependency package])])
> +   DPDK_EXTRA_LIB="-lnuma"])
>
>  # On some systems we have to add -ldl to link with dpdk
>  #
> @@ -221,7 +231,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
>  DPDKLIB_FOUND=false
>  save_LIBS=$LIBS
>  for extras in "" "-ldl"; do
> -LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB -lnuma"
> +LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB"
>  AC_LINK_IFELSE(
> [AC_LANG_PROGRAM([#include 
>   #include ],
> diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.
> spec.in
> index 34c0f37..0657e81 100644
> --- a/rhel/openvswitch-fedora.spec.in
> +++ b/rhel/openvswitch-fedora.spec.in
> @@ -54,8 +54,6 @@ BuildRequires: libcap-ng libcap-ng-devel
>  %endif
>  %if %{with dpdk}
>  BuildRequires: dpdk-devel >= 2.2.0
> -BuildRequires: numactl-devel
> -Requires: numactl-libs
>  Provides: %{name}-dpdk = %{version}-%{release}
>  %endif
>
> --
> 2.4.3
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH V2] netdev-dpdk: fix memory leak

2016-08-04 Thread Daniele Di Proietto
I'm glad we can finally uncomment this code!

The patch looks good to me.

I made a few style changes and pushed this to master

Thanks

2016-08-04 2:49 GMT-07:00 Mark Kavanagh :

> DPDK v16.07 introduces the ability to free memzones.
> Up until this point, DPDK memory pools created in OVS could
> not be destroyed, thus incurring a memory leak.
>
> Leverage the DPDK v16.07 rte_mempool API to free DPDK
> mempools when their associated reference count reaches 0 (this
> indicates that the memory pool is no longer in use).
>
> Signed-off-by: Mark Kavanagh 
> ---
>
> v2->v1: rebase to head of master, and remove 'RFC' tag
>
>  lib/netdev-dpdk.c | 29 +++--
>  1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index aaac0d1..ffcd35c 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -506,7 +506,7 @@ dpdk_mp_get(int socket_id, int mtu)
> OVS_REQUIRES(dpdk_mutex)
>  }
>
>  static void
> -dpdk_mp_put(struct dpdk_mp *dmp)
> +dpdk_mp_put(struct dpdk_mp *dmp) OVS_REQUIRES(dpdk_mutex)
>  {
>
>  if (!dmp) {
> @@ -514,15 +514,12 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>  }
>
>  dmp->refcount--;
> -ovs_assert(dmp->refcount >= 0);
>
> -#if 0
> -/* I could not find any API to destroy mp. */
> -if (dmp->refcount == 0) {
> -list_delete(dmp->list_node);
> -/* destroy mp-pool. */
> -}
> -#endif
> +if (OVS_UNLIKELY(!dmp->refcount)) {
> +ovs_list_remove(>list_node);
> +rte_mempool_free(dmp->mp);
> + }
> +
>  }
>
>  static void
> @@ -928,16 +925,18 @@ netdev_dpdk_destruct(struct netdev *netdev)
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>
> +ovs_mutex_lock(_mutex);
>  ovs_mutex_lock(>mutex);
> +
>  rte_eth_dev_stop(dev->port_id);
>  free(ovsrcu_get_protected(struct ingress_policer *,
>>ingress_policer));
> -ovs_mutex_unlock(>mutex);
>
> -ovs_mutex_lock(_mutex);
>  rte_free(dev->tx_q);
>  ovs_list_remove(>list_node);
>  dpdk_mp_put(dev->dpdk_mp);
> +
> +ovs_mutex_unlock(>mutex);
>  ovs_mutex_unlock(_mutex);
>  }
>
> @@ -946,6 +945,9 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>
> +ovs_mutex_lock(_mutex);
> +ovs_mutex_lock(>mutex);
> +
>  /* Guest becomes an orphan if still attached. */
>  if (netdev_dpdk_get_vid(dev) >= 0) {
>  VLOG_ERR("Removing port '%s' while vhost device still attached.",
> @@ -961,15 +963,14 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>  fatal_signal_remove_file_to_unlink(dev->vhost_id);
>  }
>
> -ovs_mutex_lock(>mutex);
>  free(ovsrcu_get_protected(struct ingress_policer *,
>>ingress_policer));
> -ovs_mutex_unlock(>mutex);
>
> -ovs_mutex_lock(_mutex);
>  rte_free(dev->tx_q);
>  ovs_list_remove(>list_node);
>  dpdk_mp_put(dev->dpdk_mp);
> +
> +ovs_mutex_unlock(>mutex);
>  ovs_mutex_unlock(_mutex);
>  }
>
> --
> 1.9.3
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] ovs-rcu: Add new ovsrcu_index type.

2016-08-03 Thread Daniele Di Proietto
Applied to master, thanks!



On 03/08/2016 15:00, "Jarno Rajahalme" <ja...@ovn.org> wrote:

>Looks good to me,
>
>Acked-by: Jarno Rajahalme <ja...@ovn.org>
>
>> On Aug 2, 2016, at 5:03 PM, Daniele Di Proietto <diproiet...@vmware.com> 
>> wrote:
>> 
>> With RCU in Open vSwitch it's very easy to protect objects accessed by
>> a pointer, but sometimes a pointer is not available.
>> 
>> One example is the vhost id for DPDK 16.07.  Until DPDK 16.04 a pointer
>> was used to access a vhost device with RCU semantics.  From DPDK 16.07
>> an integer id (which is an array index) is used to access a vhost
>> device.  Ideally, we want the exact same RCU semantics that we had for
>> the pointer, on the integer (atomicity, memory barriers, behaviour
>> around quiescent states)
>> 
>> This commit implements a new type in ovs-rcu: ovsrcu_index. The newly
>> implemented ovsrcu_index_*() functions should be used to access the
>> type.
>> 
>> Even though we say "Do not, in general, declare a typedef for a struct,
>> union, or enum.", I think we're not in the "general" case.
>> 
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>> ---
>> lib/ovs-rcu.h | 84 
>> +++
>> 1 file changed, 84 insertions(+)
>> 
>> diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h
>> index dc75749..2887bb8 100644
>> --- a/lib/ovs-rcu.h
>> +++ b/lib/ovs-rcu.h
>> @@ -125,6 +125,36 @@
>>  * ovs_mutex_unlock();
>>  * }
>>  *
>> + * In some rare cases an object may not be addressable with a pointer, but 
>> only
>> + * through an array index (e.g. because it's provided by another library).  
>> It
>> + * is still possible to have RCU semantics by using the ovsrcu_index type.
>> + *
>> + * static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
>> + *
>> + * ovsrcu_index port_id;
>> + *
>> + * void tx()
>> + * {
>> + * int id = ovsrcu_index_get(_id);
>> + * if (id == -1) {
>> + * return;
>> + * }
>> + * port_tx(id);
>> + * }
>> + *
>> + * void delete()
>> + * {
>> + * int id;
>> + *
>> + * ovs_mutex_lock();
>> + * id = ovsrcu_index_get_protected(_id);
>> + * ovsrcu_index_set(_id, -1);
>> + * ovs_mutex_unlock();
>> + *
>> + * ovsrcu_synchronize();
>> + * port_delete(id);
>> + * }
>> + *
>>  */
>> 
>> #include "compiler.h"
>> @@ -213,6 +243,60 @@ void ovsrcu_postpone__(void (*function)(void *aux), 
>> void *aux);
>>  (void) sizeof(*(ARG)), \
>>  ovsrcu_postpone__((void (*)(void *))(FUNCTION), ARG))
>> 
>> +/* An array index protected by RCU semantics.  This is an easier 
>> alternative to
>> + * an RCU protected pointer to a malloc'd int. */
>> +typedef struct { atomic_int v; } ovsrcu_index;
>> +
>> +static inline int ovsrcu_index_get__(const ovsrcu_index *i, memory_order 
>> order)
>> +{
>> +int ret;
>> +atomic_read_explicit(CONST_CAST(atomic_int *, >v), , order);
>> +return ret;
>> +}
>> +
>> +/* Returns the index contained in 'i'.  The returned value can be used until
>> + * the next grace period. */
>> +static inline int ovsrcu_index_get(const ovsrcu_index *i)
>> +{
>> +return ovsrcu_index_get__(i, memory_order_consume);
>> +}
>> +
>> +/* Returns the index contained in 'i'.  This is an alternative to
>> + * ovsrcu_index_get() that can be used when there's no possible concurrent
>> + * writer. */
>> +static inline int ovsrcu_index_get_protected(const ovsrcu_index *i)
>> +{
>> +return ovsrcu_index_get__(i, memory_order_relaxed);
>> +}
>> +
>> +static inline void ovsrcu_index_set__(ovsrcu_index *i, int value,
>> +  memory_order order)
>> +{
>> +atomic_store_explicit(>v, value, order);
>> +}
>> +
>> +/* Writes the index 'value' in 'i'.  The previous value of 'i' may still be
>> + * used by readers until the next grace period. */
>> +static inline void ovsrcu_index_set(ovsrcu_index *i, int value)
>> +{
>> +ovsrcu_index_set__(i, value, memory_order_release);
>> +}
>> +
>> +/* Writes the index 'value' in 'i'.  This is an alternative to
>> + * ovsrcu_index_set() that can be used when there's no possible concurrent
>> + * reader. */
>> +static inline void ovsrcu_index_set_hidden(ovsrcu_index *i, int value)
>> +{
>> +ovsrcu_index_set__(i, value, memory_order_relaxed);
>> +}
>> +
>> +/* Initializes 'i' with 'value'.  This is safe to call as long as there are 
>> no
>> + * concurrent readers. */
>> +static inline void ovsrcu_index_init(ovsrcu_index *i, int value)
>> +{
>> +atomic_init(>v, value);
>> +}
>> +
>> /* Quiescent states. */
>> void ovsrcu_quiesce_start(void);
>> void ovsrcu_quiesce_end(void);
>> -- 
>> 2.8.1
>> 
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] test-netlink-conntrack: Fix sparse warning.

2016-08-03 Thread Daniele Di Proietto





On 03/08/2016 10:54, "Joe Stringer" <j...@ovn.org> wrote:

>On 3 August 2016 at 10:50, Daniele Di Proietto <diproiet...@vmware.com> wrote:
>> On some systems I get a sparse warning when compiling
>> tests/test-netlink-conntrack.c
>>
>> /usr/include/x86_64-linux-gnu/sys/cdefs.h:307:10: warning: preprocessor
>> token __always_inline redefined
>> /usr/include/linux/stddef.h:4:9: this was the original definition
>>
>> The problem seems to be that Linux upstream commit
>> 283d75737837("uapi/linux/stddef.h: Provide __always_inline to userspace
>> headers") introduced __always_inline in stddef.h, but glibc headers
>> didn't like that until e0835a5354ab("Bug 20215: Always undefine
>> __always_inline before defining it.").
>>
>> This commit works around the issue by including a glibc header before a
>> kernel header.
>>
>> Fixes: 2c06d9a927c5("ovstest: Add test-netlink-conntrack command.")
>> Reported-by: Joe Stringer <j...@ovn.org>
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>Thanks.
>
>Acked-by: Joe Stringer <j...@ovn.org>

Thanks, applied to master and branch-2.5
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] test-netlink-conntrack: Fix sparse warning.

2016-08-03 Thread Daniele Di Proietto
On some systems I get a sparse warning when compiling
tests/test-netlink-conntrack.c

/usr/include/x86_64-linux-gnu/sys/cdefs.h:307:10: warning: preprocessor
token __always_inline redefined
/usr/include/linux/stddef.h:4:9: this was the original definition

The problem seems to be that Linux upstream commit
283d75737837("uapi/linux/stddef.h: Provide __always_inline to userspace
headers") introduced __always_inline in stddef.h, but glibc headers
didn't like that until e0835a5354ab("Bug 20215: Always undefine
__always_inline before defining it.").

This commit works around the issue by including a glibc header before a
kernel header.

Fixes: 2c06d9a927c5("ovstest: Add test-netlink-conntrack command.")
Reported-by: Joe Stringer <j...@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 tests/test-netlink-conntrack.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/test-netlink-conntrack.c b/tests/test-netlink-conntrack.c
index 62bef13..f0d48f7 100644
--- a/tests/test-netlink-conntrack.c
+++ b/tests/test-netlink-conntrack.c
@@ -16,6 +16,7 @@
 
 #include 
 
+#include 
 #include 
 
 #include "ct-dpif.h"
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH RFC v3 1/1] netdev-dpdk: Add support for DPDK 16.07

2016-08-02 Thread Daniele Di Proietto
Given that using vhost PMD doesn't seem viable in the very short term, I
think we should stick with the vhost lib.

I sent a patch for ovsrcu to add a new RCU protected array index.

http://openvswitch.org/pipermail/dev/2016-August/077097.html

Thanks,

Daniele

2016-07-28 6:26 GMT-07:00 Loftus, Ciara :

> >
> > Thanks for the patch.
> > I have another concern with this.  If we're still going to rely on RCU
> to protect
> > the vhost device (and as pointed out by Ilya, I think we should) we need
> to
> > use RCU-like semantics on the vid array index. I'm not sure a boolean
> flag is
> > going to be enough.
> > CCing Jarno:
> > We have this int, which is an index into an array of vhost devices (the
> array is
> > inside the DPDK library).  We want to make sure that when
> > ovsrcu_synchronize() returns nobody is using the old index anymore.
> > Should we introduce an RCU type for indexing into arrays?  I found some
> > negative opinions here:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-
> > next.git/tree/Documentation/RCU/arrayRCU.txt?id=refs/tags/next-
> > 20160722#n13
> > but I think using atomics should prevent the compiler from playing
> tricks with
> > the index.
> >
> > How about something like the code below?
> > Thanks,
> > Daniele
>
> I think the best way forward here is to avoid the RCU mechanisms by
> merging the vHost PMD first as you have previously suggested. What do you
> think?
> If we don't go with that, I think we need to make a decision ASAP on how
> to handle the RCU (ie. is below code needed?) as both DPDK and 2.6 releases
> are imminent.
>
> Thanks,
> Ciara
>
> >
> >
> > diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h
> > index dc75749..d1a57f6 100644
> > --- a/lib/ovs-rcu.h
> > +++ b/lib/ovs-rcu.h
> > @@ -130,6 +130,41 @@
> >  #include "compiler.h"
> >  #include "ovs-atomic.h"
> >
> > +typedef struct { atomic_int v; } ovsrcu_int;
> > +
> > +static inline int ovsrcu_int_get__(const ovsrcu_int *i, memory_order
> order)
> > +{
> > +int ret;
> > +atomic_read_explicit(CONST_CAST(atomic_int *, >v), , order);
> > +return ret;
> > +}
> > +
> > +static inline int ovsrcu_int_get(const ovsrcu_int *i)
> > +{
> > +return ovsrcu_int_get__(i, memory_order_consume);
> > +}
> > +
> > +static inline int ovsrcu_int_get_protected(const ovsrcu_int *i)
> > +{
> > +return ovsrcu_int_get__(i, memory_order_relaxed);
> > +}
> > +
> > +static inline void ovsrcu_int_set__(ovsrcu_int *i, int value,
> > +memory_order order)
> > +{
> > +atomic_store_explicit(>v, value, order);
> > +}
> > +
> > +static inline void ovsrcu_int_set(ovsrcu_int *i, int value)
> > +{
> > +ovsrcu_int_set__(i, value, memory_order_release);
> > +}
> > +
> > +static inline void ovsrcu_int_set_protected(ovsrcu_int *i, int value)
> > +{
> > +ovsrcu_int_set__(i, value, memory_order_relaxed);
> > +}
> > +
> >  #if __GNUC__
> >  #define OVSRCU_TYPE(TYPE) struct { ATOMIC(TYPE) p; }
> >  #define OVSRCU_INITIALIZER(VALUE) { ATOMIC_VAR_INIT(VALUE) }
> >
> >
> > 2016-07-22 8:55 GMT-07:00 Ciara Loftus :
> > This commit introduces support for DPDK 16.07 and consequently breaks
> > compatibility with DPDK 16.04.
> >
> > DPDK 16.07 introduces some changes to various APIs. These have been
> > updated in OVS, including:
> > * xstats API: changes to structure of xstats
> > * vhost API:  replace virtio-net references with 'vid'
> >
> > Signed-off-by: Ciara Loftus 
> > Tested-by: Maxime Coquelin 
> >
> > v3:
> > - fixed style issues
> > - fixed & simplified xstats frees
> > - use xcalloc & free instead of rte_mzalloc & rte_free for stats
> > - remove libnuma include
> > - fixed & simplified vHost NUMA set
> > - added flag to indicate device reconfigured at least once
> > - re-add call to rcu synchronise in destroy_device
> > - define IF_NAME_SZ and use instead of PATH_MAX
> >
> > v2:
> > - rebase with DPDK rc2
> > - rebase with OVS master
> > - fix vhost cuse compilation
> > ---
> >  .travis/linux-build.sh   |   2 +-
> >  INSTALL.DPDK-ADVANCED.md |   8 +-
> >  INSTALL.DPDK.md  |  20 ++---
> >  NEWS |   1 +
> >  lib/netdev-dpdk.c| 220
> +++---
> > -
> >  5 files changed, 126 insertions(+), 125 deletions(-)
> >
> > diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
> > index 065de39..1b3d43d 100755
> > --- a/.travis/linux-build.sh
> > +++ b/.travis/linux-build.sh
> > @@ -68,7 +68,7 @@ fi
> >
> >  if [ "$DPDK" ]; then
> >  if [ -z "$DPDK_VER" ]; then
> > -DPDK_VER="16.04"
> > +DPDK_VER="16.07"
> >  fi
> >  install_dpdk $DPDK_VER
> >  if [ "$CC" = "clang" ]; then
> > diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> > index 9ae536d..ec1de29 100644
> > --- a/INSTALL.DPDK-ADVANCED.md
> > +++ b/INSTALL.DPDK-ADVANCED.md
> > @@ -43,7 +43,7 @@ for DPDK and 

Re: [ovs-dev] [PATCH RFC v3 1/1] netdev-dpdk: Add support for DPDK 16.07

2016-08-02 Thread Daniele Di Proietto
2016-07-24 11:06 GMT-07:00 Ben Pfaff <b...@ovn.org>:

> On Sun, Jul 24, 2016 at 10:17:13AM -0400, Aaron Conole wrote:
> > Daniele Di Proietto <diproiet...@ovn.org> writes:
> >
> > > Thanks for the patch.
> > >
> > > I have another concern with this.  If we're still going to rely on RCU
> to
> > > protect the vhost device (and as pointed out by Ilya, I think we
> should) we
> > > need to use RCU-like semantics on the vid array index. I'm not sure a
> > > boolean flag is going to be enough.
> > >
> > > CCing Jarno:
> > >
> > > We have this int, which is an index into an array of vhost devices (the
> > > array is inside the DPDK library).  We want to make sure that when
> > > ovsrcu_synchronize() returns nobody is using the old index anymore.
> > >
> > > Should we introduce an RCU type for indexing into arrays?  I found some
> > > negative opinions here:
> > >
> > >
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/Documentation/RCU/arrayRCU.txt?id=refs/tags/next-20160722#n13
> > >
> > > but I think using atomics should prevent the compiler from playing
> tricks
> > > with the index.
> >
> > Is there a reason to prefer atomics over something like a reference
> > counted array pointer (as described in the linked text)?  Are you
> > afraid of the malloc/memcpy overhead in the worst case?  I think
> > many times side effects of atomics can be difficult to debug, because
> > of the subtleties of various chip synchronization protocols.  Maybe it's
> > okay if you are only going to support Intel chips, though.  Just my
> > $0.02
>
> We definitely don't support just Intel chips.  (We're not etcd!)
>
> Thanks,
>
> Ben.
>

I think by using atomics and making sure that we follow the C11 memory
model, it should work on every architecture supported by the compiler.

I agree that using atomics is not always straightforward, but we had a
pretty good experience so far with RCU.  The approach I'm proposing uses
atomics in the same way they're used for pointers.

It'd be the same as using an RCU pointer to an malloc'd int (the vid
index), but I'd prefer avoiding the extra indirection in the fast path and
the extra allocation in the slow path.

Thanks,

Daniele
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] ovs-rcu: Add new ovsrcu_index type.

2016-08-02 Thread Daniele Di Proietto
With RCU in Open vSwitch it's very easy to protect objects accessed by
a pointer, but sometimes a pointer is not available.

One example is the vhost id for DPDK 16.07.  Until DPDK 16.04 a pointer
was used to access a vhost device with RCU semantics.  From DPDK 16.07
an integer id (which is an array index) is used to access a vhost
device.  Ideally, we want the exact same RCU semantics that we had for
the pointer, on the integer (atomicity, memory barriers, behaviour
around quiescent states)

This commit implements a new type in ovs-rcu: ovsrcu_index. The newly
implemented ovsrcu_index_*() functions should be used to access the
type.

Even though we say "Do not, in general, declare a typedef for a struct,
union, or enum.", I think we're not in the "general" case.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/ovs-rcu.h | 84 +++
 1 file changed, 84 insertions(+)

diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h
index dc75749..2887bb8 100644
--- a/lib/ovs-rcu.h
+++ b/lib/ovs-rcu.h
@@ -125,6 +125,36 @@
  * ovs_mutex_unlock();
  * }
  *
+ * In some rare cases an object may not be addressable with a pointer, but only
+ * through an array index (e.g. because it's provided by another library).  It
+ * is still possible to have RCU semantics by using the ovsrcu_index type.
+ *
+ * static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
+ *
+ * ovsrcu_index port_id;
+ *
+ * void tx()
+ * {
+ * int id = ovsrcu_index_get(_id);
+ * if (id == -1) {
+ * return;
+ * }
+ * port_tx(id);
+ * }
+ *
+ * void delete()
+ * {
+ * int id;
+ *
+ * ovs_mutex_lock();
+ * id = ovsrcu_index_get_protected(_id);
+ * ovsrcu_index_set(_id, -1);
+ * ovs_mutex_unlock();
+ *
+ * ovsrcu_synchronize();
+ * port_delete(id);
+ * }
+ *
  */
 
 #include "compiler.h"
@@ -213,6 +243,60 @@ void ovsrcu_postpone__(void (*function)(void *aux), void 
*aux);
  (void) sizeof(*(ARG)), \
  ovsrcu_postpone__((void (*)(void *))(FUNCTION), ARG))
 
+/* An array index protected by RCU semantics.  This is an easier alternative to
+ * an RCU protected pointer to a malloc'd int. */
+typedef struct { atomic_int v; } ovsrcu_index;
+
+static inline int ovsrcu_index_get__(const ovsrcu_index *i, memory_order order)
+{
+int ret;
+atomic_read_explicit(CONST_CAST(atomic_int *, >v), , order);
+return ret;
+}
+
+/* Returns the index contained in 'i'.  The returned value can be used until
+ * the next grace period. */
+static inline int ovsrcu_index_get(const ovsrcu_index *i)
+{
+return ovsrcu_index_get__(i, memory_order_consume);
+}
+
+/* Returns the index contained in 'i'.  This is an alternative to
+ * ovsrcu_index_get() that can be used when there's no possible concurrent
+ * writer. */
+static inline int ovsrcu_index_get_protected(const ovsrcu_index *i)
+{
+return ovsrcu_index_get__(i, memory_order_relaxed);
+}
+
+static inline void ovsrcu_index_set__(ovsrcu_index *i, int value,
+  memory_order order)
+{
+atomic_store_explicit(>v, value, order);
+}
+
+/* Writes the index 'value' in 'i'.  The previous value of 'i' may still be
+ * used by readers until the next grace period. */
+static inline void ovsrcu_index_set(ovsrcu_index *i, int value)
+{
+ovsrcu_index_set__(i, value, memory_order_release);
+}
+
+/* Writes the index 'value' in 'i'.  This is an alternative to
+ * ovsrcu_index_set() that can be used when there's no possible concurrent
+ * reader. */
+static inline void ovsrcu_index_set_hidden(ovsrcu_index *i, int value)
+{
+ovsrcu_index_set__(i, value, memory_order_relaxed);
+}
+
+/* Initializes 'i' with 'value'.  This is safe to call as long as there are no
+ * concurrent readers. */
+static inline void ovsrcu_index_init(ovsrcu_index *i, int value)
+{
+atomic_init(>v, value);
+}
+
 /* Quiescent states. */
 void ovsrcu_quiesce_start(void);
 void ovsrcu_quiesce_end(void);
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] tests: Fix conntrack tests on windows.

2016-08-02 Thread Daniele Di Proietto

On 02/08/2016 13:19, "Joe Stringer" <j...@ovn.org> wrote:

>On 2 August 2016 at 11:58, Daniele Di Proietto <diproiet...@vmware.com> wrote:
>> The conntrack unit tests seem to generate different megaflow masks on
>> Windows.  The megaflow masks depend on the internal ordering of the
>> subtables, which are sorted using qsort(), based on their max priority.
>> If two subtables have the same priority the ordering between them depend
>> on the stability property of qsort(), which apparently are different
>> between Windows and Linux/*BSD.
>>
>> This commit uses multiple OpenFlow tables to build our conntrack
>> pipelines in the tests, which gives us more control over the visited
>> subtables and also improves clarity
>>
>> Reported-by: Alin Serdean <aserd...@cloudbasesolutions.com>
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>Thanks for fixing this, LGTM.
>
>Minor comment, the flows for port 2 in table 0 in each of these tests
>don't really need a match on ct_state=-trk now that we have different
>tables for pre-conntrack and post-conntrack flows.

Good point, I removed all the superfluous ct_state=-trk from the tables.

>
>Acked-by: Joe Stringer <j...@ovn.org>

Thanks for the fast review, applied to master

>
>> ---
>>  tests/ofproto-dpif.at | 263 
>> +-
>>  1 file changed, 174 insertions(+), 89 deletions(-)
>>
>> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
>> index 5ce6439..b2373d3 100644
>> --- a/tests/ofproto-dpif.at
>> +++ b/tests/ofproto-dpif.at
>> @@ -8107,11 +8107,17 @@ AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg 
>> vconn:info ofproto_dpif:info])
>>
>>  dnl Allow new connections on p1->p2, but not on p2->p1.
>>  AT_DATA([flows.txt], [dnl
>> -priority=1,action=drop
>> -priority=10,arp,action=normal
>> -priority=100,in_port=1,udp,action=ct(commit,zone=0),controller
>> -priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0,zone=0)
>> -priority=100,in_port=2,ct_state=+trk+est-new,udp,action=controller
>> +dnl Table 0
>> +dnl
>> +table=0,priority=100,arp,action=normal
>> +table=0,priority=10,in_port=1,udp,action=ct(commit,zone=0),controller
>> +table=0,priority=10,in_port=2,ct_state=-trk,udp,action=ct(table=1,zone=0)
>> +table=0,priority=1,action=drop
>> +dnl
>> +dnl Table 1
>> +dnl
>> +table=1,priority=10,in_port=2,ct_state=+trk+est-new,udp,action=controller
>> +table=1,priority=1,action=drop
>>  ])
>>
>>  AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
>> @@ -8137,7 +8143,7 @@ AT_CHECK([cat ofctl_monitor.log], [0], [dnl
>>  NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) 
>> data_len=42 (unbuffered)
>>  
>> udp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=1,tp_dst=2
>>  udp_csum:e9d6
>>  dnl
>> -NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=42 
>> ct_state=est|rpl|trk,in_port=2 (via action) data_len=42 (unbuffered)
>> +NXT_PACKET_IN (xid=0x0): table_id=1 cookie=0x0 total_len=42 
>> ct_state=est|rpl|trk,in_port=2 (via action) data_len=42 (unbuffered)
>>  
>> udp,vlan_tci=0x,dl_src=50:54:00:00:00:0a,dl_dst=50:54:00:00:00:09,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=2,tp_dst=1
>>  udp_csum:e9d6
>>  ])
>>
>> @@ -8160,7 +8166,7 @@ AT_CHECK([cat ofctl_monitor.log], [0], [dnl
>>  NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) 
>> data_len=42 (unbuffered)
>>  
>> udp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=3,tp_dst=4
>>  udp_csum:e9d2
>>  dnl
>> -NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=42 
>> ct_state=est|rpl|trk,in_port=2 (via action) data_len=42 (unbuffered)
>> +NXT_PACKET_IN (xid=0x0): table_id=1 cookie=0x0 total_len=42 
>> ct_state=est|rpl|trk,in_port=2 (via action) data_len=42 (unbuffered)
>>  
>> udp,vlan_tci=0x,dl_src=50:54:00:00:00:0a,dl_dst=50:54:00:00:00:09,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=4,tp_dst=3
>>  udp_csum:e9d2
>>  ])
>>
>> @@ -8176,11 +8182,16 @@ AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg 
>> vconn:info ofproto_dpif:info])
>>
>>  dnl Allow new connections on p1->p2, but not on p2->p1.
>>  AT_DATA([flows.txt], [dnl
>> -priority=1,action=drop
>> -priority=10,arp,action=normal
>> -priority=100,in_port=1,udp6,action=ct(com

Re: [ovs-dev] FAILING UNIT ofproto-dpif.at:8215: testing ofproto-dpif - conntrack - output action

2016-08-02 Thread Daniele Di Proietto
Hi Alin,

Thanks for the report!

I couldn't reproduce this on Linux or BSD, but Guru (thanks!) gave me a
Windows
 setup, which showed the problem.

I sent a patch to fix this and the other issue you reported here:

http://openvswitch.org/pipermail/dev/2016-August/077062.html

Thanks,

Daniele

2016-08-01 9:56 GMT-07:00 Alin Serdean :

> Error log:
>
> 1153. ofproto-dpif.at:8215: testing ofproto-dpif - conntrack - output
> action ...
> ./ofproto-dpif.at:8216: ovsdb-tool create conf.db
> $abs_top_srcdir/vswitchd/vswitch.ovsschema
> ./ofproto-dpif.at:8216: ovsdb-server --detach --no-chdir --pidfile
> --log-file --remote=punix:$OVS_RUNDIR/db.sock
> stderr:
> ./ofproto-dpif.at:8216: sed < stderr '
> /vlog|INFO|opened log file/d
> /ovsdb_server|INFO|ovsdb-server (Open vSwitch)/d'
> ./ofproto-dpif.at:8216: ovs-vsctl --no-wait init
> ./ofproto-dpif.at:8216: ovs-vswitchd --enable-dummy --disable-system
> --detach --no-chdir --pidfile --log-file -vvconn -vofproto_dpif -vunixctl
> stderr:
> ./ofproto-dpif.at:8216: sed < stderr '
> /ovs_numa|INFO|Discovered /d
> /vlog|INFO|opened log file/d
> /vswitchd|INFO|ovs-vswitchd (Open vSwitch)/d
> /reconnect|INFO|/d
> /ofproto|INFO|using datapath ID/d
> /netdev_linux|INFO|.*device has unknown hardware address family/d
> /ofproto|INFO|datapath ID changed to fedcba9876543210/d
> /dpdk|INFO|DPDK Disabled - to change this requires a restart./d'
> ./ofproto-dpif.at:8216: add_of_br 0
> ovs-vsctl -- add-port br0 p1 -- set Interface p1 type=dummy
> ofport_request=1 -- add-port br0 p2 -- set Interface p2 type=dummy
> ofport_request=2
> ./ofproto-dpif.at:8220: ovs-appctl vlog/set dpif_netdev:dbg vconn:info
> ofproto_dpif:info
> ./ofproto-dpif.at:8231: ovs-ofctl add-flows br0 flows.txt
> ./ofproto-dpif.at:8234: ovs-appctl netdev-dummy/receive p2
> 'in_port(2),eth(src=50:54:00:00:00:0a,dst=50:54:00:00:00:09),eth_type(0x0800),ipv4(src=10.1.1.2,
> dst=10.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=2,dst=1)'
> ./ofproto-dpif.at:8237: ovs-appctl netdev-dummy/receive p1
> 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.1.1.1,
> dst=10.1.1.2,proto=17,tos=0,ttl=64,frag=no),udp(src=1,dst=2)'
> ./ofproto-dpif.at:8240: ovs-appctl netdev-dummy/receive p2
> 'in_port(2),eth(src=50:54:00:00:00:0a,dst=50:54:00:00:00:09),eth_type(0x0800),ipv4(src=10.1.1.2,
> dst=10.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=2,dst=1)'
> ./ofproto-dpif.at:8243: ovs-appctl netdev-dummy/receive p1
> 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.1.1.1,
> dst=10.1.1.2,proto=17,tos=0,ttl=64,frag=no),udp(src=1,dst=2)'
> ./ofproto-dpif.at:8246: ovs-appctl netdev-dummy/receive p2
> 'in_port(2),eth(src=50:54:00:00:00:0a,dst=50:54:00:00:00:09),eth_type(0x0800),ipv4(src=10.1.1.2,
> dst=10.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=2,dst=1)'
> ./ofproto-dpif.at:8249: cat ovs-vswitchd.log | strip_ufid |
> filter_flow_install
> --- -   2016-08-01 19:53:12 +0300
> +++ /c/_2016/august/01/ovs/tests/testsuite.dir/at-groups/1153/stdout
> 2016-08-01 19:53:12 +0300
> @@ -1,5 +1,5 @@
>  
> ct_state(+new-est+trk),recirc_id(0x1),in_port(2),eth_type(0x0800),ipv4(frag=no),
> actions:drop
>  
> ct_state(-new+est+trk),recirc_id(0x1),in_port(2),eth_type(0x0800),ipv4(proto=17,frag=no),
> actions:1
> -ct_state(-trk),recirc_id(0),in_port(2),eth_type(0x0800),ipv4(proto=17,frag=no),
> actions:ct,recirc(0x1)
> +ct_state(-new-est-trk),recirc_id(0),in_port(2),eth_type(0x0800),ipv4(proto=17,frag=no),
> actions:ct,recirc(0x1)
>  recirc_id(0),in_port(1),eth_type(0x0800),ipv4(proto=17,frag=no),
> actions:ct(commit),2
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] tests: Fix conntrack tests on windows.

2016-08-02 Thread Daniele Di Proietto
The conntrack unit tests seem to generate different megaflow masks on
Windows.  The megaflow masks depend on the internal ordering of the
subtables, which are sorted using qsort(), based on their max priority.
If two subtables have the same priority the ordering between them depend
on the stability property of qsort(), which apparently are different
between Windows and Linux/*BSD.

This commit uses multiple OpenFlow tables to build our conntrack
pipelines in the tests, which gives us more control over the visited
subtables and also improves clarity

Reported-by: Alin Serdean <aserd...@cloudbasesolutions.com>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 tests/ofproto-dpif.at | 263 +-
 1 file changed, 174 insertions(+), 89 deletions(-)

diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 5ce6439..b2373d3 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -8107,11 +8107,17 @@ AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg 
vconn:info ofproto_dpif:info])
 
 dnl Allow new connections on p1->p2, but not on p2->p1.
 AT_DATA([flows.txt], [dnl
-priority=1,action=drop
-priority=10,arp,action=normal
-priority=100,in_port=1,udp,action=ct(commit,zone=0),controller
-priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0,zone=0)
-priority=100,in_port=2,ct_state=+trk+est-new,udp,action=controller
+dnl Table 0
+dnl
+table=0,priority=100,arp,action=normal
+table=0,priority=10,in_port=1,udp,action=ct(commit,zone=0),controller
+table=0,priority=10,in_port=2,ct_state=-trk,udp,action=ct(table=1,zone=0)
+table=0,priority=1,action=drop
+dnl
+dnl Table 1
+dnl
+table=1,priority=10,in_port=2,ct_state=+trk+est-new,udp,action=controller
+table=1,priority=1,action=drop
 ])
 
 AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
@@ -8137,7 +8143,7 @@ AT_CHECK([cat ofctl_monitor.log], [0], [dnl
 NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) 
data_len=42 (unbuffered)
 
udp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=1,tp_dst=2
 udp_csum:e9d6
 dnl
-NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=42 
ct_state=est|rpl|trk,in_port=2 (via action) data_len=42 (unbuffered)
+NXT_PACKET_IN (xid=0x0): table_id=1 cookie=0x0 total_len=42 
ct_state=est|rpl|trk,in_port=2 (via action) data_len=42 (unbuffered)
 
udp,vlan_tci=0x,dl_src=50:54:00:00:00:0a,dl_dst=50:54:00:00:00:09,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=2,tp_dst=1
 udp_csum:e9d6
 ])
 
@@ -8160,7 +8166,7 @@ AT_CHECK([cat ofctl_monitor.log], [0], [dnl
 NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) 
data_len=42 (unbuffered)
 
udp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=3,tp_dst=4
 udp_csum:e9d2
 dnl
-NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=42 
ct_state=est|rpl|trk,in_port=2 (via action) data_len=42 (unbuffered)
+NXT_PACKET_IN (xid=0x0): table_id=1 cookie=0x0 total_len=42 
ct_state=est|rpl|trk,in_port=2 (via action) data_len=42 (unbuffered)
 
udp,vlan_tci=0x,dl_src=50:54:00:00:00:0a,dl_dst=50:54:00:00:00:09,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=4,tp_dst=3
 udp_csum:e9d2
 ])
 
@@ -8176,11 +8182,16 @@ AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg 
vconn:info ofproto_dpif:info])
 
 dnl Allow new connections on p1->p2, but not on p2->p1.
 AT_DATA([flows.txt], [dnl
-priority=1,action=drop
-priority=10,arp,action=normal
-priority=100,in_port=1,udp6,action=ct(commit,zone=0),controller
-priority=100,in_port=2,ct_state=-trk,udp6,action=ct(table=0,zone=0)
-priority=100,in_port=2,ct_state=+trk+est-new,udp6,action=controller
+dnl Table 0
+dnl
+table=0,priority=100,arp,action=normal
+table=0,priority=10,in_port=1,udp6,action=ct(commit,zone=0),controller
+table=0,priority=10,in_port=2,ct_state=-trk,udp6,action=ct(table=1,zone=0)
+table=0,priority=1,action=drop
+dnl Table 1
+dnl
+table=1,priority=10,in_port=2,ct_state=+trk+est-new,udp6,action=controller
+table=1,priority=1,action=drop
 ])
 
 AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
@@ -8205,7 +8216,7 @@ dnl happens because the ct_state field is available only 
after recirc.
 AT_CHECK([cat ofctl_monitor.log], [0], [dnl
 NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=62 in_port=1 (via action) 
data_len=62 (unbuffered)
 
udp6,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,ipv6_src=2001:db8::1,ipv6_dst=2001:db8::2,ipv6_label=0x0,nw_tos=112,nw_ecn=0,nw_ttl=128,tp_src=1,tp_dst=2
 udp_csum:a466
-NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=62 
ct_state=est|rpl|trk,in_port=2 (via action) data_len=62 (unbuffered)
+NXT_PACKET_IN (xid=0x0): table_id=1 cookie=0x0 total_len=62 
ct_state=est|rpl|trk,in_port=2 (via action) data_len=62 (unbuffered)
 
udp6,vlan_tci=0x,dl_src=50:54:00:00:00:0a,dl_dst=50:54:00:00:00:09,ipv6_src=2001:db8

[ovs-dev] [PATCH 2/7] vswitchd: Introduce 'mtu_request' column in Interface.

2016-07-29 Thread Daniele Di Proietto
The 'mtu_request' column can be used to set the MTU of a specific
interface.

This column is useful because it will allow changing the MTU of DPDK
devices (implemented in a future commit), which are not accessible
outside the ovs-vswitchd process, but it can be used for kernel
interfaces as well.

The current implementation of set_mtu() in netdev-dpdk is removed
because it's broken.  It will be reintroduced by a subsequent commit on
this series.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 NEWS   |  2 ++
 lib/netdev-dpdk.c  | 53 +-
 vswitchd/bridge.c  |  9 
 vswitchd/vswitch.ovsschema | 10 +++--
 vswitchd/vswitch.xml   | 52 +
 5 files changed, 58 insertions(+), 68 deletions(-)

diff --git a/NEWS b/NEWS
index 8221505..0ff5616 100644
--- a/NEWS
+++ b/NEWS
@@ -100,6 +100,8 @@ Post-v2.5.0
- ovs-pki: Changed message digest algorithm from SHA-1 to SHA-512 because
  SHA-1 is no longer secure and some operating systems have started to
  disable it in OpenSSL.
+   - Add 'mtu_request' column to the Interface table. It can be used to
+ configure the MTU of non-internal ports.
 
 
 v2.5.0 - 26 Feb 2016
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index a0d541a..0b6e410 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1635,57 +1635,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int 
*mtup)
 }
 
 static int
-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
-{
-struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-int old_mtu, err, dpdk_mtu;
-struct dpdk_mp *old_mp;
-struct dpdk_mp *mp;
-uint32_t buf_size;
-
-ovs_mutex_lock(_mutex);
-ovs_mutex_lock(>mutex);
-if (dev->mtu == mtu) {
-err = 0;
-goto out;
-}
-
-buf_size = dpdk_buf_size(mtu);
-dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
-
-mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
-if (!mp) {
-err = ENOMEM;
-goto out;
-}
-
-rte_eth_dev_stop(dev->port_id);
-
-old_mtu = dev->mtu;
-old_mp = dev->dpdk_mp;
-dev->dpdk_mp = mp;
-dev->mtu = mtu;
-dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-
-err = dpdk_eth_dev_init(dev);
-if (err) {
-dpdk_mp_put(mp);
-dev->mtu = old_mtu;
-dev->dpdk_mp = old_mp;
-dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-dpdk_eth_dev_init(dev);
-goto out;
-}
-
-dpdk_mp_put(old_mp);
-netdev_change_seq_changed(netdev);
-out:
-ovs_mutex_unlock(>mutex);
-ovs_mutex_unlock(_mutex);
-return err;
-}
-
-static int
 netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier);
 
 static int
@@ -2949,7 +2898,7 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev)
 netdev_dpdk_set_etheraddr,\
 netdev_dpdk_get_etheraddr,\
 netdev_dpdk_get_mtu,  \
-netdev_dpdk_set_mtu,  \
+NULL,   /* set_mtu */ \
 netdev_dpdk_get_ifindex,  \
 GET_CARRIER,  \
 netdev_dpdk_get_carrier_resets,   \
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 07f7b55..b947a7c 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -775,6 +775,15 @@ bridge_delete_or_reconfigure_ports(struct bridge *br)
 goto delete;
 }
 
+if (iface->cfg->n_mtu_request == 1
+&& strcmp(iface->type,
+  ofproto_port_open_type(br->type, "internal"))) {
+/* Try to set the MTU to the requested value.  This is not done
+ * for internal interfaces, since their MTU is decided by the
+ * ofproto module, based on other ports in the bridge. */
+netdev_set_mtu(iface->netdev, *iface->cfg->mtu_request);
+}
+
 /* If the requested OpenFlow port for 'iface' changed, and it's not
  * already the correct port, then we might want to temporarily delete
  * this interface, so we can add it back again with the new OpenFlow
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 32fdf28..8966803 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
 {"name": "Open_vSwitch",
- "version": "7.13.0",
- "cksum": "889248633 22774",
+ "version": "7.14.0",
+ "cksum": "3974332717 22936",
  "tables": {
"Open_vSwitch": {
  "columns": {
@@ -321,6 +321,12 @@
"mtu": {
  "type": {"key&q

[ovs-dev] [PATCH 7/7] netdev-dpdk: add support for Jumbo Frames

2016-07-29 Thread Daniele Di Proietto
From: Mark Kavanagh <mark.b.kavan...@intel.com>

Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.

Using this approach, the amount of memory allocated to each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame of a specific
size can be carried in a single mbuf, as opposed to partitioning
it across multiple mbuf segments.

The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
parameter.

Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com>
[diproiet...@vmware.com rebased]
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 INSTALL.DPDK-ADVANCED.md |  59 +-
 INSTALL.DPDK.md  |   1 -
 NEWS |   1 +
 lib/netdev-dpdk.c| 151 +++
 4 files changed, 185 insertions(+), 27 deletions(-)

diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
index 191e69e..5cd64bf 100755
--- a/INSTALL.DPDK-ADVANCED.md
+++ b/INSTALL.DPDK-ADVANCED.md
@@ -1,5 +1,5 @@
 OVS DPDK ADVANCED INSTALL GUIDE
-=
+===
 
 ## Contents
 
@@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
 7. [QOS](#qos)
 8. [Rate Limiting](#rl)
 9. [Flow Control](#fc)
-10. [Vsperf](#vsperf)
+10. [Jumbo Frames](#jumbo)
+11. [Vsperf](#vsperf)
 
 ##  1. Overview
 
@@ -862,7 +863,59 @@ respective parameter. To disable the flow control at tx 
side,
 
 `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
 
-##  10. Vsperf
+##  10. Jumbo Frames
+
+By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
+enable Jumbo Frames support for a DPDK port, change the Interface's 
`mtu_request`
+attribute to a sufficiently large value.
+
+e.g. Add a DPDK Phy port with MTU of 9000:
+
+`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
Interface dpdk0 mtu_request=9000`
+
+e.g. Change the MTU of an existing port to 6200:
+
+`ovs-vsctl set Interface dpdk0 mtu_request=6200`
+
+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
+increased, such that a full Jumbo Frame of a specific size may be accommodated
+within a single mbuf segment.
+
+Jumbo frame support has been validated against 9728B frames (largest frame size
+supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
+(particularly in use cases involving East-West traffic only), and other DPDK 
NIC
+drivers may be supported.
+
+### 9.1 vHost Ports and Jumbo Frames
+
+Some additional configuration is needed to take advantage of jumbo frames with
+vhost ports:
+
+1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
+the QEMU command line snippet below:
+
+```
+'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
+'-device 
virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
+```
+
+2. Where virtio devices are bound to the Linux kernel driver in a guest
+   environment (i.e. interfaces are not bound to an in-guest DPDK driver),
+   the MTU of those logical network interfaces must also be increased to a
+   sufficiently large value. This avoids segmentation of Jumbo Frames
+   received in the guest. Note that 'MTU' refers to the length of the IP
+   packet only, and not that of the entire frame.
+
+   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
+   header and CRC lengths (i.e. 18B) from the max supported frame size.
+   So, to set the MTU for a 9018B Jumbo Frame:
+
+   ```
+   ifconfig eth1 mtu 9000
+   ```
+>>>>>>> 5ec921d... netdev-dpdk: add support for Jumbo Frames
+
+##  11. Vsperf
 
 Vsperf project goal is to develop vSwitch test framework that can be used to
 validate the suitability of different vSwitch implementations in a Telco 
deployment
diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 7609aa7..25c79de 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -590,7 +590,6 @@ can be found in [Vhost Walkthrough].
 
 ##  6. Limitations
 
-  - Supports MTU size 1500, MTU setting for DPDK netdevs will be in future OVS 
release.
   - Currently DPDK ports does not use HW offload functionality.
   - Network Interface Firmware requirements:
 Each release of DPDK is validated against a specific firmware version for
diff --git a/NEWS b/NEWS
index 0ff5616..c004e5f 100644
--- a/NEWS
+++ b/NEWS
@@ -68,6 +68,7 @@ Post-v2.5.0
is enabled in DPDK.
  * Basic connection tracking for the userspace datapath (no ALG,
fragmentation or NAT support yet)
+ * Jumbo frame support
- Increase number of registers to 16.
- ovs-benchmark: This utility has been removed due to lack of use and
  bitrot.
diff

[ovs-dev] [PATCH 6/7] netdev: Make netdev_set_mtu() netdev parameter non-const.

2016-07-29 Thread Daniele Di Proietto
Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass.  Might as well drop the const
attribute from the parameter, since this is a "set" function.
---
 lib/netdev-dummy.c| 2 +-
 lib/netdev-linux.c| 2 +-
 lib/netdev-provider.h | 2 +-
 lib/netdev.c  | 2 +-
 lib/netdev.h  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index c8f82b7..dec1a8e 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1150,7 +1150,7 @@ netdev_dummy_get_mtu(const struct netdev *netdev, int 
*mtup)
 }
 
 static int
-netdev_dummy_set_mtu(const struct netdev *netdev, int mtu)
+netdev_dummy_set_mtu(struct netdev *netdev, int mtu)
 {
 struct netdev_dummy *dev = netdev_dummy_cast(netdev);
 
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 1b5f7c1..20b5cc7 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1382,7 +1382,7 @@ netdev_linux_get_mtu(const struct netdev *netdev_, int 
*mtup)
  * networking ioctl interface.
  */
 static int
-netdev_linux_set_mtu(const struct netdev *netdev_, int mtu)
+netdev_linux_set_mtu(struct netdev *netdev_, int mtu)
 {
 struct netdev_linux *netdev = netdev_linux_cast(netdev_);
 struct ifreq ifr;
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 5bcfeba..cd04ae9 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -389,7 +389,7 @@ struct netdev_class {
  * If 'netdev' does not have an MTU (e.g. as some tunnels do not), then
  * this function should return EOPNOTSUPP.  This function may be set to
  * null if it would always return EOPNOTSUPP. */
-int (*set_mtu)(const struct netdev *netdev, int mtu);
+int (*set_mtu)(struct netdev *netdev, int mtu);
 
 /* Returns the ifindex of 'netdev', if successful, as a positive number.
  * On failure, returns a negative errno value.
diff --git a/lib/netdev.c b/lib/netdev.c
index 589d37c..5cf8bbb 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -869,7 +869,7 @@ netdev_get_mtu(const struct netdev *netdev, int *mtup)
  * MTU (as e.g. some tunnels do not).  On other failure, returns a positive
  * errno value. */
 int
-netdev_set_mtu(const struct netdev *netdev, int mtu)
+netdev_set_mtu(struct netdev *netdev, int mtu)
 {
 const struct netdev_class *class = netdev->netdev_class;
 int error;
diff --git a/lib/netdev.h b/lib/netdev.h
index dc7ede8..d8ec627 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -132,7 +132,7 @@ const char *netdev_get_name(const struct netdev *);
 const char *netdev_get_type(const struct netdev *);
 const char *netdev_get_type_from_name(const char *);
 int netdev_get_mtu(const struct netdev *, int *mtup);
-int netdev_set_mtu(const struct netdev *, int mtu);
+int netdev_set_mtu(struct netdev *, int mtu);
 int netdev_get_ifindex(const struct netdev *);
 int netdev_set_tx_multiq(struct netdev *, unsigned int n_txq);
 
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 5/7] tests: Add a new MTU test.

2016-07-29 Thread Daniele Di Proietto
Also, netdev-dummy needs to call netdev_change_seq_changed() in
set_mtu().

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/netdev-dummy.c|  5 -
 tests/ofproto-dpif.at | 30 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 92af15f..c8f82b7 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1155,7 +1155,10 @@ netdev_dummy_set_mtu(const struct netdev *netdev, int 
mtu)
 struct netdev_dummy *dev = netdev_dummy_cast(netdev);
 
 ovs_mutex_lock(>mutex);
-dev->mtu = mtu;
+if (dev->mtu != mtu) {
+dev->mtu = mtu;
+netdev_change_seq_changed(netdev);
+}
 ovs_mutex_unlock(>mutex);
 
 return 0;
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 0892f07..5fb5727 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -8774,3 +8774,33 @@ n_packets=0
 
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([ofproto - set mtu])
+OVS_VSWITCHD_START
+
+add_of_ports br0 1
+
+# Check that initial MTU is 1500 for 'br0' and 'p1'.
+AT_CHECK([ovs-vsctl get Interface br0 mtu], [0], [dnl
+1500
+])
+AT_CHECK([ovs-vsctl get Interface p1 mtu], [0], [dnl
+1500
+])
+
+# Request new MTU for 'p1'
+AT_CHECK([ovs-vsctl set Interface p1 mtu_request=1600])
+
+# Check that the new MTU is applied
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface p1 mtu=1600])
+# The internal port 'br0' should have the same MTU value as p1, becase it's
+# the new bridge minimum.
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface br0 mtu=1600])
+
+AT_CHECK([ovs-vsctl del-port br0 p1])
+
+# When 'p1' is deleted, the internal port should return to the default MTU
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface br0 mtu=1500])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 4/7] netdev-dummy: Add dummy-internal class.

2016-07-29 Thread Daniele Di Proietto
"internal" netdevs are treated specially in OVS (e.g. for MTU), but
the dummy datapath remaps both "system" and "internal" devices to the
same "dummy" netdev class, so there's no way to discern those in tests.

This commit adds a new "dummy-internal" netdev type, which will be used
by the dummy datapath for internal ports, so that other parts of the
code can understand which ports are internal just by looking at the
netdev object.

The alternative solution, using the original interface type ("internal")
instead of the translated netdev type ("dummy"), is harder to implement,
because in so many places only the netdev object is available.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/dpif-netdev.c |  2 +-
 lib/netdev-dummy.c| 14 --
 tests/bridge.at   |  6 +++---
 tests/dpctl.at| 12 ++--
 tests/mpls-xlate.at   |  4 ++--
 tests/netdev-type.at  |  2 +-
 tests/ofproto-dpif.at | 18 +-
 tests/ovs-vswitchd.at |  6 +++---
 tests/pmd.at  |  8 
 tests/tunnel-push-pop-ipv6.at |  4 ++--
 tests/tunnel-push-pop.at  |  4 ++--
 tests/tunnel.at   | 28 ++--
 12 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e39362e..6f2e07d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -888,7 +888,7 @@ static const char *
 dpif_netdev_port_open_type(const struct dpif_class *class, const char *type)
 {
 return strcmp(type, "internal") ? type
-  : dpif_netdev_class_is_dummy(class) ? "dummy"
+  : dpif_netdev_class_is_dummy(class) ? "dummy-internal"
   : "tap";
 }
 
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 2a6aa56..92af15f 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -622,12 +622,15 @@ dummy_netdev_get_conn_state(struct dummy_packet_conn 
*conn)
 }
 
 static void
-netdev_dummy_run(const struct netdev_class *netdev_class OVS_UNUSED)
+netdev_dummy_run(const struct netdev_class *netdev_class)
 {
 struct netdev_dummy *dev;
 
 ovs_mutex_lock(_list_mutex);
 LIST_FOR_EACH (dev, list_node, _list) {
+if (netdev_get_class(>up) != netdev_class) {
+continue;
+}
 ovs_mutex_lock(>mutex);
 dummy_packet_conn_run(dev);
 ovs_mutex_unlock(>mutex);
@@ -636,12 +639,15 @@ netdev_dummy_run(const struct netdev_class *netdev_class 
OVS_UNUSED)
 }
 
 static void
-netdev_dummy_wait(const struct netdev_class *netdev_class OVS_UNUSED)
+netdev_dummy_wait(const struct netdev_class *netdev_class)
 {
 struct netdev_dummy *dev;
 
 ovs_mutex_lock(_list_mutex);
 LIST_FOR_EACH (dev, list_node, _list) {
+if (netdev_get_class(>up) != netdev_class) {
+continue;
+}
 ovs_mutex_lock(>mutex);
 dummy_packet_conn_wait(>conn);
 ovs_mutex_unlock(>mutex);
@@ -1380,6 +1386,9 @@ netdev_dummy_update_flags(struct netdev *netdev_,
 static const struct netdev_class dummy_class =
 NETDEV_DUMMY_CLASS("dummy", false, NULL);
 
+static const struct netdev_class dummy_internal_class =
+NETDEV_DUMMY_CLASS("dummy-internal", false, NULL);
+
 static const struct netdev_class dummy_pmd_class =
 NETDEV_DUMMY_CLASS("dummy-pmd", true,
netdev_dummy_reconfigure);
@@ -1751,6 +1760,7 @@ netdev_dummy_register(enum dummy_level level)
 netdev_dummy_override("system");
 }
 netdev_register_provider(_class);
+netdev_register_provider(_internal_class);
 netdev_register_provider(_pmd_class);
 
 netdev_vport_tunnel_register();
diff --git a/tests/bridge.at b/tests/bridge.at
index 37c55ba..3dbabe5 100644
--- a/tests/bridge.at
+++ b/tests/bridge.at
@@ -12,7 +12,7 @@ add_of_ports br0 1 2
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p1 1/1: (dummy)
p2 2/2: (dummy)
 ])
@@ -23,7 +23,7 @@ AT_CHECK([ovs-appctl dpctl/del-if dummy@ovs-dummy p1])
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p2 2/2: (dummy)
 ])
 
@@ -32,7 +32,7 @@ AT_CHECK([ovs-vsctl del-port p2])
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p1 1/1: (dummy)
 ])
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
diff --git a/tests/dpctl.at b/tests/dpctl.at
index b6d5dd6..8c761c8 100644
--- a/tests/dpctl.at
+++ b

[ovs-dev] [PATCH 1/7] ofproto: Consider datapath_type when looking for internal ports.

2016-07-29 Thread Daniele Di Proietto
Interfaces with type "internal" end up having a netdev with type "tap"
in the dpif-netdev datapath, so a strcmp will fail to match internal
interfaces.

We can translate the types with ofproto_port_open_type() before calling
strcmp to fix this.

This fixes a minor issue where internal interfaces are considered
non-internal in the userspace datapath for the purpose of adjusting the
MTU.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 ofproto/ofproto.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 8e59c69..088f91a 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -220,7 +220,8 @@ static void learned_cookies_flush(struct ofproto *, struct 
ovs_list *dead_cookie
 /* ofport. */
 static void ofport_destroy__(struct ofport *) OVS_EXCLUDED(ofproto_mutex);
 static void ofport_destroy(struct ofport *, bool del);
-static inline bool ofport_is_internal(const struct ofport *);
+static inline bool ofport_is_internal(const struct ofproto *,
+  const struct ofport *);
 
 static int update_port(struct ofproto *, const char *devname);
 static int init_ports(struct ofproto *);
@@ -2465,7 +2466,7 @@ static void
 ofport_remove(struct ofport *ofport)
 {
 struct ofproto *p = ofport->ofproto;
-bool is_internal = ofport_is_internal(ofport);
+bool is_internal = ofport_is_internal(p, ofport);
 
 connmgr_send_port_status(ofport->ofproto->connmgr, NULL, >pp,
  OFPPR_DELETE);
@@ -2751,9 +2752,10 @@ init_ports(struct ofproto *p)
 }
 
 static inline bool
-ofport_is_internal(const struct ofport *port)
+ofport_is_internal(const struct ofproto *p, const struct ofport *port)
 {
-return !strcmp(netdev_get_type(port->netdev), "internal");
+return !strcmp(netdev_get_type(port->netdev),
+   ofproto_port_open_type(p->type, "internal"));
 }
 
 /* Find the minimum MTU of all non-datapath devices attached to 'p'.
@@ -2770,7 +2772,7 @@ find_min_mtu(struct ofproto *p)
 
 /* Skip any internal ports, since that's what we're trying to
  * set. */
-if (ofport_is_internal(ofport)) {
+if (ofport_is_internal(p, ofport)) {
 continue;
 }
 
@@ -2797,7 +2799,7 @@ update_mtu(struct ofproto *p, struct ofport *port)
 port->mtu = 0;
 return;
 }
-if (ofport_is_internal(port)) {
+if (ofport_is_internal(p, port)) {
 if (dev_mtu > p->min_mtu) {
if (!netdev_set_mtu(port->netdev, p->min_mtu)) {
dev_mtu = p->min_mtu;
@@ -2827,7 +2829,7 @@ update_mtu_ofproto(struct ofproto *p)
 HMAP_FOR_EACH (ofport, hmap_node, >ports) {
 struct netdev *netdev = ofport->netdev;
 
-if (ofport_is_internal(ofport)) {
+if (ofport_is_internal(p, ofport)) {
 if (!netdev_set_mtu(netdev, p->min_mtu)) {
 ofport->mtu = p->min_mtu;
 }
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 3/7] netdev: Pass 'netdev_class' to ->run() and ->wait().

2016-07-29 Thread Daniele Di Proietto
This will allow run() and wait() methods to be shared between different
classes and still perform class-specific work.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/netdev-bsd.c  |  6 +++---
 lib/netdev-dummy.c|  4 ++--
 lib/netdev-linux.c|  6 +++---
 lib/netdev-provider.h | 14 ++
 lib/netdev-vport.c|  4 ++--
 lib/netdev.c  |  4 ++--
 6 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
index 2bba0ed..75a330b 100644
--- a/lib/netdev-bsd.c
+++ b/lib/netdev-bsd.c
@@ -146,7 +146,7 @@ static void ifr_set_flags(struct ifreq *, int flags);
 static int af_link_ioctl(unsigned long command, const void *arg);
 #endif
 
-static void netdev_bsd_run(void);
+static void netdev_bsd_run(const struct netdev_class *);
 static int netdev_bsd_get_mtu(const struct netdev *netdev_, int *mtup);
 
 static bool
@@ -180,7 +180,7 @@ netdev_get_kernel_name(const struct netdev *netdev)
  * interface status changes, and eventually calls all the user callbacks.
  */
 static void
-netdev_bsd_run(void)
+netdev_bsd_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 rtbsd_notifier_run();
 }
@@ -190,7 +190,7 @@ netdev_bsd_run(void)
  * be called.
  */
 static void
-netdev_bsd_wait(void)
+netdev_bsd_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 rtbsd_notifier_wait();
 }
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index a950409..2a6aa56 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -622,7 +622,7 @@ dummy_netdev_get_conn_state(struct dummy_packet_conn *conn)
 }
 
 static void
-netdev_dummy_run(void)
+netdev_dummy_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct netdev_dummy *dev;
 
@@ -636,7 +636,7 @@ netdev_dummy_run(void)
 }
 
 static void
-netdev_dummy_wait(void)
+netdev_dummy_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct netdev_dummy *dev;
 
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index fa37bcf..1b5f7c1 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -526,7 +526,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
  * changes in the device miimon status, so we can use atomic_count. */
 static atomic_count miimon_cnt = ATOMIC_COUNT_INIT(0);
 
-static void netdev_linux_run(void);
+static void netdev_linux_run(const struct netdev_class *);
 
 static int netdev_linux_do_ethtool(const char *name, struct ethtool_cmd *,
int cmd, const char *cmd_name);
@@ -623,7 +623,7 @@ netdev_linux_miimon_enabled(void)
 }
 
 static void
-netdev_linux_run(void)
+netdev_linux_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct nl_sock *sock;
 int error;
@@ -697,7 +697,7 @@ netdev_linux_run(void)
 }
 
 static void
-netdev_linux_wait(void)
+netdev_linux_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct nl_sock *sock;
 
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index ae390cb..5bcfeba 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -236,15 +236,21 @@ struct netdev_class {
 int (*init)(void);
 
 /* Performs periodic work needed by netdevs of this class.  May be null if
- * no periodic work is necessary. */
-void (*run)(void);
+ * no periodic work is necessary.
+ *
+ * 'netdev_class' points to the class.  It is useful in case the same
+ * function is used to implement different classes. */
+void (*run)(const struct netdev_class *netdev_class);
 
 /* Arranges for poll_block() to wake up if the "run" member function needs
  * to be called.  Implementations are additionally required to wake
  * whenever something changes in any of its netdevs which would cause their
  * ->change_seq() function to change its result.  May be null if nothing is
- * needed here. */
-void (*wait)(void);
+ * needed here.
+ *
+ * 'netdev_class' points to the class.  It is useful in case the same
+ * function is used to implement different classes. */
+void (*wait)(const struct netdev_class *netdev_class);
 
 /* ##  ## */
 /* ## netdev Functions ## */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 87a30f8..7eabd2c 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -321,7 +321,7 @@ netdev_vport_update_flags(struct netdev *netdev OVS_UNUSED,
 }
 
 static void
-netdev_vport_run(void)
+netdev_vport_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 uint64_t seq;
 
@@ -334,7 +334,7 @@ netdev_vport_run(void)
 }
 
 static void
-netdev_vport_wait(void)
+netdev_vport_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 uint64_t seq;
 
diff --git a/lib/netdev.c b/lib/netdev.c
index 75bf1cb..589d37c 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -160,7 +160,7 @@ netdev_run(void)
 struct netdev_registered_class *rc;
 CMAP_FOR_EACH (rc, cmap_node, _classes) {

Re: [ovs-dev] tests: Add new pmd test for pmd-rxq-affinity.

2016-07-29 Thread Daniele Di Proietto
Thanks for the review, applied to master




On 28/07/2016 01:59, "Ilya Maximets" <i.maxim...@samsung.com> wrote:

>Thanks for making this.
>
>Acked-by: Ilya Maximets <i.maxim...@samsung.com>
>
>On 27.07.2016 23:12, Daniele Di Proietto wrote:
>> This tests that the newly introduced pmd-rxq-affinity option works as
>> intended, at least for a single port.
>> 
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>> ---
>>  tests/pmd.at | 53 +
>>  1 file changed, 53 insertions(+)
>> 
>> diff --git a/tests/pmd.at b/tests/pmd.at
>> index 47639b6..3052f95 100644
>> --- a/tests/pmd.at
>> +++ b/tests/pmd.at
>> @@ -461,3 +461,56 @@ 
>> icmp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10
>>  
>>  OVS_VSWITCHD_STOP
>>  AT_CLEANUP
>> +
>> +AT_SETUP([PMD - rxq affinity])
>> +OVS_VSWITCHD_START(
>> +  [], [], [], [--dummy-numa 0,0,0,0,0,0,0,0,0])
>> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
>> +
>> +AT_CHECK([ovs-ofctl add-flow br0 actions=controller])
>> +
>> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=1fe])
>> +
>> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dummy-pmd 
>> ofport_request=1 options:n_rxq=4 
>> other_config:pmd-rxq-affinity="0:3,1:7,2:2,3:8"])
>> +
>> +dnl The rxqs should be on the requested cores.
>> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], 
>> [dnl
>> +p1 0 0 3
>> +p1 1 0 7
>> +p1 2 0 2
>> +p1 3 0 8
>> +])
>> +
>> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6])
>> +
>> +dnl We removed the cores requested by some queues from pmd-cpu-mask.
>> +dnl Those queues will not be polled.
>> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], 
>> [dnl
>> +p1 2 0 2
>> +])
>> +
>> +AT_CHECK([ovs-vsctl remove Interface p1 other_config pmd-rxq-affinity])
>> +
>> +dnl We removed the rxq-affinity request.  dpif-netdev should assign queues
>> +dnl in a round robin fashion.  We just make sure that every rxq is being
>> +dnl polled again.
>> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show | cut -f 
>> 1,2 -d ' ' | sort], [0], [dnl
>> +p1 0
>> +p1 1
>> +p1 2
>> +p1 3
>> +])
>> +
>> +AT_CHECK([ovs-vsctl set Interface p1 other_config:pmd-rxq-affinity='0:1'])
>> +
>> +dnl We explicitly requested core 1 for queue 0.  Core 1 becomes isolated and
>> +dnl every other queue goes to core 2.
>> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], 
>> [dnl
>> +p1 0 0 1
>> +p1 1 0 2
>> +p1 2 0 2
>> +p1 3 0 2
>> +])
>> +
>> +OVS_VSWITCHD_STOP(["/dpif_netdev|WARN|There is no PMD thread on core/d"])
>> +AT_CLEANUP
>> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] dpif-netdev: Fix xps revalidation.

2016-07-29 Thread Daniele Di Proietto
Thanks for the fix, applied to master!




On 29/07/2016 01:07, "Ilya Maximets"  wrote:

>Revalidation should work in case of 'dynamic_txqs == true'.
>
>Fixes: 324c8374852a ("dpif-netdev: XPS (Transmit Packet Steering) 
>implementation.")
>Signed-off-by: Ilya Maximets 
>---
> lib/dpif-netdev.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index 828171e..c446ae8 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -4193,7 +4193,7 @@ dpif_netdev_xps_revalidate_pmd(const struct 
>dp_netdev_pmd_thread *pmd,
> long long interval;
> 
> HMAP_FOR_EACH (tx, node, >port_cache) {
>-if (tx->port->dynamic_txqs) {
>+if (!tx->port->dynamic_txqs) {
> continue;
> }
> interval = now - tx->last_used;
>-- 
>2.7.4
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2] netdev-dpdk: Add Flow Control support.

2016-07-29 Thread Daniele Di Proietto
I changed the fc_conf initialization to use memset, because clang was
complaining about the initializer.

I moved the description in vswitch.xml to avoid nesting inside the "Common
Columns" group.

I changed slightly the wording in vswitch.xml, in case we want to implement
this even for non DPDK devices.

I added your name to AUTHORS

and pushed this to master

Thanks!

Daniele


2016-07-28 13:36 GMT-07:00 Chandran, Sugesh :

> Thank you Bhanu for reviewing and testing it.
>
>
> Regards
> _Sugesh
>
>
> > -Original Message-
> > From: Bodireddy, Bhanuprakash
> > Sent: Thursday, July 28, 2016 4:54 PM
> > To: Chandran, Sugesh ; diproiet...@ovn.org;
> > dev@openvswitch.org
> > Subject: RE: [PATCH v2] netdev-dpdk: Add Flow Control support.
> >
> > >-Original Message-
> > >From: Chandran, Sugesh
> > >Sent: Thursday, July 28, 2016 4:30 PM
> > >To: diproiet...@ovn.org; Bodireddy, Bhanuprakash
> > >; dev@openvswitch.org
> > >Cc: Chandran, Sugesh 
> > >Subject: [PATCH v2] netdev-dpdk: Add Flow Control support.
> > >
> > >Add support for flow-control(mac control frame) to DPDK enabled physical
> > >port types. By default, the flow-control is OFF on both rx and tx side.
> > >The flow control can be enabled/disabled either when adding a port to
> OVS
> > or
> > >at run time.
> > >
> > >For eg:
> > >To enable flow control support at tx side while adding a port, add the
> 'tx-
> > flow-
> > >ctrl' option to the 'ovs-vsctl add-port' command-line as below.
> > >
> > > 'ovs-vsctl add-port br0 dpdk0 -- \
> > >  set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true'
> > >
> > >Similarly to enable rx flow control,
> > > 'ovs-vsctl add-port br0 dpdk0 -- \
> > >  set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true'
> > >
> > >And to enable the flow control auto-negotiation,  'ovs-vsctl add-port
> br0
> > >dpdk0 -- \
> > >  set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true'
> > >
> > >To turn ON the tx flow control at run time(After the port is being
> added to
> > >OVS), the command-line input will be,  'ovs-vsctl set Interface dpdk0
> > >options:tx-flow-ctrl=true'
> > >
> > >The flow control parameters can be turned off by setting 'false' to the
> > >respective parameter. To dsiable the flow control at tx side,
> 'ovs-vsctl set
> > >Interface dpdk0 options:tx-flow-ctrl=false'
> > >
> > >Signed-off-by: Sugesh Chandran 
> >
> > LGTM, I tested it and can apply the rx flow control setting even when the
> > interface is transmitting.
> > Acked-by: Bhanuprakash Bodireddy 
> >
> > Regards,
> > Bhanu Prakash.
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] ovs-numa: fixed cmask parse with 0x prefix

2016-07-29 Thread Daniele Di Proietto
2016-07-27 23:22 GMT-07:00 Shen, Wei1 <wei1.s...@intel.com>:

> Thanks for the reply. The INSTALL.DPDK.md has those “0x” prefix used as
> example
>
>
>
> 212  * dpdk-lcore-mask
>
> 213  Specifies the CPU cores on which dpdk lcore threads should be
> spawned and
>
> 214  expects hex string (eg '0x123').
>
>
>
> so I think either we make those documents compliant or make the parsing be
> able to accept both form as long as they are base 16 regardless of the
> presence of “0x”.
>

OVS already has no problem accepting 0x prefixes as part of
"dpdk-lcore-mask".

With your patch, OVS accepts the 0x prefix also as part of "pmd-cpu-mask",
which I think is an enhancement.  If this is the intended effect, please
update the commit message and submit another version.

Thanks,

Daniele


>
>
> Also thanks for the styling reminder… I haven’t gone through those in much
> detail. Let me send another patch that complies with those.
>
>
>
> --
>
> Best,
>
> Wei Shen.
>
>
>
> *From: *Daniele Di Proietto <diproiet...@ovn.org>
> *Date: *Wednesday, July 27, 2016 at 2:28 PM
> *To: *Wei1 Shen <wei1.s...@intel.com>
> *Cc: *"dev@openvswitch.org" <dev@openvswitch.org>
> *Subject: *Re: [ovs-dev] [PATCH] ovs-numa: fixed cmask parse with 0x
> prefix
>
>
>
> Thanks for the patch.
>
> We never accepted the 0x prefix for pmd-cpu-mask, but I guess there's no
> harm in doing it and it might make user's life easier.
>
> We always use braces, even for single statement, please read CodingStyle.md
>
> https://github.com/openvswitch/ovs/blob/master/CodingStyle.md#statements
>
> I cannot merge this unless you provide a signoff, the details and the
> meaning is explained here:
>
>
> https://github.com/openvswitch/ovs/blob/master/CONTRIBUTING.md#developers-certificate-of-origin
>
> Thanks,
>
> Daniele
>
>
>
> 2016-07-26 14:56 GMT-07:00 Wei Shen <wei1.s...@intel.com>:
>
> Fixed a minor bug that would print out a confusing warning about core mask,
> "ovs_numa|WARN|Invalid cpu mask: x", when dpdl-lcore-mask has 0x prefix,
> e.g.
> 0x123, which is the convention used in INSTALL.DPDK.md.
> ---
>  lib/ovs-numa.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
> index c8173e0..c1938eb 100644
> --- a/lib/ovs-numa.c
> +++ b/lib/ovs-numa.c
> @@ -551,6 +551,10 @@ ovs_numa_set_cpu_mask(const char *cmask)
>  return;
>  }
>
> +/* Skip 0x if supplied in the cmask */
> +if (!strncmp(cmask, "0x", 2))
> +cmask += 2;
> +
>  for (i = strlen(cmask) - 1; i >= 0; i--) {
>  char hex = toupper((unsigned char)cmask[i]);
>  int bin, j;
> --
> 2.5.5
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
>
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/4] tests: Remove trim_zeros() from ovn tests.

2016-07-29 Thread Daniele Di Proietto





On 28/07/2016 22:22, "Ben Pfaff" <b...@ovn.org> wrote:

>On Thu, Jul 28, 2016 at 07:58:04PM -0700, Daniele Di Proietto wrote:
>> trim_zeros() is not necessary anymore, since now we don't pad packets in
>> the userspace datapath.
>> 
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>Oops, I just committed a patch that made a change related to trim_zeros,
>so you'll have to adjust this.  Sorry!  Still, I imagine that most of it was
>sed -e 's/| trim_zeros//' -i ovn.at

No problem, it made my patch simpler!

>
>Acked-by: Ben Pfaff <b...@ovn.org>

Thanks for the reviews, I've pushed this to master
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v5 13/16] system-tests: Run conntrack tests with userspace.

2016-07-28 Thread Daniele Di Proietto





On 27/07/2016 14:18, "Ben Pfaff" <b...@ovn.org> wrote:

>On Wed, Jul 27, 2016 at 01:51:00PM -0700, Jesse Gross wrote:
>> On Wed, Jul 27, 2016 at 1:40 PM, Daniele Di Proietto
>> <diproiet...@vmware.com> wrote:
>> > On 27/07/2016 13:12, "Joe Stringer" <j...@ovn.org> wrote:
>> >
>> >>On 26 July 2016 at 17:58, Daniele Di Proietto <diproiet...@vmware.com> 
>> >>wrote:
>> >>> The userspace connection tracker doesn't support ALGs, frag reassembly
>> >>> or NAT yet, so skip those tests.
>> >>>
>> >>> Also, connection tracking state input from a local port is not possible
>> >>> in userspace.
>> >>>
>> >>> The userspace datapath pads all frames with 0, to make them at
>> >>> least 64 bytes.
>> >>>
>> >>> Finally, the userspace datapath checks for the IPv4 header checksum, so
>> >>> fix those in the hardcoded packets.
>> >>>
>> >>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>> >>> Acked-by: Joe Stringer <j...@ovn.org>
>> >>> Acked-by: Flavio Leitner <f...@sysclose.org>
>> >>> ---
>> >>
>> >>
>> >>
>> >>> @@ -1324,11 +1327,11 @@ dnl UDP packets from ns0->ns1 should solicit 
>> >>> "destination unreachable" response.
>> >>>  NS_CHECK_EXEC([at_ns0], [bash -c "echo a | nc $NC_EOF_OPT -u 10.1.1.2 
>> >>> 1"])
>> >>>
>> >>>  AT_CHECK([ovs-appctl revalidator/purge], [0])
>> >>> -AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v 
>> >>> drop], [0], [dnl
>> >>> - n_packets=1, n_bytes=44, priority=100,udp,in_port=1 
>> >>> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
>> >>> - n_packets=1, n_bytes=72, 
>> >>> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 
>> >>> actions=output:1
>> >>> - n_packets=1, n_bytes=72, priority=100,ct_state=-trk,icmp,in_port=2 
>> >>> actions=ct(table=0)
>> >>> - n_packets=2, n_bytes=84, priority=10,arp actions=NORMAL
>> >>> +AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop 
>> >>> | sed -e 's/n_bytes=[[0-9]]*/n_bytes=/g'], [0], [dnl
>> >>> + n_packets=1, n_bytes=, priority=100,udp,in_port=1 
>> >>> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
>> >>> + n_packets=1, n_bytes=, 
>> >>> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 
>> >>> actions=output:1
>> >>> + n_packets=1, n_bytes=, 
>> >>> priority=100,ct_state=-trk,icmp,in_port=2 actions=ct(table=0)
>> >>> + n_packets=2, n_bytes=, priority=10,arp actions=NORMAL
>> >>>  NXST_FLOW reply:
>> >>>  ])
>> >>
>> >>I think this is a completely orthogonal point, but it's still a bit
>> >>surprising to me that the n_bytes would differ when receiving short
>> >>packets in kernel vs. userspace datapaths. I follow that userspace
>> >>pads shorter packets on receive, but shouldn't we be able to attribute
>> >>these stats consistently, regardless of the datapath?
>> >
>> > That's a good point.
>> >
>> > We call dp_packet_pad() in netdev_linux_recv().  That used to be in 
>> > netdev_recv() and can be traced back to the initial OVS commit.  Here's a 
>> > comment in netdev_recv():
>> >
>> > COVERAGE_INC(netdev_received);
>> > buffer->size += n_bytes;
>> >
>> > /* When the kernel internally sends out an Ethernet frame on an
>> >  * interface, it gives us a copy *before* padding the frame to the
>> >  * minimum length.  Thus, when it sends out something like an ARP
>> >  * request, we see a too-short frame.  So pad it out to the minimum
>> >  * length. */
>> > pad_to_minimum_length(buffer);
>> 
>> I wonder if anything in OVS actually cares about this anymore? I don't
>> know the history of that comment.
>
>I don't remember the origin anymore but it was probably my comment.
>It's possible that some code in OVS assumed that packets were at least
>64 bytes long at some point.  For example, flow_extract() might have
>been able to be slightly simplified on the basis that access to the IPv4
>header couldn't be beyond the end of the buffer.
>
>I doubt we do that kind of optimization any longer.

Thanks for the clarifying.  I agree, it doesn't look like it's needed anymore.

I've sent a couple of patches to remove that, along with the connection tracker 
system tests for userspace here:

http://openvswitch.org/pipermail/dev/2016-July/076659.html
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 4/4] system-tests: Add ping through conntrack test.

2016-07-28 Thread Daniele Di Proietto
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Joe Stringer <j...@ovn.org>
---
 tests/system-traffic.at | 84 +
 1 file changed, 84 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 5732d9b..de657e6 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -608,6 +608,90 @@ NS_CHECK_EXEC([at_ns1], [wget http://[[fc00::1]] -t 3 -T 1 
-v -o wget1.log], [4]
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - IPv4 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from 
ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
+priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], 
[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0)
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], 
[0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - IPv6 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH(p1, at_ns1, br0, "fc00::2/96")
+
+AT_DATA([flows.txt], [dnl
+
+dnl ICMPv6 echo request and reply go to table 1.  The rest of the traffic goes
+dnl through normal action.
+table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
+table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
+table=0,priority=1,action=normal
+
+dnl Allow everything from ns0->ns1. Only allow return traffic from ns1->ns0.
+table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
+table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
+table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
+table=1,priority=1,action=drop
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 | FORMAT_PING], 
[0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 | FORMAT_PING], 
[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl
+icmpv6,orig=(src=fc00::1,dst=fc00::2,id=,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=,type=129,code=0)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - commit, recirc])
 CHECK_CONNTRACK()
 OVS_TRAFFIC_VSWITCHD_START()
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 2/4] tests: Remove trim_zeros() from ovn tests.

2016-07-28 Thread Daniele Di Proietto
trim_zeros() is not necessary anymore, since now we don't pad packets in
the userspace datapath.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 tests/ovn.at | 135 +++
 1 file changed, 43 insertions(+), 92 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index dfdd110..5761981 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -737,9 +737,6 @@ vif_to_hv() {
 # digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
 # more) list the VIFs on which the packet should be received.  INPORT and the
 # OUTPORTs are specified as logical switch port numbers, e.g. 11 for vif11.
-trim_zeros() {
-sed 's/\(00\)\{1,\}$//'
-}
 for i in 1 2 3; do
 for j in 1 2 3; do
 : > $i$j.expected
@@ -751,7 +748,7 @@ test_packet() {
 vif=vif$inport
 as $hv ovs-appctl netdev-dummy/receive $vif $packet
 for outport; do
-echo $packet | trim_zeros >> $outport.expected
+echo $packet >> $outport.expected
 done
 }
 
@@ -935,7 +932,7 @@ for i in 1 2 3; do
 for j in 1 2 3; do
 file=hv$i/vif$i$j-tx.pcap
 echo $file
-$PYTHON "$top_srcdir/utilities/ovs-pcap.in" $file | trim_zeros > 
$i$j.packets
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" $file > $i$j.packets
 sort $i$j.expected > expout
 AT_CHECK([sort $i$j.packets], [0], [expout])
 echo
@@ -1038,9 +1035,6 @@ vif_to_hv() {
 # digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
 # more) list the VIFs on which the packet should be received.  INPORT and the
 # OUTPORTs are specified as logical switch port numbers, e.g. 11 for vif11.
-trim_zeros() {
-sed 's/\(00\)\{1,\}$//'
-}
 for i in 1 2; do
 for j in 1 2 3 4 5; do
 : > $i$j.expected
@@ -1053,7 +1047,7 @@ test_packet() {
 vif=vif$inport
 as $hv ovs-appctl netdev-dummy/receive $vif $packet
 for outport; do
-echo $packet | trim_zeros >> $outport.expected
+echo $packet >> $outport.expected
 done
 }
 
@@ -1120,7 +1114,7 @@ for i in 1 2; do
 for j in 1 2 3 4 5; do
 file=hv$i/vif$i$j-tx.pcap
 echo $file
-$PYTHON "$top_srcdir/utilities/ovs-pcap.in" $file | trim_zeros > 
$i$j.packets
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" $file > $i$j.packets
 sort $i$j.expected > expout
 AT_CHECK([sort $i$j.packets], [0], [expout])
 echo
@@ -1223,9 +1217,6 @@ sleep 1
 # digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
 # more) list the VIFs on which the packet should be received.  INPORT and the
 # OUTPORTs are specified as logical switch port numbers, e.g. 1 for vif1.
-trim_zeros() {
-sed 's/\(00\)\{1,\}$//'
-}
 for i in 1 2 3; do
 : > $i.expected
 done
@@ -1236,7 +1227,7 @@ test_packet() {
 vif=vif$inport
 as $hv ovs-appctl netdev-dummy/receive $vif $packet
 for outport; do
-echo $packet | trim_zeros >> $outport.expected
+echo $packet >> $outport.expected
 done
 }
 
@@ -1302,7 +1293,7 @@ AT_CHECK([as hv3 ovs-ofctl -O OpenFlow13 show br-int], 
[1], [],
 for i in 1 2 3; do
 file=hv$i/vif$i-tx.pcap
 echo $file
-$PYTHON "$top_srcdir/utilities/ovs-pcap.in" $file | trim_zeros > $i.packets
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" $file > $i.packets
 sort $i.expected > expout
 AT_CHECK([sort $i.packets], [0], [expout])
 echo
@@ -1384,9 +1375,6 @@ sleep 1
 # digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
 # more) list the VIFs on which the packet should be received.  INPORT and the
 # OUTPORTs are specified as lport numbers, e.g. 1 for vif1.
-trim_zeros() {
-sed 's/\(00\)\{1,\}$//'
-}
 for i in 1 2 3; do
 : > $i.expected
 done
@@ -1397,7 +1385,7 @@ test_packet() {
 vif=vif$inport
 as $hv ovs-appctl netdev-dummy/receive $vif $packet
 for outport; do
-echo $packet | trim_zeros >> $outport.expected
+echo $packet >> $outport.expected
 done
 }
 
@@ -1468,7 +1456,7 @@ as hv3 ovs-ofctl -O OpenFlow13 dump-flows br-phys
 for i in 1 2 3; do
 file=hv$i/vif$i-tx.pcap
 echo $file
-$PYTHON "$top_srcdir/utilities/ovs-pcap.in" $file | trim_zeros > $i.packets
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" $file > $i.packets
 sort $i.expected > expout
 AT_CHECK([sort $i.packets], [0], [expout])
 echo
@@ -1593,9 +1581,6 @@ sleep 1
 # digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
 # more) list the VIFs on which the packet should be received.  INPORT and the
 # OUTPORTs are specified as logical switch port numbers, e.g. 123 for vif123.
-trim_zeros() {
-sed 's/\(00\)\{1,\}$//'
-}
 for i in 1 2 3; do
 for j in 1 2 3; do
 for k in 1 2 3; do
@@ -1623,7 +1608,7 @@ 

[ovs-dev] [PATCH 3/4] system-tests: Run conntrack tests with userspace.

2016-07-28 Thread Daniele Di Proietto
The userspace connection tracker doesn't support ALGs, frag reassembly
or NAT yet, so skip those tests.

Also, connection tracking state input from a local port is not possible
in userspace.

Finally, the userspace datapath checks for the IPv4 header checksum, so
fix those in the hardcoded packets.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Flavio Leitner <f...@sysclose.org>
---
 tests/system-kmod-macros.at  | 28 +
 tests/system-ovn.at  |  3 +++
 tests/system-traffic.at  | 32 +---
 tests/system-userspace-macros.at | 45 +---
 4 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index 2134db7..e1b5707 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -67,3 +67,31 @@ m4_define([CHECK_CONNTRACK],
  on_exit 'ovstest test-netlink-conntrack flush'
 ]
 )
+
+# CHECK_CONNTRACK_ALG()
+#
+# Perform requirements checks for running conntrack ALG tests. The kernel
+# supports ALG, so no check is needed.
+#
+m4_define([CHECK_CONNTRACK_ALG])
+
+# CHECK_CONNTRACK_FRAG()
+#
+# Perform requirements checks for running conntrack fragmentations tests.
+# The kernel always supports fragmentation, so no check is needed.
+m4_define([CHECK_CONNTRACK_FRAG])
+
+# CHECK_CONNTRACK_LOCAL_STACK()
+#
+# Perform requirements checks for running conntrack tests with local stack.
+# The kernel always supports reading the connection state of an skb coming
+# from an internal port, without an explicit ct() action, so no check is
+# needed.
+m4_define([CHECK_CONNTRACK_LOCAL_STACK])
+
+# CHECK_CONNTRACK_NAT()
+#
+# Perform requirements checks for running conntrack NAT tests. The kernel
+# always supports NAT, so no check is needed.
+#
+m4_define([CHECK_CONNTRACK_NAT])
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index 2a94d68..b96b260 100755
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -2,6 +2,7 @@ AT_SETUP([ovn -- 2 LRs connected via LS, gateway router, SNAT 
and DNAT])
 AT_KEYWORDS([ovnnat])
 
 CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
 ovn_start
 OVS_TRAFFIC_VSWITCHD_START()
 ADD_BR([br-int])
@@ -172,6 +173,7 @@ AT_SETUP([ovn -- 2 LRs connected via LS, gateway router, 
easy SNAT])
 AT_KEYWORDS([ovnnat])
 
 CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
 ovn_start
 OVS_TRAFFIC_VSWITCHD_START()
 ADD_BR([br-int])
@@ -277,6 +279,7 @@ AT_SETUP([ovn -- load-balancing])
 AT_KEYWORDS([ovnlb])
 
 CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
 ovn_start
 OVS_TRAFFIC_VSWITCHD_START()
 ADD_BR([br-int])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 1cdc2d2..5732d9b 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -510,13 +510,13 @@ AT_CAPTURE_FILE([ofctl_monitor.log])
 AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir 
--pidfile 2> ofctl_monitor.log])
 
 dnl Send an unsolicited reply from port 2. This should be dropped.
-AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) 
'5054000a505400090800451c00110a0101020a010101000200010008'])
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) 
'5054000a505400090800451c0011a4cd0a0101020a010101000200010008'])
 
 dnl OK, now start a new connection from port 1.
-AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 ct\(commit\),controller 
'5054000a505400090800451c00110a0101010a010102000100020008'])
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 ct\(commit\),controller 
'5054000a505400090800451c0011a4cd0a0101010a010102000100020008'])
 
 dnl Now try a reply from port 2.
-AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) 
'5054000a505400090800451c00110a0101020a010101000200010008'])
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) 
'5054000a505400090800451c0011a4cd0a0101020a010101000200010008'])
 
 dnl Check this output. We only see the latter two packets, not the first.
 AT_CHECK([cat ofctl_monitor.log], [0], [dnl
@@ -906,6 +906,7 @@ AT_CLEANUP
 
 AT_SETUP([conntrack - multiple zones, local])
 CHECK_CONNTRACK()
+CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_NAMESPACES(at_ns0)
@@ -953,6 +954,7 @@ AT_CLEANUP
 
 AT_SETUP([conntrack - multiple namespaces, internal ports])
 CHECK_CONNTRACK()
+CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START(
[set-fail-mode br0 secure -- ])
 
@@ -993,6 +995,7 @@ AT_CLEANUP
 
 AT_SETUP([conntrack - multi-stage pipeline, local])
 CHECK_CONNTRACK()
+CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_NAMESPACES(at_ns0)
@@ -1382,6 +1385,7 @@ AT_CLEANUP
 AT_SETUP([conntrack - FTP])
 AT_SKIP_IF([test $HAVE_PYFTPDLIB = no])
 CHECK_CONNTRACK()
+CHECK_CONNTRACK_ALG()
 OVS_T

Re: [ovs-dev] [PATCH v5 00/16] Userspace (DPDK) connection tracker

2016-07-28 Thread Daniele Di Proietto
I pushed the intended version except that I forgot to squash that commit with
the previous one.  Sorry about this


On 28/07/2016 00:40, "Ilya Maximets" <i.maxim...@samsung.com> wrote:

>Sorry.
>TO: Daniele Di Proietto <diproiet...@vmware.com>
>
>On 28.07.2016 09:27, Ilya Maximets wrote:
>> I guess, you pushed some development version of this patch set.
>> 
>> There is strange commit there:
>> 
>> commit 6c54734ed27bc22975d7035a6bd5f32a412335a0
>> Author: Daniele Di Proietto <diproiet...@vmware.com>
>> Date:   Wed Jul 27 18:32:15 2016 -0700
>> 
>> XXX Improve comment.
>> 
>> 
>> Best regards, Ilya Maximets.
>> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Mea culpa - rewound master branch

2016-07-28 Thread Daniele Di Proietto
2016-07-28 10:52 GMT-07:00 Flavio Leitner :

> On Thu, Jul 28, 2016 at 09:32:54AM -0700, Ben Pfaff wrote:
> > On Thu, Jul 28, 2016 at 12:49:37PM -0300, Flavio Leitner wrote:
> > > On Wed, Jul 27, 2016 at 11:49:18AM -0700, Ben Pfaff wrote:
> > > > Some time ago this morning, I accidentally pushed several patches
> that
> > > > were still under review to master.  I've force-pushed a correction to
> > > > the branch.  My apologies--I hope that this does not screw up
> anyone's
> > > > development process in a bad way.
> > >
> > > There is another weird one:
> > > $ git log master..origin/master --oneline
> > > [...]
> > >   f3edb dpif-netdev: Execute conntrack action.
> > >   b412490 tests: Add test-conntrack pcap test.
> > >   8cb1462 tests: Add very simple conntrack benchmark.
> > > * 6c54734 XXX Improve comment.
> > >   e6ef6cc conntrack: Periodically delete expired connections.
> > >   a489b16 conntrack: New userspace connection tracker.
> > > [...]
> >
> > Probably should have been squashed before Daniele pushed it, but not
> > really so bad if that's the only issue.
>
> Yup, hopefully it is just that.
>
>
Sorry, I forgot to squash that commit before pushing it and when I realized
it, it was too late to do a forced push.

Apologies for the inconvenience guys,

Daniele
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v5 00/16] Userspace (DPDK) connection tracker

2016-07-28 Thread Daniele Di Proietto
Thanks for the reviews, I pushed this to master except for the system tests 
part.



On 26/07/2016 17:58, "Daniele Di Proietto" <diproiet...@vmware.com> wrote:

>This series aims to implement the ct() action for the dpif-netdev datapath.
>The bulk of the code is in the new conntrack module: it contains some packet
>parsing code, some lookup tables and the logic to implements all the ct bits.
>
>The conntrack module is helped by conntrack-tcp, for TCP window and flags
>tracking: the bulk of the code of this submodule is from the FreeBSD's pf
>subsystem, therefore is BSD licensed.
>
>The rest of the series integrates the connection tracker with the rest of
>OVS: the ct() action is implemented in dpif-netdev, and the debugging
>interfaces required by dpctl/{dump,flush}-conntrack are implemented.
>
>Besides adding some unit tests, this series ports the existing conntrack
>system test to the userspace datapath.  Some small modifications are
>required to pass the testsuite, and some tests still have to be skipped.
>
>This can also be downloaded at:
>
>https://github.com/ddiproietto/ovs/tree/userconntrack_20160726
>
>Any feedback is appreciated, thanks.
>
>v4 -> v5:
>* Rebase: hmap.h is moved, include ct_* field in some unit tests,
>  skip and adapt to the new ct dump format the OVN tests.
>* Style and typo fixes.
>* Add coverage counter to detect long cleanup.
>* Use ovs_barrier instead of pthread_barrier in test (fix compilation
>  on OS X).
>* Fix dumping tcp state in the reply direction.
>* Squash together flow_compose improvements (checksum and udp_len).
>
>v3 -> v4:
>* Rebase: use struct dp_packet_batch, add extra ct_ fields in some
>  new tests, use struct hmap_pos, skip some new system NAT tests.
>* Style and typo fixes.
>* Add OVS_NOT_REACHED() in switch in process_one().
>* New commit: use dl_type from flow or matching megaflow.
>
>v2 -> v3:
>* Rebased.
>* Squashed commits for flushing (in dpif-netdev and conntrack).
>* Squashed commits for dumping (in dpif-netdev and conntrack).
>* Use adaptive mutex instead of spinlock: this prevents livelock
>  if the cleanup thread is executed on the same CPU as a forwarding
>  thread.  Performance impact in minimal.
>* Validate L3 and L4 checksum.
>* Use proper L3 and L4 checksum in hardcoded packets in system and unit
>  tests.
>* Consider ICMPv6 as well as ICMP in l4_protos and conn_key_to_tuple.
>* Mention conntrack in NEWS and FAQ.md.
>* Use uint16_t for ct_state.
>* Fix possible NULL dereference for conn in process_one().
>* Add OVS_U128_MIN, OVS_U128_ZERO.
>* Use HMAP_FOR_EACH_POP.
>* Check that UDP length is valid.
>* Style fix: prefer 'sizeof *object' instead of 'sizeof type'
>* Don't accept packets from/to UDP/TCP port 0.
>* Use defines for timeouts.
>* Check expiration inside lookup loop in conn_key_lookup().
>* Limit the number of connections.
>* Simplify case if tcp_get_wscale().
>* Introduce general INT_MOD_* macros for comparisons in modular arithmetic.
>* Improve comments.
>* New cleanup mechanism: we keep connections in an ordered list and we have
>  a separate thread to performs the cleanup.  This doesn't block the main
>  thread for long intervals anymore.
>* Correctly fill UDP length and UDP/TCP/ICMP checksums in flow_compose():
>  it's useful to write testcases for the connection tracker.
>* Added system test with ICMP traffic through the connection tracker.
>* Track ICMP type and code.
>
>v1 -> v2:
>* Fixed bug in tcp_get_wscale(), related to TCP options parsing.
>* Changed names of ICMP constants: now they're different from Linux and
>  FreeBSD.
>* Fixed bug in parse_ipv6_ext_hdrs().
>* Used ALWAYS_INLINE in parse_vlan and parse_ethertype, to avoid a
>  performance regression in miniflow_extract().
>* Updated copyright info in COPYING and debian/copyright.in.
>* Rebased.
>* Changed batching strategy in conntrack_execute() to allow a newly
>  created connection to be picked up by packets in the same batch.
>* Added an ovs-test module to throw pcap files at the connection tracker.
>* Added a workaround for the userspace testsuite on new kernels and a tcp
>  non-conntrack test.
>
>
>
>Daniele Di Proietto (16):
>  packets: Define ICMP types.
>  flow: Export parse_ipv6_ext_hdrs().
>  flow: Introduce parse_dl_type().
>  conntrack: New userspace connection tracker.
>  conntrack: Periodically delete expired connections.
>  tests: Add very simple conntrack benchmark.
>  tests: Add test-conntrack pcap test.
>  dpif-netdev: Execute conntrack action.
>  dpif-netdev: Implement conntrack dump functions.
>  dpif-netdev: Implement conntrack flush interface.
>  flow: Generate checksum and udp_len in flow_compose().
>  tests: Ad

Re: [ovs-dev] [PATCH v5 05/16] conntrack: Periodically delete expired connections.

2016-07-28 Thread Daniele Di Proietto





On 27/07/2016 21:13, "Joe Stringer" <j...@ovn.org> wrote:

>On 27 July 2016 at 19:01, Daniele Di Proietto <diproiet...@vmware.com> wrote:
>>
>>
>>
>>
>>
>> On 27/07/2016 17:14, "Joe Stringer" <j...@ovn.org> wrote:
>>
>>>On 26 July 2016 at 17:58, Daniele Di Proietto <diproiet...@vmware.com> wrote:
>>>> This commit adds a thread that periodically removes expired connections.
>>>>
>>>> The expiration time of a connection can be expressed by:
>>>>
>>>> expiration = now + timeout
>>>>
>>>> For each possible 'timeout' value (there aren't many) we keep a list.
>>>> When the expiration is updated, we move the connection to the back of the
>>>> corresponding 'timeout' list. This ways, the list is always ordered by
>>>> 'expiration'.
>>>>
>>>> When the cleanup thread iterates through the lists for expired
>>>> connections, it can stop at the first non expired connection.
>>>>
>>>> Suggested-by: Joe Stringer <j...@ovn.org>
>>>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>>>
>>>Acked-by: Joe Stringer <j...@ovn.org>
>>
>> Thanks for the review!
>>
>>>
>>>Minor comments on comments below. Thanks!
>>>
>>>
>>>
>>>> +/* Cleanup:
>>>> + *
>>>
>>>Extra line.
>>
>> Fixed
>>
>>>
>>>> + * We must call conntrack_clean() periodically.  conntrack_clean() return
>>>> + * value gives an hint on when the next cleanup must be done (either 
>>>> because
>>>> + * there is an actual connection that expires, or because a new connection
>>>> + * might be created with the minimum timeout).
>>>> + *
>>>> + * The logic below has two goals:
>>>> + *
>>>> + * - Avoid calling conntrack_clean() too often.  If we call 
>>>> conntrack_clean()
>>>> + *   each time a connection expires, the thread will consume 100% CPU, so 
>>>> we
>>>> + *   try to call the function _at most_ once every CT_CLEAN_INTERVAL, to 
>>>> batch
>>>> + *   removal.
>>>
>>>Isn't it CT_CLEAN_MIN_INTERVAL that prevents calls happening too
>>>often? I would imagine that under high cps conditions where you're
>>>likely to peg 100% on cleanup, cleanup is behind and CT_CLEAN_INTERVAL
>>>logic won't come into the picture.
>>>
>>>> + * - On the other hand, it's not a good idea to keep the buckets locked 
>>>> for
>>>> + *   too long, as we might prevent traffic from flowing.  If 
>>>> conntrack_clean()
>>>> + *   returns a value which is in the past, it means that the internal 
>>>> limit
>>>> + *   has been reached and more cleanup is required.  In this case, just 
>>>> wait
>>>> + *   CT_CLEAN_MIN_INTERVAL before the next call.
>>>> + */
>>>
>>>Keeping the buckets locked too long also happens if you constantly
>>>call conntrack_clean(), so I think these two paragraphs are arguing
>>>slightly different angles for the same parameter.
>>>
>>>CT_CLEAN_MIN_INTERVAL ensures that if cleanup is behind, there is
>>>atleast some 200ms blocks of time when buckets will be left alone so
>>>the datapath can operate unhindered.
>>>
>>>CT_CLEAN_INTERVAL ensures that if we are coping with the current
>>>cleanup tasks, then we wait at least 5 seconds to do further cleanup.
>>>This seems like it's more targeted towards reducing wakeups when there
>>>is a wide distribution of timeouts but relatively small number of
>>>connections, that could be handled by less frequent cleanups.
>>>
>>>I like the "the logic has two goals" presentation of this, but maybe
>>>there is a better way we can frame the comment above?
>>
>> I couldn't have said it better, I almost stole your wording entirely:
>>
>> + * The logic below has two goals:
>> + *
>> + * - Avoid calling conntrack_clean() too often.  If we call 
>> conntrack_clean()
>> + *   each time a connection expires, the thread will consume 100% CPU, so we
>> + *   try to call the function _at most_ once every CT_CLEAN_INTERVAL, to 
>> batch
>> + *   removal.
>> + *
>> + * - On the other hand, it's not a good idea to keep the buckets locked for
>> + *   too long, as we might prevent traffic from fl

Re: [ovs-dev] [PATCH v5 05/16] conntrack: Periodically delete expired connections.

2016-07-27 Thread Daniele Di Proietto





On 27/07/2016 17:14, "Joe Stringer" <j...@ovn.org> wrote:

>On 26 July 2016 at 17:58, Daniele Di Proietto <diproiet...@vmware.com> wrote:
>> This commit adds a thread that periodically removes expired connections.
>>
>> The expiration time of a connection can be expressed by:
>>
>> expiration = now + timeout
>>
>> For each possible 'timeout' value (there aren't many) we keep a list.
>> When the expiration is updated, we move the connection to the back of the
>> corresponding 'timeout' list. This ways, the list is always ordered by
>> 'expiration'.
>>
>> When the cleanup thread iterates through the lists for expired
>> connections, it can stop at the first non expired connection.
>>
>> Suggested-by: Joe Stringer <j...@ovn.org>
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>Acked-by: Joe Stringer <j...@ovn.org>

Thanks for the review!

>
>Minor comments on comments below. Thanks!
>
>
>
>> +/* Cleanup:
>> + *
>
>Extra line.

Fixed

>
>> + * We must call conntrack_clean() periodically.  conntrack_clean() return
>> + * value gives an hint on when the next cleanup must be done (either because
>> + * there is an actual connection that expires, or because a new connection
>> + * might be created with the minimum timeout).
>> + *
>> + * The logic below has two goals:
>> + *
>> + * - Avoid calling conntrack_clean() too often.  If we call 
>> conntrack_clean()
>> + *   each time a connection expires, the thread will consume 100% CPU, so we
>> + *   try to call the function _at most_ once every CT_CLEAN_INTERVAL, to 
>> batch
>> + *   removal.
>
>Isn't it CT_CLEAN_MIN_INTERVAL that prevents calls happening too
>often? I would imagine that under high cps conditions where you're
>likely to peg 100% on cleanup, cleanup is behind and CT_CLEAN_INTERVAL
>logic won't come into the picture.
>
>> + * - On the other hand, it's not a good idea to keep the buckets locked for
>> + *   too long, as we might prevent traffic from flowing.  If 
>> conntrack_clean()
>> + *   returns a value which is in the past, it means that the internal limit
>> + *   has been reached and more cleanup is required.  In this case, just wait
>> + *   CT_CLEAN_MIN_INTERVAL before the next call.
>> + */
>
>Keeping the buckets locked too long also happens if you constantly
>call conntrack_clean(), so I think these two paragraphs are arguing
>slightly different angles for the same parameter.
>
>CT_CLEAN_MIN_INTERVAL ensures that if cleanup is behind, there is
>atleast some 200ms blocks of time when buckets will be left alone so
>the datapath can operate unhindered.
>
>CT_CLEAN_INTERVAL ensures that if we are coping with the current
>cleanup tasks, then we wait at least 5 seconds to do further cleanup.
>This seems like it's more targeted towards reducing wakeups when there
>is a wide distribution of timeouts but relatively small number of
>connections, that could be handled by less frequent cleanups.
>
>I like the "the logic has two goals" presentation of this, but maybe
>there is a better way we can frame the comment above?

I couldn't have said it better, I almost stole your wording entirely:

+ * The logic below has two goals:
+ *
+ * - Avoid calling conntrack_clean() too often.  If we call conntrack_clean()
+ *   each time a connection expires, the thread will consume 100% CPU, so we
+ *   try to call the function _at most_ once every CT_CLEAN_INTERVAL, to batch
+ *   removal.
+ *
+ * - On the other hand, it's not a good idea to keep the buckets locked for
+ *   too long, as we might prevent traffic from flowing.  If conntrack_clean()
+ *   returns a value which is in the past, it means that the internal limit
+ *   has been reached and more cleanup is required.  In this case, just wait
+ *   CT_CLEAN_MIN_INTERVAL before the next call.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Add Flow Control support.

2016-07-27 Thread Daniele Di Proietto
Thanks for the patch.

I agree, I'd be nice to document this in INSTALL.DPDK-ADVANCED.md

We should also document the new fields in vswitchd/vswitch.xml.

Probably it's better to use "true" and "false" rather that "on" and "off",
for consistency with other configuration options and so that we can use
smap_get_bool().

I assume it's ok to call rte_eth_dev_flow_ctrl_get()/_set() while the
device is transmitting/receiving.

Maybe it would be better to cache fc_conf in struct netdev_dpdk and call
_set() only if we have to make a change?

2016-07-22 6:18 GMT-07:00 Sugesh Chandran :

> Add support for flow-control(mac control frame) to DPDK enabled physical
> port types. By default, the flow-control is OFF on both rx and tx side.
> The flow control can be enabled/disabled either when adding a port to OVS
> or at run time.
>
> For eg:
> To enable flow control support at tx side while adding a port, add the
> 'tx-flow-ctrl' option to the 'ovs-vsctl add-port' command-line as below.
>
>  'ovs-vsctl add-port br0 dpdk0 -- \
>   set Interface dpdk0 type=dpdk options:tx-flow-ctrl=on'
>
> Similarly to enable rx flow control,
>  'ovs-vsctl add-port br0 dpdk0 -- \
>   set Interface dpdk0 type=dpdk options:rx-flow-ctrl=on'
>
> And to enable the flow control auto-negotiation,
>  'ovs-vsctl add-port br0 dpdk0 -- \
>   set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=on'
>
> To turn ON the tx flow control at run time(After the port is being added
> to OVS), the command-line input will be,
>  'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=on'
>
> The flow control parameters can be turned off by setting 'off' to the
> respective parameter. To turn off the flow control at tx side,
>  'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=off'
>
> Signed-off-by: Sugesh Chandran 
> ---
>  lib/netdev-dpdk.c | 68
> +++
>  1 file changed, 68 insertions(+)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 85b18fd..74efd25 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -634,6 +634,67 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int
> n_rxq, int n_txq)
>  return diag;
>  }
>
> +static void
> +dpdk_eth_parse_flow_ctrl(struct netdev_dpdk *dev,
> + const struct smap *args,
> + struct rte_eth_fc_conf *fc_conf)
> + OVS_REQUIRES(dev->mutex)
>

Minor: the other thread safety annotations are not aligned with the
parameters, but are just indented four spaces.


> +{
> +int ret = 0;
> +int rx_fc_en = 0;
> +int tx_fc_en = 0;
> +const char *rx_flow_mode;
> +const char *tx_flow_mode;
> +const char *flow_autoneg;
> +enum rte_eth_fc_mode fc_mode_set[2][2] = {{RTE_FC_NONE,
> RTE_FC_TX_PAUSE},
> +  {RTE_FC_RX_PAUSE,
> RTE_FC_FULL}
> + };
> +
> +ret = rte_eth_dev_flow_ctrl_get(dev->port_id, fc_conf);
> +if (ret != 0) {
> +VLOG_DBG("cannot get flow control parameters on port=%d, err=%s",
> + dev->port_id, rte_strerror(ret));
>

I'm not sure, do we need to change 'ret' sign before passing it to
rte_strerror()?


> +return;
> +}
> +rx_flow_mode = smap_get(args, "rx-flow-ctrl");
> +tx_flow_mode = smap_get(args, "tx-flow-ctrl");
> +flow_autoneg = smap_get(args, "flow-ctrl-autoneg");
> +if (rx_flow_mode) {
> +if (!strcmp(rx_flow_mode, "on")) {
> +rx_fc_en = 1;
> +}
> +else if (!strcmp(rx_flow_mode, "off")) {
> +rx_fc_en = 0;
> +}
> +}
> +if (tx_flow_mode) {
> +if (!strcmp(tx_flow_mode, "on")) {
> +tx_fc_en =1;
> +}
> +else if (!strcmp(tx_flow_mode, "off")) {
> +tx_fc_en =0;
> +}
> +}
> +if (flow_autoneg) {
> +if (!strcmp(flow_autoneg, "on")) {
> +fc_conf->autoneg = 1;
> +}
> +else if (!strcmp(flow_autoneg, "off")) {
> +fc_conf->autoneg = 0;
> +}
> +}
> +fc_conf->mode = fc_mode_set[tx_fc_en][rx_fc_en];
> +}
> +
> +static void
> +dpdk_eth_flow_ctrl_config(struct netdev_dpdk *dev,
> +  struct rte_eth_fc_conf *fc_conf)
> +  OVS_REQUIRES(dev->mutex)
>

Minor: the other thread safety annotations are not aligned with the
parameters, but are just indented four spaces.


> +{
> +if (rte_eth_dev_flow_ctrl_set(dev->port_id, fc_conf) != 0) {
>

I'd drop the != 0


> +VLOG_ERR("Failed to enable flow control on device %d",
> dev->port_id);
>

VLOG_WARN is probably enough


> +}
> +}
>
>  static int
>  dpdk_eth_dev_init(struct netdev_dpdk *dev) OVS_REQUIRES(dpdk_mutex)
> @@ -991,6 +1052,13 @@ netdev_dpdk_set_config(struct netdev *netdev, const
> struct smap *args)
>  dev->requested_n_rxq = new_n_rxq;
>  

Re: [ovs-dev] [PATCH 0/5] netdev_open conflicting types

2016-07-27 Thread Daniele Di Proietto
Looks good to me, applied to master, thanks!

2016-07-27 8:06 GMT-07:00 Thadeu Lima de Souza Cascardo :

> Fix some uses of type on netdev_open.
>
> We established that there are two types: the database type and the netdev
> type.
> And that ofproto and dpif layers use the netdev type and return it when
> queried.
>
> Some calls to netdev_open should use NULL instead of system. If there is no
> netdev opened, system will be used as the default, but if a different type
> is
> opened, the caller does not expect it to be of type system.
>
> In other cases, the use of internal is incorrect and the appropriate
> port_open_type should be used instead.
>
> Finally, we make netdev_open return an error when a netdev of a different
> type
> than the one requested already exists.
>
> This will fix some bugs and alert users when conflicting interfaces exist
> on the
> system and the database. For example, when a user configures an interface
> with a
> type other than system, and there is a system interface with the same name.
>
> Thadeu Lima de Souza Cascardo (5):
>   in-band: use open_type when opening internal device
>   in-band: don't use system type when opening netdev
>   netdev-vport: don't use system type when opening netdev
>   dpif-netdev: use the open_type when creating the local port
>   netdev: do not allow devices to be opened with conflicting types
>
>  lib/dpif-netdev.c  | 11 ++-
>  lib/netdev-vport.c |  2 +-
>  lib/netdev.c   |  8 +++-
>  ofproto/in-band.c  |  5 +++--
>  tests/dpctl.at | 13 +++--
>  5 files changed, 24 insertions(+), 15 deletions(-)
>
> --
> 2.7.4
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] ovs-numa: fixed cmask parse with 0x prefix

2016-07-27 Thread Daniele Di Proietto
Thanks for the patch.

We never accepted the 0x prefix for pmd-cpu-mask, but I guess there's no
harm in doing it and it might make user's life easier.

We always use braces, even for single statement, please read CodingStyle.md

https://github.com/openvswitch/ovs/blob/master/CodingStyle.md#statements

I cannot merge this unless you provide a signoff, the details and the
meaning is explained here:

https://github.com/openvswitch/ovs/blob/master/CONTRIBUTING.md#developers-certificate-of-origin

Thanks,

Daniele

2016-07-26 14:56 GMT-07:00 Wei Shen :

> Fixed a minor bug that would print out a confusing warning about core mask,
> "ovs_numa|WARN|Invalid cpu mask: x", when dpdl-lcore-mask has 0x prefix,
> e.g.
> 0x123, which is the convention used in INSTALL.DPDK.md.
> ---
>  lib/ovs-numa.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
> index c8173e0..c1938eb 100644
> --- a/lib/ovs-numa.c
> +++ b/lib/ovs-numa.c
> @@ -551,6 +551,10 @@ ovs_numa_set_cpu_mask(const char *cmask)
>  return;
>  }
>
> +/* Skip 0x if supplied in the cmask */
> +if (!strncmp(cmask, "0x", 2))
> +cmask += 2;
> +
>  for (i = strlen(cmask) - 1; i >= 0; i--) {
>  char hex = toupper((unsigned char)cmask[i]);
>  int bin, j;
> --
> 2.5.5
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v5] netdev-dpdk: Set pmd thread priority

2016-07-27 Thread Daniele Di Proietto
Thanks for the patch, the implementation looks good to me too.

During testing I kept noticing that it's way too easy to make OVS
completely unresponsive.  As you point out in the documentation by having
dpdk-lcore-mask the same as pmd-cpu-mask, OVS cannot even be killed (a kill
-9 is required).  I wonder what happens if one tries to set pmd-cpu-mask to
every core in the system.

As a way to mitigate the risk perhaps we can avoid setting the main thread
affinity to the first core in dpdk-lcore-mask by _always_ restoring it in
dpdk_init__(), also if auto_determine is false.

Perhaps we should start explicitly prohibiting creating a pmd thread on the
first core in dpdk-lcore-mask (I get why previous version of this didn't do
it on core 0.  Perhaps we can generalize that to the first core in
dpdk-lcore-mask).

What's the behavior of other DPDK applications?

Thanks,

Daniele

2016-07-27 5:28 GMT-07:00 Kavanagh, Mark B :

> >
> >Set the DPDK pmd thread scheduling policy to SCHED_RR and static
> >priority to highest priority value of the policy. This is to deal with
> >pmd thread starvation case where another cpu hogging process can get
> >scheduled/affinitized on to the same core the pmd thread is running
> >there by significantly impacting the datapath performance.
> >
> >Setting the realtime scheduling policy to the pmd threads is one step
> >towards Fastpath Service Assurance in OVS DPDK.
> >
> >The realtime scheduling policy is applied only when CPU mask is passed
> >to 'pmd-cpu-mask'. For example:
> >
> >* In the absence of pmd-cpu-mask, one pmd thread shall be created
> >  and default scheduling policy and priority gets applied.
> >
> >* If pmd-cpu-mask is specified, one or more pmd threads shall be
> >  spawned on the corresponding core(s) in the mask and real time
> >  scheduling policy SCHED_RR and highest priority of the policy is
> >  applied to the pmd thread(s).
> >
> >To reproduce the pmd thread starvation case:
> >
> >ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
> >taskset 0x2 cat /dev/zero > /dev/null &
> >
> >With this commit, it is recommended that the OVS control thread and pmd
> >thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask'
> >should be non-overlapping). Also other processes with same affinity as
> >PMD thread will be unresponsive.
> >
> >Signed-off-by: Bhanuprakash Bodireddy 
>
> LGTM - Acked-by: mark.b.kavan...@intel.com
>
> >---
> >v4->v5:
> >* Reword Note section in DPDK-ADVANCED.md
> >
> >v3->v4:
> >* Document update
> >* Use ovs_strerror for reporting errors in lib-numa.c
> >
> >v2->v3:
> >* Move set_priority() function to lib/ovs-numa.c
> >* Apply realtime scheduling policy and priority to pmd thread only if
> >  pmd-cpu-mask is passed.
> >* Update INSTALL.DPDK-ADVANCED.
> >
> >v1->v2:
> >* Removed #ifdef and introduced dummy function "pmd_thread_setpriority"
> >  in netdev-dpdk.h
> >* Rebase
> >
> > INSTALL.DPDK-ADVANCED.md | 17 +
> > lib/dpif-netdev.c|  9 +
> > lib/ovs-numa.c   | 18 ++
> > lib/ovs-numa.h   |  1 +
> > 4 files changed, 41 insertions(+), 4 deletions(-)
> >
> >diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> >index 9ae536d..d76cb4e 100644
> >--- a/INSTALL.DPDK-ADVANCED.md
> >+++ b/INSTALL.DPDK-ADVANCED.md
> >@@ -205,8 +205,10 @@ needs to be affinitized accordingly.
> > pmd thread is CPU bound, and needs to be affinitized to isolated
> > cores for optimum performance.
> >
> >-By setting a bit in the mask, a pmd thread is created and pinned
> >-to the corresponding CPU core. e.g. to run a pmd thread on core 2
> >+By setting a bit in the mask, a pmd thread is created, pinned
> >+to the corresponding CPU core and the scheduling policy SCHED_RR
> >+along with maximum priority of the policy applied to the pmd thread.
> >+e.g. to pin a pmd thread on core 2
> >
> > `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4`
> >
> >@@ -234,8 +236,10 @@ needs to be affinitized accordingly.
> >   responsible for different ports/rxq's. Assignment of ports/rxq's to
> >   pmd threads is done automatically.
> >
> >-  A set bit in the mask means a pmd thread is created and pinned
> >-  to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
> >+  A set bit in the mask means a pmd thread is created, pinned to the
> >+  corresponding CPU core and the scheduling policy SCHED_RR with highest
> >+  priority of the scheduling policy applied to pmd thread.
> >+  e.g. to run pmd threads on core 1 and 2
> >
> >   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
> >
> >@@ -246,6 +250,11 @@ needs to be affinitized accordingly.
> >
> >   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
> >
> >+  Note: It is recommended that the OVS control thread and pmd thread
> >+  shouldn't be pinned to same core i.e 'dpdk-lcore-mask' and
> 

Re: [ovs-dev] [PATCH v5 13/16] system-tests: Run conntrack tests with userspace.

2016-07-27 Thread Daniele Di Proietto





On 27/07/2016 13:12, "Joe Stringer" <j...@ovn.org> wrote:

>On 26 July 2016 at 17:58, Daniele Di Proietto <diproiet...@vmware.com> wrote:
>> The userspace connection tracker doesn't support ALGs, frag reassembly
>> or NAT yet, so skip those tests.
>>
>> Also, connection tracking state input from a local port is not possible
>> in userspace.
>>
>> The userspace datapath pads all frames with 0, to make them at
>> least 64 bytes.
>>
>> Finally, the userspace datapath checks for the IPv4 header checksum, so
>> fix those in the hardcoded packets.
>>
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>> Acked-by: Joe Stringer <j...@ovn.org>
>> Acked-by: Flavio Leitner <f...@sysclose.org>
>> ---
>
>
>
>> @@ -1324,11 +1327,11 @@ dnl UDP packets from ns0->ns1 should solicit 
>> "destination unreachable" response.
>>  NS_CHECK_EXEC([at_ns0], [bash -c "echo a | nc $NC_EOF_OPT -u 10.1.1.2 
>> 1"])
>>
>>  AT_CHECK([ovs-appctl revalidator/purge], [0])
>> -AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop], 
>> [0], [dnl
>> - n_packets=1, n_bytes=44, priority=100,udp,in_port=1 
>> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
>> - n_packets=1, n_bytes=72, 
>> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 actions=output:1
>> - n_packets=1, n_bytes=72, priority=100,ct_state=-trk,icmp,in_port=2 
>> actions=ct(table=0)
>> - n_packets=2, n_bytes=84, priority=10,arp actions=NORMAL
>> +AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop | 
>> sed -e 's/n_bytes=[[0-9]]*/n_bytes=/g'], [0], [dnl
>> + n_packets=1, n_bytes=, priority=100,udp,in_port=1 
>> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
>> + n_packets=1, n_bytes=, 
>> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 actions=output:1
>> + n_packets=1, n_bytes=, priority=100,ct_state=-trk,icmp,in_port=2 
>> actions=ct(table=0)
>> + n_packets=2, n_bytes=, priority=10,arp actions=NORMAL
>>  NXST_FLOW reply:
>>  ])
>
>I think this is a completely orthogonal point, but it's still a bit
>surprising to me that the n_bytes would differ when receiving short
>packets in kernel vs. userspace datapaths. I follow that userspace
>pads shorter packets on receive, but shouldn't we be able to attribute
>these stats consistently, regardless of the datapath?

That's a good point.

We call dp_packet_pad() in netdev_linux_recv().  That used to be in 
netdev_recv() and can be traced back to the initial OVS commit.  Here's a 
comment in netdev_recv():

COVERAGE_INC(netdev_received);
buffer->size += n_bytes;

/* When the kernel internally sends out an Ethernet frame on an
 * interface, it gives us a copy *before* padding the frame to the
 * minimum length.  Thus, when it sends out something like an ARP
 * request, we see a too-short frame.  So pad it out to the minimum
 * length. */
pad_to_minimum_length(buffer);
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] tests: Add new pmd test for pmd-rxq-affinity.

2016-07-27 Thread Daniele Di Proietto
This tests that the newly introduced pmd-rxq-affinity option works as
intended, at least for a single port.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 tests/pmd.at | 53 +
 1 file changed, 53 insertions(+)

diff --git a/tests/pmd.at b/tests/pmd.at
index 47639b6..3052f95 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -461,3 +461,56 @@ 
icmp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10
 
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([PMD - rxq affinity])
+OVS_VSWITCHD_START(
+  [], [], [], [--dummy-numa 0,0,0,0,0,0,0,0,0])
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
+
+AT_CHECK([ovs-ofctl add-flow br0 actions=controller])
+
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=1fe])
+
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dummy-pmd 
ofport_request=1 options:n_rxq=4 
other_config:pmd-rxq-affinity="0:3,1:7,2:2,3:8"])
+
+dnl The rxqs should be on the requested cores.
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], [dnl
+p1 0 0 3
+p1 1 0 7
+p1 2 0 2
+p1 3 0 8
+])
+
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6])
+
+dnl We removed the cores requested by some queues from pmd-cpu-mask.
+dnl Those queues will not be polled.
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], [dnl
+p1 2 0 2
+])
+
+AT_CHECK([ovs-vsctl remove Interface p1 other_config pmd-rxq-affinity])
+
+dnl We removed the rxq-affinity request.  dpif-netdev should assign queues
+dnl in a round robin fashion.  We just make sure that every rxq is being
+dnl polled again.
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show | cut -f 
1,2 -d ' ' | sort], [0], [dnl
+p1 0
+p1 1
+p1 2
+p1 3
+])
+
+AT_CHECK([ovs-vsctl set Interface p1 other_config:pmd-rxq-affinity='0:1'])
+
+dnl We explicitly requested core 1 for queue 0.  Core 1 becomes isolated and
+dnl every other queue goes to core 2.
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], [dnl
+p1 0 0 1
+p1 1 0 2
+p1 2 0 2
+p1 3 0 2
+])
+
+OVS_VSWITCHD_STOP(["/dpif_netdev|WARN|There is no PMD thread on core/d"])
+AT_CLEANUP
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v5 1/4] dpif-netdev: XPS (Transmit Packet Steering) implementation.

2016-07-27 Thread Daniele Di Proietto
I don't think dynamic_txqs should be atomic, since we change it when the pmd 
threads are stopped.

Also, in port_create() we should check for 'netdev_n_txq(netdev) < n_cores + 1' 
after we reconfigure the device.

Other than that this looks good to me, so I applied the following incremental 
and pushed this to master.

Thanks,

Daniele


---8<---
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d1ba6f3..d45aba0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -258,7 +258,7 @@ struct dp_netdev_port {
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
 struct netdev_rxq **rxq;
-atomic_bool dynamic_txqs;   /* If true XPS will be used. */
+bool dynamic_txqs;  /* If true XPS will be used. */
 unsigned *txq_used; /* Number of threads that uses each tx queue. 
*/
 struct ovs_mutex txq_used_mutex;
 char *type; /* Port type as requested by user. */
@@ -1151,6 +1151,7 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 enum netdev_flags flags;
 struct netdev *netdev;
 int n_open_rxqs = 0;
+int n_cores = 0;
 int i, error;
 bool dynamic_txqs = false;

@@ -1171,7 +1172,7 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 }

 if (netdev_is_pmd(netdev)) {
-int n_cores = ovs_numa_get_n_cores();
+n_cores = ovs_numa_get_n_cores();

 if (n_cores == OVS_CORE_UNSPEC) {
 VLOG_ERR("%s, cannot get cpu core info", devname);
@@ -1186,9 +1187,6 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 VLOG_ERR("%s, cannot set multiq", devname);
 goto out;
 }
-if (netdev_n_txq(netdev) < n_cores + 1) {
-dynamic_txqs = true;
-}
 }

 if (netdev_is_reconf_required(netdev)) {
@@ -1198,6 +1196,12 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 }
 }

+if (netdev_is_pmd(netdev)) {
+if (netdev_n_txq(netdev) < n_cores + 1) {
+dynamic_txqs = true;
+}
+}
+
 port = xzalloc(sizeof *port);
 port->port_no = port_no;
 port->netdev = netdev;
@@ -1206,7 +1210,7 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 port->txq_used = xcalloc(netdev_n_txq(netdev), sizeof *port->txq_used);
 port->type = xstrdup(type);
 ovs_mutex_init(>txq_used_mutex);
-atomic_init(>dynamic_txqs, dynamic_txqs);
+port->dynamic_txqs = dynamic_txqs;

 for (i = 0; i < port->n_rxq; i++) {
 error = netdev_rxq_open(netdev, >rxq[i], i);
@@ -2718,8 +2722,7 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
 seq_change(dp->port_seq);
 port_destroy(port);
 } else {
-atomic_init(>dynamic_txqs,
-netdev_n_txq(port->netdev) < n_cores + 1);
+port->dynamic_txqs = netdev_n_txq(port->netdev) < n_cores + 1;
 }
 }
 /* Restores the non-pmd. */
@@ -4015,11 +4018,9 @@ dpif_netdev_xps_revalidate_pmd(const struct 
dp_netdev_pmd_thread *pmd,
 struct tx_port *tx;
 struct dp_netdev_port *port;
 long long interval;
-bool dynamic_txqs;

 HMAP_FOR_EACH (tx, node, >port_cache) {
-atomic_read_relaxed(>port->dynamic_txqs, _txqs);
-if (dynamic_txqs) {
+if (tx->port->dynamic_txqs) {
 continue;
 }
 interval = now - tx->last_used;
@@ -4156,7 +4157,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 int tx_qid;
 bool dynamic_txqs;

-atomic_read_relaxed(>port->dynamic_txqs, _txqs);
+dynamic_txqs = p->port->dynamic_txqs;
 if (dynamic_txqs) {
 tx_qid = dpif_netdev_xps_get_tx_qid(pmd, p, now);
 } else {

---8<---


On 27/07/2016 07:44, "Ilya Maximets"  wrote:

>If CPU number in pmd-cpu-mask is not divisible by the number of queues and
>in a few more complex situations there may be unfair distribution of TX
>queue-ids between PMD threads.
>
>For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
>such distribution is possible:
><>
>pmd thread numa_id 0 core_id 13:
>port: vhost-user1   queue-id: 1
>port: dpdk0 queue-id: 3
>pmd thread numa_id 0 core_id 14:
>port: vhost-user1   queue-id: 2
>pmd thread numa_id 0 core_id 16:
>port: dpdk0 queue-id: 0
>pmd thread numa_id 0 core_id 17:
>port: dpdk0 queue-id: 1
>pmd thread numa_id 0 core_id 12:
>port: vhost-user1   queue-id: 0
>port: dpdk0 queue-id: 2
>pmd thread numa_id 0 core_id 15:
>port: vhost-user1   queue-id: 3
><>
>
>As we can see above dpdk0 port 

Re: [ovs-dev] [PATCH v5 4/4] dpif-netdev: Introduce pmd-rxq-affinity.

2016-07-27 Thread Daniele Di Proietto
ofputil_parse_key_value() to parse the affinity seems like a good idea, thanks!


I got a compiler warning on an unused variable.  I fixed that and applied the 
series to master.

Thanks,

Daniele



On 27/07/2016 07:44, "Ilya Maximets"  wrote:

>New 'other_config:pmd-rxq-affinity' field for Interface table to
>perform manual pinning of RX queues to desired cores.
>
>This functionality is required to achieve maximum performance because
>all kinds of ports have different cost of rx/tx operations and
>only user can know about expected workload on different ports.
>
>Example:
>   # ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
> other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>   Queue #0 pinned to core 3;
>   Queue #1 pinned to core 7;
>   Queue #2 not pinned.
>   Queue #3 pinned to core 8;
>
>It's decided to automatically isolate cores that have rxq explicitly
>assigned to them because it's useful to keep constant polling rate on
>some performance critical ports while adding/deleting other ports
>without explicit pinning of all ports.
>
>Signed-off-by: Ilya Maximets 
>---
> INSTALL.DPDK.md  |  49 +++-
> NEWS |   2 +
> lib/dpif-netdev.c| 216 +--
> tests/pmd.at |   6 ++
> vswitchd/vswitch.xml |  23 ++
> 5 files changed, 254 insertions(+), 42 deletions(-)
>
>diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>index 5407794..7609aa7 100644
>--- a/INSTALL.DPDK.md
>+++ b/INSTALL.DPDK.md
>@@ -289,14 +289,57 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
>  # Check current stats
>ovs-appctl dpif-netdev/pmd-stats-show
> 
>+ # Clear previous stats
>+   ovs-appctl dpif-netdev/pmd-stats-clear
>+ ```
>+
>+  7. Port/rxq assigment to PMD threads
>+
>+ ```
>  # Show port/rxq assignment
>ovs-appctl dpif-netdev/pmd-rxq-show
>+ ```
> 
>- # Clear previous stats
>-   ovs-appctl dpif-netdev/pmd-stats-clear
>+ To change default rxq assignment to pmd threads rxqs may be manually
>+ pinned to desired cores using:
>+
>+ ```
>+ ovs-vsctl set Interface  \
>+   other_config:pmd-rxq-affinity=
>  ```
>+ where:
>+
>+ ```
>+  ::= NULL | 
>+  ::=  |
>+   , 
>+  ::=  : 
>+ ```
>+
>+ Example:
>+
>+ ```
>+ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>+   other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>+
>+ Queue #0 pinned to core 3;
>+ Queue #1 pinned to core 7;
>+ Queue #2 not pinned.
>+ Queue #3 pinned to core 8;
>+ ```
>+
>+ After that PMD threads on cores where RX queues was pinned will become
>+ `isolated`. This means that this thread will poll only pinned RX queues.
>+
>+ WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX 
>queues
>+ will not be polled. Also, if provided `core_id` is not available (ex. 
>this
>+ `core_id` not in `pmd-cpu-mask`), RX queue will not be polled by any
>+ PMD thread.
>+
>+ Isolation of PMD threads also can be checked using
>+ `ovs-appctl dpif-netdev/pmd-rxq-show` command.
> 
>-  7. Stop vswitchd & Delete bridge
>+  8. Stop vswitchd & Delete bridge
> 
>  ```
>  ovs-appctl -t ovs-vswitchd exit
>diff --git a/NEWS b/NEWS
>index 73d3fcf..1a34f75 100644
>--- a/NEWS
>+++ b/NEWS
>@@ -45,6 +45,8 @@ Post-v2.5.0
>Old 'other_config:n-dpdk-rxqs' is no longer supported.
>Not supported by vHost interfaces. For them number of rx and tx queues
>is applied from connected virtio device.
>+ * New 'other_config:pmd-rxq-affinity' field for PMD interfaces, that
>+   allows to pin port's rx queues to desired cores.
>  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
>assignment.
>  * Type of log messages from PMD threads changed from INFO to DBG.
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index 1ef0cd7..33f1216 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -53,7 +53,9 @@
> #include "openvswitch/list.h"
> #include "openvswitch/match.h"
> #include "openvswitch/ofp-print.h"
>+#include "openvswitch/ofp-util.h"
> #include "openvswitch/ofpbuf.h"
>+#include "openvswitch/shash.h"
> #include "openvswitch/vlog.h"
> #include "ovs-numa.h"
> #include "ovs-rcu.h"
>@@ -62,7 +64,7 @@
> #include "pvector.h"
> #include "random.h"
> #include "seq.h"
>-#include "openvswitch/shash.h"
>+#include "smap.h"
> #include "sset.h"
> #include "timeval.h"
> #include "tnl-neigh-cache.h"
>@@ -252,6 +254,12 @@ enum pmd_cycles_counter_type {
> 
> #define XPS_TIMEOUT_MS 500LL
> 
>+/* Contained by struct dp_netdev_port's 'rxqs' member.  */
>+struct dp_netdev_rxq {
>+struct netdev_rxq *rxq;
>+unsigned core_id;   /* Сore to which this queue is pinned. */
>+};
>+
> /* A port in a netdev-based datapath. */
> struct dp_netdev_port {
> odp_port_t 

[ovs-dev] [PATCH v5 04/16] conntrack: New userspace connection tracker.

2016-07-26 Thread Daniele Di Proietto
This commit adds the conntrack module.

It is a connection tracker that resides entirely in userspace.  Its
primary user will be the dpif-netdev datapath.

The module main goal is to provide conntrack_execute(), which offers a
convenient interface to implement the datapath ct() action.

The conntrack module uses two submodules to deal with the l4 protocol
details (conntrack-other for UDP and ICMP, conntrack-tcp for TCP).

The conntrack-tcp submodule implementation is adapted from FreeBSD's pf
subsystem, therefore it's BSD licensed.  It has been slightly altered to
match the OVS coding style and to allow the pickup of already
established connections.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Antonio Fischetti <antonio.fische...@intel.com>
Acked-by: Joe Stringer <j...@ovn.org>
---
 COPYING |   1 +
 debian/copyright.in |   4 +
 include/openvswitch/types.h |   4 +
 lib/automake.mk |   5 +
 lib/conntrack-other.c   |  85 +
 lib/conntrack-private.h |  89 +
 lib/conntrack-tcp.c | 462 +++
 lib/conntrack.c | 890 
 lib/conntrack.h | 150 
 lib/util.h  |   9 +
 10 files changed, 1699 insertions(+)
 create mode 100644 lib/conntrack-other.c
 create mode 100644 lib/conntrack-private.h
 create mode 100644 lib/conntrack-tcp.c
 create mode 100644 lib/conntrack.c
 create mode 100644 lib/conntrack.h

diff --git a/COPYING b/COPYING
index 308e3ea..afb98b9 100644
--- a/COPYING
+++ b/COPYING
@@ -25,6 +25,7 @@ License, version 2.
 The following files are licensed under the 2-clause BSD license.
 include/windows/getopt.h
 lib/getopt_long.c
+lib/conntrack-tcp.c
 
 The following files are licensed under the 3-clause BSD-license
 include/windows/netinet/icmp6.h
diff --git a/debian/copyright.in b/debian/copyright.in
index 57d007a..a15f4dd 100644
--- a/debian/copyright.in
+++ b/debian/copyright.in
@@ -21,6 +21,9 @@ Upstream Copyright Holders:
Copyright (c) 2014 Michael Chapman
Copyright (c) 2014 WindRiver, Inc.
Copyright (c) 2014 Avaya, Inc.
+   Copyright (c) 2001 Daniel Hartmeier
+   Copyright (c) 2002 - 2008 Henning Brauer
+   Copyright (c) 2012 Gleb Smirnoff <gleb...@freebsd.org>
 
 License:
 
@@ -90,6 +93,7 @@ License:
lib/getopt_long.c
include/windows/getopt.h
datapath-windows/ovsext/Conntrack-tcp.c
+   lib/conntrack-tcp.c
 
 * The following files are licensed under the 3-clause BSD-license
 
diff --git a/include/openvswitch/types.h b/include/openvswitch/types.h
index da56d4b..2f5fcca 100644
--- a/include/openvswitch/types.h
+++ b/include/openvswitch/types.h
@@ -108,6 +108,10 @@ static const ovs_u128 OVS_U128_MAX = { { UINT32_MAX, 
UINT32_MAX,
  UINT32_MAX, UINT32_MAX } };
 static const ovs_be128 OVS_BE128_MAX OVS_UNUSED = { { OVS_BE32_MAX, 
OVS_BE32_MAX,
OVS_BE32_MAX, OVS_BE32_MAX } };
+static const ovs_u128 OVS_U128_MIN OVS_UNUSED = { {0, 0, 0, 0} };
+static const ovs_u128 OVS_BE128_MIN OVS_UNUSED = { {0, 0, 0, 0} };
+
+#define OVS_U128_ZERO OVS_U128_MIN
 
 /* A 64-bit value, in network byte order, that is only aligned on a 32-bit
  * boundary. */
diff --git a/lib/automake.mk b/lib/automake.mk
index 71c9d41..b1da53d 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -49,6 +49,11 @@ lib_libopenvswitch_la_SOURCES = \
lib/compiler.h \
lib/connectivity.c \
lib/connectivity.h \
+   lib/conntrack-private.h \
+   lib/conntrack-tcp.c \
+   lib/conntrack-other.c \
+   lib/conntrack.c \
+   lib/conntrack.h \
lib/coverage.c \
lib/coverage.h \
lib/crc32c.c \
diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c
new file mode 100644
index 000..295cb2c
--- /dev/null
+++ b/lib/conntrack-other.c
@@ -0,0 +1,85 @@
+/*
+ * Copyright (c) 2015, 2016 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include "conntrack-private.h"
+#include "dp-packet.h"
+
+enum other_state {
+OTHERS_FIRST,
+OTHERS_MULTIPLE,
+OTHERS_BIDIR,
+};
+
+struct conn_other {
+struct conn up;
+enum other_state state;
+};
+
+static const enum ct_timeout other_timeouts[] = {
+[OTHERS_FIRST] = 

[ovs-dev] [PATCH v5 12/16] tests: Add conntrack ofproto-dpif tests.

2016-07-26 Thread Daniele Di Proietto
While the system testsuite already has connection tracking tests, it
will be still useful to add some to the standard testsuite because:

* They're run more often by developers.
* Some of them are more interesting for the userspace datapath.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Flavio Leitner <f...@sysclose.org>
---
 tests/ofproto-dpif.at | 678 ++
 1 file changed, 678 insertions(+)

diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 67bb5e2..19ff4ce 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -8118,5 +8118,683 @@ AT_CHECK([grep "Final flow:" stdout], [0], [Final flow: 
unchanged
 AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(100)'], [0], [stdout])
 AT_CHECK([grep "Final flow:" stdout], [0], [Final flow: unchanged
 ])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto-dpif - conntrack - controller])
+OVS_VSWITCHD_START
+
+add_of_ports br0 1 2
+
+AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg vconn:info ofproto_dpif:info])
+
+dnl Allow new connections on p1->p2, but not on p2->p1.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(commit,zone=0),controller
+priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0,zone=0)
+priority=100,in_port=2,ct_state=+trk+est-new,udp,action=controller
+])
+
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl -P nxt_packet_in --detach 
--no-chdir --pidfile 2> ofctl_monitor.log])
+
+AT_CHECK([ovs-appctl netdev-dummy/receive p2 
'in_port(2),eth(src=50:54:00:00:00:0a,dst=50:54:00:00:00:09),eth_type(0x0800),ipv4(src=10.1.1.2,dst=10.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=2,dst=1)'])
+
+dnl OK, now start a new connection from port 1.
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=17,tos=0,ttl=64,frag=no),udp(src=1,dst=2)'])
+
+dnl Now try a reply from port 2.
+AT_CHECK([ovs-appctl netdev-dummy/receive p2 
'in_port(2),eth(src=50:54:00:00:00:0a,dst=50:54:00:00:00:09),eth_type(0x0800),ipv4(src=10.1.1.2,dst=10.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=2,dst=1)'])
+
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 4])
+OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
+
+dnl Check this output. We only see the latter two packets, not the first.
+dnl Note that the first packet doesn't have the ct_state bits set. This
+dnl happens because the ct_state field is available only after recirc.
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=60 in_port=1 (via action) 
data_len=60 (unbuffered)
+udp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=1,tp_dst=2
 udp_csum:e9d6
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=60 
ct_state=est|rpl|trk,in_port=2 (via action) data_len=60 (unbuffered)
+udp,vlan_tci=0x,dl_src=50:54:00:00:00:0a,dl_dst=50:54:00:00:00:09,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=2,tp_dst=1
 udp_csum:e9d6
+])
+
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl -P nxt_packet_in --detach 
--no-chdir --pidfile 2> ofctl_monitor.log])
+
+dnl OK, now start a second connection from port 1
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=17,tos=0,ttl=64,frag=no),udp(src=3,dst=4)'])
+
+dnl Now try a reply from port 2.
+AT_CHECK([ovs-appctl netdev-dummy/receive p2 
'in_port(2),eth(src=50:54:00:00:00:0a,dst=50:54:00:00:00:09),eth_type(0x0800),ipv4(src=10.1.1.2,dst=10.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=4,dst=3)'])
+
+
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 4])
+OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
+
+dnl Check this output. We should see both packets
+dnl Note that the first packet doesn't have the ct_state bits set. This
+dnl happens because the ct_state field is available only after recirc.
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=60 in_port=1 (via action) 
data_len=60 (unbuffered)
+udp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=3,tp_dst=4
 udp_csum:e9d2
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=60 
ct_state=est|rpl|trk,in_port=2 (via action) data_len=60 (unbuffered)
+udp,vlan_tci=0x,dl_src=50:54:00:00:00:0a,dl_dst=50:54:00:00:00:09,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=4,tp_dst=3
 udp_csum:e9d2
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto-dpif - conntrack - ipv6])
+OVS_VSWITCHD_START
+
+add_of_ports br0 1 2
+
+AT_CHECK([ovs-appctl vlo

[ovs-dev] [PATCH v5 15/16] conntrack: Track ICMP type and code.

2016-07-26 Thread Daniele Di Proietto
From the connection tracker perspective, an ICMP connection is a tuple
identified by source ip address, destination ip address and ICMP id.

While this allows basic ICMP traffic (pings) to work, it doesn't take
into account the icmp type: the connection tracker will allow
requests/replies in any directions.

This is improved by making the ICMP type and code part of the connection
tuple.  An ICMP echo request packet from A to B, will create a
connection that matches ICMP echo request from A to B and ICMP echo
replies from B to A.  The same is done for timestamp and info
request/replies, and for ICMPv6.

A new modules conntrack-icmp is implemented, to allow only "request"
types to create new connections.

Also, since they're tracked in both userspace and kernel
implementations, ICMP type and code are always printed in ct-dpif (a few
testcase are updated as a consequence).

Reported-by: Subramani Paramasivam <subramani.paramasi...@wipro.com>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Joe Stringer <j...@ovn.org>
---
 lib/automake.mk |   1 +
 lib/conntrack-icmp.c| 105 
 lib/conntrack-private.h |  11 -
 lib/conntrack.c |  62 
 lib/conntrack.h |   2 +
 lib/ct-dpif.c   |  24 ---
 lib/ct-dpif.h   |   3 +-
 lib/netlink-conntrack.c |   2 +-
 tests/system-traffic.at |  12 +++---
 9 files changed, 188 insertions(+), 34 deletions(-)
 create mode 100644 lib/conntrack-icmp.c

diff --git a/lib/automake.mk b/lib/automake.mk
index b1da53d..4110e5f 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -49,6 +49,7 @@ lib_libopenvswitch_la_SOURCES = \
lib/compiler.h \
lib/connectivity.c \
lib/connectivity.h \
+   lib/conntrack-icmp.c \
lib/conntrack-private.h \
lib/conntrack-tcp.c \
lib/conntrack-other.c \
diff --git a/lib/conntrack-icmp.c b/lib/conntrack-icmp.c
new file mode 100644
index 000..40fd1d8
--- /dev/null
+++ b/lib/conntrack-icmp.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright (c) 2015, 2016 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "conntrack-private.h"
+#include "dp-packet.h"
+
+enum icmp_state {
+ICMPS_FIRST,
+ICMPS_REPLY,
+};
+
+struct conn_icmp {
+struct conn up;
+enum icmp_state state;
+};
+
+static const enum ct_timeout icmp_timeouts[] = {
+[ICMPS_FIRST] = CT_TM_ICMP_FIRST,
+[ICMPS_REPLY] = CT_TM_ICMP_REPLY,
+};
+
+static struct conn_icmp *
+conn_icmp_cast(const struct conn *conn)
+{
+return CONTAINER_OF(conn, struct conn_icmp, up);
+}
+
+static enum ct_update_res
+icmp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb,
+ struct dp_packet *pkt OVS_UNUSED, bool reply, long long now)
+{
+struct conn_icmp *conn = conn_icmp_cast(conn_);
+
+if (reply && conn->state != ICMPS_REPLY) {
+conn->state = ICMPS_REPLY;
+}
+
+conn_update_expiration(ctb, >up, icmp_timeouts[conn->state], now);
+
+return CT_UPDATE_VALID;
+}
+
+static bool
+icmp4_valid_new(struct dp_packet *pkt)
+{
+struct icmp_header *icmp = dp_packet_l4(pkt);
+
+return icmp->icmp_type == ICMP4_ECHO_REQUEST
+   || icmp->icmp_type == ICMP4_INFOREQUEST
+   || icmp->icmp_type == ICMP4_TIMESTAMP;
+}
+
+static bool
+icmp6_valid_new(struct dp_packet *pkt)
+{
+struct icmp6_header *icmp6 = dp_packet_l4(pkt);
+
+return icmp6->icmp6_type == ICMP6_ECHO_REQUEST;
+}
+
+static struct conn *
+icmp_new_conn(struct conntrack_bucket *ctb, struct dp_packet *pkt OVS_UNUSED,
+   long long now)
+{
+struct conn_icmp *conn;
+
+conn = xzalloc(sizeof *conn);
+conn->state = ICMPS_FIRST;
+
+conn_init_expiration(ctb, >up, icmp_timeouts[conn->state], now);
+
+return >up;
+}
+
+struct ct_l4_proto ct_proto_icmp4 = {
+.new_conn = icmp_new_conn,
+.valid_new = icmp4_valid_new,
+.conn_update = icmp_conn_update,
+};
+
+struct ct_l4_proto ct_proto_icmp6 = {
+.new_conn = icmp_new_conn,
+.valid_new = icmp6_valid_new,
+.conn_update = icmp_conn_update,
+};
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index df32525..013f19f 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -

[ovs-dev] [PATCH v5 06/16] tests: Add very simple conntrack benchmark.

2016-07-26 Thread Daniele Di Proietto
This introduces a very limited but simple benchmark for
conntrack_execute(). It just sends repeatedly the same batch of packets
through the connection tracker and returns the time spent to process
them.

While this is not a realistic benchmark, it has proven useful during
development to evaluate different batching and locking strategies.

E.g. the line:

`./tests/ovstest test-conntrack benchmark 1 1488 32`

starts 1 thread that will send 1488 packets to the connection
tracker, 32 at a time. It will print the time taken to process them.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Flavio Leitner <f...@sysclose.org>
---
 tests/automake.mk  |   1 +
 tests/test-conntrack.c | 172 +
 2 files changed, 173 insertions(+)
 create mode 100644 tests/test-conntrack.c

diff --git a/tests/automake.mk b/tests/automake.mk
index 575ffeb..a9ebf91 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -328,6 +328,7 @@ tests_ovstest_SOURCES = \
tests/test-classifier.c \
tests/test-ccmap.c \
tests/test-cmap.c \
+   tests/test-conntrack.c \
tests/test-csum.c \
tests/test-flows.c \
tests/test-hash.c \
diff --git a/tests/test-conntrack.c b/tests/test-conntrack.c
new file mode 100644
index 000..37c7277
--- /dev/null
+++ b/tests/test-conntrack.c
@@ -0,0 +1,172 @@
+/*
+ * Copyright (c) 2015 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include "conntrack.h"
+
+#include "dp-packet.h"
+#include "fatal-signal.h"
+#include "flow.h"
+#include "netdev.h"
+#include "ovs-thread.h"
+#include "ovstest.h"
+#include "timeval.h"
+
+static const char payload[] = "5054000a505400090800451c00"
+  "11a4cd0a0101010a010102000100020008";
+
+static struct dp_packet_batch *
+prepare_packets(size_t n, bool change, unsigned tid)
+{
+struct dp_packet_batch *pkt_batch = xzalloc(sizeof *pkt_batch);
+struct flow flow;
+size_t i;
+
+ovs_assert(n <= ARRAY_SIZE(pkt_batch->packets));
+
+dp_packet_batch_init(pkt_batch);
+pkt_batch->count = n;
+
+for (i = 0; i < n; i++) {
+struct udp_header *udp;
+struct dp_packet *pkt = dp_packet_new(sizeof payload/2);
+
+dp_packet_put_hex(pkt, payload, NULL);
+flow_extract(pkt, );
+
+udp = dp_packet_l4(pkt);
+udp->udp_src = htons(ntohs(udp->udp_src) + tid);
+
+if (change) {
+udp->udp_dst = htons(ntohs(udp->udp_dst) + i);
+}
+
+pkt_batch->packets[i] = pkt;
+}
+
+return pkt_batch;
+}
+
+static void
+destroy_packets(struct dp_packet_batch *pkt_batch)
+{
+dp_packet_delete_batch(pkt_batch, true);
+free(pkt_batch);
+}
+
+struct thread_aux {
+pthread_t thread;
+unsigned tid;
+};
+
+static struct conntrack ct;
+static unsigned long n_threads, n_pkts, batch_size;
+static bool change_conn = false;
+static struct ovs_barrier barrier;
+
+static void *
+ct_thread_main(void *aux_)
+{
+struct thread_aux *aux = aux_;
+struct dp_packet_batch *pkt_batch;
+size_t i;
+
+pkt_batch = prepare_packets(batch_size, change_conn, aux->tid);
+ovs_barrier_block();
+for (i = 0; i < n_pkts; i += batch_size) {
+conntrack_execute(, pkt_batch, true, 0, NULL, NULL, NULL);
+}
+ovs_barrier_block();
+destroy_packets(pkt_batch);
+
+return NULL;
+}
+
+static void
+test_benchmark(struct ovs_cmdl_context *ctx)
+{
+struct thread_aux *threads;
+long long start;
+unsigned i;
+
+fatal_signal_init();
+
+/* Parse arguments */
+n_threads = strtoul(ctx->argv[1], NULL, 0);
+if (!n_threads) {
+ovs_fatal(0, "n_threads must be at least one");
+}
+n_pkts = strtoul(ctx->argv[2], NULL, 0);
+batch_size = strtoul(ctx->argv[3], NULL, 0);
+if (batch_size == 0 || batch_size > NETDEV_MAX_BURST) {
+ovs_fatal(0, "batch_size must be between 1 and NETDEV_MAX_BURST(%u)",
+  NETDEV_MAX_BURST);
+}
+if (ctx->argc > 4) {
+change_conn = strtoul(ctx->argv[4], NULL, 0);
+}
+
+threads = xcalloc(n_threads, sizeof *threads);
+ovs_barrier_init(, n_t

[ovs-dev] [PATCH v5 16/16] conntrack: Add 'dl_type' parameter to conntrack_execute().

2016-07-26 Thread Daniele Di Proietto
Now that dpif_execute has a 'flow' member, it's pretty easy to access a
the flow (or the matching megaflow) in dp_execute_cb().

This means that's not necessary anymore for the connection tracker to
reextract 'dl_type' from the packet, it can be passed as a parameter.

This change means that we have to complicate sightly test-conntrack to
group the packets by dl_type before passing them to the connection
tracker.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Joe Stringer <j...@ovn.org>
---
 lib/conntrack.c| 47 ++---
 lib/conntrack.h|  3 ++-
 lib/dpif-netdev.c  | 21 ++--
 tests/test-conntrack.c | 52 +++---
 4 files changed, 77 insertions(+), 46 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 8e6c826..6ef9114 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -53,7 +53,8 @@ struct conn_lookup_ctx {
 };
 
 static bool conn_key_extract(struct conntrack *, struct dp_packet *,
- struct conn_lookup_ctx *, uint16_t zone);
+ ovs_be16 dl_type, struct conn_lookup_ctx *,
+ uint16_t zone);
 static uint32_t conn_key_hash(const struct conn_key *, uint32_t basis);
 static void conn_key_reverse(struct conn_key *);
 static void conn_key_lookup(struct conntrack_bucket *ctb,
@@ -265,7 +266,8 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
  * 'setlabel' behaves similarly for the connection label.*/
 int
 conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch,
-  bool commit, uint16_t zone, const uint32_t *setmark,
+  ovs_be16 dl_type, bool commit, uint16_t zone,
+  const uint32_t *setmark,
   const struct ovs_key_ct_labels *setlabel,
   const char *helper)
 {
@@ -299,7 +301,7 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
 for (i = 0; i < cnt; i++) {
 unsigned bucket;
 
-if (!conn_key_extract(ct, pkts[i], [i], zone)) {
+if (!conn_key_extract(ct, pkts[i], dl_type, [i], zone)) {
 write_ct_md(pkts[i], CS_INVALID, zone, 0, OVS_U128_ZERO);
 continue;
 }
@@ -917,7 +919,7 @@ extract_l4(struct conn_key *key, const void *data, size_t 
size, bool *related,
 }
 
 static bool
-conn_key_extract(struct conntrack *ct, struct dp_packet *pkt,
+conn_key_extract(struct conntrack *ct, struct dp_packet *pkt, ovs_be16 dl_type,
  struct conn_lookup_ctx *ctx, uint16_t zone)
 {
 const struct eth_header *l2 = dp_packet_l2(pkt);
@@ -941,43 +943,32 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt,
  *We already have the l3 and l4 headers' pointers.  Extracting
  *the l3 addresses and the l4 ports is really cheap, since they
  *can be found at fixed locations.
- * 2) To extract the l3 and l4 types.
- *Extracting the l3 and l4 types (especially the l3[1]) on the
- *other hand is quite expensive, because they're not at a
- *fixed location.
+ * 2) To extract the l4 type.
+ *Extracting the l4 types, for IPv6 can be quite expensive, because
+ *it's not at a fixed location.
  *
  * Here's a way to avoid (2) with the help of the datapath.
- * The datapath doesn't keep the packet's extracted flow[2], so
+ * The datapath doesn't keep the packet's extracted flow[1], so
  * using that is not an option.  We could use the packet's matching
- * megaflow for l3 type (it's always unwildcarded), and for l4 type
- * (we have to unwildcard it first).  This means either:
+ * megaflow, but we have to make sure that the l4 type (nw_proto)
+ * is unwildcarded.  This means either:
  *
- * a) dpif-netdev passes the matching megaflow to dp_execute_cb(), which
- *is used to extract the l3 type.  Unfortunately, dp_execute_cb() is
- *used also in dpif_netdev_execute(), which doesn't have a matching
- *megaflow.
+ * a) dpif-netdev unwildcards the l4 type when a new flow is installed
+ *if the actions contains ct().
  *
- * b) We define an alternative OVS_ACTION_ATTR_CT, used only by the
- *userspace datapath, which includes l3 (and l4) type.  The
- *alternative action could be generated by ofproto-dpif specifically
- *for the userspace datapath. Having a different interface for
- *userspace and kernel doesn't seem very clean, though.
+ * b) ofproto-dpif-xlate unwildcards the l4 type when translating a ct()
+ *action.  This is already done in different actions, but it's
+ *unnecessary for the kernel.
  *
  * ---
- * [1] A simple benchmark (running only the connection tracker
- * over and over on the same packets) shows that if the
- * l3 type is alre

[ovs-dev] [PATCH v5 13/16] system-tests: Run conntrack tests with userspace.

2016-07-26 Thread Daniele Di Proietto
The userspace connection tracker doesn't support ALGs, frag reassembly
or NAT yet, so skip those tests.

Also, connection tracking state input from a local port is not possible
in userspace.

The userspace datapath pads all frames with 0, to make them at
least 64 bytes.

Finally, the userspace datapath checks for the IPv4 header checksum, so
fix those in the hardcoded packets.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Flavio Leitner <f...@sysclose.org>
---
 tests/system-kmod-macros.at  | 28 +
 tests/system-ovn.at  | 10 +---
 tests/system-traffic.at  | 54 +---
 tests/system-userspace-macros.at | 45 ++---
 4 files changed, 116 insertions(+), 21 deletions(-)

diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index 2134db7..e1b5707 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -67,3 +67,31 @@ m4_define([CHECK_CONNTRACK],
  on_exit 'ovstest test-netlink-conntrack flush'
 ]
 )
+
+# CHECK_CONNTRACK_ALG()
+#
+# Perform requirements checks for running conntrack ALG tests. The kernel
+# supports ALG, so no check is needed.
+#
+m4_define([CHECK_CONNTRACK_ALG])
+
+# CHECK_CONNTRACK_FRAG()
+#
+# Perform requirements checks for running conntrack fragmentations tests.
+# The kernel always supports fragmentation, so no check is needed.
+m4_define([CHECK_CONNTRACK_FRAG])
+
+# CHECK_CONNTRACK_LOCAL_STACK()
+#
+# Perform requirements checks for running conntrack tests with local stack.
+# The kernel always supports reading the connection state of an skb coming
+# from an internal port, without an explicit ct() action, so no check is
+# needed.
+m4_define([CHECK_CONNTRACK_LOCAL_STACK])
+
+# CHECK_CONNTRACK_NAT()
+#
+# Perform requirements checks for running conntrack NAT tests. The kernel
+# always supports NAT, so no check is needed.
+#
+m4_define([CHECK_CONNTRACK_NAT])
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index 13f380f..c043f74 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -2,6 +2,7 @@ AT_SETUP([ovn -- 2 LRs connected via LS, gateway router, NAT])
 AT_KEYWORDS([ovnnat])
 
 CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
 ovn_start
 OVS_TRAFFIC_VSWITCHD_START()
 ADD_BR([br-int])
@@ -111,7 +112,7 @@ NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 30.0.0.2 
| FORMAT_PING], \
 # Check conntrack entries.
 AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.2) | \
 sed -e 's/zone=[[0-9]]*/zone=/'], [0], [dnl
-icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=),reply=(src=192.168.1.2,dst=172.16.1.2,id=),zone=
+icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=,type=0,code=0),zone=
 ])
 
 # South-North SNAT: 'bar1' pings 'alice1'. But 'alice1' receives traffic
@@ -124,7 +125,7 @@ NS_CHECK_EXEC([bar1], [ping -q -c 3 -i 0.3 -w 2 172.16.1.2 
| FORMAT_PING], \
 # We verify that SNAT indeed happened via 'dump-conntrack' command.
 AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.1) | \
 sed -e 's/zone=[[0-9]]*/zone=/'], [0], [dnl
-icmp,orig=(src=192.168.2.2,dst=172.16.1.2,id=),reply=(src=172.16.1.2,dst=30.0.0.1,id=),zone=
+icmp,orig=(src=192.168.2.2,dst=172.16.1.2,id=,type=8,code=0),reply=(src=172.16.1.2,dst=30.0.0.1,id=,type=0,code=0),zone=
 ])
 
 # Add static routes to handle east-west NAT.
@@ -143,14 +144,14 @@ NS_CHECK_EXEC([bar1], [ping -q -c 3 -i 0.3 -w 2 30.0.0.2 
| FORMAT_PING], \
 # 30.0.0.2 to R2, it hits the DNAT rule and converts 30.0.0.2 to 192.168.1.2
 AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.2) | \
 sed -e 's/zone=[[0-9]]*/zone=/'], [0], [dnl
-icmp,orig=(src=192.168.2.2,dst=30.0.0.2,id=),reply=(src=192.168.1.2,dst=192.168.2.2,id=),zone=
+icmp,orig=(src=192.168.2.2,dst=30.0.0.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=192.168.2.2,id=,type=0,code=0),zone=
 ])
 
 # As we have a SNAT rule that converts 192.168.2.2 to 30.0.0.1, the source is
 # SNATted and 'foo1' receives it.
 AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.1) | \
 sed -e 's/zone=[[0-9]]*/zone=/'], [0], [dnl
-icmp,orig=(src=192.168.2.2,dst=192.168.1.2,id=),reply=(src=192.168.1.2,dst=30.0.0.1,id=),zone=
+icmp,orig=(src=192.168.2.2,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=30.0.0.1,id=,type=0,code=0),zone=
 ])
 
 OVS_APP_EXIT_AND_WAIT([ovn-controller])
@@ -173,6 +174,7 @@ AT_SETUP([ovn -- load-balancing])
 AT_KEYWORDS([ovnlb])
 
 CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
 ovn_start
 OVS_TRAFFIC_VSWITCHD_START()
 ADD_BR([br-int])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index a337950..0b4b4b7 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -510,13 +510,13 @@ AT_CAPTURE_FILE([ofctl_monitor.log])
 AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir 
--pidfile 2> ofctl_monitor.log])
 
 dnl Send an u

[ovs-dev] [PATCH v5 03/16] flow: Introduce parse_dl_type().

2016-07-26 Thread Daniele Di Proietto
The function simply returns the ethernet type of the packet (after
eventually discarding the VLAN tag).  It will be used by a following
commit.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Flavio Leitner <f...@sysclose.org>
---
 lib/flow.c | 14 --
 lib/flow.h |  1 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/lib/flow.c b/lib/flow.c
index f94b1f2..8cf707b 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -328,7 +328,7 @@ parse_mpls(const void **datap, size_t *sizep)
 return MIN(count, FLOW_MAX_MPLS_LABELS);
 }
 
-static inline ovs_be16
+static inline ALWAYS_INLINE ovs_be16
 parse_vlan(const void **datap, size_t *sizep)
 {
 const struct eth_header *eth = *datap;
@@ -350,7 +350,7 @@ parse_vlan(const void **datap, size_t *sizep)
 return 0;
 }
 
-static inline ovs_be16
+static inline ALWAYS_INLINE ovs_be16
 parse_ethertype(const void **datap, size_t *sizep)
 {
 const struct llc_snap_header *llc;
@@ -827,6 +827,16 @@ miniflow_extract(struct dp_packet *packet, struct miniflow 
*dst)
 dst->map = mf.map;
 }
 
+ovs_be16
+parse_dl_type(const struct eth_header *data_, size_t size)
+{
+const void *data = data_;
+
+parse_vlan(, );
+
+return parse_ethertype(, );
+}
+
 /* For every bit of a field that is wildcarded in 'wildcards', sets the
  * corresponding bit in 'flow' to zero. */
 void
diff --git a/lib/flow.h b/lib/flow.h
index c041e8a..fd9c712 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -108,6 +108,7 @@ void flow_compose(struct dp_packet *, const struct flow *);
 
 bool parse_ipv6_ext_hdrs(const void **datap, size_t *sizep, uint8_t *nw_proto,
  uint8_t *nw_frag);
+ovs_be16 parse_dl_type(const struct eth_header *data_, size_t size);
 
 static inline uint64_t
 flow_get_xreg(const struct flow *flow, int idx)
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v5 14/16] system-tests: Add ping through conntrack test.

2016-07-26 Thread Daniele Di Proietto
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Joe Stringer <j...@ovn.org>
---
 tests/system-traffic.at | 84 +
 1 file changed, 84 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 0b4b4b7..fd8b918 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -608,6 +608,90 @@ NS_CHECK_EXEC([at_ns1], [wget http://[[fc00::1]] -t 3 -T 1 
-v -o wget1.log], [4]
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - IPv4 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from 
ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
+priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], 
[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=),reply=(src=10.1.1.2,dst=10.1.1.1,id=)
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], 
[0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - IPv6 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH(p1, at_ns1, br0, "fc00::2/96")
+
+AT_DATA([flows.txt], [dnl
+
+dnl ICMPv6 echo request and reply go to table 1.  The rest of the traffic goes
+dnl through normal action.
+table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
+table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
+table=0,priority=1,action=normal
+
+dnl Allow everything from ns0->ns1. Only allow return traffic from ns1->ns0.
+table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
+table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
+table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
+table=1,priority=1,action=drop
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 | FORMAT_PING], 
[0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 | FORMAT_PING], 
[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl
+icmpv6,orig=(src=fc00::1,dst=fc00::2,id=),reply=(src=fc00::2,dst=fc00::1,id=)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - commit, recirc])
 CHECK_CONNTRACK()
 OVS_TRAFFIC_VSWITCHD_START()
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v5 09/16] dpif-netdev: Implement conntrack dump functions.

2016-07-26 Thread Daniele Di Proietto
New functions are implemented in the conntrack module to support this.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Flavio Leitner <f...@sysclose.org>
---
 lib/conntrack-private.h |   3 ++
 lib/conntrack-tcp.c |  34 +
 lib/conntrack.c | 123 
 lib/conntrack.h |  15 ++
 lib/dpif-netdev.c   |  60 +--
 5 files changed, 232 insertions(+), 3 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 5aac938..df32525 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -22,6 +22,7 @@
 #include 
 
 #include "conntrack.h"
+#include "ct-dpif.h"
 #include "openvswitch/hmap.h"
 #include "openvswitch/list.h"
 #include "openvswitch/types.h"
@@ -76,6 +77,8 @@ struct ct_l4_proto {
   struct conntrack_bucket *,
   struct dp_packet *pkt, bool reply,
   long long now);
+void (*conn_get_protoinfo)(const struct conn *,
+   struct ct_dpif_protoinfo *);
 };
 
 extern struct ct_l4_proto ct_proto_tcp;
diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
index 7edcce3..ea22400 100644
--- a/lib/conntrack-tcp.c
+++ b/lib/conntrack-tcp.c
@@ -457,8 +457,42 @@ tcp_new_conn(struct conntrack_bucket *ctb, struct 
dp_packet *pkt,
 return >up;
 }
 
+static uint8_t
+tcp_peer_to_protoinfo_flags(const struct tcp_peer *peer)
+{
+uint8_t res = 0;
+
+if (peer->wscale & CT_WSCALE_FLAG) {
+res |= CT_DPIF_TCPF_WINDOW_SCALE;
+}
+
+if (peer->wscale & CT_WSCALE_UNKNOWN) {
+res |= CT_DPIF_TCPF_BE_LIBERAL;
+}
+
+return res;
+}
+
+static void
+tcp_conn_get_protoinfo(const struct conn *conn_,
+   struct ct_dpif_protoinfo *protoinfo)
+{
+const struct conn_tcp *conn = conn_tcp_cast(conn_);
+
+protoinfo->proto = IPPROTO_TCP;
+protoinfo->tcp.state_orig = conn->peer[0].state;
+protoinfo->tcp.state_reply = conn->peer[1].state;
+
+protoinfo->tcp.wscale_orig = conn->peer[0].wscale & CT_WSCALE_MASK;
+protoinfo->tcp.wscale_reply = conn->peer[1].wscale & CT_WSCALE_MASK;
+
+protoinfo->tcp.flags_orig = tcp_peer_to_protoinfo_flags(>peer[0]);
+protoinfo->tcp.flags_reply = tcp_peer_to_protoinfo_flags(>peer[1]);
+}
+
 struct ct_l4_proto ct_proto_tcp = {
 .new_conn = tcp_new_conn,
 .valid_new = tcp_valid_new,
 .conn_update = tcp_conn_update,
+.conn_get_protoinfo = tcp_conn_get_protoinfo,
 };
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 094a230..47214e1 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -26,6 +26,7 @@
 #include "conntrack-private.h"
 #include "coverage.h"
 #include "csum.h"
+#include "ct-dpif.h"
 #include "dp-packet.h"
 #include "flow.h"
 #include "netdev.h"
@@ -1050,3 +1051,125 @@ delete_conn(struct conn *conn)
 {
 free(conn);
 }
+
+static void
+ct_endpoint_to_ct_dpif_inet_addr(const struct ct_addr *a,
+ union ct_dpif_inet_addr *b,
+ ovs_be16 dl_type)
+{
+if (dl_type == htons(ETH_TYPE_IP)) {
+b->ip = a->ipv4_aligned;
+} else if (dl_type == htons(ETH_TYPE_IPV6)){
+b->in6 = a->ipv6_aligned;
+}
+}
+
+static void
+conn_key_to_tuple(const struct conn_key *key, struct ct_dpif_tuple *tuple)
+{
+if (key->dl_type == htons(ETH_TYPE_IP)) {
+tuple->l3_type = AF_INET;
+} else if (key->dl_type == htons(ETH_TYPE_IPV6)) {
+tuple->l3_type = AF_INET6;
+}
+tuple->ip_proto = key->nw_proto;
+ct_endpoint_to_ct_dpif_inet_addr(>src.addr, >src,
+ key->dl_type);
+ct_endpoint_to_ct_dpif_inet_addr(>dst.addr, >dst,
+ key->dl_type);
+
+if (key->nw_proto == IPPROTO_ICMP || key->nw_proto == IPPROTO_ICMPV6) {
+tuple->icmp_id = key->src.port;
+/* ICMP type and code are not tracked */
+tuple->icmp_type = 0;
+tuple->icmp_code = 0;
+} else {
+tuple->src_port = key->src.port;
+tuple->dst_port = key->dst.port;
+}
+}
+
+static void
+conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry,
+  long long now)
+{
+struct ct_l4_proto *class;
+long long expiration;
+memset(entry, 0, sizeof *entry);
+conn_key_to_tuple(>key, >tuple_orig);
+conn_key_to_tuple(>rev_key, >tuple_reply);
+
+entry->zone = conn->key.zone;
+entry->mark = conn->mark;
+
+memcpy(>labels, >label, sizeof(entry->labels));
+/* Not implemented yet */
+

[ovs-dev] [PATCH v5 11/16] flow: Generate checksum and udp_len in flow_compose().

2016-07-26 Thread Daniele Di Proietto
This is useful to test the connection tracker, which performs checksum
and udp length verification.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Joe Stringer <j...@ovn.org>
---
 lib/flow.c|  62 ++--
 tests/ofproto-dpif.at | 198 +-
 2 files changed, 153 insertions(+), 107 deletions(-)

diff --git a/lib/flow.c b/lib/flow.c
index 8cf707b..ba4f8c7 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -2238,6 +2238,7 @@ flow_compose_l4(struct dp_packet *p, const struct flow 
*flow)
 udp = dp_packet_put_zeros(p, l4_len);
 udp->udp_src = flow->tp_src;
 udp->udp_dst = flow->tp_dst;
+udp->udp_len = htons(l4_len);
 } else if (flow->nw_proto == IPPROTO_SCTP) {
 struct sctp_header *sctp;
 
@@ -2252,8 +2253,6 @@ flow_compose_l4(struct dp_packet *p, const struct flow 
*flow)
 icmp = dp_packet_put_zeros(p, l4_len);
 icmp->icmp_type = ntohs(flow->tp_src);
 icmp->icmp_code = ntohs(flow->tp_dst);
-/* Checksum has already been zeroed by put_zeros call. */
-icmp->icmp_csum = csum(icmp, ICMP_HEADER_LEN);
 } else if (flow->nw_proto == IPPROTO_IGMP) {
 struct igmp_header *igmp;
 
@@ -2262,8 +2261,6 @@ flow_compose_l4(struct dp_packet *p, const struct flow 
*flow)
 igmp->igmp_type = ntohs(flow->tp_src);
 igmp->igmp_code = ntohs(flow->tp_dst);
 put_16aligned_be32(>group, flow->igmp_group_ip4);
-/* Checksum has already been zeroed by put_zeros call. */
-igmp->igmp_csum = csum(igmp, IGMP_HEADER_LEN);
 } else if (flow->nw_proto == IPPROTO_ICMPV6) {
 struct icmp6_hdr *icmp;
 
@@ -2297,22 +2294,65 @@ flow_compose_l4(struct dp_packet *p, const struct flow 
*flow)
 nd_opt->nd_opt_mac = flow->arp_tha;
 }
 }
-icmp->icmp6_cksum = (OVS_FORCE uint16_t)
-csum(icmp, (char *)dp_packet_tail(p) - (char *)icmp);
 }
 }
 return l4_len;
 }
 
+static void
+flow_compose_l4_csum(struct dp_packet *p, const struct flow *flow,
+ uint32_t pseudo_hdr_csum)
+{
+size_t l4_len = (char *) dp_packet_tail(p) - (char *) dp_packet_l4(p);
+
+if (!(flow->nw_frag & FLOW_NW_FRAG_ANY)
+|| !(flow->nw_frag & FLOW_NW_FRAG_LATER)) {
+if (flow->nw_proto == IPPROTO_TCP) {
+struct tcp_header *tcp = dp_packet_l4(p);
+
+/* Checksum has already been zeroed by put_zeros call in
+ * flow_compose_l4(). */
+tcp->tcp_csum = csum_finish(csum_continue(pseudo_hdr_csum,
+  tcp, l4_len));
+} else if (flow->nw_proto == IPPROTO_UDP) {
+struct udp_header *udp = dp_packet_l4(p);
+
+/* Checksum has already been zeroed by put_zeros call in
+ * flow_compose_l4(). */
+udp->udp_csum = csum_finish(csum_continue(pseudo_hdr_csum,
+  udp, l4_len));
+} else if (flow->nw_proto == IPPROTO_ICMP) {
+struct icmp_header *icmp = dp_packet_l4(p);
+
+/* Checksum has already been zeroed by put_zeros call in
+ * flow_compose_l4(). */
+icmp->icmp_csum = csum(icmp, l4_len);
+} else if (flow->nw_proto == IPPROTO_IGMP) {
+struct igmp_header *igmp = dp_packet_l4(p);
+
+/* Checksum has already been zeroed by put_zeros call in
+ * flow_compose_l4(). */
+igmp->igmp_csum = csum(igmp, l4_len);
+} else if (flow->nw_proto == IPPROTO_ICMPV6) {
+struct icmp6_hdr *icmp = dp_packet_l4(p);
+
+/* Checksum has already been zeroed by put_zeros call in
+ * flow_compose_l4(). */
+icmp->icmp6_cksum = (OVS_FORCE uint16_t)
+csum_finish(csum_continue(pseudo_hdr_csum, icmp, l4_len));
+}
+}
+}
+
 /* Puts into 'b' a packet that flow_extract() would parse as having the given
  * 'flow'.
  *
  * (This is useful only for testing, obviously, and the packet isn't really
- * valid. It hasn't got some checksums filled in, for one, and lots of fields
- * are just zeroed.) */
+ * valid.  Lots of fields are just zeroed.) */
 void
 flow_compose(struct dp_packet *p, const struct flow *flow)
 {
+uint32_t pseudo_hdr_csum;
 size_t l4_len;
 
 /* eth_compose() sets l3 pointer and makes sure it is 32-bit aligned. */
@@ -2353,6 +2393,9 @@ flow_compose(struct dp_packet *p, const struct flow *flow)
 ip->ip_tot_len = htons(p->l4_ofs - p->l3_ofs + l4_len);
 /* Checksum has already been zeroed by put_zeros call. */
 ip->ip_csum = csum(ip, sizeof *ip);

[ovs-dev] [PATCH v5 10/16] dpif-netdev: Implement conntrack flush interface.

2016-07-26 Thread Daniele Di Proietto
New functions are implemented in the conntrack module to support this.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Flavio Leitner <f...@sysclose.org>
---
 lib/conntrack.c   | 23 +++
 lib/conntrack.h   |  2 ++
 lib/dpif-netdev.c | 10 +-
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 47214e1..15a9582 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1173,3 +1173,26 @@ conntrack_dump_done(struct conntrack_dump *dump 
OVS_UNUSED)
 {
 return 0;
 }
+
+int
+conntrack_flush(struct conntrack *ct, const uint16_t *zone)
+{
+unsigned i;
+
+for (i = 0; i < CONNTRACK_BUCKETS; i++) {
+struct conn *conn, *next;
+
+ct_lock_lock(>buckets[i].lock);
+HMAP_FOR_EACH_SAFE(conn, next, node, >buckets[i].connections) {
+if (!zone || *zone == conn->key.zone) {
+ovs_list_remove(>exp_node);
+hmap_remove(>buckets[i].connections, >node);
+atomic_count_dec(>n_conn);
+delete_conn(conn);
+}
+}
+ct_lock_unlock(>buckets[i].lock);
+}
+
+return 0;
+}
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 2f0680e..8802d35 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -83,6 +83,8 @@ int conntrack_dump_start(struct conntrack *, struct 
conntrack_dump *,
  const uint16_t *pzone);
 int conntrack_dump_next(struct conntrack_dump *, struct ct_dpif_entry *);
 int conntrack_dump_done(struct conntrack_dump *);
+
+int conntrack_flush(struct conntrack *, const uint16_t *zone);
 
 /* 'struct ct_lock' is a wrapper for an adaptive mutex.  It's useful to try
  * different types of locks (e.g. spinlocks) */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 48861a2..5793995 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4309,6 +4309,14 @@ dpif_netdev_ct_dump_done(struct dpif *dpif OVS_UNUSED,
 return err;
 }
 
+static int
+dpif_netdev_ct_flush(struct dpif *dpif, const uint16_t *zone)
+{
+struct dp_netdev *dp = get_dp_netdev(dpif);
+
+return conntrack_flush(>conntrack, zone);
+}
+
 const struct dpif_class dpif_netdev_class = {
 "netdev",
 dpif_netdev_init,
@@ -4352,7 +4360,7 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_ct_dump_start,
 dpif_netdev_ct_dump_next,
 dpif_netdev_ct_dump_done,
-NULL,   /* ct_flush */
+dpif_netdev_ct_flush,
 };
 
 static void
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v5 08/16] dpif-netdev: Execute conntrack action.

2016-07-26 Thread Daniele Di Proietto
This commit implements the OVS_ACTION_ATTR_CT action in dpif-netdev.

To allow ofproto-dpif to detect the conntrack feature, flow_put will not
discard anymore flows with ct_* fields set. We still shouldn't allow
flows with NAT bits set, since there is no support for NAT.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Flavio Leitner <f...@sysclose.org>
Acked-by: Antonio Fischetti <antonio.fische...@intel.com>
---
 FAQ.md|  2 +-
 NEWS  |  2 ++
 lib/dpif-netdev.c | 63 ---
 tests/dpif-netdev.at  | 16 ++---
 tests/ofproto-dpif.at | 24 ++--
 tests/pmd.at  |  2 +-
 6 files changed, 79 insertions(+), 30 deletions(-)

diff --git a/FAQ.md b/FAQ.md
index 35e1cac..cec420b 100644
--- a/FAQ.md
+++ b/FAQ.md
@@ -193,7 +193,7 @@ A: Open vSwitch supports different datapaths on different 
platforms.  Each
 Feature   | Linux upstream | Linux OVS tree | Userspace | Hyper-V |
 --|:--:|:--:|:-:|:---:|
 NAT   |  4.6   |   YES  |NO |   NO|
-Connection tracking   |  4.3   |   YES  |NO | PARTIAL |
+Connection tracking   |  4.3   |   YES  |  PARTIAL  | PARTIAL |
 Tunnel - LISP |  NO|   YES  |NO |   NO|
 Tunnel - STT  |  NO|   YES  |NO |   YES   |
 Tunnel - GRE  |  3.11  |   YES  |YES|   YES   |
diff --git a/NEWS b/NEWS
index 73d3fcf..39157b8 100644
--- a/NEWS
+++ b/NEWS
@@ -59,6 +59,8 @@ Post-v2.5.0
  * PMD threads servicing vHost User ports can now come from the NUMA
node that device memory is located on if CONFIG_RTE_LIBRTE_VHOST_NUMA
is enabled in DPDK.
+ * Basic connection tracking for the userspace datapath (no ALG,
+   fragmentation or NAT support yet)
- Increase number of registers to 16.
- ovs-benchmark: This utility has been removed due to lack of use and
  bitrot.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f05ca4e..c928ffe 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -33,6 +33,7 @@
 
 #include "bitmap.h"
 #include "cmap.h"
+#include "conntrack.h"
 #include "coverage.h"
 #include "csum.h"
 #include "dp-packet.h"
@@ -89,9 +90,17 @@ static struct shash dp_netdevs 
OVS_GUARDED_BY(dp_netdev_mutex)
 
 static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600);
 
+#define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \
+ | CS_INVALID | CS_REPLY_DIR | CS_TRACKED)
+#define DP_NETDEV_CS_UNSUPPORTED_MASK (~(uint32_t)DP_NETDEV_CS_SUPPORTED_MASK)
+
 static struct odp_support dp_netdev_support = {
 .max_mpls_depth = SIZE_MAX,
 .recirc = true,
+.ct_state = true,
+.ct_zone = true,
+.ct_mark = true,
+.ct_label = true,
 };
 
 /* Stores a miniflow with inline values */
@@ -228,6 +237,8 @@ struct dp_netdev {
 char *pmd_cmask;
 
 uint64_t last_tnl_conf_seq;
+
+struct conntrack conntrack;
 };
 
 static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp,
@@ -934,6 +945,8 @@ create_dp_netdev(const char *name, const struct dpif_class 
*class,
 dp->upcall_aux = NULL;
 dp->upcall_cb = NULL;
 
+conntrack_init(>conntrack);
+
 cmap_init(>poll_threads);
 ovs_mutex_init_recursive(>non_pmd_mutex);
 ovsthread_key_create(>per_pmd_key, NULL);
@@ -1004,6 +1017,8 @@ dp_netdev_free(struct dp_netdev *dp)
 ovs_mutex_destroy(>non_pmd_mutex);
 ovsthread_key_delete(dp->per_pmd_key);
 
+conntrack_destroy(>conntrack);
+
 ovs_mutex_lock(>port_mutex);
 HMAP_FOR_EACH_SAFE (port, next, node, >ports) {
 do_del_port(dp, port);
@@ -2030,9 +2045,7 @@ dpif_netdev_flow_from_nlattrs(const struct nlattr *key, 
uint32_t key_len,
 return EINVAL;
 }
 
-/* Userspace datapath doesn't support conntrack. */
-if (flow->ct_state || flow->ct_zone || flow->ct_mark
-|| !ovs_u128_is_zero(flow->ct_label)) {
+if (flow->ct_state & DP_NETDEV_CS_UNSUPPORTED_MASK) {
 return EINVAL;
 }
 
@@ -4172,12 +4185,46 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 VLOG_WARN("Packet dropped. Max recirculation depth exceeded.");
 break;
 
-case OVS_ACTION_ATTR_CT:
-/* If a flow with this action is slow-pathed, datapath assistance is
- * required to implement it. However, we don't support this action
- * in the userspace datapath. */
-VLOG_WARN("Cannot execute conntrack action in userspace.");
+case OVS_ACTION_ATTR_CT: {
+const struct nlattr *b;
+bool commit = false;
+unsigned int left;
+   

[ovs-dev] [PATCH v5 05/16] conntrack: Periodically delete expired connections.

2016-07-26 Thread Daniele Di Proietto
This commit adds a thread that periodically removes expired connections.

The expiration time of a connection can be expressed by:

expiration = now + timeout

For each possible 'timeout' value (there aren't many) we keep a list.
When the expiration is updated, we move the connection to the back of the
corresponding 'timeout' list. This ways, the list is always ordered by
'expiration'.

When the cleanup thread iterates through the lists for expired
connections, it can stop at the first non expired connection.

Suggested-by: Joe Stringer <j...@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/conntrack-other.c   |  11 +--
 lib/conntrack-private.h |  21 --
 lib/conntrack-tcp.c |  20 +++---
 lib/conntrack.c | 186 
 lib/conntrack.h |  36 +-
 5 files changed, 243 insertions(+), 31 deletions(-)

diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c
index 295cb2c..2920889 100644
--- a/lib/conntrack-other.c
+++ b/lib/conntrack-other.c
@@ -43,8 +43,8 @@ conn_other_cast(const struct conn *conn)
 }
 
 static enum ct_update_res
-other_conn_update(struct conn *conn_, struct dp_packet *pkt OVS_UNUSED,
-  bool reply, long long now)
+other_conn_update(struct conn *conn_, struct conntrack_bucket *ctb,
+  struct dp_packet *pkt OVS_UNUSED, bool reply, long long now)
 {
 struct conn_other *conn = conn_other_cast(conn_);
 
@@ -54,7 +54,7 @@ other_conn_update(struct conn *conn_, struct dp_packet *pkt 
OVS_UNUSED,
 conn->state = OTHERS_MULTIPLE;
 }
 
-update_expiration(conn_, other_timeouts[conn->state], now);
+conn_update_expiration(ctb, >up, other_timeouts[conn->state], now);
 
 return CT_UPDATE_VALID;
 }
@@ -66,14 +66,15 @@ other_valid_new(struct dp_packet *pkt OVS_UNUSED)
 }
 
 static struct conn *
-other_new_conn(struct dp_packet *pkt OVS_UNUSED, long long now)
+other_new_conn(struct conntrack_bucket *ctb, struct dp_packet *pkt OVS_UNUSED,
+   long long now)
 {
 struct conn_other *conn;
 
 conn = xzalloc(sizeof *conn);
 conn->state = OTHERS_FIRST;
 
-update_expiration(>up, other_timeouts[conn->state], now);
+conn_init_expiration(ctb, >up, other_timeouts[conn->state], now);
 
 return >up;
 }
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index bc32448..5aac938 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -69,10 +69,13 @@ enum ct_update_res {
 };
 
 struct ct_l4_proto {
-struct conn *(*new_conn)(struct dp_packet *pkt, long long now);
+struct conn *(*new_conn)(struct conntrack_bucket *, struct dp_packet *pkt,
+ long long now);
 bool (*valid_new)(struct dp_packet *pkt);
-enum ct_update_res (*conn_update)(struct conn *conn, struct dp_packet *pkt,
-  bool reply, long long now);
+enum ct_update_res (*conn_update)(struct conn *conn,
+  struct conntrack_bucket *,
+  struct dp_packet *pkt, bool reply,
+  long long now);
 };
 
 extern struct ct_l4_proto ct_proto_tcp;
@@ -81,9 +84,19 @@ extern struct ct_l4_proto ct_proto_other;
 extern long long ct_timeout_val[];
 
 static inline void
-update_expiration(struct conn *conn, enum ct_timeout tm, long long now)
+conn_init_expiration(struct conntrack_bucket *ctb, struct conn *conn,
+enum ct_timeout tm, long long now)
 {
 conn->expiration = now + ct_timeout_val[tm];
+ovs_list_push_back(>exp_lists[tm], >exp_node);
+}
+
+static inline void
+conn_update_expiration(struct conntrack_bucket *ctb, struct conn *conn,
+   enum ct_timeout tm, long long now)
+{
+ovs_list_remove(>exp_node);
+conn_init_expiration(ctb, conn, tm, now);
 }
 
 #endif /* conntrack-private.h */
diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
index 6da798d..7edcce3 100644
--- a/lib/conntrack-tcp.c
+++ b/lib/conntrack-tcp.c
@@ -152,8 +152,8 @@ tcp_payload_length(struct dp_packet *pkt)
 }
 
 static enum ct_update_res
-tcp_conn_update(struct conn* conn_, struct dp_packet *pkt, bool reply,
-long long now)
+tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb,
+struct dp_packet *pkt, bool reply, long long now)
 {
 struct conn_tcp *conn = conn_tcp_cast(conn_);
 struct tcp_header *tcp = dp_packet_l4(pkt);
@@ -319,18 +319,18 @@ tcp_conn_update(struct conn* conn_, struct dp_packet 
*pkt, bool reply,
 
 if (src->state >= CT_DPIF_TCPS_FIN_WAIT_2
 && dst->state >= CT_DPIF_TCPS_FIN_WAIT_2) {
-update_expiration(conn_, CT_TM_TCP_CLOSED, now);
+conn_update_expiration(ctb, >up, CT_TM_TCP_CLOSED, now);
 } else if (src->state >= CT_DPIF_TCPS_CLOSING
  

[ovs-dev] [PATCH v5 02/16] flow: Export parse_ipv6_ext_hdrs().

2016-07-26 Thread Daniele Di Proietto
This will be used by a future commit.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Flavio Leitner <f...@sysclose.org>
---
 lib/flow.c | 140 ++---
 lib/flow.h |   3 ++
 2 files changed, 81 insertions(+), 62 deletions(-)

diff --git a/lib/flow.c b/lib/flow.c
index 5775127..f94b1f2 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -440,6 +440,82 @@ invalid:
 arp_buf[1] = eth_addr_zero;
 }
 
+static inline bool
+parse_ipv6_ext_hdrs__(const void **datap, size_t *sizep, uint8_t *nw_proto,
+  uint8_t *nw_frag)
+{
+while (1) {
+if (OVS_LIKELY((*nw_proto != IPPROTO_HOPOPTS)
+   && (*nw_proto != IPPROTO_ROUTING)
+   && (*nw_proto != IPPROTO_DSTOPTS)
+   && (*nw_proto != IPPROTO_AH)
+   && (*nw_proto != IPPROTO_FRAGMENT))) {
+/* It's either a terminal header (e.g., TCP, UDP) or one we
+ * don't understand.  In either case, we're done with the
+ * packet, so use it to fill in 'nw_proto'. */
+return true;
+}
+
+/* We only verify that at least 8 bytes of the next header are
+ * available, but many of these headers are longer.  Ensure that
+ * accesses within the extension header are within those first 8
+ * bytes. All extension headers are required to be at least 8
+ * bytes. */
+if (OVS_UNLIKELY(*sizep < 8)) {
+return false;
+}
+
+if ((*nw_proto == IPPROTO_HOPOPTS)
+|| (*nw_proto == IPPROTO_ROUTING)
+|| (*nw_proto == IPPROTO_DSTOPTS)) {
+/* These headers, while different, have the fields we care
+ * about in the same location and with the same
+ * interpretation. */
+const struct ip6_ext *ext_hdr = *datap;
+*nw_proto = ext_hdr->ip6e_nxt;
+if (OVS_UNLIKELY(!data_try_pull(datap, sizep,
+(ext_hdr->ip6e_len + 1) * 8))) {
+return false;
+}
+} else if (*nw_proto == IPPROTO_AH) {
+/* A standard AH definition isn't available, but the fields
+ * we care about are in the same location as the generic
+ * option header--only the header length is calculated
+ * differently. */
+const struct ip6_ext *ext_hdr = *datap;
+*nw_proto = ext_hdr->ip6e_nxt;
+if (OVS_UNLIKELY(!data_try_pull(datap, sizep,
+(ext_hdr->ip6e_len + 2) * 4))) {
+return false;
+}
+} else if (*nw_proto == IPPROTO_FRAGMENT) {
+const struct ovs_16aligned_ip6_frag *frag_hdr = *datap;
+
+*nw_proto = frag_hdr->ip6f_nxt;
+if (!data_try_pull(datap, sizep, sizeof *frag_hdr)) {
+return false;
+}
+
+/* We only process the first fragment. */
+if (frag_hdr->ip6f_offlg != htons(0)) {
+*nw_frag = FLOW_NW_FRAG_ANY;
+if ((frag_hdr->ip6f_offlg & IP6F_OFF_MASK) != htons(0)) {
+*nw_frag |= FLOW_NW_FRAG_LATER;
+*nw_proto = IPPROTO_FRAGMENT;
+return true;
+}
+}
+}
+}
+}
+
+bool
+parse_ipv6_ext_hdrs(const void **datap, size_t *sizep, uint8_t *nw_proto,
+uint8_t *nw_frag)
+{
+return parse_ipv6_ext_hdrs__(datap, sizep, nw_proto, nw_frag);
+}
+
 /* Initializes 'flow' members from 'packet' and 'md'
  *
  * Initializes 'packet' header l2 pointer to the start of the Ethernet
@@ -642,68 +718,8 @@ miniflow_extract(struct dp_packet *packet, struct miniflow 
*dst)
 nw_ttl = nh->ip6_hlim;
 nw_proto = nh->ip6_nxt;
 
-while (1) {
-if (OVS_LIKELY((nw_proto != IPPROTO_HOPOPTS)
-   && (nw_proto != IPPROTO_ROUTING)
-   && (nw_proto != IPPROTO_DSTOPTS)
-   && (nw_proto != IPPROTO_AH)
-   && (nw_proto != IPPROTO_FRAGMENT))) {
-/* It's either a terminal header (e.g., TCP, UDP) or one we
- * don't understand.  In either case, we're done with the
- * packet, so use it to fill in 'nw_proto'. */
-break;
-}
-
-/* We only verify that at least 8 bytes of the next header are
- * available, but many of these headers are longer.  Ensure that
- * accesses within the extension header are within those first 8
- * bytes. All extension headers are required to be at least 8
- * bytes. */
-if 

[ovs-dev] [PATCH v5 07/16] tests: Add test-conntrack pcap test.

2016-07-26 Thread Daniele Di Proietto
Simple program that runs the packet in a pcap file through the
connection tracker and prints the 'ct_state' for each packet.

E.g. the line:

`./test/ovstest test-conntrack capture.pcap 2`

sends the packets in `capture.pcap` to the connection tracker, 2 per
call.

Useful for debugging.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Flavio Leitner <f...@sysclose.org>
---
 tests/test-conntrack.c | 73 ++
 1 file changed, 73 insertions(+)

diff --git a/tests/test-conntrack.c b/tests/test-conntrack.c
index 37c7277..0ff70d1 100644
--- a/tests/test-conntrack.c
+++ b/tests/test-conntrack.c
@@ -23,6 +23,7 @@
 #include "netdev.h"
 #include "ovs-thread.h"
 #include "ovstest.h"
+#include "pcap-file.h"
 #include "timeval.h"
 
 static const char payload[] = "5054000a505400090800451c00"
@@ -145,6 +146,74 @@ test_benchmark(struct ovs_cmdl_context *ctx)
 ovs_barrier_destroy();
 free(threads);
 }
+
+static void
+test_pcap(struct ovs_cmdl_context *ctx)
+{
+size_t total_count, i, batch_size;
+FILE *pcap;
+int err;
+
+pcap = ovs_pcap_open(ctx->argv[1], "rb");
+if (!pcap) {
+return;
+}
+
+batch_size = 1;
+if (ctx->argc > 2) {
+batch_size = strtoul(ctx->argv[2], NULL, 0);
+if (batch_size == 0 || batch_size > NETDEV_MAX_BURST) {
+ovs_fatal(0,
+  "batch_size must be between 1 and NETDEV_MAX_BURST(%u)",
+  NETDEV_MAX_BURST);
+}
+}
+
+fatal_signal_init();
+
+conntrack_init();
+total_count = 0;
+for (;;) {
+struct dp_packet_batch pkt_batch;
+
+dp_packet_batch_init(_batch);
+
+for (i = 0; i < batch_size; i++) {
+struct flow dummy_flow;
+
+err = ovs_pcap_read(pcap, _batch.packets[i], NULL);
+if (err) {
+break;
+}
+flow_extract(pkt_batch.packets[i], _flow);
+}
+
+pkt_batch.count = i;
+if (pkt_batch.count == 0) {
+break;
+}
+
+conntrack_execute(, _batch, true, 0, NULL, NULL, NULL);
+
+for (i = 0; i < pkt_batch.count; i++) {
+struct ds ds = DS_EMPTY_INITIALIZER;
+struct dp_packet *pkt = pkt_batch.packets[i];
+
+total_count++;
+
+format_flags(, ct_state_to_string, pkt->md.ct_state, '|');
+printf("%"PRIuSIZE": %s\n", total_count, ds_cstr());
+
+ds_destroy();
+}
+
+dp_packet_delete_batch(_batch, true);
+if (err) {
+break;
+}
+}
+conntrack_destroy();
+}
 
 static const struct ovs_cmdl_command commands[] = {
 /* Connection tracker tests. */
@@ -154,6 +223,10 @@ static const struct ovs_cmdl_command commands[] = {
  * destination port */
 {"benchmark", "n_threads n_pkts batch_size [change_connection]", 3, 4,
  test_benchmark},
+/* Reads packets from 'file' and sends them to the connection tracker,
+ * 'batch_size' (1 by default) per call, with the commit flag set.
+ * Prints the ct_state of each packet. */
+{"pcap", "file [batch_size]", 1, 2, test_pcap},
 
 {NULL, NULL, 0, 0, NULL},
 };
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v5 01/16] packets: Define ICMP types.

2016-07-26 Thread Daniele Di Proietto
Linux and FreeBSD have slightly different names for these constants.
Windows doesn't define them.  It is simpler to redefine them from
scratch for OVS.  The new names are different than those used in Linux
and FreeBSD.

These definitions will be used by a future commit.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Flavio Leitner <f...@sysclose.org>
Acked-by: Ryan Moats <rmo...@us.ibm.com>
---
 lib/packets.h | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/lib/packets.h b/lib/packets.h
index 5fd1e51..6ab235a 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -611,9 +611,21 @@ struct ip_header {
 ovs_16aligned_be32 ip_src;
 ovs_16aligned_be32 ip_dst;
 };
-
 BUILD_ASSERT_DECL(IP_HEADER_LEN == sizeof(struct ip_header));
 
+/* ICMPv4 types. */
+#define ICMP4_ECHO_REPLY 0
+#define ICMP4_DST_UNREACH 3
+#define ICMP4_SOURCEQUENCH 4
+#define ICMP4_REDIRECT 5
+#define ICMP4_ECHO_REQUEST 8
+#define ICMP4_TIME_EXCEEDED 11
+#define ICMP4_PARAM_PROB 12
+#define ICMP4_TIMESTAMP 13
+#define ICMP4_TIMESTAMPREPLY 14
+#define ICMP4_INFOREQUEST 15
+#define ICMP4_INFOREPLY 16
+
 #define ICMP_HEADER_LEN 8
 struct icmp_header {
 uint8_t icmp_type;
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v5 00/16] Userspace (DPDK) connection tracker

2016-07-26 Thread Daniele Di Proietto
This series aims to implement the ct() action for the dpif-netdev datapath.
The bulk of the code is in the new conntrack module: it contains some packet
parsing code, some lookup tables and the logic to implements all the ct bits.

The conntrack module is helped by conntrack-tcp, for TCP window and flags
tracking: the bulk of the code of this submodule is from the FreeBSD's pf
subsystem, therefore is BSD licensed.

The rest of the series integrates the connection tracker with the rest of
OVS: the ct() action is implemented in dpif-netdev, and the debugging
interfaces required by dpctl/{dump,flush}-conntrack are implemented.

Besides adding some unit tests, this series ports the existing conntrack
system test to the userspace datapath.  Some small modifications are
required to pass the testsuite, and some tests still have to be skipped.

This can also be downloaded at:

https://github.com/ddiproietto/ovs/tree/userconntrack_20160726

Any feedback is appreciated, thanks.

v4 -> v5:
* Rebase: hmap.h is moved, include ct_* field in some unit tests,
  skip and adapt to the new ct dump format the OVN tests.
* Style and typo fixes.
* Add coverage counter to detect long cleanup.
* Use ovs_barrier instead of pthread_barrier in test (fix compilation
  on OS X).
* Fix dumping tcp state in the reply direction.
* Squash together flow_compose improvements (checksum and udp_len).

v3 -> v4:
* Rebase: use struct dp_packet_batch, add extra ct_ fields in some
  new tests, use struct hmap_pos, skip some new system NAT tests.
* Style and typo fixes.
* Add OVS_NOT_REACHED() in switch in process_one().
* New commit: use dl_type from flow or matching megaflow.

v2 -> v3:
* Rebased.
* Squashed commits for flushing (in dpif-netdev and conntrack).
* Squashed commits for dumping (in dpif-netdev and conntrack).
* Use adaptive mutex instead of spinlock: this prevents livelock
  if the cleanup thread is executed on the same CPU as a forwarding
  thread.  Performance impact in minimal.
* Validate L3 and L4 checksum.
* Use proper L3 and L4 checksum in hardcoded packets in system and unit
  tests.
* Consider ICMPv6 as well as ICMP in l4_protos and conn_key_to_tuple.
* Mention conntrack in NEWS and FAQ.md.
* Use uint16_t for ct_state.
* Fix possible NULL dereference for conn in process_one().
* Add OVS_U128_MIN, OVS_U128_ZERO.
* Use HMAP_FOR_EACH_POP.
* Check that UDP length is valid.
* Style fix: prefer 'sizeof *object' instead of 'sizeof type'
* Don't accept packets from/to UDP/TCP port 0.
* Use defines for timeouts.
* Check expiration inside lookup loop in conn_key_lookup().
* Limit the number of connections.
* Simplify case if tcp_get_wscale().
* Introduce general INT_MOD_* macros for comparisons in modular arithmetic.
* Improve comments.
* New cleanup mechanism: we keep connections in an ordered list and we have
  a separate thread to performs the cleanup.  This doesn't block the main
  thread for long intervals anymore.
* Correctly fill UDP length and UDP/TCP/ICMP checksums in flow_compose():
  it's useful to write testcases for the connection tracker.
* Added system test with ICMP traffic through the connection tracker.
* Track ICMP type and code.

v1 -> v2:
* Fixed bug in tcp_get_wscale(), related to TCP options parsing.
* Changed names of ICMP constants: now they're different from Linux and
  FreeBSD.
* Fixed bug in parse_ipv6_ext_hdrs().
* Used ALWAYS_INLINE in parse_vlan and parse_ethertype, to avoid a
  performance regression in miniflow_extract().
* Updated copyright info in COPYING and debian/copyright.in.
* Rebased.
* Changed batching strategy in conntrack_execute() to allow a newly
  created connection to be picked up by packets in the same batch.
* Added an ovs-test module to throw pcap files at the connection tracker.
* Added a workaround for the userspace testsuite on new kernels and a tcp
  non-conntrack test.



Daniele Di Proietto (16):
  packets: Define ICMP types.
  flow: Export parse_ipv6_ext_hdrs().
  flow: Introduce parse_dl_type().
  conntrack: New userspace connection tracker.
  conntrack: Periodically delete expired connections.
  tests: Add very simple conntrack benchmark.
  tests: Add test-conntrack pcap test.
  dpif-netdev: Execute conntrack action.
  dpif-netdev: Implement conntrack dump functions.
  dpif-netdev: Implement conntrack flush interface.
  flow: Generate checksum and udp_len in flow_compose().
  tests: Add conntrack ofproto-dpif tests.
  system-tests: Run conntrack tests with userspace.
  system-tests: Add ping through conntrack test.
  conntrack: Track ICMP type and code.
  conntrack: Add 'dl_type' parameter to conntrack_execute().

 COPYING  |1 +
 FAQ.md   |2 +-
 NEWS |2 +
 debian/copyright.in  |4 +
 include/openvswitch/types.h  |4 +
 lib/automake.mk  |6 +
 lib/conntrack-icmp.c |  105 
 lib/conntrack-other.c|   86 +++
 l

Re: [ovs-dev] [PATCH v4] dpif-netdev: XPS (Transmit Packet Steering) implementation.

2016-07-26 Thread Daniele Di Proietto
Thanks for the patch

I think the caller of dp_netdev_execute_actions() should always pass a valid 
timestamp.  We can pass it from aux->now to dp_execute_userspace_actions(), we 
can add it to fast_path_processing() so that it can be passed down to 
handle_packet_upcall().  In the other cases it's fine to call time_msec(), 
we're in the slow path anyway.

One more thing: I think we should avoid XPS entirely if there are enough txqs, 
to avoid any possible locks and even writing tx->last_used.


Thanks,

Daniele

On 13/07/2016 05:34, "Ilya Maximets"  wrote:

>If CPU number in pmd-cpu-mask is not divisible by the number of queues and
>in a few more complex situations there may be unfair distribution of TX
>queue-ids between PMD threads.
>
>For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
>such distribution is possible:
><>
>pmd thread numa_id 0 core_id 13:
>port: vhost-user1   queue-id: 1
>port: dpdk0 queue-id: 3
>pmd thread numa_id 0 core_id 14:
>port: vhost-user1   queue-id: 2
>pmd thread numa_id 0 core_id 16:
>port: dpdk0 queue-id: 0
>pmd thread numa_id 0 core_id 17:
>port: dpdk0 queue-id: 1
>pmd thread numa_id 0 core_id 12:
>port: vhost-user1   queue-id: 0
>port: dpdk0 queue-id: 2
>pmd thread numa_id 0 core_id 15:
>port: vhost-user1   queue-id: 3
><>
>
>As we can see above dpdk0 port polled by threads on cores:
>   12, 13, 16 and 17.
>
>By design of dpif-netdev, there is only one TX queue-id assigned to each
>pmd thread. This queue-id's are sequential similar to core-id's. And
>thread will send packets to queue with exact this queue-id regardless
>of port.
>
>In previous example:
>
>   pmd thread on core 12 will send packets to tx queue 0
>   pmd thread on core 13 will send packets to tx queue 1
>   ...
>   pmd thread on core 17 will send packets to tx queue 5
>
>So, for dpdk0 port after truncating in netdev-dpdk:
>
>   core 12 --> TX queue-id 0 % 4 == 0
>   core 13 --> TX queue-id 1 % 4 == 1
>   core 16 --> TX queue-id 4 % 4 == 0
>   core 17 --> TX queue-id 5 % 4 == 1
>
>As a result only 2 of 4 queues used.
>
>To fix this issue some kind of XPS implemented in following way:
>
>   * TX queue-ids are allocated dynamically.
>   * When PMD thread first time tries to send packets to new port
> it allocates less used TX queue for this port.
>   * PMD threads periodically performes revalidation of
> allocated TX queue-ids. If queue wasn't used in last
> XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.
>
>Reported-by: Zhihong Wang 
>Signed-off-by: Ilya Maximets 
>---
> lib/dpif-netdev.c | 170 +-
> 1 file changed, 117 insertions(+), 53 deletions(-)
>
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index e0107b7..6345944 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -248,6 +248,8 @@ enum pmd_cycles_counter_type {
> PMD_N_CYCLES
> };
> 
>+#define XPS_TIMEOUT_MS 500LL
>+
> /* A port in a netdev-based datapath. */
> struct dp_netdev_port {
> odp_port_t port_no;
>@@ -256,6 +258,8 @@ struct dp_netdev_port {
> struct netdev_saved_flags *sf;
> unsigned n_rxq; /* Number of elements in 'rxq' */
> struct netdev_rxq **rxq;
>+unsigned *txq_used; /* Number of threads that uses each tx queue. 
>*/
>+struct ovs_mutex txq_used_mutex;
> char *type; /* Port type as requested by user. */
> };
> 
>@@ -384,8 +388,9 @@ struct rxq_poll {
> 
> /* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */
> struct tx_port {
>-odp_port_t port_no;
>-struct netdev *netdev;
>+struct dp_netdev_port *port;
>+int qid;
>+long long last_used;
> struct hmap_node node;
> };
> 
>@@ -498,7 +503,8 @@ static void dp_netdev_execute_actions(struct 
>dp_netdev_pmd_thread *pmd,
>   struct dp_packet_batch *,
>   bool may_steal,
>   const struct nlattr *actions,
>-  size_t actions_len);
>+  size_t actions_len,
>+  long long now);
> static void dp_netdev_input(struct dp_netdev_pmd_thread *,
> struct dp_packet_batch *, odp_port_t port_no);
> static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
>@@ -541,6 +547,12 @@ static void dp_netdev_pmd_flow_flush(struct 
>dp_netdev_pmd_thread *pmd);
> static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
> OVS_REQUIRES(pmd->port_mutex);
> 
>+static void

Re: [ovs-dev] [PATCH v3 3/3] dpif-netdev: Introduce pmd-rxq-affinity.

2016-07-26 Thread Daniele Di Proietto
Looks mostly good to me, a couple more comments inline

Thanks,

Daniele


On 26/07/2016 06:48, "Ilya Maximets" <i.maxim...@samsung.com> wrote:

>On 26.07.2016 04:46, Daniele Di Proietto wrote:
>> Thanks for the patch.
>> 
>> I haven't been able to apply this without the XPS patch.
>
>That was the original idea. Using of this patch with current
>tx queue management may lead to performance issues on multiqueue
>configurations.

Ok, in this case it should be part of the same series.

>
>> This looks like a perfect chance to add more tests to pmd.at.  I can do it 
>> if you want
>
>Sounds good.
>
>> I started taking a look at this patch and I have a few comments inline.  
>> I'll keep looking at it tomorrow
>> 
>> Thanks,
>> 
>> Daniele
>> 
>> 
>> On 15/07/2016 04:54, "Ilya Maximets" <i.maxim...@samsung.com> wrote:
>> 
>>> New 'other_config:pmd-rxq-affinity' field for Interface table to
>>> perform manual pinning of RX queues to desired cores.
>>>
>>> This functionality is required to achieve maximum performance because
>>> all kinds of ports have different cost of rx/tx operations and
>>> only user can know about expected workload on different ports.
>>>
>>> Example:
>>> # ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>>>   other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>>> Queue #0 pinned to core 3;
>>> Queue #1 pinned to core 7;
>>> Queue #2 not pinned.
>>> Queue #3 pinned to core 8;
>>>
>>> It's decided to automatically isolate cores that have rxq explicitly
>>> assigned to them because it's useful to keep constant polling rate on
>>> some performance critical ports while adding/deleting other ports
>>> without explicit pinning of all ports.
>>>
>>> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
>>> ---
>>> INSTALL.DPDK.md  |  49 +++-
>>> NEWS |   2 +
>>> lib/dpif-netdev.c| 218 
>>> ++-
>>> tests/pmd.at |   6 ++
>>> vswitchd/vswitch.xml |  23 ++
>>> 5 files changed, 257 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>>> index 5407794..7609aa7 100644
>>> --- a/INSTALL.DPDK.md
>>> +++ b/INSTALL.DPDK.md
>>> @@ -289,14 +289,57 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
>>>  # Check current stats
>>>ovs-appctl dpif-netdev/pmd-stats-show
>>>
>>> + # Clear previous stats
>>> +   ovs-appctl dpif-netdev/pmd-stats-clear
>>> + ```
>>> +
>>> +  7. Port/rxq assigment to PMD threads
>>> +
>>> + ```
>>>  # Show port/rxq assignment
>>>ovs-appctl dpif-netdev/pmd-rxq-show
>>> + ```
>>>
>>> - # Clear previous stats
>>> -   ovs-appctl dpif-netdev/pmd-stats-clear
>>> + To change default rxq assignment to pmd threads rxqs may be manually
>>> + pinned to desired cores using:
>>> +
>>> + ```
>>> + ovs-vsctl set Interface  \
>>> +   other_config:pmd-rxq-affinity=
>>>  ```
>>> + where:
>>> +
>>> + ```
>>> +  ::= NULL | 
>>> +  ::=  |
>>> +   , 
>>> +  ::=  : 
>>> + ```
>>> +
>>> + Example:
>>> +
>>> + ```
>>> + ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>>> +   other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>>> +
>>> + Queue #0 pinned to core 3;
>>> + Queue #1 pinned to core 7;
>>> + Queue #2 not pinned.
>>> + Queue #3 pinned to core 8;
>>> + ```
>>> +
>>> + After that PMD threads on cores where RX queues was pinned will become
>>> + `isolated`. This means that this thread will poll only pinned RX 
>>> queues.
>>> +
>>> + WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX 
>>> queues
>>> + will not be polled. Also, if provided `core_id` is not available (ex. 
>>> this
>>> + `core_id` not in `pmd-cpu-mask`), RX queue will not be polled by any
>>> + PMD thread.
>>> +
>>> + Isolation of PMD threads also can be checked using
>>> +   

Re: [ovs-dev] [RFC 4/5] dpctl: uses open_type when calling netdev_open

2016-07-26 Thread Daniele Di Proietto
2016-07-26 11:29 GMT-07:00 Thadeu Lima de Souza Cascardo <
casca...@redhat.com>:

> On Tue, Jul 26, 2016 at 11:20:38AM -0700, Daniele Di Proietto wrote:
> > Hi Cascardo,
> >
> > thanks for your input on this.  It's quite messy right now, but I believe
> > we have a chance
> > to fix this up.
> >
> > Replies inline
> >
> > 2016-07-26 7:33 GMT-07:00 Thadeu Lima de Souza Cascardo <
> casca...@redhat.com
> > >:
> >
> > > On Mon, Jul 25, 2016 at 11:03:29AM -0700, Daniele Di Proietto wrote:
> > > > 2016-07-25 9:57 GMT-07:00 Thadeu Lima de Souza Cascardo <
> > > casca...@redhat.com
> > > > >:
> > > >
> > > > > On Fri, Jul 22, 2016 at 02:49:39PM -0700, Daniele Di Proietto
> wrote:
> > > > > > I would prefer if dpctl kept using the datapath types.  The
> > > translation
> > > > > > from database types to datapath type should happen in ofproto,
> dpctl
> > > is
> > > > > > supposed to be used to interact with the datapath directly.
> > > > > >
> > > > > > What do you guys think?
> > > > > >
> > > > > > The rest of the series looks good to me as well.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Daniele
> > > > > >
> > > > >
> > > > >
> > > > Hi Cascardo,
> > > >
> > > > Thanks for the detailed analysis.  The problem is that there are
> three
> > > > types:
> > > >
> > > > a) the database type
> > > > b) the port type in dpif-netdev
> > > > c) the netdev type
> > > >
> > > > I was assuming that b and c are always equal, but they're not.  The
> only
> > > > case
> > > > when they're not equal is the "ovs-netdev" (or "ovs-dummy") port.
> > > >
> > > > I think we can easily remove this case and make b and c always equal
> > > > with the following changes:
> > >
> > > Well, we also have ofproto type.
> > >
> > >
> > If I'm not mistaken, ofproto type is always equal to b) and therefore to
> c).
> >
>
> I didn't think so, I thought that a would equal the ofproto type. But it
> seems
> you are right, and we can just have two types: database type and netdev
> type,
> and make sure dpif and ofproto types match the netdev type.
>
> >
> > > I had a different approach, in which I would use the netdev_type when
> > > doing the
> > > query. That broke tests too. The affected tests were just dpctl output
> > > shown to
> > > the user. But I would expect some breakage when ofproto_query also used
> > > the same
> > > type and vswitchd would see the database type and ofproto type as
> > > different and
> > > try to reconfigure the port.
> > >
> > >
> > I don't think so. In bridge_delete_or_reconfigure() we compare the
> ofproto
> > type with
> > iface->netdev_type:
> >
> > if (strcmp(ofproto_port.type, iface->netdev_type)
> > || netdev_set_config(iface->netdev, >cfg->options,
>
> You are right. It was just some confusion because of the related problem I
> found
> (and this code is a recent fix from myself because we were using the
> database
> type).
>
> The problem was that we were comparing the database type to "system" and
> my mind
> was thinking "system" was the database type and the ofproto_port.type was
> different. I just didn't look back at the code and made a quick assumption.
>
> > NULL)) {
> > /* The interface is the wrong type or can't be configured.
> >  * Delete it. */
> > goto delete;
> > }
> >
> >
> > > Then, I looked at your patch below and noticed that you do the
> opposite,
> > > you
> > > eliminate the open_type and only use it for the internal type. Then I
> > > thought
> > > that would break other cases. But dpif_netdev_port_add uses the
> netdev_type
> > > already. Hey...
> > >
> > > So, dpctl does see it as tap when I add an internal port. Which
> probably
> > > means
> > > ofproto_type is also tap. I guess we will have to fix that too.
> > >
> >
> > To sum up, why do we have to fix that?  The translation between the
> database
> > type and the netdev type happ

Re: [ovs-dev] [PATCH v3 1/3] bridge: Pass interface's configuration to datapath.

2016-07-26 Thread Daniele Di Proietto





On 26/07/2016 06:19, "Ilya Maximets" <i.maxim...@samsung.com> wrote:

>On 26.07.2016 04:45, Daniele Di Proietto wrote:
>> Thanks for the patch
>> 
>> It looks good to me, a few minor comments inline
>> 
>> 
>> On 15/07/2016 04:54, "Ilya Maximets" <i.maxim...@samsung.com> wrote:
>> 
>>> This commit adds functionality to pass value of 'other_config' column
>>> of 'Interface' table to datapath.
>>>
>>> This may be used to pass not directly connected with netdev options and
>>> configure behaviour of the datapath for different ports.
>>> For example: pinning of rx queues to polling threads in dpif-netdev.
>>>
>>> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
>>> ---
>>> lib/dpif-netdev.c  |  1 +
>>> lib/dpif-netlink.c |  1 +
>>> lib/dpif-provider.h|  5 +
>>> lib/dpif.c | 17 +
>>> lib/dpif.h |  1 +
>>> ofproto/ofproto-dpif.c | 16 
>>> ofproto/ofproto-provider.h |  5 +
>>> ofproto/ofproto.c  | 29 +
>>> ofproto/ofproto.h  |  2 ++
>>> vswitchd/bridge.c  |  2 ++
>>> 10 files changed, 79 insertions(+)
>>>
>>> [...]
>>>
>>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>>> index ce9383a..97510a9 100644
>>> --- a/ofproto/ofproto-dpif.c
>>> +++ b/ofproto/ofproto-dpif.c
>>> @@ -3542,6 +3542,21 @@ port_del(struct ofproto *ofproto_, ofp_port_t 
>>> ofp_port)
>>> }
>>>
>>> static int
>>> +port_set_config(struct ofproto *ofproto_, ofp_port_t ofp_port,
>>> +        const struct smap *cfg)
>> 
>> Can we change this to directly take struct ofport_dpif *ofport instead of 
>> ofp_port_t?
>
>We can't get 'struct ofport_dpif *' because ofproto layer knows nothing
>about 'ofport_dpif' structure. All that we can is to get 'struct ofport *'
>and cast it.

You're right, using 'struct ofport *' seems better, thanks.

With the below diff

Acked-by: Daniele Di Proietto <diproiet...@vmware.com>

>
>How about following fixup to this patch:
>--
>diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>index 3a13326..79f2aa0 100644
>--- a/ofproto/ofproto-dpif.c
>+++ b/ofproto/ofproto-dpif.c
>@@ -3543,14 +3543,13 @@ port_del(struct ofproto *ofproto_, ofp_port_t ofp_port)
> }
> 
> static int
>-port_set_config(struct ofproto *ofproto_, ofp_port_t ofp_port,
>-const struct smap *cfg)
>+port_set_config(const struct ofport *ofport_, const struct smap *cfg)
> {
>-struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
>-struct ofport_dpif *ofport = ofp_port_to_ofport(ofproto, ofp_port);
>+struct ofport_dpif *ofport = ofport_dpif_cast(ofport_);
>+struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
> 
>-if (!ofport || sset_contains(>ghost_ports,
>- netdev_get_name(ofport->up.netdev))) {
>+if (sset_contains(>ghost_ports,
>+  netdev_get_name(ofport->up.netdev))) {
> return 0;
> }
> 
>diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
>index 2fc7452..7156814 100644
>--- a/ofproto/ofproto-provider.h
>+++ b/ofproto/ofproto-provider.h
>@@ -972,10 +972,9 @@ struct ofproto_class {
>  * convenient. */
> int (*port_del)(struct ofproto *ofproto, ofp_port_t ofp_port);
> 
>-/* Refreshes dtapath configuration of port number 'ofp_port' in 'ofproto'.
>+/* Refreshes datapath configuration of 'port'.
>  * Returns 0 if successful, otherwise a positive errno value. */
>-int (*port_set_config)(struct ofproto *ofproto, ofp_port_t ofp_port,
>-   const struct smap *cfg);
>+int (*port_set_config)(const struct ofport *port, const struct smap *cfg);
> 
> /* Get port stats */
> int (*port_get_stats)(const struct ofport *port,
>diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
>index c66c866..6cd2600 100644
>--- a/ofproto/ofproto.c
>+++ b/ofproto/ofproto.c
>@@ -2079,7 +2079,7 @@ ofproto_port_del(struct ofproto *ofproto, ofp_port_t 
>ofp_port)
> return error;
> }
> 
>-/* Refreshes dtapath configuration of port number 'ofp_port' in 'ofproto'.
>+/* Refreshes datapath configuration of port number 'ofp_port' in 'ofproto'.
>  *
>  * This function has no effect if 'ofproto' does not have a port 'ofp_port'. 
> */
>

Re: [ovs-dev] [RFC 4/5] dpctl: uses open_type when calling netdev_open

2016-07-26 Thread Daniele Di Proietto
Hi Cascardo,

thanks for your input on this.  It's quite messy right now, but I believe
we have a chance
to fix this up.

Replies inline

2016-07-26 7:33 GMT-07:00 Thadeu Lima de Souza Cascardo <casca...@redhat.com
>:

> On Mon, Jul 25, 2016 at 11:03:29AM -0700, Daniele Di Proietto wrote:
> > 2016-07-25 9:57 GMT-07:00 Thadeu Lima de Souza Cascardo <
> casca...@redhat.com
> > >:
> >
> > > On Fri, Jul 22, 2016 at 02:49:39PM -0700, Daniele Di Proietto wrote:
> > > > I would prefer if dpctl kept using the datapath types.  The
> translation
> > > > from database types to datapath type should happen in ofproto, dpctl
> is
> > > > supposed to be used to interact with the datapath directly.
> > > >
> > > > What do you guys think?
> > > >
> > > > The rest of the series looks good to me as well.
> > > >
> > > > Thanks,
> > > >
> > > > Daniele
> > > >
> > >
> > >
> > Hi Cascardo,
> >
> > Thanks for the detailed analysis.  The problem is that there are three
> > types:
> >
> > a) the database type
> > b) the port type in dpif-netdev
> > c) the netdev type
> >
> > I was assuming that b and c are always equal, but they're not.  The only
> > case
> > when they're not equal is the "ovs-netdev" (or "ovs-dummy") port.
> >
> > I think we can easily remove this case and make b and c always equal
> > with the following changes:
>
> Well, we also have ofproto type.
>
>
If I'm not mistaken, ofproto type is always equal to b) and therefore to c).


> I had a different approach, in which I would use the netdev_type when
> doing the
> query. That broke tests too. The affected tests were just dpctl output
> shown to
> the user. But I would expect some breakage when ofproto_query also used
> the same
> type and vswitchd would see the database type and ofproto type as
> different and
> try to reconfigure the port.
>
>
I don't think so. In bridge_delete_or_reconfigure() we compare the ofproto
type with
iface->netdev_type:

if (strcmp(ofproto_port.type, iface->netdev_type)
|| netdev_set_config(iface->netdev, >cfg->options,
NULL)) {
/* The interface is the wrong type or can't be configured.
 * Delete it. */
goto delete;
}


> Then, I looked at your patch below and noticed that you do the opposite,
> you
> eliminate the open_type and only use it for the internal type. Then I
> thought
> that would break other cases. But dpif_netdev_port_add uses the netdev_type
> already. Hey...
>
> So, dpctl does see it as tap when I add an internal port. Which probably
> means
> ofproto_type is also tap. I guess we will have to fix that too.
>

To sum up, why do we have to fix that?  The translation between the database
type and the netdev type happens in vswitchd/bridge.c.  The below layers,
ofproto
and dpif-netdev, deal with the netdev type directly.

Is there a problem with this approach?

The few changes I suggested fix the confusion for the "ovs-netdev" port.


>
> I am attaching my version of the patch here as well. Which of the 3
> versions do
> you think I should send? The original one I sent will not require changes
> in the
> tests and also doesn't change behavior for the user output, so I would
> stick to
> it for now.
>

Why is the code below necessary?  ofproto already deals with the netdev
type only, right?
(maybe I'm missing something here, probably)


>
> Cascardo.
>
> ---8<---
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 37c2631..19b1f87 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1437,11 +1437,12 @@ do_del_port(struct dp_netdev *dp, struct
> dp_netdev_port *port)
>  }
>
>  static void
> -answer_port_query(const struct dp_netdev_port *port,
> +answer_port_query(struct dp_netdev *dp OVS_UNUSED, const struct
> dp_netdev_port *port,
>struct dpif_port *dpif_port)
>  {
> +const char *type = dpif_netdev_port_open_type(dp->class, port->type);
>  dpif_port->name = xstrdup(netdev_get_name(port->netdev));
> -dpif_port->type = xstrdup(port->type);
> +dpif_port->type = xstrdup(type);
>  dpif_port->port_no = port->port_no;
>  }
>
> @@ -1456,7 +1457,7 @@ dpif_netdev_port_query_by_number(const struct dpif
> *dpif, odp_port_t port_no,
>  ovs_mutex_lock(>port_mutex);
>  error = get_port_by_number(dp, port_no, );
>  if (!error && dpif_port) {
> -answer_port_query(port, dpif_port);
> +answer_port_query(dp, p

Re: [ovs-dev] [PATCH v3 3/3] dpif-netdev: Introduce pmd-rxq-affinity.

2016-07-25 Thread Daniele Di Proietto
Thanks for the patch.

I haven't been able to apply this without the XPS patch.

This looks like a perfect chance to add more tests to pmd.at.  I can do it if 
you want

I started taking a look at this patch and I have a few comments inline.  I'll 
keep looking at it tomorrow

Thanks,

Daniele


On 15/07/2016 04:54, "Ilya Maximets"  wrote:

>New 'other_config:pmd-rxq-affinity' field for Interface table to
>perform manual pinning of RX queues to desired cores.
>
>This functionality is required to achieve maximum performance because
>all kinds of ports have different cost of rx/tx operations and
>only user can know about expected workload on different ports.
>
>Example:
>   # ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
> other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>   Queue #0 pinned to core 3;
>   Queue #1 pinned to core 7;
>   Queue #2 not pinned.
>   Queue #3 pinned to core 8;
>
>It's decided to automatically isolate cores that have rxq explicitly
>assigned to them because it's useful to keep constant polling rate on
>some performance critical ports while adding/deleting other ports
>without explicit pinning of all ports.
>
>Signed-off-by: Ilya Maximets 
>---
> INSTALL.DPDK.md  |  49 +++-
> NEWS |   2 +
> lib/dpif-netdev.c| 218 ++-
> tests/pmd.at |   6 ++
> vswitchd/vswitch.xml |  23 ++
> 5 files changed, 257 insertions(+), 41 deletions(-)
>
>diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>index 5407794..7609aa7 100644
>--- a/INSTALL.DPDK.md
>+++ b/INSTALL.DPDK.md
>@@ -289,14 +289,57 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
>  # Check current stats
>ovs-appctl dpif-netdev/pmd-stats-show
> 
>+ # Clear previous stats
>+   ovs-appctl dpif-netdev/pmd-stats-clear
>+ ```
>+
>+  7. Port/rxq assigment to PMD threads
>+
>+ ```
>  # Show port/rxq assignment
>ovs-appctl dpif-netdev/pmd-rxq-show
>+ ```
> 
>- # Clear previous stats
>-   ovs-appctl dpif-netdev/pmd-stats-clear
>+ To change default rxq assignment to pmd threads rxqs may be manually
>+ pinned to desired cores using:
>+
>+ ```
>+ ovs-vsctl set Interface  \
>+   other_config:pmd-rxq-affinity=
>  ```
>+ where:
>+
>+ ```
>+  ::= NULL | 
>+  ::=  |
>+   , 
>+  ::=  : 
>+ ```
>+
>+ Example:
>+
>+ ```
>+ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>+   other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>+
>+ Queue #0 pinned to core 3;
>+ Queue #1 pinned to core 7;
>+ Queue #2 not pinned.
>+ Queue #3 pinned to core 8;
>+ ```
>+
>+ After that PMD threads on cores where RX queues was pinned will become
>+ `isolated`. This means that this thread will poll only pinned RX queues.
>+
>+ WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX 
>queues
>+ will not be polled. Also, if provided `core_id` is not available (ex. 
>this
>+ `core_id` not in `pmd-cpu-mask`), RX queue will not be polled by any
>+ PMD thread.
>+
>+ Isolation of PMD threads also can be checked using
>+ `ovs-appctl dpif-netdev/pmd-rxq-show` command.
> 
>-  7. Stop vswitchd & Delete bridge
>+  8. Stop vswitchd & Delete bridge
> 
>  ```
>  ovs-appctl -t ovs-vswitchd exit
>diff --git a/NEWS b/NEWS
>index 6496dc1..9ccc1f5 100644
>--- a/NEWS
>+++ b/NEWS
>@@ -44,6 +44,8 @@ Post-v2.5.0
>Old 'other_config:n-dpdk-rxqs' is no longer supported.
>Not supported by vHost interfaces. For them number of rx and tx queues
>is applied from connected virtio device.
>+ * New 'other_config:pmd-rxq-affinity' field for PMD interfaces, that
>+   allows to pin port's rx queues to desired cores.
>  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
>assignment.
>  * Type of log messages from PMD threads changed from INFO to DBG.
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index 18ce316..e5a8dec 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -63,6 +63,7 @@
> #include "random.h"
> #include "seq.h"
> #include "shash.h"
>+#include "smap.h"
> #include "sset.h"
> #include "timeval.h"
> #include "tnl-neigh-cache.h"
>@@ -250,6 +251,12 @@ enum pmd_cycles_counter_type {
> 
> #define XPS_TIMEOUT_MS 500LL
> 
>+/* Contained by struct dp_netdev_port's 'rxqs' member.  */
>+struct dp_netdev_rxq {
>+struct netdev_rxq *rxq;
>+unsigned core_id;   /* Сore to which this queue is pinned. */
>+};
>+
> /* A port in a netdev-based datapath. */
> struct dp_netdev_port {
> odp_port_t port_no;
>@@ -257,10 +264,11 @@ struct dp_netdev_port {
> struct hmap_node node;  /* Node in dp_netdev's 'ports'. */
> struct netdev_saved_flags *sf;
> unsigned n_rxq; /* Number of elements in 'rxq' */
>-

Re: [ovs-dev] [PATCH v3 2/3] util: Expose function nullable_string_is_equal.

2016-07-25 Thread Daniele Di Proietto
Nice!

Applied to master, thanks



On 15/07/2016 04:54, "Ilya Maximets"  wrote:

>Implementation of 'nullable_string_is_equal()' moved to util.c and
>reused inside dpif-netdev.
>
>Signed-off-by: Ilya Maximets 
>---
> lib/dpif-netdev.c| 14 ++
> lib/util.c   |  6 ++
> lib/util.h   |  1 +
> ofproto/ofproto-dpif-ipfix.c |  6 --
> ofproto/ofproto-dpif-sflow.c |  6 --
> 5 files changed, 9 insertions(+), 24 deletions(-)
>
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index 4643cce..18ce316 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -2524,16 +2524,6 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op 
>**ops, size_t n_ops)
> }
> }
> 
>-static bool
>-cmask_equals(const char *a, const char *b)
>-{
>-if (a && b) {
>-return !strcmp(a, b);
>-}
>-
>-return a == NULL && b == NULL;
>-}
>-
> /* Changes the number or the affinity of pmd threads.  The changes are 
> actually
>  * applied in dpif_netdev_run(). */
> static int
>@@ -2541,7 +2531,7 @@ dpif_netdev_pmd_set(struct dpif *dpif, const char *cmask)
> {
> struct dp_netdev *dp = get_dp_netdev(dpif);
> 
>-if (!cmask_equals(dp->requested_pmd_cmask, cmask)) {
>+if (!nullable_string_is_equal(dp->requested_pmd_cmask, cmask)) {
> free(dp->requested_pmd_cmask);
> dp->requested_pmd_cmask = nullable_xstrdup(cmask);
> }
>@@ -2756,7 +2746,7 @@ dpif_netdev_run(struct dpif *dpif)
> 
> dp_netdev_pmd_unref(non_pmd);
> 
>-if (!cmask_equals(dp->pmd_cmask, dp->requested_pmd_cmask)
>+if (!nullable_string_is_equal(dp->pmd_cmask, dp->requested_pmd_cmask)
> || ports_require_restart(dp)) {
> reconfigure_pmd_threads(dp);
> }
>diff --git a/lib/util.c b/lib/util.c
>index e1dc3d2..241a7f1 100644
>--- a/lib/util.c
>+++ b/lib/util.c
>@@ -157,6 +157,12 @@ nullable_xstrdup(const char *s)
> return s ? xstrdup(s) : NULL;
> }
> 
>+bool
>+nullable_string_is_equal(const char *a, const char *b)
>+{
>+return a ? b && !strcmp(a, b) : !b;
>+}
>+
> char *
> xvasprintf(const char *format, va_list args)
> {
>diff --git a/lib/util.h b/lib/util.h
>index e738c9f..6a61dde 100644
>--- a/lib/util.h
>+++ b/lib/util.h
>@@ -113,6 +113,7 @@ void *xmemdup(const void *, size_t) MALLOC_LIKE;
> char *xmemdup0(const char *, size_t) MALLOC_LIKE;
> char *xstrdup(const char *) MALLOC_LIKE;
> char *nullable_xstrdup(const char *) MALLOC_LIKE;
>+bool nullable_string_is_equal(const char *a, const char *b);
> char *xasprintf(const char *format, ...) OVS_PRINTF_FORMAT(1, 2) MALLOC_LIKE;
> char *xvasprintf(const char *format, va_list) OVS_PRINTF_FORMAT(1, 0) 
> MALLOC_LIKE;
> void *x2nrealloc(void *p, size_t *n, size_t s);
>diff --git a/ofproto/ofproto-dpif-ipfix.c b/ofproto/ofproto-dpif-ipfix.c
>index 5744abb..d9069cb 100644
>--- a/ofproto/ofproto-dpif-ipfix.c
>+++ b/ofproto/ofproto-dpif-ipfix.c
>@@ -464,12 +464,6 @@ static void get_export_time_now(uint64_t *, uint32_t *);
> static void dpif_ipfix_cache_expire_now(struct dpif_ipfix_exporter *, bool);
> 
> static bool
>-nullable_string_is_equal(const char *a, const char *b)
>-{
>-return a ? b && !strcmp(a, b) : !b;
>-}
>-
>-static bool
> ofproto_ipfix_bridge_exporter_options_equal(
> const struct ofproto_ipfix_bridge_exporter_options *a,
> const struct ofproto_ipfix_bridge_exporter_options *b)
>diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
>index 7d0aa36..8ede492 100644
>--- a/ofproto/ofproto-dpif-sflow.c
>+++ b/ofproto/ofproto-dpif-sflow.c
>@@ -92,12 +92,6 @@ static void dpif_sflow_del_port__(struct dpif_sflow *,
> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> 
> static bool
>-nullable_string_is_equal(const char *a, const char *b)
>-{
>-return a ? b && !strcmp(a, b) : !b;
>-}
>-
>-static bool
> ofproto_sflow_options_equal(const struct ofproto_sflow_options *a,
>  const struct ofproto_sflow_options *b)
> {
>-- 
>2.7.4
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 1/3] bridge: Pass interface's configuration to datapath.

2016-07-25 Thread Daniele Di Proietto
Thanks for the patch

It looks good to me, a few minor comments inline


On 15/07/2016 04:54, "Ilya Maximets"  wrote:

>This commit adds functionality to pass value of 'other_config' column
>of 'Interface' table to datapath.
>
>This may be used to pass not directly connected with netdev options and
>configure behaviour of the datapath for different ports.
>For example: pinning of rx queues to polling threads in dpif-netdev.
>
>Signed-off-by: Ilya Maximets 
>---
> lib/dpif-netdev.c  |  1 +
> lib/dpif-netlink.c |  1 +
> lib/dpif-provider.h|  5 +
> lib/dpif.c | 17 +
> lib/dpif.h |  1 +
> ofproto/ofproto-dpif.c | 16 
> ofproto/ofproto-provider.h |  5 +
> ofproto/ofproto.c  | 29 +
> ofproto/ofproto.h  |  2 ++
> vswitchd/bridge.c  |  2 ++
> 10 files changed, 79 insertions(+)
>
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index 6345944..4643cce 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -4295,6 +4295,7 @@ const struct dpif_class dpif_netdev_class = {
> dpif_netdev_get_stats,
> dpif_netdev_port_add,
> dpif_netdev_port_del,
>+NULL,   /* port_set_config */
> dpif_netdev_port_query_by_number,
> dpif_netdev_port_query_by_name,
> NULL,   /* port_get_pid */
>diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>index e2bea23..2f939ae 100644
>--- a/lib/dpif-netlink.c
>+++ b/lib/dpif-netlink.c
>@@ -2348,6 +2348,7 @@ const struct dpif_class dpif_netlink_class = {
> dpif_netlink_get_stats,
> dpif_netlink_port_add,
> dpif_netlink_port_del,
>+NULL,   /* port_set_config */
> dpif_netlink_port_query_by_number,
> dpif_netlink_port_query_by_name,
> dpif_netlink_port_get_pid,
>diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
>index 25f4280..21fb0ba 100644
>--- a/lib/dpif-provider.h
>+++ b/lib/dpif-provider.h
>@@ -167,6 +167,11 @@ struct dpif_class {
> /* Removes port numbered 'port_no' from 'dpif'. */
> int (*port_del)(struct dpif *dpif, odp_port_t port_no);
> 
>+/* Refreshes configuration of 'dpif's port. The implementation might
>+ * postpone applying the changes until run() is called. */
>+int (*port_set_config)(struct dpif *dpif, odp_port_t port_no,
>+   const struct smap *cfg);
>+
> /* Queries 'dpif' for a port with the given 'port_no' or 'devname'.
>  * If 'port' is not null, stores information about the port into
>  * '*port' if successful.
>diff --git a/lib/dpif.c b/lib/dpif.c
>index 5f1be41..f6e5338 100644
>--- a/lib/dpif.c
>+++ b/lib/dpif.c
>@@ -610,6 +610,23 @@ dpif_port_exists(const struct dpif *dpif, const char 
>*devname)
> return !error;
> }
> 
>+/* Refreshes configuration of 'dpif's port. */
>+int
>+dpif_port_set_config(struct dpif *dpif, odp_port_t port_no,
>+ const struct smap *cfg)
>+{
>+int error = 0;
>+
>+if (dpif->dpif_class->port_set_config) {
>+error = dpif->dpif_class->port_set_config(dpif, port_no, cfg);
>+if (error) {
>+log_operation(dpif, "port_set_config", error);
>+}
>+}
>+
>+return error;
>+}
>+
> /* Looks up port number 'port_no' in 'dpif'.  On success, returns 0 and
>  * initializes '*port' appropriately; on failure, returns a positive errno
>  * value.
>diff --git a/lib/dpif.h b/lib/dpif.h
>index 981868c..a7c5097 100644
>--- a/lib/dpif.h
>+++ b/lib/dpif.h
>@@ -839,6 +839,7 @@ void dpif_register_upcall_cb(struct dpif *, 
>upcall_callback *, void *aux);
> int dpif_recv_set(struct dpif *, bool enable);
> int dpif_handlers_set(struct dpif *, uint32_t n_handlers);
> int dpif_poll_threads_set(struct dpif *, const char *cmask);
>+int dpif_port_set_config(struct dpif *, odp_port_t, const struct smap *cfg);
> int dpif_recv(struct dpif *, uint32_t handler_id, struct dpif_upcall *,
>   struct ofpbuf *);
> void dpif_recv_purge(struct dpif *);
>diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>index ce9383a..97510a9 100644
>--- a/ofproto/ofproto-dpif.c
>+++ b/ofproto/ofproto-dpif.c
>@@ -3542,6 +3542,21 @@ port_del(struct ofproto *ofproto_, ofp_port_t ofp_port)
> }
> 
> static int
>+port_set_config(struct ofproto *ofproto_, ofp_port_t ofp_port,
>+const struct smap *cfg)

Can we change this to directly take struct ofport_dpif *ofport instead of 
ofp_port_t?

>+{
>+struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
>+struct ofport_dpif *ofport = ofp_port_to_ofport(ofproto, ofp_port);
>+
>+if (!ofport || sset_contains(>ghost_ports,
>+ netdev_get_name(ofport->up.netdev))) {
>+return 0;
>+}
>+
>+return dpif_port_set_config(ofproto->backer->dpif, ofport->odp_port, cfg);
>+}
>+
>+static int
> port_get_stats(const struct ofport *ofport_, 

Re: [ovs-dev] [PATCH v3] netdev-dpdk: Set pmd thread priority

2016-07-25 Thread Daniele Di Proietto
I agree with Mark's comments, other than that this looks good to me.

If you agree with the comments would you mind sending an updates version?

Thanks,

Daniele

2016-07-19 7:04 GMT-07:00 Kavanagh, Mark B :

> >
>
> Hi Bhanu,
>
> Thanks for the patch - some comments inline.
>
> Cheers,
> Mark
>
> >Set the DPDK pmd thread scheduling policy to SCHED_RR and static
> >priority to highest priority value of the policy. This is to deal with
> >pmd thread starvation case where another cpu hogging process can get
> >scheduled/affinitized on to the same core the pmd thread is running
> >there by significantly impacting the datapath performance.
> >
> >Setting the realtime scheduling policy to the pmd threads is one step
> >towards Fastpath Service Assurance in OVS DPDK.
> >
> >The realtime scheduling policy is applied only when CPU mask is passed
> >to 'pmd-cpu-mask'. For example:
> >
> >* In the absence of pmd-cpu-mask, one pmd thread shall be created
> >  and default scheduling policy and prority gets applied.
>
> Typo above - 'prority'
>
> >
> >* If pmd-cpu-mask is specified, one ore more pmd threads shall be
>
> Typo above - 'ore'
>
> >  spawned on the corresponding core(s) in the mask and real time
> >  scheduling policy SCHED_RR and highest priority of the policy is
> >  applied to the pmd thread(s).
> >
> >To reproduce the pmd thread starvation case:
> >
> >ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
> >taskset 0x2 cat /dev/zero > /dev/null &
> >
> >With this commit OVS control threads and pmd threads can't have same
> >affinity ('dpdk-lcore-mask','pmd-cpu-mask' should be non-overlapping).
> >Also other processes with same affinity as PMD thread will be
> unresponsive.
> >
> >Signed-off-by: Bhanuprakash Bodireddy 
> >---
> >
> >v2->v3:
> >* Move set_priority() function to lib/ovs-numa.c
> >* Apply realtime scheduling policy and priority to pmd thread only if
> >  pmd-cpu-mask is passed.
> >* Update INSTALL.DPDK-ADVANCED.
> >
> >v1->v2:
> >* Removed #ifdef and introduced dummy function "pmd_thread_setpriority"
> >  in netdev-dpdk.h
> >* Rebase
> >
> >
> > INSTALL.DPDK-ADVANCED.md | 15 +++
> > lib/dpif-netdev.c|  9 +
> > lib/ovs-numa.c   | 18 ++
> > lib/ovs-numa.h   |  1 +
> > 4 files changed, 39 insertions(+), 4 deletions(-)
> >
> >diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> >index 9ae536d..d828290 100644
> >--- a/INSTALL.DPDK-ADVANCED.md
> >+++ b/INSTALL.DPDK-ADVANCED.md
> >@@ -205,8 +205,10 @@ needs to be affinitized accordingly.
> > pmd thread is CPU bound, and needs to be affinitized to isolated
> > cores for optimum performance.
> >
> >-By setting a bit in the mask, a pmd thread is created and pinned
> >-to the corresponding CPU core. e.g. to run a pmd thread on core 2
> >+By setting a bit in the mask, a pmd thread is created, pinned
> >+to the corresponding CPU core and the scheduling policy SCHED_RR
> >+along with maximum priority of the policy applied to the pmd thread.
> >+e.g. to pin a pmd thread on core 2
> >
> > `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4`
> >
> >@@ -234,8 +236,10 @@ needs to be affinitized accordingly.
> >   responsible for different ports/rxq's. Assignment of ports/rxq's to
> >   pmd threads is done automatically.
> >
> >-  A set bit in the mask means a pmd thread is created and pinned
> >-  to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
> >+  A set bit in the mask means a pmd thread is created, pinned to the
> >+  corresponding CPU core and the scheduling policy SCHED_RR with highest
> >+  priority of the scheduling policy applied to pmd thread.
> >+  e.g. to run pmd threads on core 1 and 2
>
> There's some repetition in the last paragraph - I'm reviewing this patch
> in isolation, so the text may make sense/be required in the full document.
>
> >
> >   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
> >
> >@@ -246,6 +250,9 @@ needs to be affinitized accordingly.
> >
> >   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
> >
> >+  Note: 'dpdk-lcore-mask' and 'pmd-cpu-mask' cpu mask settings should be
> >+  non-overlapping.
>
> Although it's mentioned in the commit message, it might be worth
> mentioning here the consequences of attempting to pin non-PMD processes to
> a pmd-cpu-mask core (i.e. CPU starvation)
>
> >+
> > ### 4.3 DPDK physical port Rx Queues
> >
> >   `ovs-vsctl set Interface  options:n_rxq=`
> >diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >index e0107b7..805d0ae 100644
> >--- a/lib/dpif-netdev.c
> >+++ b/lib/dpif-netdev.c
> >@@ -2851,6 +2851,15 @@ pmd_thread_main(void *f_)
> > ovs_numa_thread_setaffinity_core(pmd->core_id);
> > dpdk_set_lcore_id(pmd->core_id);
> > poll_cnt = pmd_load_queues_and_ports(pmd, _list);
> >+
> >+/* When cpu affinity mask explicitly set using 

Re: [ovs-dev] [PATCH 3/4] netdev-dpdk: Add vHost User PMD

2016-07-25 Thread Daniele Di Proietto
Thanks for the patch

This needs a little bit of rebasing, I did it myself to review, but it'd be
nice to have an updated version.

I like the simplification that this brings especially to the fast path.

If we merge this before we merge the DPDK 16.07 we won't have to deal with
the vid change.

Thanks,

Daniele

2016-07-15 7:26 GMT-07:00 Ciara Loftus :

> DPDK 16.04 introduces the vHost PMD which allows 'dpdkvhostuser' ports
> to be controlled by the librte_ether API, like physical 'dpdk' ports and
> IVSHM 'dpdkr' ports. This commit integrates this PMD into OVS and
> removes direct calls to the librte_vhost DPDK library.
>
> This commit removes extended statistics support for vHost User ports
> until such a time that this becomes available in the vHost PMD in a
> DPDK release supported by OVS.
>
> Signed-off-by: Ciara Loftus 
> ---
>  INSTALL.DPDK.md   |  10 +
>  NEWS  |   2 +
>  lib/netdev-dpdk.c | 856
> ++
>  3 files changed, 302 insertions(+), 566 deletions(-)
>
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 5407794..29b6f91 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -561,6 +561,16 @@ can be found in [Vhost Walkthrough].
>
>  http://dpdk.org/doc/guides/rel_notes/release_16_04.html
>
> +  - dpdk, dpdkr and dpdkvhostuser ports are 'eth' type ports in the
> context of
> +DPDK as they are all managed by the rte_ether API. This means that
> they
> +adhere to the DPDK configuration option CONFIG_RTE_MAX_ETHPORTS which
> by
> +default is set to 32. This means by default the combined total number
> of
> +dpdk, dpdkr and dpdkvhostuser ports allowable in OVS with DPDK is 32.
> This
> +value can be changed if desired by modifying the configuration file in
> +DPDK, or by overriding the default value on the command line when
> building
> +DPDK. eg.
> +
> +`make install CONFIG_RTE_MAX_ETHPORTS=64`
>

Again, I hope this doesn't cause problems to a lot of users.  I'd like to
see the limit increased by default, but I think we can merge this patch as
it is.


>
>  Bug Reporting:
>  --
> diff --git a/NEWS b/NEWS
> index aa1b915..b3791ed 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -59,6 +59,8 @@ Post-v2.5.0
> node that device memory is located on if
> CONFIG_RTE_LIBRTE_VHOST_NUMA
> is enabled in DPDK.
>   * Remove dpdkvhostcuse port type.
> + * vHost PMD integration brings vhost-user ports under control of the
> +   rte_ether DPDK API.
> - Increase number of registers to 16.
> - ovs-benchmark: This utility has been removed due to lack of use and
>   bitrot.
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index b4f82af..5de806a 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -56,6 +56,7 @@
>  #include "unixctl.h"
>
>  #include "rte_config.h"
> +#include "rte_eth_vhost.h"
>  #include "rte_mbuf.h"
>  #include "rte_meter.h"
>  #include "rte_virtio_net.h"
> @@ -141,6 +142,11 @@ static char *vhost_sock_dir = NULL;   /* Location of
> vhost-user sockets */
>
>  #define VHOST_ENQ_RETRY_NUM 8
>
> +/* Array that tracks the used & unused vHost user driver IDs */
> +static unsigned int vhost_drv_ids[RTE_MAX_ETHPORTS];
> +/* Maximum string length allowed to provide to rte_eth_attach function */
> +#define DEVARGS_MAX (RTE_ETH_NAME_MAX_LEN + PATH_MAX + 18)
> +
>

I think this is not needed if we use xasprintf() below.


>  static const struct rte_eth_conf port_conf = {
>  .rxmode = {
>  .mq_mode = ETH_MQ_RX_RSS,
> @@ -353,12 +359,15 @@ struct netdev_dpdk {
>   * always true.  */
>  bool txq_needs_locking;
>
> -/* virtio-net structure for vhost device */
> -OVSRCU_TYPE(struct virtio_net *) virtio_dev;
> +/* Number of virtqueue pairs reported by the guest */
> +uint32_t vhost_qp_nb;
>
>  /* Identifier used to distinguish vhost devices from each other */
>  char vhost_id[PATH_MAX];
>
> +/* ID of vhost user port given to the PMD driver */
> +unsigned int vhost_pmd_id;
> +
>  /* In dpdk_list. */
>  struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
>
> @@ -389,16 +398,25 @@ struct netdev_rxq_dpdk {
>  static bool dpdk_thread_is_pmd(void);
>
>  static int netdev_dpdk_construct(struct netdev *);
> +static int netdev_dpdk_vhost_construct(struct netdev *);
>
>  struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk *dev);
>
>  struct ingress_policer *
>  netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev);
>
> +void link_status_changed_callback(uint8_t port_id,
> +enum rte_eth_event_type type OVS_UNUSED, void *param OVS_UNUSED);
> +void vring_state_changed_callback(uint8_t port_id,
> +enum rte_eth_event_type type OVS_UNUSED, void *param OVS_UNUSED);
>

Minor: I think we can avoid OVS_UNUSED on the declaration and keep it only
on the definition.

Also, these two function can be static


> +static void 

Re: [ovs-dev] [PATCH 2/4] netdev-dpdk: Consistent naming for vhost functions

2016-07-25 Thread Daniele Di Proietto
Looks like every caller of NETDEV_DPDK_CLASS() passes NULL as INIT param.
Maybe we can remove that?

Acked-by: Daniele Di Proietto <diproiet...@vmware.com>

2016-07-15 7:26 GMT-07:00 Ciara Loftus <ciara.lof...@intel.com>:

> A mix of vhost_user_ and vhost_ is used when naming vhost functions. The
> 'user_' has been dropped for consistency. Also remove empty
> 'vhost_user_class_init' function.
>
> Signed-off-by: Ciara Loftus <ciara.lof...@intel.com>
> ---
>  lib/netdev-dpdk.c | 20 +++-
>  1 file changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index faf93e0..b4f82af 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -811,7 +811,7 @@ dpdk_dev_parse_name(const char dev_name[], const char
> prefix[],
>  }
>
>  static int
> -netdev_dpdk_vhost_user_construct(struct netdev *netdev)
> +netdev_dpdk_vhost_construct(struct netdev *netdev)
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>  const char *name = netdev->name;
> @@ -2408,12 +2408,6 @@ dpdk_vhost_class_init(void)
>  return 0;
>  }
>
> -static int
> -dpdk_vhost_user_class_init(void)
> -{
> -return 0;
> -}
> -
>  static void
>  dpdk_common_init(void)
>  {
> @@ -2809,7 +2803,7 @@ out:
>  }
>
>  static int
> -netdev_dpdk_vhost_user_reconfigure(struct netdev *netdev)
> +netdev_dpdk_vhost_reconfigure(struct netdev *netdev)
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>  struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(dev);
> @@ -3326,11 +3320,11 @@ static const struct netdev_class dpdk_ring_class =
>  netdev_dpdk_reconfigure,
>  netdev_dpdk_rxq_recv);
>
> -static const struct netdev_class OVS_UNUSED dpdk_vhost_user_class =
> +static const struct netdev_class OVS_UNUSED dpdk_vhost_class =
>  NETDEV_DPDK_CLASS(
>  "dpdkvhostuser",
> -dpdk_vhost_user_class_init,
> -netdev_dpdk_vhost_user_construct,
> +NULL,
> +netdev_dpdk_vhost_construct,
>  netdev_dpdk_vhost_destruct,
>  NULL,
>  NULL,
> @@ -3339,7 +,7 @@ static const struct netdev_class OVS_UNUSED
> dpdk_vhost_user_class =
>  netdev_dpdk_vhost_get_stats,
>  NULL,
>  NULL,
> -netdev_dpdk_vhost_user_reconfigure,
> +netdev_dpdk_vhost_reconfigure,
>  netdev_dpdk_vhost_rxq_recv);
>
>  void
> @@ -3348,7 +3342,7 @@ netdev_dpdk_register(void)
>  dpdk_common_init();
>  netdev_register_provider(_class);
>  netdev_register_provider(_ring_class);
> -netdev_register_provider(_vhost_user_class);
> +netdev_register_provider(_vhost_class);
>  }
>
>  void
> --
> 2.4.3
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/4] netdev-dpdk: Remove dpdkvhostcuse ports

2016-07-25 Thread Daniele Di Proietto
Acked-by: Daniele Di Proietto <diproiet...@vmware.com>

2016-07-15 7:26 GMT-07:00 Ciara Loftus <ciara.lof...@intel.com>:

> This commit removes the 'dpdkvhostcuse' port type from the userspace
> datapath. vhost-cuse ports are quickly becoming obsolete as the
> vhost-user port type begins to support a greater feature-set thanks to
> the addition of things like vhost-user multiqueue and potential
> upcoming features like vhost-user client-mode and vhost-user reconnect.
> The feature is also expected to be removed from DPDK soon.
>
> One potential drawback of the removal of this support is that a
> userspace vHost port type is not available in OVS for use with older
> versions of QEMU (pre v2.2). Considering v2.2 is nearly two years old
> this should however be a low impact change.
>
> Signed-off-by: Ciara Loftus <ciara.lof...@intel.com>
> Acked-by: Flavio Leitner <f...@sysclose.org>
> ---
>  INSTALL.DPDK-ADVANCED.md | 241 -
>  NEWS |   1 +
>  acinclude.m4 |  12 --
>  lib/netdev-dpdk.c| 101 +---
>  rhel/README.RHEL |   2 -
>  utilities/automake.mk|   1 -
>  utilities/qemu-wrap.py   | 389
> ---
>  vswitchd/vswitch.xml |  12 --
>  8 files changed, 5 insertions(+), 754 deletions(-)
>  delete mode 100755 utilities/qemu-wrap.py
>
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 9ae536d..fb37584 100644
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -374,12 +374,6 @@ For users wanting to do packet forwarding using
> kernel stack below are the steps
>
>  ##  6. Vhost Walkthrough
>
> -DPDK 16.04 supports two types of vhost:
> -
> -1. vhost-user - enabled default
> -
> -2. vhost-cuse - Legacy, disabled by default
> -
>  ### 6.1 vhost-user
>
>- Prerequisites:
> @@ -533,241 +527,6 @@ DPDK 16.04 supports two types of vhost:
>
>Note: For information on libvirt and further tuning refer [libvirt].
>
> -### 6.2 vhost-cuse
> -
> -  - Prerequisites:
> -
> -QEMU version >= 2.2
> -
> -  - Enable vhost-cuse support
> -
> -1. Enable vhost cuse support in DPDK
> -
> -   Set `CONFIG_RTE_LIBRTE_VHOST_USER=n` in config/common_linuxapp and
> follow the
> -   steps in 2.2 section of INSTALL.DPDK guide to build DPDK with cuse
> support.
> -   OVS will detect that DPDK has vhost-cuse libraries compiled and in
> turn will enable
> -   support for it in the switch and disable vhost-user support.
> -
> -2. Insert the Cuse module
> -
> -   `modprobe cuse`
> -
> -3. Build and insert the `eventfd_link` module
> -
> -   ```
> -   cd $DPDK_DIR/lib/librte_vhost/eventfd_link/
> -   make
> -   insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko
> -   ```
> -
> -  - Adding vhost-cuse ports to Switch
> -
> -Unlike DPDK ring ports, DPDK vhost-cuse ports can have arbitrary
> names.
> -For vhost-cuse, the name of the port type is `dpdkvhostcuse`
> -
> -```
> -ovs-vsctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1
> -type=dpdkvhostcuse
> -```
> -
> -When attaching vhost-cuse ports to QEMU, the name provided during the
> -add-port operation must match the ifname parameter on the QEMU cmd
> line.
> -
> -  - Adding vhost-cuse ports to VM
> -
> -vhost-cuse ports use a Linux* character device to communicate with
> QEMU.
> -By default it is set to `/dev/vhost-net`. It is possible to reuse this
> -standard device for DPDK vhost, which makes setup a little simpler
> but it
> -is better practice to specify an alternative character device in
> order to
> -avoid any conflicts if kernel vhost is to be used in parallel.
> -
> -1. This step is only needed if using an alternative character device.
> -
> -   ```
> -   ./utilities/ovs-vsctl --no-wait set Open_vSwitch . \
> -other_config:cuse-dev-name=my-vhost-net
> -   ```
> -
> -   In the example above, the character device to be used will be
> -   `/dev/my-vhost-net`.
> -
> -2. In case of reusing kernel vhost character device, there would be
> conflict
> -   user should remove it.
> -
> -   `rm -rf /dev/vhost-net`
> -
> -3. Configure virtio-net adapters
> -
> -   The following parameters must be passed to the QEMU binary, repeat
> -   the below parameters for multiple devices.
> -
> -   ```
> -   -netdev tap,id=,script=no,downscript=no,ifname=,vhost=on
> -   -device virtio-net-pci,netdev=net1,mac=
> -   ```
> -

Re: [ovs-dev] [PATACH]netdev-dpdk: remove duplicated code in netdev_dpdk_get_status

2016-07-25 Thread Daniele Di Proietto
Thanks for the patch!

I had to manually unwrap some long lines.  Could you maybe use git
send-email next time?

Anyway, I added you to AUTHORS and applied this to master

Thanks,

Daniele

2016-07-25 1:37 GMT-07:00 :

> From caeb84217c38ccd0b2076689fd36b578c00678ad Mon Sep 17 00:00:00 2001
> From: xubinbin 
> Date: Thu, 21 Jul 2016 21:52:29 +0800
> Subject: [PATCH] netdev-dpdk: remove duplicated code in
> netdev_dpdk_get_status
>
> Put "driver_name" into "args" twice, that's meaninglessness.
> So need to remove duplicated code.
>
> Signed-off-by: Binbin Xu 
> ---
>  lib/netdev-dpdk.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 85b18fd..b515bee 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2091,8 +2091,6 @@ netdev_dpdk_get_status(const struct netdev *netdev,
> struct smap *args)
>  rte_eth_dev_info_get(dev->port_id, _info);
>  ovs_mutex_unlock(>mutex);
>
> -smap_add_format(args, "driver_name", "%s", dev_info.driver_name);
> -
>  smap_add_format(args, "port_no", "%d", dev->port_id);
>  smap_add_format(args, "numa_id", "%d",
> rte_eth_dev_socket_id(dev->port_id));
>  smap_add_format(args, "driver_name", "%s", dev_info.driver_name);
> --
> 1.8.3.1
> 
> ZTE Information Security Notice: The information contained in this mail
> (and any attachment transmitted herewith) is privileged and confidential
> and is intended for the exclusive use of the addressee(s).  If you are not
> an intended recipient, any disclosure, reproduction, distribution or other
> dissemination or use of the information contained is strictly prohibited.
> If you have received this mail in error, please delete it and notify us
> immediately.
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Apply batch truncation API.

2016-07-25 Thread Daniele Di Proietto
Applied to master, thanks

2016-07-25 8:14 GMT-07:00 William Tu :

> Instead of looping into each packet and check whether to truncate, the
> patch moves it out of the loop and uses batch API.  If truncation is
> not set, checking 'trunc' in 'struct dp_packet_batch' at per-batch basis
> can skip the per-packet checking overhead.
>
> Signed-off-by: William Tu 
> ---
>  lib/netdev-dpdk.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 7fb6457..22d547f 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1411,6 +1411,8 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
> struct dp_packet_batch *batch)
>  ovs_mutex_lock(_mempool_mutex);
>  }
>
> +dp_packet_batch_apply_cutlen(batch);
> +
>  for (i = 0; i < batch->count; i++) {
>  int size = dp_packet_size(batch->packets[i]);
>
> @@ -1429,10 +1431,6 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
> struct dp_packet_batch *batch)
>  break;
>  }
>
> -/* Cut the size so only the truncated size is copied. */
> -size -= dp_packet_get_cutlen(batch->packets[i]);
> -dp_packet_reset_cutlen(batch->packets[i]);
> -
>  /* We have to do a copy for now */
>  memcpy(rte_pktmbuf_mtod(mbufs[newcnt], void *),
> dp_packet_data(batch->packets[i]), size);
> @@ -1506,12 +1504,11 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int
> qid,
>  unsigned int temp_cnt = 0;
>  int cnt = batch->count;
>
> +dp_packet_batch_apply_cutlen(batch);
> +
>  for (int i = 0; i < cnt; i++) {
>  int size = dp_packet_size(batch->packets[i]);
>
> -size -= dp_packet_get_cutlen(batch->packets[i]);
> -dp_packet_set_size(batch->packets[i], size);
> -
>  if (OVS_UNLIKELY(size > dev->max_packet_len)) {
>  if (next_tx_idx != i) {
>  temp_cnt = i - next_tx_idx;
> --
> 2.5.0
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Backport Request: dpif-netdev: Remove PMD latency on seq_mutex

2016-07-25 Thread Daniele Di Proietto
Fair point, I guess this looks more like a bug fix than a new features.

After discussing offline with Ben and having another set of eyes on the changes 
I backported this to branch-2.5.

Thanks,

Daniele


On 25/07/2016 02:29, "Loftus, Ciara" <ciara.lof...@intel.com> wrote:

>> 
>> Thanks Flavio for checking and Daniel for your consideration.
>> Indeed the issue exists in 2.5 branch.
>> 
>> We are treating this more in the bucket of a performance bug fix than a
>> feature.
>> 
>> Any specific testing that you would like to see run to help reduce
>> your concern related to changes to the core modules ?
>> 
>> Ciara, what's your opinion on these changes for a backport ?
>
>I'm of the same opinion as yourself and Flavio that this is more a fix than a 
>feature. I'd like to see it backported.
>But I understand there may be some risk associated with due to the nature of 
>the changes.
>
>Thanks,
>Ciara
>
>> 
>> Thanks
>> Vinod
>> 
>> 
>> 
>> -Original Message-
>> From: Daniele Di Proietto [mailto:diproiet...@vmware.com]
>> Sent: Friday, July 22, 2016 3:49 PM
>> To: Flavio Leitner <f...@redhat.com>; Vinod, Chegu
>> <chegu_vi...@hpe.com>
>> Cc: kris...@redhat.com; ovs-dev <dev@openvswitch.org>; Loftus, Ciara
>> <ciara.lof...@intel.com>
>> Subject: Re: Backport Request: dpif-netdev: Remove PMD latency on
>> seq_mutex
>> 
>> I'm not sure I'm 100% comfortable back porting this to branch-2.5
>> 
>> I see the change more as a feature rather than a bugfix.
>> 
>> Also it touches some core modules (seq and rcu) in a non trivial way.
>> 
>> 
>> What do you guys think?
>> 
>> Thanks,
>> 
>> Daniele
>> 
>> On 22/07/2016 15:03, "Flavio Leitner" <f...@redhat.com> wrote:
>> 
>> >(adding ovs-dev mailing list and more people interesting on the
>> >backport to CC)
>> >
>> >On Mon, Jul 18, 2016 at 05:31:52AM +, Vinod, Chegu wrote:
>> >> Hi Flavio, Karl,
>> >>
>> >> Is there a version of the following fix available that is compatible with 
>> >> OVS
>> 2.5?
>> >>
>> >>
>> https://github.com/openvswitch/ovs/commit/9dede5cff553d7c4e074f04c52
>> 5
>> >> c1417eb209363
>> >>
>> >> If yes can it backported to the 2.5 branch ?
>> >
>> >branch-2.5 is affected by the same issue.  I tested the patch from
>> >branch master (cherry-pick) and it solved the issue.
>> >
>> >Daniele,
>> >
>> >What do you think? If you agree, do you need me to post the backported
>> >patch or is it enough for you to cherry-pick?
>> >
>> >Thanks,
>> >--
>> >fbl
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC 4/5] dpctl: uses open_type when calling netdev_open

2016-07-25 Thread Daniele Di Proietto
2016-07-25 9:57 GMT-07:00 Thadeu Lima de Souza Cascardo <casca...@redhat.com
>:

> On Fri, Jul 22, 2016 at 02:49:39PM -0700, Daniele Di Proietto wrote:
> > I would prefer if dpctl kept using the datapath types.  The translation
> > from database types to datapath type should happen in ofproto, dpctl is
> > supposed to be used to interact with the datapath directly.
> >
> > What do you guys think?
> >
> > The rest of the series looks good to me as well.
> >
> > Thanks,
> >
> > Daniele
> >
>
>
Hi Cascardo,

Thanks for the detailed analysis.  The problem is that there are three
types:

a) the database type
b) the port type in dpif-netdev
c) the netdev type

I was assuming that b and c are always equal, but they're not.  The only
case
when they're not equal is the "ovs-netdev" (or "ovs-dummy") port.

I think we can easily remove this case and make b and c always equal
with the following changes:

---8<---
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 787851d..effa7e0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -941,7 +941,9 @@ create_dp_netdev(const char *name, const struct
dpif_class *class,
 ovs_mutex_lock(>port_mutex);
 dp_netdev_set_nonpmd(dp);

-error = do_add_port(dp, name, "internal", ODPP_LOCAL);
+error = do_add_port(dp, name, dpif_netdev_port_open_type(dp->class,
+ "internal"),
+ODPP_LOCAL);
 ovs_mutex_unlock(>port_mutex);
 if (error) {
 dp_netdev_free(dp);
@@ -1129,7 +1131,7 @@ hash_port_no(odp_port_t port_no)
 }

 static int
-port_create(const char *devname, const char *open_type, const char *type,
+port_create(const char *devname, const char *type,
 odp_port_t port_no, struct dp_netdev_port **portp)
 {
 struct netdev_saved_flags *sf;
@@ -1142,7 +1144,7 @@ port_create(const char *devname, const char
*open_type, const char *type,
 *portp = NULL;

 /* Open and validate network device. */
-error = netdev_open(devname, open_type, );
+error = netdev_open(devname, type, );
 if (error) {
 return error;
 }
@@ -1233,8 +1235,7 @@ do_add_port(struct dp_netdev *dp, const char
*devname, const char *type,
 return EEXIST;
 }

-error = port_create(devname, dpif_netdev_port_open_type(dp->class,
type),
-type, port_no, );
+error = port_create(devname, type, port_no, );
 if (error) {
 return error;
 }
root@diproiettod-dev:~/ovs# git diff
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 787851d..effa7e0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -941,7 +941,9 @@ create_dp_netdev(const char *name, const struct
dpif_class *class,
 ovs_mutex_lock(>port_mutex);
 dp_netdev_set_nonpmd(dp);

-error = do_add_port(dp, name, "internal", ODPP_LOCAL);
+error = do_add_port(dp, name, dpif_netdev_port_open_type(dp->class,
+ "internal"),
+ODPP_LOCAL);
 ovs_mutex_unlock(>port_mutex);
 if (error) {
 dp_netdev_free(dp);
@@ -1129,7 +1131,7 @@ hash_port_no(odp_port_t port_no)
 }

 static int
-port_create(const char *devname, const char *open_type, const char *type,
+port_create(const char *devname, const char *type,
 odp_port_t port_no, struct dp_netdev_port **portp)
 {
 struct netdev_saved_flags *sf;
@@ -1142,7 +1144,7 @@ port_create(const char *devname, const char
*open_type, const char *type,
 *portp = NULL;

 /* Open and validate network device. */
-error = netdev_open(devname, open_type, );
+error = netdev_open(devname, type, );
 if (error) {
 return error;
 }
@@ -1233,8 +1235,7 @@ do_add_port(struct dp_netdev *dp, const char
*devname, const char *type,
 return EEXIST;
 }

-error = port_create(devname, dpif_netdev_port_open_type(dp->class,
type),
-type, port_no, );
+error = port_create(devname, type, port_no, );
 if (error) {
 return error;
 }
---8<---

With this incremental case 2 is covered, dpctl/show always shows the
datapath type.
(The incremental also requires some testsuite changes)

For case 1 and 3 is just a matter of deciding if we want to support database
types (in addition to netdev) in dpctl.  I would lean towards no: I find it
confusing
that both types are acceptable.

That said, I feel like I'm nitpicking, dpctl is a tool used for debugging
and I think
we should do whatever comes easier.

Thanks,

Daniele

Hi, Daniele.
>
> Thanks for the comment.
>
> The best example that breaks currently is the internal type on the netdev
> datapath.
>
> There are three scenarios in dpctl, and two of them are changed with this
> patch.
>
>

Re: [ovs-dev] releasing 2.6: branch Aug 1, release Sep 15

2016-07-25 Thread Daniele Di Proietto





On 25/07/2016 09:16, "Ben Pfaff"  wrote:

>On Mon, Jul 25, 2016 at 03:44:01PM +, Gray, Mark D wrote:
>> 
>> 
>> > -Original Message-
>> > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff
>> > Sent: Saturday, July 23, 2016 5:00 PM
>> > To: dev@openvswitch.org
>> > Subject: [ovs-dev] releasing 2.6: branch Aug 1, release Sep 15
>> > 
>> > The proposed Open vSwitch release schedule calls for branching 2.6 from
>> > master on Aug. 1, followed by a period of bug fixes and stabilization, with
>> > release on Sep. 15.  The proposed release schedule is posted here for
>> > review:
>> > https://patchwork.ozlabs.org/patch/650319/
>> > 
>> > I don't yet know of a reason to modify this schedule.
>> > 
>> > If you know of reasons to change it, now is an appropriate time to bring 
>> > it up
>> > for discussion.  In addition, if you have features planned for 2.6 that 
>> > risk
>> > hitting master somewhat late for the branch, it is also a good time to 
>> > bring
>> > these up for discussion, so that we can plan to backport them to the branch
>> > early on, or to delay the branch by a few days.
>> 
>> DPDK 16.07 should be released by early next week. Ciara has a patch to
>> enable it, could this be backported to the branch?
>
>I think that Daniele and Pravin should weigh in on that since they
>understand DPDK and its relationship to OVS much better.

I think we should use DPDK 16.07 for branch-2.6.  I hope to merge the patch
before we branch, otherwise we can always backport it.

Thanks,

Daniele
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2] ovs-lib: Keep internal interface ip during upgrade.

2016-07-22 Thread Daniele Di Proietto





On 22/07/2016 12:54, "Ben Pfaff" <b...@ovn.org> wrote:

>On Tue, Jun 21, 2016 at 07:27:30PM -0700, Daniele Di Proietto wrote:
>> Commit 9b5422a98f81("ovs-lib: Try to call exit before killing.")
>> introduced a problem where internal interfaces are destroyed and
>> recreated, losing their IP address.
>
>Acked-by: Ben Pfaff <b...@ovn.org>

Thanks for the review, pushed to master
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2] ovs-lib: Keep internal interface ip during upgrade.

2016-07-22 Thread Daniele Di Proietto
2016-07-22 16:02 GMT-07:00 Daniele Di Proietto <diproiet...@ovn.org>:

>
>
> 2016-06-22 9:52 GMT-07:00 Darrell Ball <dlu...@gmail.com>:
>
>> On Tue, Jun 21, 2016 at 7:27 PM, Daniele Di Proietto <
>> diproiet...@vmware.com
>> > wrote:
>>
>> > Commit 9b5422a98f81("ovs-lib: Try to call exit before killing.")
>> > introduced a problem where internal interfaces are destroyed and
>> > recreated, losing their IP address.
>> >
>> > Commit 9aad5a5a96ba("ovs-vswitchd: Preserve datapath ports across
>> > graceful shutdown.") fixed the problem by changing ovs-vswitchd
>> > to preserve the ports on `ovs-appctl exit`.  Unfortunately, this fix is
>> > not enough during upgrade from <= 2.5.0, where an old ovs-vswitchd is
>> > running (without the fix) and a new ovs-lib script is performing the
>> > restart.
>> >
>> > The problem seem to affect both RHEL and ubuntu.
>> >
>> > This commit fixes the upgrade by looking at the running daemon
>> > version and avoid using `ovs-appctl exit` if it's < 2.5.90.
>> >
>> > Suggested-by: Gurucharan Shetty <g...@ovn.org>
>> > Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>>
>>
>> 1) Is it normal in this code base to embed specific version numbers in a
>> generic library file ?
>>
>
> v1 of this patch had the check in ovs-ctl, but we thought that other
> daemons might be affected (every daemon used to be killed during restart),
> so we moved it to ovs-lib
>
>
>> 2) If coming from < 2.5.90 then the problem that Commit 9b5422a98f81 was
>> trying to fix
>> will exist ?
>>
>
> On < 2.5.90 the problem already existed on restart and with this patch it
> will show once more during upgrade.
>
>
>>
>>In general, do you need to document this somewhere at user level (
>> install.md or somewhere else) ?
>>
>
> IMHO this is a problem of the system scripts, and with this commit the
> system scripts are fixed, so that the user doesn't need to worry about this.
>
>
We discussed this offline and I misunderstood what you meant, sorry.

You suggested to document that the problem fixed by 9b5422a98f81 affects
also update.

We agreed that's not important given that in 2.5.90 we made some backwards
incompatible changes for DPDK.

Thanks,

Daniele
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2] ovs-lib: Keep internal interface ip during upgrade.

2016-07-22 Thread Daniele Di Proietto
2016-06-22 9:52 GMT-07:00 Darrell Ball <dlu...@gmail.com>:

> On Tue, Jun 21, 2016 at 7:27 PM, Daniele Di Proietto <
> diproiet...@vmware.com
> > wrote:
>
> > Commit 9b5422a98f81("ovs-lib: Try to call exit before killing.")
> > introduced a problem where internal interfaces are destroyed and
> > recreated, losing their IP address.
> >
> > Commit 9aad5a5a96ba("ovs-vswitchd: Preserve datapath ports across
> > graceful shutdown.") fixed the problem by changing ovs-vswitchd
> > to preserve the ports on `ovs-appctl exit`.  Unfortunately, this fix is
> > not enough during upgrade from <= 2.5.0, where an old ovs-vswitchd is
> > running (without the fix) and a new ovs-lib script is performing the
> > restart.
> >
> > The problem seem to affect both RHEL and ubuntu.
> >
> > This commit fixes the upgrade by looking at the running daemon
> > version and avoid using `ovs-appctl exit` if it's < 2.5.90.
> >
> > Suggested-by: Gurucharan Shetty <g...@ovn.org>
> > Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>
> 1) Is it normal in this code base to embed specific version numbers in a
> generic library file ?
>

v1 of this patch had the check in ovs-ctl, but we thought that other
daemons might be affected (every daemon used to be killed during restart),
so we moved it to ovs-lib


> 2) If coming from < 2.5.90 then the problem that Commit 9b5422a98f81 was
> trying to fix
> will exist ?
>

On < 2.5.90 the problem already existed on restart and with this patch it
will show once more during upgrade.


>
>In general, do you need to document this somewhere at user level (
> install.md or somewhere else) ?
>

IMHO this is a problem of the system scripts, and with this commit the
system scripts are fixed, so that the user doesn't need to worry about this.

Thanks,

Daniele
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Backport Request: dpif-netdev: Remove PMD latency on seq_mutex

2016-07-22 Thread Daniele Di Proietto
I'm not sure I'm 100% comfortable back porting this to branch-2.5

I see the change more as a feature rather than a bugfix.

Also it touches some core modules (seq and rcu) in a non trivial way.


What do you guys think?

Thanks,

Daniele

On 22/07/2016 15:03, "Flavio Leitner"  wrote:

>(adding ovs-dev mailing list and more people interesting on
>the backport to CC)
>
>On Mon, Jul 18, 2016 at 05:31:52AM +, Vinod, Chegu wrote:
>> Hi Flavio, Karl,
>> 
>> Is there a version of the following fix available that is compatible with 
>> OVS 2.5?
>> 
>> https://github.com/openvswitch/ovs/commit/9dede5cff553d7c4e074f04c525c1417eb209363
>> 
>> If yes can it backported to the 2.5 branch ?
>
>branch-2.5 is affected by the same issue.  I tested the patch from
>branch master (cherry-pick) and it solved the issue.
>
>Daniele,
>
>What do you think? If you agree, do you need me to post the
>backported patch or is it enough for you to cherry-pick?
>
>Thanks,
>-- 
>fbl
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH RFC v3 1/1] netdev-dpdk: Add support for DPDK 16.07

2016-07-22 Thread Daniele Di Proietto
Thanks for the patch.

I have another concern with this.  If we're still going to rely on RCU to
protect the vhost device (and as pointed out by Ilya, I think we should) we
need to use RCU-like semantics on the vid array index. I'm not sure a
boolean flag is going to be enough.

CCing Jarno:

We have this int, which is an index into an array of vhost devices (the
array is inside the DPDK library).  We want to make sure that when
ovsrcu_synchronize() returns nobody is using the old index anymore.

Should we introduce an RCU type for indexing into arrays?  I found some
negative opinions here:

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/Documentation/RCU/arrayRCU.txt?id=refs/tags/next-20160722#n13

but I think using atomics should prevent the compiler from playing tricks
with the index.

How about something like the code below?

Thanks,

Daniele


diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h
index dc75749..d1a57f6 100644
--- a/lib/ovs-rcu.h
+++ b/lib/ovs-rcu.h
@@ -130,6 +130,41 @@
 #include "compiler.h"
 #include "ovs-atomic.h"

+typedef struct { atomic_int v; } ovsrcu_int;
+
+static inline int ovsrcu_int_get__(const ovsrcu_int *i, memory_order order)
+{
+int ret;
+atomic_read_explicit(CONST_CAST(atomic_int *, >v), , order);
+return ret;
+}
+
+static inline int ovsrcu_int_get(const ovsrcu_int *i)
+{
+return ovsrcu_int_get__(i, memory_order_consume);
+}
+
+static inline int ovsrcu_int_get_protected(const ovsrcu_int *i)
+{
+return ovsrcu_int_get__(i, memory_order_relaxed);
+}
+
+static inline void ovsrcu_int_set__(ovsrcu_int *i, int value,
+memory_order order)
+{
+atomic_store_explicit(>v, value, order);
+}
+
+static inline void ovsrcu_int_set(ovsrcu_int *i, int value)
+{
+ovsrcu_int_set__(i, value, memory_order_release);
+}
+
+static inline void ovsrcu_int_set_protected(ovsrcu_int *i, int value)
+{
+ovsrcu_int_set__(i, value, memory_order_relaxed);
+}
+
 #if __GNUC__
 #define OVSRCU_TYPE(TYPE) struct { ATOMIC(TYPE) p; }
 #define OVSRCU_INITIALIZER(VALUE) { ATOMIC_VAR_INIT(VALUE) }



2016-07-22 8:55 GMT-07:00 Ciara Loftus :

> This commit introduces support for DPDK 16.07 and consequently breaks
> compatibility with DPDK 16.04.
>
> DPDK 16.07 introduces some changes to various APIs. These have been
> updated in OVS, including:
> * xstats API: changes to structure of xstats
> * vhost API:  replace virtio-net references with 'vid'
>
> Signed-off-by: Ciara Loftus 
> Tested-by: Maxime Coquelin 
>
> v3:
> - fixed style issues
> - fixed & simplified xstats frees
> - use xcalloc & free instead of rte_mzalloc & rte_free for stats
> - remove libnuma include
> - fixed & simplified vHost NUMA set
> - added flag to indicate device reconfigured at least once
> - re-add call to rcu synchronise in destroy_device
> - define IF_NAME_SZ and use instead of PATH_MAX
>
> v2:
> - rebase with DPDK rc2
> - rebase with OVS master
> - fix vhost cuse compilation
> ---
>  .travis/linux-build.sh   |   2 +-
>  INSTALL.DPDK-ADVANCED.md |   8 +-
>  INSTALL.DPDK.md  |  20 ++---
>  NEWS |   1 +
>  lib/netdev-dpdk.c| 220
> +++
>  5 files changed, 126 insertions(+), 125 deletions(-)
>
> diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
> index 065de39..1b3d43d 100755
> --- a/.travis/linux-build.sh
> +++ b/.travis/linux-build.sh
> @@ -68,7 +68,7 @@ fi
>
>  if [ "$DPDK" ]; then
>  if [ -z "$DPDK_VER" ]; then
> -DPDK_VER="16.04"
> +DPDK_VER="16.07"
>  fi
>  install_dpdk $DPDK_VER
>  if [ "$CC" = "clang" ]; then
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 9ae536d..ec1de29 100644
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -43,7 +43,7 @@ for DPDK and OVS.
>  For IVSHMEM case, set `export DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc`
>
>  ```
> -export DPDK_DIR=/usr/src/dpdk-16.04
> +export DPDK_DIR=/usr/src/dpdk-16.07
>  export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
>  make install T=$DPDK_TARGET DESTDIR=install
>  ```
> @@ -339,7 +339,7 @@ For users wanting to do packet forwarding using kernel
> stack below are the steps
> cd /usr/src/cmdline_generator
> wget
> https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/cmdline_generator/cmdline_generator.c
> wget
> https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/cmdline_generator/Makefile
> -   export RTE_SDK=/usr/src/dpdk-16.04
> +   export RTE_SDK=/usr/src/dpdk-16.07
> export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
> make
> ./build/cmdline_generator -m -p dpdkr0 XXX
> @@ -363,7 +363,7 @@ For users wanting to do packet forwarding using kernel

Re: [ovs-dev] [RFC 4/5] dpctl: uses open_type when calling netdev_open

2016-07-22 Thread Daniele Di Proietto
I would prefer if dpctl kept using the datapath types.  The translation
from database types to datapath type should happen in ofproto, dpctl is
supposed to be used to interact with the datapath directly.

What do you guys think?

The rest of the series looks good to me as well.

Thanks,

Daniele

2016-07-18 10:00 GMT-07:00 Thadeu Lima de Souza Cascardo <
casca...@redhat.com>:

> dpctl uses a user or database defined type when calling netdev_open.
> Instead, it
> should use the type from dpif_port_open_type. Otherwise, when using the
> internal
> type, it could open it with that type instead of the correct one, which
> would be
> tap or dummy.
> ---
>  lib/dpctl.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/lib/dpctl.c b/lib/dpctl.c
> index 003602a..f896161 100644
> --- a/lib/dpctl.c
> +++ b/lib/dpctl.c
> @@ -274,7 +274,8 @@ dpctl_add_if(int argc OVS_UNUSED, const char *argv[],
>  }
>  }
>
> -error = netdev_open(name, type, );
> +error = netdev_open(name, dpif_port_open_type(dpif_type(dpif),
> type),
> +);
>  if (error) {
>  dpctl_error(dpctl_p, error, "%s: failed to open network
> device",
>  name);
> @@ -356,7 +357,8 @@ dpctl_set_if(int argc, const char *argv[], struct
> dpctl_params *dpctl_p)
>  dpif_port_destroy(_port);
>
>  /* Retrieve its existing configuration. */
> -error = netdev_open(name, type, );
> +error = netdev_open(name, dpif_port_open_type(dpif_type(dpif),
> type),
> +);
>  if (error) {
>  dpctl_error(dpctl_p, error, "%s: failed to open network
> device",
>  name);
> @@ -558,10 +560,13 @@ show_dpif(struct dpif *dpif, struct dpctl_params
> *dpctl_p)
>  qsort(port_nos, n_port_nos, sizeof *port_nos, compare_port_nos);
>
>  for (int i = 0; i < n_port_nos; i++) {
> +const char *type;
>  if (dpif_port_query_by_number(dpif, port_nos[i], _port)) {
>  continue;
>  }
>
> +type = dpif_port_open_type(dpif_type(dpif), dpif_port.type);
> +
>  dpctl_print(dpctl_p, "\tport %u: %s",
>  dpif_port.port_no, dpif_port.name);
>
> @@ -570,7 +575,7 @@ show_dpif(struct dpif *dpif, struct dpctl_params
> *dpctl_p)
>
>  dpctl_print(dpctl_p, " (%s", dpif_port.type);
>
> -error = netdev_open(dpif_port.name, dpif_port.type, );
> +error = netdev_open(dpif_port.name, type, );
>  if (!error) {
>  struct smap config;
>
> @@ -603,7 +608,7 @@ show_dpif(struct dpif *dpif, struct dpctl_params
> *dpctl_p)
>  struct netdev_stats s;
>  int error;
>
> -error = netdev_open(dpif_port.name, dpif_port.type, );
> +error = netdev_open(dpif_port.name, type, );
>  if (error) {
>  dpctl_print(dpctl_p, ", open failed (%s)",
>  ovs_strerror(error));
> @@ -891,6 +896,7 @@ get_in_port_netdev_from_key(struct dpif *dpif, const
> struct ofpbuf *key)
>  struct dpif_port dpif_port;
>  odp_port_t port_no;
>  int error;
> +const char *type;
>
>  port_no = ODP_PORT_C(nl_attr_get_u32(in_port_nla));
>  error = dpif_port_query_by_number(dpif, port_no, _port);
> @@ -898,7 +904,8 @@ get_in_port_netdev_from_key(struct dpif *dpif, const
> struct ofpbuf *key)
>  goto out;
>  }
>
> -netdev_open(dpif_port.name, dpif_port.type, );
> +type = dpif_port_open_type(dpif_type(dpif), dpif_port.type);
> +netdev_open(dpif_port.name, type, );
>  dpif_port_destroy(_port);
>  }
>
> --
> 2.7.4
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH v2 1/1] netdev-dpdk: Add support for DPDK 16.07

2016-07-22 Thread Daniele Di Proietto
[...]

> @@ -1776,7 +1764,8 @@ netdev_dpdk_get_stats(const struct netdev
> > *netdev, struct netdev_stats *stats)
> >  netdev_dpdk_get_carrier(netdev, );
> >  ovs_mutex_lock(>mutex);
> >
> > -struct rte_eth_xstats *rte_xstats;
> > +struct rte_eth_xstat *rte_xstats;
> > +struct rte_eth_xstat_name *rte_xstats_names;
> >  int rte_xstats_len, rte_xstats_ret;
> >
> >  if (rte_eth_stats_get(dev->port_id, _stats)) {
> > @@ -1785,20 +1774,51 @@ netdev_dpdk_get_stats(const struct netdev
> > *netdev, struct netdev_stats *stats)
> >  return EPROTO;
> >  }
> >
> > -rte_xstats_len = rte_eth_xstats_get(dev->port_id, NULL, 0);
> > -if (rte_xstats_len > 0) {
> > -rte_xstats = dpdk_rte_mzalloc(sizeof(*rte_xstats) *
> rte_xstats_len);
> > -memset(rte_xstats, 0xff, sizeof(*rte_xstats) * rte_xstats_len);
> > -rte_xstats_ret = rte_eth_xstats_get(dev->port_id, rte_xstats,
> > -rte_xstats_len);
> > -if (rte_xstats_ret > 0 && rte_xstats_ret <= rte_xstats_len) {
> > -netdev_dpdk_convert_xstats(stats, rte_xstats,
> rte_xstats_ret);
> > -}
> > +/* Get length of statistics */
> > +rte_xstats_len = rte_eth_xstats_get_names(dev->port_id, NULL, 0);
> > +if (rte_xstats_len < 0) {
> > +VLOG_WARN("Cannot get XSTATS values for port: %i",
> dev->port_id);
> > +goto out;
> > +}
> > +/* Reserve memory for xstats names */
> > +rte_xstats_names = dpdk_rte_mzalloc(sizeof(*rte_xstats_names)
> > +* rte_xstats_len);
> > +if (rte_xstats_names == NULL) {
> >
> > Minor: how about !rte_xstats_names?
> >
> > +VLOG_WARN("Cannot allocate memory for XSTATS names for port %i",
> > +  dev->port_id);
> > +rte_free(rte_xstats_names);
> >
> > This rte_free seems unnecessary
> >
> > +goto out;
> > +}
> > +/* Reserve memory for xstats values */
> > +rte_xstats = dpdk_rte_mzalloc(sizeof(*rte_xstats) * rte_xstats_len);
> > +if (rte_xstats == NULL) {
> >
> > Minor: how about !rte_xstats?
> >
> > +VLOG_WARN("Cannot allocate memory for XSTATS values for port
> %i",
> > +  dev->port_id);
> >  rte_free(rte_xstats);
> >
> > This rte_free seems unnecessary.
> >
> > +goto out;
> >
> > I think here we would leak rte_xstats_names.
> >
> > +}
> > +/* Retreive xstats names */
> > +if (rte_xstats_len != rte_eth_xstats_get_names(dev->port_id,
> > +   rte_xstats_names,
> rte_xstats_len)) {
> > +VLOG_WARN("Cannot get XSTATS names for port: %i.",
> dev->port_id);
> > +rte_free(rte_xstats);
> > +rte_free(rte_xstats_names);
> > +goto out;
> > +}
> > +/* Retreive xstats values */
> > +memset(rte_xstats, 0xff, sizeof(*rte_xstats) * rte_xstats_len);
> > +rte_xstats_ret = rte_eth_xstats_get(dev->port_id, rte_xstats,
> > +rte_xstats_len);
> > +if (rte_xstats_ret > 0 && rte_xstats_ret <= rte_xstats_len) {
> > +netdev_dpdk_convert_xstats(stats, rte_xstats, rte_xstats_names,
> > +   rte_xstats_len);
> >
> > Who frees rte_xstats_names and rte_xstats?
> >
> >  } else {
> > -VLOG_WARN("Can't get XSTATS counters for port: %i.",
> dev->port_id);
> > +VLOG_WARN("Cannot get XSTATS values for port: %i.",
> dev->port_id);
> > +rte_free(rte_xstats);
> > +rte_free(rte_xstats_names);
> >  }
> >
> > +out:
> >
> > Perhaps it's easier to always free rte_xstats and rte_xstats_names here.
> I've fixed this in the v3 and moved the free as suggested.
>
> > Also, is there a reason not to use xcalloc() and free(), instead?
> > This is not the fastpath, as far as I understand there's no reason for
> it to be in
> > hugepages.
> Just following what was already there. But good point - I've changed this
> to use xcalloc and free.
>
>
Ok, thanks


> >
> >  stats->rx_packets = rte_stats.ipackets;
> >  stats->tx_packets = rte_stats.opackets;
> >  stats->rx_bytes = rte_stats.ibytes;
>

[...]


> > @@ -2233,26 +2250,27 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk
> > *dev)
> >   * A new virtio-net device is added to a vhost port.
> >   */
> >  static int
> > -new_device(struct virtio_net *virtio_dev)
> > +new_device(int vid)
> >  {
> >  struct netdev_dpdk *dev;
> >  bool exists = false;
> >  int newnode = 0;
> > -long err = 0;
> > +char ifname[PATH_MAX];
> > +
> > +rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
> >
> >  ovs_mutex_lock(_mutex);
> >  /* Add device to the vhost port with the same name as that passed
> down.
> > */
> >  LIST_FOR_EACH(dev, list_node, _list) {
> > -if (strncmp(virtio_dev->ifname, dev->vhost_id, IF_NAME_SZ) ==
> 0) {
> > -uint32_t qp_num = virtio_dev->virt_qp_nb;
> > +if 

<    1   2   3   4   5   6   7   8   9   10   >