from:"Jianbo Liu"

[dpdk-dev] Clarification for eth_driver changes

2016-11-10 Thread Jianbo Liu

Hi Thomas,

On 10 November 2016 at 16:58, Thomas Monjalon  
wrote:
> 2016-11-10 14:12, Shreyansh Jain:
>> On Thursday 10 November 2016 01:33 PM, Thomas Monjalon wrote:
>> > 2016-11-10 15:51, Jianbo Liu:
>> >> On 10 November 2016 at 15:26, Shreyansh Jain  
>> >> wrote:
>> >>> This is what the current outline of eth_driver is:
>> >>>
>> >>> ++
>> >>> | eth_driver |
>> >>> | +-+|
>> >>> | | rte_pci_driver  ||
>> >>> | | +--+||
>> >>> | | | rte_driver   |||
>> >>> | | |  name[]  |||
>> >>> | | |  ... |||
>> >>> | | +--+||
>> >>> | |  .probe ||
>> >>> | |  .remove||
>> >>> | |  ...||
>> >>> | +-+|
>> >>> |  .eth_dev_init |
>> >>> |  .eth_dev_uninit   |
>> >>> ++
>> >>>
>> >>> This is what I was thinking:
>> >>>
>> >>> +-++--+
>> >>> | rte_pci_driver  ||eth_driver|
>> >>> | +--+|   _|_struct rte_driver *p |
>> >>> | | rte_driver   <---/ | .eth_dev_init|
>> >>> | |  ... ||| .eth_dev_uninit  |
>> >>> | |  name||+--+
>> >>> | |||
>> >>> | +--+|
>> >>> |  |
>> >>> +-+
>> >>>
>> >>> ::Impact::
>> >>> Various drivers use the rte_pci_driver embedded in the eth_driver object 
>> >>> for
>> >>> device initialization.
>> >>>  == They assume that rte_pci_driver is directly embedded and hence simply
>> >>> dereference.
>> >>>  == e.g. eth_igb_dev_init() in drivers/net/e1000/igb_ethdev.c file
>> >>>
>> >>> With the above change, such drivers would have to access rte_driver and 
>> >>> then
>> >>> perform container_of to obtain their respective rte_xxx_driver.
>> >>>  == this would be useful in case there is a non-PCI driver
>> >>>
>> >>> ::Problem::
>> >>> I am not sure of reason as to why eth_driver embedded rte_pci_driver in
>> >>> first place - other than a convenient way to define it before PCI driver
>> >>> registration.
>> >>>
>> >>> As all the existing PMDs are impacted - am I missing something here in
>> >>> making the above change?
>> >>>
>> >>
>> >> How do you know eth_driver->p is pointing to a rte_pci_driver or 
>> >> rte_soc_driver?
>> >> Maybe you need to add a type/flag in rte_driver.
>> >
>> > Why do you need any bus information at ethdev level?
>>
>> AFAIK, we don't need it. Above text is not stating anything on that
>> grounds either, I think. Isn't it?
>
> No, I was replying to Jianbo.
> Anyway, David made a more interesting comment.

Indeed, no need as I checked the code.
It's not even a issue if using David's design.

Thanks!
Jianbo

[dpdk-dev] Clarification for eth_driver changes

2016-11-10 Thread Jianbo Liu

On 10 November 2016 at 15:26, Shreyansh Jain  wrote:
> Hello David, list,
>
> I need some help and clarification regarding some changes I am doing to
> cleanup the EAL code.
>
> There are some changes which should be done for eth_driver/rte_eth_device
> structures:
>
> 1. most obvious, eth_driver should be renamed to rte_eth_driver.
> 2. eth_driver currently has rte_pci_driver embedded in it
>  - there can be ethernet devices which are _not_ PCI
>  - in which case, this structure should be removed.
> 3. Similarly, rte_eth_dev has rte_pci_device which should be replaced with
> rte_device.
>
> This is what the current outline of eth_driver is:
>
> ++
> | eth_driver |
> | +-+|
> | | rte_pci_driver  ||
> | | +--+||
> | | | rte_driver   |||
> | | |  name[]  |||
> | | |  ... |||
> | | +--+||
> | |  .probe ||
> | |  .remove||
> | |  ...||
> | +-+|
> |  .eth_dev_init |
> |  .eth_dev_uninit   |
> ++
>
> This is what I was thinking:
>
> +-++--+
> | rte_pci_driver  ||eth_driver|
> | +--+|   _|_struct rte_driver *p |
> | | rte_driver   <---/ | .eth_dev_init|
> | |  ... ||| .eth_dev_uninit  |
> | |  name||+--+
> | |||
> | +--+|
> |  |
> +-+
>
> ::Impact::
> Various drivers use the rte_pci_driver embedded in the eth_driver object for
> device initialization.
>  == They assume that rte_pci_driver is directly embedded and hence simply
> dereference.
>  == e.g. eth_igb_dev_init() in drivers/net/e1000/igb_ethdev.c file
>
> With the above change, such drivers would have to access rte_driver and then
> perform container_of to obtain their respective rte_xxx_driver.
>  == this would be useful in case there is a non-PCI driver
>
> ::Problem::
> I am not sure of reason as to why eth_driver embedded rte_pci_driver in
> first place - other than a convenient way to define it before PCI driver
> registration.
>
> As all the existing PMDs are impacted - am I missing something here in
> making the above change?
>

How do you know eth_driver->p is pointing to a rte_pci_driver or rte_soc_driver?
Maybe you need to add a type/flag in rte_driver.

> Probably, similar is the case for rte_eth_dev.
>
> -
> Shreyansh

[dpdk-dev] [PATCH v7 11/21] eal/soc: implement probing of drivers

2016-11-10 Thread Jianbo Liu

On 10 November 2016 at 14:10, Shreyansh Jain  wrote:
> On Thursday 10 November 2016 09:00 AM, Jianbo Liu wrote:
>>
>> On 28 October 2016 at 20:26, Shreyansh Jain 
>> wrote:
>>>
>>> Each SoC PMD registers a set of callback for scanning its own bus/infra
>>> and
>>> matching devices to drivers when probe is called.
>>> This patch introduces the infra for calls to SoC scan on
>>> rte_eal_soc_init()
>>> and match on rte_eal_soc_probe().
>>>
>>> Patch also adds test case for scan and probe.
>>>
>>> Signed-off-by: Jan Viktorin 
>>> Signed-off-by: Shreyansh Jain 
>>> Signed-off-by: Hemant Agrawal 
>>> --
>>> v4:
>>>  - Update test_soc for descriptive test function names
>>>  - Comments over test functions
>>>  - devinit and devuninint --> probe/remove
>>>  - RTE_VERIFY at some places
>>> ---
>>>  app/test/test_soc.c | 205
>>> ++-
>>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   4 +
>>>  lib/librte_eal/common/eal_common_soc.c  | 213
>>> +++-
>>>  lib/librte_eal/common/include/rte_soc.h |  75 -
>>>  lib/librte_eal/linuxapp/eal/eal.c   |   5 +
>>>  lib/librte_eal/linuxapp/eal/eal_soc.c   |  21 ++-
>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |   4 +
>>>  7 files changed, 519 insertions(+), 8 deletions(-)
>>>
>> ..
>>
>>>  /**
>>> + * SoC device scan callback, called from rte_eal_soc_init.
>>> + * For various SoC, the bus on which devices are attached maynot be
>>> compliant
>>> + * to a standard platform (or platform bus itself). In which case, extra
>>> + * steps are implemented by PMD to scan over the bus and add devices to
>>> SoC
>>> + * device list.
>>> + */
>>> +typedef void (soc_scan_t)(void);
>>
>>
>> I'm still not sure about the purpose of soc_scan, and how to use it.
>
>
> For each device to be used by DPDK, which cannot be scanned/identified using
> the existing PCI/VDEV methods (sysfs/bus/pci), 'soc_scan_t' provides a way
> for driver to make those devices part of device lists.
>
> Ideally, 'scan' is not a function of a driver. It is a bus function - which
> is missing in this case.
>
>> If it's for each driver, it should at least struct rte_soc_driver * as
>> its parameter.
>
>
> Its for each driver - assuming that each non-PCI driver which implements it
> knows how to find devices which it can control (for example, special area in
> sysfs, or even platform bus).
>

Considering there are several drivers in a platform bus, each driver
call the scan function, like the rte_eal_soc_scan_platform_bus() you
implemented.
The first will add soc devices to the list, but the remaining calls
are redundant.

The other issue is adding the driver parameter. Do you need extra
information from driver to scan the bus?

>> If it's for each bus, why it is in rte_soc_driver?
>
>
> Short answer - lack of a better place. It should be in dev.h probably
> (rte_device/driver) but it would look out of place (as that represents PCI
> devices also which cannot implement it - all PCI devices are scanned in one
> go irrespective of driver)
>
>> I know you will implement bus driver in the future, but we need to
>> make it clear for current simplified implementation.
>
>
> Current implementation makes only a single assumption - that rather than
> relying on EAL for identifying devices (as being done now), next best option
> in existing framework (driver) should have control of finding devices.
>
> This is primarily to make the SoC work parallel to PCI implementation
> without much top-down changes in EAL.
>
> Bus model, improvises it by moving this implementation a little above in
> hierarchy - in rte_bus<-rte_driver<-PMD.
>
> I understand your apprehension - 'driver-scanning-for-devices' is indeed not
> correct real world analogy. It is just a place holder for enabling those
> drivers/PMDs which cannot work in absence of the right model.
> And that is still work in progress.
>
>
>>
>>> +
>>> +/**
>>> + * Custom device<=>driver match callback for SoC
>>> + * Unlike PCI, SoC devices don't have a fixed definition of device
>>> + * identification. PMDs can implement a specific matching function in
>>> which
>>> + * driver and device objects are provided to perform custom match.
>>> + */
>>> +typedef int (soc_match_t)(struct rte_soc_driver *, str

[dpdk-dev] [PATCH v7 06/21] eal/soc: introduce very essential SoC infra definitions

2016-11-10 Thread Jianbo Liu

On 28 October 2016 at 20:26, Shreyansh Jain  wrote:
> From: Jan Viktorin 
>
> Define initial structures and functions for the SoC infrastructure.
> This patch supports only a very minimal functions for now.
> More features will be added in the following commits.
>
> Includes rte_device/rte_driver inheritance of
> rte_soc_device/rte_soc_driver.
>
> Signed-off-by: Jan Viktorin 
> Signed-off-by: Shreyansh Jain 
> Signed-off-by: Hemant Agrawal 
> ---
>  app/test/Makefile   |   1 +
>  app/test/test_soc.c |  90 +
>  lib/librte_eal/common/Makefile  |   2 +-
>  lib/librte_eal/common/eal_private.h |   4 +
>  lib/librte_eal/common/include/rte_soc.h | 138 
> 
>  5 files changed, 234 insertions(+), 1 deletion(-)
>  create mode 100644 app/test/test_soc.c
>  create mode 100644 lib/librte_eal/common/include/rte_soc.h
>
...


> +/**
> + * Utility function to write a SoC device name, this device name can later be
> + * used to retrieve the corresponding rte_soc_addr using above functions.
> + *
> + * @param addr
> + * The SoC address
> + * @param output
> + * The output buffer string
> + * @param size
> + * The output buffer size
> + * @return
> + *  0 on success, negative on error.
> + */
> +static inline void
> +rte_eal_soc_device_name(const struct rte_soc_addr *addr,
> +   char *output, size_t size)
> +{
> +   int ret;
> +
> +   RTE_VERIFY(addr != NULL);
> +   RTE_VERIFY(size >= strlen(addr->name));

Is it better to use (size > strlen(addr->name)?

> +   ret = snprintf(output, size, "%s", addr->name);
> +   RTE_VERIFY(ret >= 0);
> +}
> +
> +static inline int
> +rte_eal_compare_soc_addr(const struct rte_soc_addr *a0,
> +const struct rte_soc_addr *a1)
> +{
> +   if (a0 == NULL || a1 == NULL)
> +   return -1;
> +
> +   RTE_VERIFY(a0->name != NULL);
> +   RTE_VERIFY(a1->name != NULL);
> +
> +   return strcmp(a0->name, a1->name);
> +}
> +
> +#endif
> --
> 2.7.4
>

[dpdk-dev] [PATCH v7 11/21] eal/soc: implement probing of drivers

2016-11-10 Thread Jianbo Liu

On 28 October 2016 at 20:26, Shreyansh Jain  wrote:
> Each SoC PMD registers a set of callback for scanning its own bus/infra and
> matching devices to drivers when probe is called.
> This patch introduces the infra for calls to SoC scan on rte_eal_soc_init()
> and match on rte_eal_soc_probe().
>
> Patch also adds test case for scan and probe.
>
> Signed-off-by: Jan Viktorin 
> Signed-off-by: Shreyansh Jain 
> Signed-off-by: Hemant Agrawal 
> --
> v4:
>  - Update test_soc for descriptive test function names
>  - Comments over test functions
>  - devinit and devuninint --> probe/remove
>  - RTE_VERIFY at some places
> ---
>  app/test/test_soc.c | 205 ++-
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   4 +
>  lib/librte_eal/common/eal_common_soc.c  | 213 
> +++-
>  lib/librte_eal/common/include/rte_soc.h |  75 -
>  lib/librte_eal/linuxapp/eal/eal.c   |   5 +
>  lib/librte_eal/linuxapp/eal/eal_soc.c   |  21 ++-
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |   4 +
>  7 files changed, 519 insertions(+), 8 deletions(-)
>
.

>  /**
> + * SoC device scan callback, called from rte_eal_soc_init.
> + * For various SoC, the bus on which devices are attached maynot be compliant
> + * to a standard platform (or platform bus itself). In which case, extra
> + * steps are implemented by PMD to scan over the bus and add devices to SoC
> + * device list.
> + */
> +typedef void (soc_scan_t)(void);

I'm still not sure about the purpose of soc_scan, and how to use it.
If it's for each driver, it should at least struct rte_soc_driver * as
its parameter.
If it's for each bus, why it is in rte_soc_driver?
I know you will implement bus driver in the future, but we need to
make it clear for current simplified implementation.

> +
> +/**
> + * Custom device<=>driver match callback for SoC
> + * Unlike PCI, SoC devices don't have a fixed definition of device
> + * identification. PMDs can implement a specific matching function in which
> + * driver and device objects are provided to perform custom match.
> + */
> +typedef int (soc_match_t)(struct rte_soc_driver *, struct rte_soc_device *);
> +
> +/**
>   * A structure describing a SoC driver.
>   */
>  struct rte_soc_driver {
> @@ -104,6 +120,8 @@ struct rte_soc_driver {
> struct rte_driver driver;  /**< Inherit core driver. */
> soc_probe_t *probe;/**< Device probe */
> soc_remove_t *remove;  /**< Device remove */
> +   soc_scan_t *scan_fn;   /**< Callback for scanning SoC 
> bus*/
> +   soc_match_t *match_fn; /**< Callback to match dev<->drv */
> const struct rte_soc_id *id_table; /**< ID table, NULL terminated */
>  };
>
> @@ -146,12 +164,63 @@ rte_eal_compare_soc_addr(const struct rte_soc_addr *a0,
>  }
>
>  /**
> + * Default function for matching the Soc driver with device. Each driver can
> + * either use this function or define their own soc matching function.
> + * This function relies on the compatible string extracted from sysfs. But,
> + * a SoC might have different way of identifying its devices. Such SoC can
> + * override match_fn.
> + *
> + * @return
> + *  0 on success
> + * -1 when no match found
> +  */
> +int
> +rte_eal_soc_match_compat(struct rte_soc_driver *drv,
> +struct rte_soc_device *dev);
> +
> +/**
> + * Probe SoC devices for registered drivers.
> + *
> + * @return
> + * 0 on success
> + * !0 in case of any failure in probe
> + */
> +int rte_eal_soc_probe(void);
> +
> +/**
> + * Probe the single SoC device.
> + */
> +int rte_eal_soc_probe_one(const struct rte_soc_addr *addr);
> +
> +/**
> + * Close the single SoC device.
> + *
> + * Scan the SoC devices and find the SoC device specified by the SoC
> + * address, then call the remove() function for registered driver
> + * that has a matching entry in its id_table for discovered device.
> + *
> + * @param addr
> + * The SoC address to close.
> + * @return
> + *   - 0 on success.
> + *   - Negative on error.
> + */
> +int rte_eal_soc_detach(const struct rte_soc_addr *addr);
> +
> +/**
>   * Dump discovered SoC devices.
> + *
> + * @param f
> + * File to dump device info in.
>   */
>  void rte_eal_soc_dump(FILE *f);
>
>  /**
>   * Register a SoC driver.
> + *
> + * @param driver
> + * Object for SoC driver to register
> + * @return void
>   */
>  void rte_eal_soc_register(struct rte_soc_driver *driver);
>
> @@ -167,6 +236,10 @@ RTE_PMD_EXPORT_NAME(nm, __COUNTER__)
>
>  /**
>   * Unregister a SoC driver.
> + *
> + * @param driver
> + * Object for SoC driver to unregister
> + * @return void
>   */
>  void rte_eal_soc_unregister(struct rte_soc_driver *driver);
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
> b/lib/librte_eal/linuxapp/eal/eal.c
> index 098ba02..bd775f3 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++

[dpdk-dev] [PATCH v7 08/21] eal/soc: implement SoC device list and dump

2016-11-10 Thread Jianbo Liu

On 28 October 2016 at 20:26, Shreyansh Jain  wrote:
> From: Jan Viktorin 
>
> SoC devices would be linked in a separate list (from PCI). This is used for
> probe function.
> A helper for dumping the device list is added.
>
> Signed-off-by: Jan Viktorin 
> Signed-off-by: Shreyansh Jain 
> Signed-off-by: Hemant Agrawal 
> ---
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  2 ++
>  lib/librte_eal/common/eal_common_soc.c  | 34 
> +
>  lib/librte_eal/common/include/rte_soc.h |  9 +++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  2 ++
>  4 files changed, 47 insertions(+)
>
> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
> b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> index cf6fb8e..86e3cfd 100644
> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> @@ -171,11 +171,13 @@ DPDK_16.11 {
> rte_eal_dev_attach;
> rte_eal_dev_detach;
> rte_eal_map_resource;
> +   rte_eal_soc_dump;
> rte_eal_soc_register;
> rte_eal_soc_unregister;
> rte_eal_unmap_resource;
> rte_eal_vdrv_register;
> rte_eal_vdrv_unregister;
> +   soc_device_list;
> soc_driver_list;
>
>  } DPDK_16.07;
> diff --git a/lib/librte_eal/common/eal_common_soc.c 
> b/lib/librte_eal/common/eal_common_soc.c
> index 56135ed..5dcddc5 100644
> --- a/lib/librte_eal/common/eal_common_soc.c
> +++ b/lib/librte_eal/common/eal_common_soc.c
> @@ -31,6 +31,8 @@
>   *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>   */
>
> +#include 
> +#include 
>  #include 
>
>  #include 
> @@ -40,6 +42,38 @@
>  /* Global SoC driver list */
>  struct soc_driver_list soc_driver_list =
> TAILQ_HEAD_INITIALIZER(soc_driver_list);
> +struct soc_device_list soc_device_list =
> +   TAILQ_HEAD_INITIALIZER(soc_device_list);
> +
> +/* dump one device */
> +static int
> +soc_dump_one_device(FILE *f, struct rte_soc_device *dev)
> +{
> +   int i;
> +
> +   fprintf(f, "%s", dev->addr.name);
> +   fprintf(f, " - fdt_path: %s\n",
> +   dev->addr.fdt_path ? dev->addr.fdt_path : "(none)");
> +
> +   for (i = 0; dev->id && dev->id[i].compatible; ++i)
> +   fprintf(f, "   %s\n", dev->id[i].compatible);
> +
> +   return 0;
> +}
> +
> +/* dump devices on the bus to an output stream */
> +void
> +rte_eal_soc_dump(FILE *f)
> +{
> +   struct rte_soc_device *dev = NULL;
> +
> +   if (!f)
> +   return;
> +
> +   TAILQ_FOREACH(dev, _device_list, next) {
> +   soc_dump_one_device(f, dev);
> +   }
> +}
>
>  /* register a driver */
>  void
> diff --git a/lib/librte_eal/common/include/rte_soc.h 
> b/lib/librte_eal/common/include/rte_soc.h
> index 23b06a9..347e611 100644
> --- a/lib/librte_eal/common/include/rte_soc.h
> +++ b/lib/librte_eal/common/include/rte_soc.h
> @@ -56,8 +56,12 @@ extern "C" {
>
>  extern struct soc_driver_list soc_driver_list;
>  /**< Global list of SoC Drivers */
> +extern struct soc_device_list soc_device_list;
> +/**< Global list of SoC Devices */
>
>  TAILQ_HEAD(soc_driver_list, rte_soc_driver); /**< SoC drivers in D-linked Q. 
> */
> +TAILQ_HEAD(soc_device_list, rte_soc_device); /**< SoC devices in D-linked Q. 
> */
> +
>
>  struct rte_soc_id {
> const char *compatible; /**< OF compatible specification */
> @@ -142,6 +146,11 @@ rte_eal_compare_soc_addr(const struct rte_soc_addr *a0,
>  }
>
>  /**
> + * Dump discovered SoC devices.
> + */
> +void rte_eal_soc_dump(FILE *f);

If it is to dump device information (not driver), is it proper to
rename it rte_eal_soc_device_dump()?

> +
> +/**
>   * Register a SoC driver.
>   */
>  void rte_eal_soc_register(struct rte_soc_driver *driver);
> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
> b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index ab6b985..0155025 100644
> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> @@ -175,11 +175,13 @@ DPDK_16.11 {
> rte_eal_dev_attach;
> rte_eal_dev_detach;
> rte_eal_map_resource;
> +   rte_eal_soc_dump;
> rte_eal_soc_register;
> rte_eal_soc_unregister;
> rte_eal_unmap_resource;
> rte_eal_vdrv_register;
> rte_eal_vdrv_unregister;
> +   soc_device_list;
> soc_driver_list;
>
>  } DPDK_16.07;
> --
> 2.7.4
>

[dpdk-dev] [PATCH v7 03/21] eal/linux: generalize PCI kernel unbinding driver to EAL

2016-11-10 Thread Jianbo Liu

On 28 October 2016 at 20:26, Shreyansh Jain  wrote:
> From: Jan Viktorin 
>
> Generalize the PCI-specific pci_unbind_kernel_driver. It is now divided
> into two parts. First, determination of the path and string identification
> of the device to be unbound. Second, the actual unbind operation which is
> generic.
>
> BSD implementation updated as ENOTSUP
>
> Signed-off-by: Jan Viktorin 
> Signed-off-by: Shreyansh Jain 
> --
> Changes since v2:
>  - update BSD support for unbind kernel driver
> ---
>  lib/librte_eal/bsdapp/eal/eal.c   |  7 +++
>  lib/librte_eal/bsdapp/eal/eal_pci.c   |  4 ++--
>  lib/librte_eal/common/eal_private.h   | 13 +
>  lib/librte_eal/linuxapp/eal/eal.c | 26 ++
>  lib/librte_eal/linuxapp/eal/eal_pci.c | 33 +
>  5 files changed, 57 insertions(+), 26 deletions(-)
>
> diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
> index 35e3117..5271fc2 100644
> --- a/lib/librte_eal/bsdapp/eal/eal.c
> +++ b/lib/librte_eal/bsdapp/eal/eal.c
> @@ -633,3 +633,10 @@ rte_eal_process_type(void)
>  {
> return rte_config.process_type;
>  }
> +
> +int
> +rte_eal_unbind_kernel_driver(const char *devpath __rte_unused,
> +const char *devid __rte_unused)
> +{
> +   return -ENOTSUP;
> +}
> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> index 7ed0115..703f034 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> @@ -89,11 +89,11 @@
>
>  /* unbind kernel driver for this device */
>  int
> -pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused)
> +pci_unbind_kernel_driver(struct rte_pci_device *dev)
>  {
> RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not implemented "
> "for BSD\n");
> -   return -ENOTSUP;
> +   return rte_eal_unbind_kernel_driver(dev);

Missing the second parameter for devid.

>  }
>
>  /* Map pci device */
> diff --git a/lib/librte_eal/common/eal_private.h 
> b/lib/librte_eal/common/eal_private.h
> index 9e7d8f6..b0c208a 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -256,6 +256,19 @@ int rte_eal_alarm_init(void);
>  int rte_eal_check_module(const char *module_name);
>
>  /**
> + * Unbind kernel driver bound to the device specified by the given devpath,
> + * and its string identification.
> + *
> + * @param devpath  path to the device directory ("/sys/.../devices/")
> + * @param devididentification of the device ()
> + *
> + * @return
> + *  -1  unbind has failed
> + *   0  module has been unbound
> + */
> +int rte_eal_unbind_kernel_driver(const char *devpath, const char *devid);
> +
> +/**
>   * Get cpu core_id.
>   *
>   * This function is private to the EAL.
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
> b/lib/librte_eal/linuxapp/eal/eal.c
> index 2075282..5f6676d 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -943,3 +943,29 @@ rte_eal_check_module(const char *module_name)
> /* Module has been found */
> return 1;
>  }
> +
> +int
> +rte_eal_unbind_kernel_driver(const char *devpath, const char *devid)
> +{
> +   char filename[PATH_MAX];
> +   FILE *f;
> +
> +   snprintf(filename, sizeof(filename),
> +"%s/driver/unbind", devpath);
> +
> +   f = fopen(filename, "w");
> +   if (f == NULL) /* device was not bound */
> +   return 0;
> +
> +   if (fwrite(devid, strlen(devid), 1, f) == 0) {
> +   RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__,
> +   filename);
> +   goto error;
> +   }
> +
> +   fclose(f);
> +   return 0;
> +error:
> +   fclose(f);
> +   return -1;
> +}
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index 876ba38..a03553f 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -59,38 +59,23 @@ int
>  pci_unbind_kernel_driver(struct rte_pci_device *dev)
>  {
> int n;
> -   FILE *f;
> -   char filename[PATH_MAX];
> -   char buf[BUFSIZ];
> +   char devpath[PATH_MAX];
> +   char devid[BUFSIZ];
> struct rte_pci_addr *loc = >addr;
>
> -   /* open /sys/bus/pci/devices/:BB:CC.D/driver */
> -   snprintf(filename, sizeof(filename),
> -   "%s/" PCI_PRI_FMT "/driver/unbind", pci_get_sysfs_path(),
> +   /* devpath /sys/bus/pci/devices/:BB:CC.D */
> +   snprintf(devpath, sizeof(devpath),
> +   "%s/" PCI_PRI_FMT, pci_get_sysfs_path(),
> loc->domain, loc->bus, loc->devid, loc->function);
>
> -   f = fopen(filename, "w");
> -   if (f == NULL) /* device was not bound */
> -   return 0;
> -
> -   n = snprintf(buf, sizeof(buf),

[dpdk-dev] [PATCH v3] doc: arm64: document DPDK application profiling methods

2016-11-08 Thread Jianbo Liu

On 8 November 2016 at 11:32, Jerin Jacob  
wrote:
> Signed-off-by: Jerin Jacob 
> Signed-off-by: John McNamara 
> ---
> v3:
> Fixed formatting issues:
> - Remove the introduction heading and put intro text under the main 
> heading(Thomas)
> - Fixed RST formatting issues such as enclosing technical terms in 
> backquotes(John)
> Thanks, John for providing the updated version
> v2:
> -Addressed ARM64 specific review comments(Suggested by Thomas)
> http://dpdk.org/dev/patchwork/patch/16362/
> ---
>  doc/guides/prog_guide/profile_app.rst | 64 
> ++-
>  1 file changed, 63 insertions(+), 1 deletion(-)
>

Acked-by: Jianbo Liu

[dpdk-dev] [PATCH 1/2] arch/arm: fix file descriptors leakage when getting CPU features

2016-11-04 Thread Jianbo Liu

Hi Jan,

On 4 November 2016 at 15:24,   wrote:
> Hello Jianbo Liu,
>
> thank you, a good catch!
>
> Can you please git blame for the commit introducing the issue and add
> the "Fixes:" tag as described in [1]?
>
> Same for ppc.
>

I will send v2 soon.

Thanks!


> Regards
> Jan
>
> [1] http://dpdk.org/doc/guides/contributing/patches.html#commit-messages-body
>
> On Fri,  4 Nov 2016 11:59:08 +0530
> Jianbo Liu  wrote:
>
>> Signed-off-by: Jianbo Liu 
>
> Acked-by: Jan Viktorin

[dpdk-dev] [PATCH v2 2/2] arch/ppc: fix file descriptor leakage when getting CPU features

2016-11-04 Thread Jianbo Liu

close the file descriptor after finish using it.

Fixes: 9ae15538 (eal/ppc: cpu flag checks for IBM Power)

Signed-off-by: Jianbo Liu 
---
 lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c 
b/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
index a8147c8..fcf96e0 100644
--- a/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
+++ b/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
@@ -116,6 +116,7 @@ rte_cpu_get_features(hwcap_registers_t out)
else if (auxv.a_type == AT_HWCAP2)
out[REG_HWCAP2] = auxv.a_un.a_val;
}
+   close(auxv_fd);
 }

 /*
-- 
2.4.11

[dpdk-dev] [PATCH v2 1/2] arch/arm: fix file descriptor leakage when getting CPU features

2016-11-04 Thread Jianbo Liu

close the file descriptor after finish using it.

Fixes: b94e5c94 (eal/arm: add CPU flags for ARMv7)
Fixes: 97523f82 (eal/arm: add CPU flags for ARMv8)

Signed-off-by: Jianbo Liu 
---
 lib/librte_eal/common/arch/arm/rte_cpuflags.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/arch/arm/rte_cpuflags.c 
b/lib/librte_eal/common/arch/arm/rte_cpuflags.c
index 23240ef..79160a6 100644
--- a/lib/librte_eal/common/arch/arm/rte_cpuflags.c
+++ b/lib/librte_eal/common/arch/arm/rte_cpuflags.c
@@ -148,6 +148,7 @@ rte_cpu_get_features(hwcap_registers_t out)
out[REG_PLATFORM] = 0x0001;
}
}
+   close(auxv_fd);
 }

 /*
-- 
2.4.11

[dpdk-dev] [PATCH 2/2] arch/ppc: fix file descriptors leakage when getting CPU features

2016-11-04 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c 
b/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
index a8147c8..fcf96e0 100644
--- a/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
+++ b/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
@@ -116,6 +116,7 @@ rte_cpu_get_features(hwcap_registers_t out)
else if (auxv.a_type == AT_HWCAP2)
out[REG_HWCAP2] = auxv.a_un.a_val;
}
+   close(auxv_fd);
 }

 /*
-- 
2.4.11

[dpdk-dev] [PATCH 1/2] arch/arm: fix file descriptors leakage when getting CPU features

2016-11-04 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 lib/librte_eal/common/arch/arm/rte_cpuflags.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/arch/arm/rte_cpuflags.c 
b/lib/librte_eal/common/arch/arm/rte_cpuflags.c
index 23240ef..79160a6 100644
--- a/lib/librte_eal/common/arch/arm/rte_cpuflags.c
+++ b/lib/librte_eal/common/arch/arm/rte_cpuflags.c
@@ -148,6 +148,7 @@ rte_cpu_get_features(hwcap_registers_t out)
out[REG_PLATFORM] = 0x0001;
}
}
+   close(auxv_fd);
 }

 /*
-- 
2.4.11

[dpdk-dev] [PATCH v7 0/7] vhost: optimize mergeable Rx path

2016-10-18 Thread Jianbo Liu

On 14 October 2016 at 17:34, Yuanhan Liu  wrote:
> This is a new set of patches to optimize the mergeable Rx code path.
> No refactoring (rewrite) was made this time. It just applies some
> findings from Zhihong (kudos to him!) that could improve the mergeable
> Rx path on the old code.
..

> ---
> Yuanhan Liu (4):
>   vhost: simplify mergeable Rx vring reservation
>   vhost: use last avail idx for avail ring reservation
>   vhost: prefetch avail ring
>   vhost: retrieve avail head once
>
> Zhihong Wang (3):
>   vhost: remove useless volatile
>   vhost: optimize cache access
>   vhost: shadow used ring update
>
>  lib/librte_vhost/vhost.c  |  13 ++-
>  lib/librte_vhost/vhost.h  |   5 +-
>  lib/librte_vhost/vhost_user.c |  23 +++--
>  lib/librte_vhost/virtio_net.c | 193 
> +-
>  4 files changed, 149 insertions(+), 85 deletions(-)
>

Reviewed-by: Jianbo Liu

[dpdk-dev] [PATCH v2 5/5] maintainers: claim i40e vector PMD on ARM

2016-10-14 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8f5fa82..621bda6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -151,6 +151,7 @@ F: lib/librte_acl/acl_run_neon.*
 F: lib/librte_lpm/rte_lpm_neon.h
 F: lib/librte_hash/rte*_arm64.h
 F: drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+F: drivers/net/i40e/i40e_rxtx_vec_neon.c
 F: drivers/net/virtio/virtio_rxtx_simple_neon.c

 EZchip TILE-Gx
-- 
2.4.11

[dpdk-dev] [PATCH v2 4/5] i40e: make vector driver filenames consistent

2016-10-14 Thread Jianbo Liu

To be consistent with the naming for ARM NEON implementation,
i40e_rxtx_vec.c is renamed to i40e_rxtx_vec_sse.c.

Signed-off-by: Jianbo Liu 
---
 drivers/net/i40e/Makefile | 4 ++--
 drivers/net/i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c} | 0
 2 files changed, 2 insertions(+), 2 deletions(-)
 rename drivers/net/i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c} (100%)

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 9e92b38..13085fb 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -100,7 +100,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_rxtx.c
 ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec_neon.c
 else
-SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec.c
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec_sse.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_pf.c
@@ -108,7 +108,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c

 # vector PMD driver needs SSE4.1 support
 ifeq ($(findstring RTE_MACHINE_CPUFLAG_SSE4_1,$(CFLAGS)),)
-CFLAGS_i40e_rxtx_vec.o += -msse4.1
+CFLAGS_i40e_rxtx_vec_sse.o += -msse4.1
 endif


diff --git a/drivers/net/i40e/i40e_rxtx_vec.c 
b/drivers/net/i40e/i40e_rxtx_vec_sse.c
similarity index 100%
rename from drivers/net/i40e/i40e_rxtx_vec.c
rename to drivers/net/i40e/i40e_rxtx_vec_sse.c
-- 
2.4.11

[dpdk-dev] [PATCH v2 3/5] i40e: enable i40e vector PMD on ARMv8a platform

2016-10-14 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 config/defconfig_arm64-armv8a-linuxapp-gcc | 1 -
 doc/guides/nics/features/i40e_vec.ini  | 1 +
 doc/guides/nics/features/i40e_vf_vec.ini   | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
b/config/defconfig_arm64-armv8a-linuxapp-gcc
index a0f4473..6321884 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -45,6 +45,5 @@ CONFIG_RTE_TOOLCHAIN_GCC=y
 CONFIG_RTE_EAL_IGB_UIO=n

 CONFIG_RTE_LIBRTE_FM10K_PMD=n
-CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n

 CONFIG_RTE_SCHED_VECTOR=n
diff --git a/doc/guides/nics/features/i40e_vec.ini 
b/doc/guides/nics/features/i40e_vec.ini
index 0953d84..edd6b71 100644
--- a/doc/guides/nics/features/i40e_vec.ini
+++ b/doc/guides/nics/features/i40e_vec.ini
@@ -37,3 +37,4 @@ Linux UIO= Y
 Linux VFIO   = Y
 x86-32   = Y
 x86-64   = Y
+ARMv8= Y
diff --git a/doc/guides/nics/features/i40e_vf_vec.ini 
b/doc/guides/nics/features/i40e_vf_vec.ini
index 2a44bf6..d6674f7 100644
--- a/doc/guides/nics/features/i40e_vf_vec.ini
+++ b/doc/guides/nics/features/i40e_vf_vec.ini
@@ -26,3 +26,4 @@ Linux UIO= Y
 Linux VFIO   = Y
 x86-32   = Y
 x86-64   = Y
+ARMv8= Y
-- 
2.4.11

[dpdk-dev] [PATCH v2 2/5] i40e: implement vector PMD for ARM architecture

2016-10-14 Thread Jianbo Liu

Use ARM NEON intrinsic to implement i40e vPMD

Signed-off-by: Jianbo Liu 
---
 drivers/net/i40e/Makefile |   4 +
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 614 ++
 2 files changed, 618 insertions(+)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_neon.c

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 53fe145..9e92b38 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -97,7 +97,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_dcb.c

 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_rxtx.c
+ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec_neon.c
+else
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec.c
+endif
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_pf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c
diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c 
b/drivers/net/i40e/i40e_rxtx_vec_neon.c
new file mode 100644
index 000..011c54e
--- /dev/null
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -0,0 +1,614 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2016, Linaro Limited
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "base/i40e_prototype.h"
+#include "base/i40e_type.h"
+#include "i40e_ethdev.h"
+#include "i40e_rxtx.h"
+#include "i40e_rxtx_vec_common.h"
+
+#include 
+
+#pragma GCC diagnostic ignored "-Wcast-qual"
+
+static inline void
+i40e_rxq_rearm(struct i40e_rx_queue *rxq)
+{
+   int i;
+   uint16_t rx_id;
+   volatile union i40e_rx_desc *rxdp;
+   struct i40e_rx_entry *rxep = >sw_ring[rxq->rxrearm_start];
+   struct rte_mbuf *mb0, *mb1;
+   uint64x2_t dma_addr0, dma_addr1;
+   uint64x2_t zero = vdupq_n_u64(0);
+   uint64_t paddr;
+   uint8x8_t p;
+
+   rxdp = rxq->rx_ring + rxq->rxrearm_start;
+
+   /* Pull 'n' more MBUFs into the software ring */
+   if (unlikely(rte_mempool_get_bulk(rxq->mp,
+ (void *)rxep,
+ RTE_I40E_RXQ_REARM_THRESH) < 0)) {
+   if (rxq->rxrearm_nb + RTE_I40E_RXQ_REARM_THRESH >=
+   rxq->nb_rx_desc) {
+   for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) {
+   rxep[i].mbuf = >fake_mbuf;
+   vst1q_u64((uint64_t *)[i].read, zero);
+   }
+   }
+   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_I40E_RXQ_REARM_THRESH;
+   return;
+   }
+
+   p = vld1_u8((uint8_t *)>mbuf_initializer);
+
+   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
+   for (i = 0; i < RTE_I40E_RXQ_REARM_THRESH; i += 2, rxep += 2) {
+   mb0 = rxep[0].mbuf;
+   mb1 = rxep[1].mbuf;
+
+/* Flush mbuf with pkt template.
+* Data to be rearmed is 6 bytes long.
+* Though, RX will overwrite ol_flags that are coming next
+* anyway

[dpdk-dev] [PATCH v2 1/5] i40e: extract non-x86 specific code from vector driver

2016-10-14 Thread Jianbo Liu

move scalar code which does not use x86 intrinsic functions to new file
"i40e_rxtx_vec_common.h", while keeping x86 code in i40e_rxtx_vec.c.
This allows the scalar code to to be shared among vector drivers for
different platforms.

Signed-off-by: Jianbo Liu 
---
 drivers/net/i40e/i40e_rxtx_vec.c| 196 +
 drivers/net/i40e/i40e_rxtx_vec_common.h | 251 
 2 files changed, 255 insertions(+), 192 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_common.h

diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c
index 0ee0241..3607312 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -39,6 +39,7 @@
 #include "base/i40e_type.h"
 #include "i40e_ethdev.h"
 #include "i40e_rxtx.h"
+#include "i40e_rxtx_vec_common.h"

 #include 

@@ -445,68 +446,6 @@ i40e_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return _recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }

-static inline uint16_t
-reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs,
-  uint16_t nb_bufs, uint8_t *split_flags)
-{
-   struct rte_mbuf *pkts[RTE_I40E_VPMD_RX_BURST]; /*finished pkts*/
-   struct rte_mbuf *start = rxq->pkt_first_seg;
-   struct rte_mbuf *end =  rxq->pkt_last_seg;
-   unsigned pkt_idx, buf_idx;
-
-   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
-   if (end != NULL) {
-   /* processing a split packet */
-   end->next = rx_bufs[buf_idx];
-   rx_bufs[buf_idx]->data_len += rxq->crc_len;
-
-   start->nb_segs++;
-   start->pkt_len += rx_bufs[buf_idx]->data_len;
-   end = end->next;
-
-   if (!split_flags[buf_idx]) {
-   /* it's the last packet of the set */
-   start->hash = end->hash;
-   start->ol_flags = end->ol_flags;
-   /* we need to strip crc for the whole packet */
-   start->pkt_len -= rxq->crc_len;
-   if (end->data_len > rxq->crc_len) {
-   end->data_len -= rxq->crc_len;
-   } else {
-   /* free up last mbuf */
-   struct rte_mbuf *secondlast = start;
-
-   while (secondlast->next != end)
-   secondlast = secondlast->next;
-   secondlast->data_len -= (rxq->crc_len -
-   end->data_len);
-   secondlast->next = NULL;
-   rte_pktmbuf_free_seg(end);
-   end = secondlast;
-   }
-   pkts[pkt_idx++] = start;
-   start = end = NULL;
-   }
-   } else {
-   /* not processing a split packet */
-   if (!split_flags[buf_idx]) {
-   /* not a split packet, save and skip */
-   pkts[pkt_idx++] = rx_bufs[buf_idx];
-   continue;
-   }
-   end = start = rx_bufs[buf_idx];
-   rx_bufs[buf_idx]->data_len += rxq->crc_len;
-   rx_bufs[buf_idx]->pkt_len += rxq->crc_len;
-   }
-   }
-
-   /* save the partial packet for next time */
-   rxq->pkt_first_seg = start;
-   rxq->pkt_last_seg = end;
-   memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
-   return pkt_idx;
-}
-
  /* vPMD receive routine that reassembles scattered packets
  * Notice:
  * - nb_pkts < RTE_I40E_DESCS_PER_LOOP, just return no packet
@@ -572,73 +511,6 @@ vtx(volatile struct i40e_tx_desc *txdp,
vtx1(txdp, *pkt, flags);
 }

-static inline int __attribute__((always_inline))
-i40e_tx_free_bufs(struct i40e_tx_queue *txq)
-{
-   struct i40e_tx_entry *txep;
-   uint32_t n;
-   uint32_t i;
-   int nb_free = 0;
-   struct rte_mbuf *m, *free[RTE_I40E_TX_MAX_FREE_BUF_SZ];
-
-   /* check DD bits on threshold descriptor */
-   if ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-   rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-   rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
-   return 0;
-
-   n = txq->tx_rs_thresh;
-
-/

[dpdk-dev] [PATCH v2 0/5] i40e: vector poll-mode driver on ARM64

2016-10-14 Thread Jianbo Liu

This patch set is to implement i40e vector PMD on ARM64.
For x86, vPMD is only reorganized, there should be no performance loss.

v1 -> v2
- rebase to dpdk-next-net/rel_16_11

Jianbo Liu (5):
  i40e: extract non-x86 specific code from vector driver
  i40e: implement vector PMD for ARM architecture
  i40e: enable i40e vector PMD on ARMv8a platform
  i40e: make vector driver filenames consistent
  maintainers: claim i40e vector PMD on ARM

 MAINTAINERS|   1 +
 config/defconfig_arm64-armv8a-linuxapp-gcc |   1 -
 doc/guides/nics/features/i40e_vec.ini  |   1 +
 doc/guides/nics/features/i40e_vf_vec.ini   |   1 +
 drivers/net/i40e/Makefile  |   8 +-
 drivers/net/i40e/i40e_rxtx_vec_common.h| 251 +
 drivers/net/i40e/i40e_rxtx_vec_neon.c  | 614 +
 .../i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c}  | 196 +--
 8 files changed, 878 insertions(+), 195 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_common.h
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_neon.c
 rename drivers/net/i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c} (78%)

-- 
2.4.11

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-13 Thread Jianbo Liu

Hi Thomas,

On 12 October 2016 at 23:31, Thomas Monjalon  
wrote:
> Sorry guys, you lost me in the discussion.
>
> Is there some regression only on ARM?
> Does it need some work specifically on memcpy for ARM,

I don't know if there is common way to improve memcpy on different ARM
hardware.  Even there is, it could take times.
I have tried do that using neon (like sse) instructions, but without success.

> or vhost for ARM?
> Who can work on ARM optimization?
>

[dpdk-dev] [PATCH 1/5] i40e: extract non-x86 specific code from vector driver

2016-10-13 Thread Jianbo Liu

On 12 October 2016 at 10:55, Zhang, Qi Z  wrote:
> Hi Jianbo
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianbo Liu
>> Sent: Wednesday, August 24, 2016 5:54 PM
>> To: Zhang, Helin ; Wu, Jingjing
>> ; jerin.jacob at caviumnetworks.com; dev at 
>> dpdk.org
>> Cc: Jianbo Liu 
>> Subject: [dpdk-dev] [PATCH 1/5] i40e: extract non-x86 specific code from 
>> vector
>> driver
>>
>> move scalar code which does not use x86 intrinsic functions to new file
>> "i40e_rxtx_vec_common.h", while keeping x86 code in i40e_rxtx_vec.c.
>> This allows the scalar code to to be shared among vector drivers for 
>> different
>> platforms.
>>
>> Signed-off-by: Jianbo Liu 
>> ---
...
>
> Should we rename the function "_40e_rx_queue_release_mbufs_vec" to
> "i40e_rx_queue_release_mbufs_vec_default", so functions be wrapped can follow 
> a consistent rule?

I think these two ways are different.
For func/_func, _func implements what func needs to do, they are same.
We needs _func inline, to be called in different ARCHs.
But for func/func_default, func_default is the default behavior, but
you can use or not-use it in func.

[dpdk-dev] [PATCH 2/5] i40e: implement vector PMD for ARM architecture

2016-10-13 Thread Jianbo Liu

On 12 October 2016 at 10:46, Zhang, Qi Z  wrote:
> Hi Jianbo:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianbo Liu
>> Sent: Wednesday, August 24, 2016 5:54 PM
>> To: Zhang, Helin ; Wu, Jingjing
>> ; jerin.jacob at caviumnetworks.com; dev at 
>> dpdk.org
>> Cc: Jianbo Liu 
>> Subject: [dpdk-dev] [PATCH 2/5] i40e: implement vector PMD for ARM
>> architecture
>>
>> Use ARM NEON intrinsic to implement i40e vPMD
>>
>> Signed-off-by: Jianbo Liu 
>> ---
>>  drivers/net/i40e/Makefile |   4 +
>>  drivers/net/i40e/i40e_rxtx_vec_neon.c | 581
>> ++
>>  2 files changed, 585 insertions(+)
>>  create mode 100644 drivers/net/i40e/i40e_rxtx_vec_neon.c
..
>
> ptype and bad checksum offload is enabled with below patches
> http://dpdk.org/dev/patchwork/patch/16394
> http://dpdk.org/dev/patchwork/patch/16395
> You may take a look to see if it's necessary to enable them for ARM also.
>

Yes, I'll update in the next version. Thanks!

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-10 Thread Jianbo Liu

On 10 October 2016 at 14:22, Wang, Zhihong  wrote:
>
>
>> -Original Message-----
>> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
>> Sent: Monday, October 10, 2016 1:32 PM
>> To: Yuanhan Liu 
>> Cc: Wang, Zhihong ; Maxime Coquelin
>> ; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
>>
>> On 10 October 2016 at 10:44, Yuanhan Liu 
>> wrote:
>> > On Sun, Oct 09, 2016 at 12:09:07PM +, Wang, Zhihong wrote:
>> >> > > > Tested with testpmd, host: txonly, guest: rxonly
>> >> > > > size (bytes) improvement (%)
>> >> > > > 644.12
>> >> > > > 128   6
>> >> > > > 256   2.65
>> >> > > > 512   -1.12
>> >> > > > 1024 -7.02
>> >> > >
>> >> > > There is a difference between Zhihong's code and the old I spotted in
>> >> > > the first time: Zhihong removed the avail_idx prefetch. I understand
>> >> > > the prefetch becomes a bit tricky when mrg-rx code path is
>> considered;
>> >> > > thus, I didn't comment on that.
>> >> > >
>> >> > > That's one of the difference that, IMO, could drop a regression. I 
>> >> > > then
>> >> > > finally got a chance to add it back.
>> >> > >
>> >> > > A rough test shows it improves the performance of 1400B packet size
>> >> > greatly
>> >> > > in the "txonly in host and rxonly in guest" case: +33% is the number I
>> get
>> >> > > with my test server (Ivybridge).
>> >> >
>> >> > Thanks Yuanhan! I'll validate this on x86.
>> >>
>> >> Hi Yuanhan,
>> >>
>> >> Seems your code doesn't perform correctly. I write a new version
>> >> of avail idx prefetch but didn't see any perf benefit.
>> >>
>> >> To be honest I doubt the benefit of this idea. The previous mrg_off
>> >> code has this method but doesn't give any benefits.
>> >
>> > Good point. I thought of that before, too. But you know that I made it
>> > in rush, that I didn't think further and test more.
>> >
>> > I looked the code a bit closer this time, and spotted a bug: the prefetch
>> > actually didn't happen, due to following code piece:
>> >
>> > if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) {
>> > prefetch_avail_idx(vq);
>> > ...
>> > }
>> >
>> > Since vq->next_avail_idx is set to 0 at the entrance of enqueue path,
>> > prefetch_avail_idx() will be called. The fix is easy though: just put
>> > prefetch_avail_idx before invoking enqueue_packet.
>> >
>> > In summary, Zhihong is right, I see no more gains with that fix :(
>> >
>> > However, as stated, that's kind of the only difference I found between
>> > yours and the old code, that maybe it's still worthwhile to have a
>> > test on ARM, Jianbo?
>> >
>> I haven't tested it, but I think it could be no improvement for ARM either.
>>
>> A smalll suggestion for enqueue_packet:
>>
>> .
>> +   /* start copy from mbuf to desc */
>> +   while (mbuf_avail || mbuf->next) {
>> .
>>
>> Considering pkt_len is in the first cache line (same as data_len),
>> while next pointer is in the second cache line,
>> is it better to check the total packet len, instead of the last mbuf's
>> next pointer to jump out of while loop and avoid possible cache miss?
>
> Jianbo,
>
> Thanks for the reply!
>
> This idea sounds good, but it won't help the general perf in my
> opinion, since the 2nd cache line is accessed anyway prior in
> virtio_enqueue_offload.
>
Yes, you are right. I'm thinking of prefetching beforehand.
And if it's a chained mbuf, virtio_enqueue_offload will not be called
in next loop.

> Also this would bring a NULL check when actually access mbuf->next.
>
> BTW, could you please publish the number of:
>
>  1. mrg_rxbuf=on, comparison between original and original + this patch
>
>  2. mrg_rxbuf=off, comparison between original and original + this patch
>
> So we can have a whole picture of how this patch impact on ARM platform.
>
I think you already have got many results in my previous emails.
Sorry I can't test right now and busy with other things.

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-10 Thread Jianbo Liu

On 10 October 2016 at 10:44, Yuanhan Liu  wrote:
> On Sun, Oct 09, 2016 at 12:09:07PM +, Wang, Zhihong wrote:
>> > > > Tested with testpmd, host: txonly, guest: rxonly
>> > > > size (bytes) improvement (%)
>> > > > 644.12
>> > > > 128   6
>> > > > 256   2.65
>> > > > 512   -1.12
>> > > > 1024 -7.02
>> > >
>> > > There is a difference between Zhihong's code and the old I spotted in
>> > > the first time: Zhihong removed the avail_idx prefetch. I understand
>> > > the prefetch becomes a bit tricky when mrg-rx code path is considered;
>> > > thus, I didn't comment on that.
>> > >
>> > > That's one of the difference that, IMO, could drop a regression. I then
>> > > finally got a chance to add it back.
>> > >
>> > > A rough test shows it improves the performance of 1400B packet size
>> > greatly
>> > > in the "txonly in host and rxonly in guest" case: +33% is the number I 
>> > > get
>> > > with my test server (Ivybridge).
>> >
>> > Thanks Yuanhan! I'll validate this on x86.
>>
>> Hi Yuanhan,
>>
>> Seems your code doesn't perform correctly. I write a new version
>> of avail idx prefetch but didn't see any perf benefit.
>>
>> To be honest I doubt the benefit of this idea. The previous mrg_off
>> code has this method but doesn't give any benefits.
>
> Good point. I thought of that before, too. But you know that I made it
> in rush, that I didn't think further and test more.
>
> I looked the code a bit closer this time, and spotted a bug: the prefetch
> actually didn't happen, due to following code piece:
>
> if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) {
> prefetch_avail_idx(vq);
> ...
> }
>
> Since vq->next_avail_idx is set to 0 at the entrance of enqueue path,
> prefetch_avail_idx() will be called. The fix is easy though: just put
> prefetch_avail_idx before invoking enqueue_packet.
>
> In summary, Zhihong is right, I see no more gains with that fix :(
>
> However, as stated, that's kind of the only difference I found between
> yours and the old code, that maybe it's still worthwhile to have a
> test on ARM, Jianbo?
>
I haven't tested it, but I think it could be no improvement for ARM either.

A smalll suggestion for enqueue_packet:

.
+   /* start copy from mbuf to desc */
+   while (mbuf_avail || mbuf->next) {
.

Considering pkt_len is in the first cache line (same as data_len),
while next pointer is in the second cache line,
is it better to check the total packet len, instead of the last mbuf's
next pointer to jump out of while loop and avoid possible cache miss?

[dpdk-dev] [PATCH v3 00/15] Introduce SoC device/driver framework for EAL

2016-09-18 Thread Jianbo Liu

On 18 September 2016 at 15:22, Jan Viktorin  wrote:
> On Sun, 18 Sep 2016 13:58:50 +0800
> Jianbo Liu  wrote:
>
>> On 9 September 2016 at 16:43, Shreyansh Jain  
>> wrote:
>> > Introduction:
>> > =
>> >
>> > This patch set is direct derivative of Jan's original series [1],[2].
>> >
>> >  - As this deviates substantially from original series, if need be I can
>> >post it as a separate patch rather than v2. Please suggest.
>> >  - Also, there are comments on original v1 ([4]) which are _not_
>> >incorporated in this series as they refer to section no more in new
>> >version.
>> >  - This v3 version is based on the rte_driver/device patchset v9 [10].
>> >That series introduced device structures (rte_driver/rte_device)
>> >generalizing devices into PCI, VDEV, XXX. For the purpose of this
>> >patchset, XXX=>SOC.
>
> [...]
>
>> >
>> > 5) Design considerations that are different from PCI:
>> >  - Each driver implements its own scan and match function. PCI uses the BDF
>> >format to read the device from sysfs, but this _may_not_ be a case for a
>> >SoC ethernet device.
>> >= This is an important change from initial proposal by Jan in [2]. 
>> > Unlike
>> >his attempt to use /sys/bus/platform, this patch relies on the PMD to
>>
>> It could be many redundant code if Each PMD driver has the scan
>> function if its own.
>> I think Jan's implementation is common to many platform drivers.
>
> I personally can find a use case for having a custom scan function.
> However, we should at least provide a default implementation. Probably,
> both the scan and match functions should be used to _override_ a default
> behaviour. So, only drivers that require to scan devices in a specific
> way would provide a custom function for this.
>
And for each platform/product

> I agree, that this can sometimes lead to code duplication. Moreover, it
> opens door for a very non-standard, unsecure and wrong-by-design
> approaches. I'd like more to provide one or more scan implementations
> in EAL and do not put this responsibility on PMDs.
>
>>
>> >detect the devices. This is because SoC may require specific or
>> >additional info for device detection. Further, SoC may have embedded
>
> Can you provide an example for "additional info for device detection"?
>
>>
>> Can you give us more precise definition about SoC driver? Does it
>> include the driver in ARM server?
>
> I am sorry but I don't understand this question.
>
> What you mean by a "driver in ARM server"? Do you mean a kernel driver?
>
> There is no "SoC driver" in the text so what definition are asking for?
>
This patchset introduces rte_soc_driver, which is inheriting from rte_driver.
I want to know what devices can use this SoC driver/device framework.
Is it for the devices from ARM servers, or embedded systems of
different vendors?
And this framework is too generalized, if we don't try to understand
"soc" in rte_soc_driver, we can use it for PCI devices. :)

Thanks!
Jianbo

[dpdk-dev] [PATCH v3 00/15] Introduce SoC device/driver framework for EAL

2016-09-18 Thread Jianbo Liu

On 9 September 2016 at 16:43, Shreyansh Jain  wrote:
> Introduction:
> =
>
> This patch set is direct derivative of Jan's original series [1],[2].
>
>  - As this deviates substantially from original series, if need be I can
>post it as a separate patch rather than v2. Please suggest.
>  - Also, there are comments on original v1 ([4]) which are _not_
>incorporated in this series as they refer to section no more in new
>version.
>  - This v3 version is based on the rte_driver/device patchset v9 [10].
>That series introduced device structures (rte_driver/rte_device)
>generalizing devices into PCI, VDEV, XXX. For the purpose of this
>patchset, XXX=>SOC.
>
> Aim:
> 
>
> As of now EAL is primarly focused on PCI initialization/probing.
>
>  rte_eal_init()
>   |- rte_eal_pci_init(): Find PCI devices from sysfs
>   |- ...
>   |- rte_eal_memzone_init()
>   |- ...
>   `- rte_eal_pci_probe(): Driver<=>Device initialization
>
> This patchset introduces SoC framework which would enable SoC drivers and
> drivers to be plugged into EAL, very similar to how PCI drivers/devices are
> done today.
>
> This is a stripped down version of PCI framework which allows the SoC PMDs
> to implement their own routines for detecting devices and linking devices to
> drivers.
>
> 1) Changes to EAL
>  rte_eal_init()
>   |- rte_eal_pci_init(): Find PCI devices from sysfs
>   |- rte_eal_soc_init(): Calls PMDs->scan_fn
>   |- ...
>   |- rte_eal_memzone_init()
>   |- ...
>   |- rte_eal_pci_probe(): Driver<=>Device initialization, PMD->devinit()
>   `- rte_eal_soc_probe(): Calls PMDs->match_fn and PMDs->devinit();
>
> 2) New device/driver structures:
>   - rte_soc_driver (inheriting rte_driver)
>   - rte_soc_device (inheriting rte_device)
>   - rte_eth_dev and eth_driver embedded rte_soc_device and rte_soc_driver,
> respectively.
>
> 3) The SoC PMDs need to:
>  - define rte_soc_driver with necessary scan and match callbacks
>  - Register themselves using DRIVER_REGISTER_SOC()
>  - Implement respective bus scanning in the scan callbacks to add necessary
>devices to SoC device list
>  - Implement necessary eth_dev_init/uninint for ethernet instances
>
> 4) Design considerations that are same as PCI:
>  - SoC initialization is being done through rte_eal_init(), just after PCI
>initialization is done.
>  - As in case of PCI, probe is done after rte_eal_pci_probe() to link the
>devices detected with the drivers registered.
>  - Device attach/detach functions are available and have been designed on
>the lines of PCI framework.
>  - PMDs register using DRIVER_REGISTER_SOC, very similar to
>DRIVER_REGISTER_PCI for PCI devices.
>  - Linked list of SoC driver and devices exists independent of the other
>driver/device list, but inheriting rte_driver/rte_driver, these are also
>part of a global list.
>
> 5) Design considerations that are different from PCI:
>  - Each driver implements its own scan and match function. PCI uses the BDF
>format to read the device from sysfs, but this _may_not_ be a case for a
>SoC ethernet device.
>= This is an important change from initial proposal by Jan in [2]. Unlike
>his attempt to use /sys/bus/platform, this patch relies on the PMD to

It could be many redundant code if Each PMD driver has the scan
function if its own.
I think Jan's implementation is common to many platform drivers.

>detect the devices. This is because SoC may require specific or
>additional info for device detection. Further, SoC may have embedded

Can you give us more precise definition about SoC driver? Does it
include the driver in ARM server?

>devices/MACs which require initialization which cannot be covered through
>sysfs parsing.

I think it can be done in devinit, not in scan function. devinit can
be different for each driver.

>= PCI based PMDs rely on EAL's capability to detect devices. This
>proposal puts the onus on PMD to detect devices, add to soc_device_list
>and wait for Probe. Matching, of device<=>driver is again PMD's callback.
>

[dpdk-dev] [PATCH 5/5] maintainers: claim i40e vector PMD on ARM

2016-08-24 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6536c6b..5d6ecba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -150,6 +150,7 @@ F: lib/librte_acl/acl_run_neon.*
 F: lib/librte_lpm/rte_lpm_neon.h
 F: lib/librte_hash/rte*_arm64.h
 F: drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+F: drivers/net/i40e/i40e_rxtx_vec_neon.c

 EZchip TILE-Gx
 M: Zhigang Lu 
-- 
2.4.11

[dpdk-dev] [PATCH 4/5] i40e: make vector driver filenames consistent

2016-08-24 Thread Jianbo Liu

To be consistent with the naming for ARM NEON implementation,
i40e_rxtx_vec.c is renamed to i40e_rxtx_vec_sse.c.

Signed-off-by: Jianbo Liu 
---
 drivers/net/i40e/Makefile | 4 ++--
 drivers/net/i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c} | 0
 2 files changed, 2 insertions(+), 2 deletions(-)
 rename drivers/net/i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c} (100%)

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 9e92b38..13085fb 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -100,7 +100,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_rxtx.c
 ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec_neon.c
 else
-SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec.c
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec_sse.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_pf.c
@@ -108,7 +108,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c

 # vector PMD driver needs SSE4.1 support
 ifeq ($(findstring RTE_MACHINE_CPUFLAG_SSE4_1,$(CFLAGS)),)
-CFLAGS_i40e_rxtx_vec.o += -msse4.1
+CFLAGS_i40e_rxtx_vec_sse.o += -msse4.1
 endif


diff --git a/drivers/net/i40e/i40e_rxtx_vec.c 
b/drivers/net/i40e/i40e_rxtx_vec_sse.c
similarity index 100%
rename from drivers/net/i40e/i40e_rxtx_vec.c
rename to drivers/net/i40e/i40e_rxtx_vec_sse.c
-- 
2.4.11

[dpdk-dev] [PATCH 3/5] i40e: enable i40e vector PMD on ARMv8a platform

2016-08-24 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 config/defconfig_arm64-armv8a-linuxapp-gcc | 1 -
 doc/guides/nics/features/i40e_vec.ini  | 1 +
 doc/guides/nics/features/i40e_vf_vec.ini   | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 1a17126..d10e1fd 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -46,6 +46,5 @@ CONFIG_RTE_EAL_IGB_UIO=n

 CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
-CONFIG_RTE_LIBRTE_I40E_PMD=n

 CONFIG_RTE_SCHED_VECTOR=n
diff --git a/doc/guides/nics/features/i40e_vec.ini 
b/doc/guides/nics/features/i40e_vec.ini
index 0953d84..edd6b71 100644
--- a/doc/guides/nics/features/i40e_vec.ini
+++ b/doc/guides/nics/features/i40e_vec.ini
@@ -37,3 +37,4 @@ Linux UIO= Y
 Linux VFIO   = Y
 x86-32   = Y
 x86-64   = Y
+ARMv8= Y
diff --git a/doc/guides/nics/features/i40e_vf_vec.ini 
b/doc/guides/nics/features/i40e_vf_vec.ini
index 2a44bf6..d6674f7 100644
--- a/doc/guides/nics/features/i40e_vf_vec.ini
+++ b/doc/guides/nics/features/i40e_vf_vec.ini
@@ -26,3 +26,4 @@ Linux UIO= Y
 Linux VFIO   = Y
 x86-32   = Y
 x86-64   = Y
+ARMv8= Y
-- 
2.4.11

[dpdk-dev] [PATCH 2/5] i40e: implement vector PMD for ARM architecture

2016-08-24 Thread Jianbo Liu

Use ARM NEON intrinsic to implement i40e vPMD

Signed-off-by: Jianbo Liu 
---
 drivers/net/i40e/Makefile |   4 +
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 581 ++
 2 files changed, 585 insertions(+)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_neon.c

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 53fe145..9e92b38 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -97,7 +97,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_dcb.c

 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_rxtx.c
+ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec_neon.c
+else
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec.c
+endif
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_pf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c
diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c 
b/drivers/net/i40e/i40e_rxtx_vec_neon.c
new file mode 100644
index 000..015fa9f
--- /dev/null
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -0,0 +1,581 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2016, Linaro Limited
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "base/i40e_prototype.h"
+#include "base/i40e_type.h"
+#include "i40e_ethdev.h"
+#include "i40e_rxtx.h"
+#include "i40e_rxtx_vec_common.h"
+
+#include 
+
+#pragma GCC diagnostic ignored "-Wcast-qual"
+
+static inline void
+i40e_rxq_rearm(struct i40e_rx_queue *rxq)
+{
+   int i;
+   uint16_t rx_id;
+   volatile union i40e_rx_desc *rxdp;
+   struct i40e_rx_entry *rxep = >sw_ring[rxq->rxrearm_start];
+   struct rte_mbuf *mb0, *mb1;
+   uint64x2_t dma_addr0, dma_addr1;
+   uint64x2_t zero = vdupq_n_u64(0);
+   uint64_t paddr;
+   uint8x8_t p;
+
+   rxdp = rxq->rx_ring + rxq->rxrearm_start;
+
+   /* Pull 'n' more MBUFs into the software ring */
+   if (unlikely(rte_mempool_get_bulk(rxq->mp,
+ (void *)rxep,
+ RTE_I40E_RXQ_REARM_THRESH) < 0)) {
+   if (rxq->rxrearm_nb + RTE_I40E_RXQ_REARM_THRESH >=
+   rxq->nb_rx_desc) {
+   for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) {
+   rxep[i].mbuf = >fake_mbuf;
+   vst1q_u64((uint64_t *)[i].read, zero);
+   }
+   }
+   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_I40E_RXQ_REARM_THRESH;
+   return;
+   }
+
+   p = vld1_u8((uint8_t *)>mbuf_initializer);
+
+   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
+   for (i = 0; i < RTE_I40E_RXQ_REARM_THRESH; i += 2, rxep += 2) {
+   mb0 = rxep[0].mbuf;
+   mb1 = rxep[1].mbuf;
+
+/* Flush mbuf with pkt template.
+* Data to be rearmed is 6 bytes long.
+* Though, RX will overwrite ol_flags that are coming next
+* anyway

[dpdk-dev] [PATCH 1/5] i40e: extract non-x86 specific code from vector driver

2016-08-24 Thread Jianbo Liu

move scalar code which does not use x86 intrinsic functions to new file
"i40e_rxtx_vec_common.h", while keeping x86 code in i40e_rxtx_vec.c.
This allows the scalar code to to be shared among vector drivers for
different platforms.

Signed-off-by: Jianbo Liu 
---
 drivers/net/i40e/i40e_rxtx_vec.c| 184 +---
 drivers/net/i40e/i40e_rxtx_vec_common.h | 239 
 2 files changed, 243 insertions(+), 180 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_common.h

diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c
index 51fb282..f847469 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -39,6 +39,7 @@
 #include "base/i40e_type.h"
 #include "i40e_ethdev.h"
 #include "i40e_rxtx.h"
+#include "i40e_rxtx_vec_common.h"

 #include 

@@ -421,68 +422,6 @@ i40e_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return _recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }

-static inline uint16_t
-reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs,
-  uint16_t nb_bufs, uint8_t *split_flags)
-{
-   struct rte_mbuf *pkts[RTE_I40E_VPMD_RX_BURST]; /*finished pkts*/
-   struct rte_mbuf *start = rxq->pkt_first_seg;
-   struct rte_mbuf *end =  rxq->pkt_last_seg;
-   unsigned pkt_idx, buf_idx;
-
-   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
-   if (end != NULL) {
-   /* processing a split packet */
-   end->next = rx_bufs[buf_idx];
-   rx_bufs[buf_idx]->data_len += rxq->crc_len;
-
-   start->nb_segs++;
-   start->pkt_len += rx_bufs[buf_idx]->data_len;
-   end = end->next;
-
-   if (!split_flags[buf_idx]) {
-   /* it's the last packet of the set */
-   start->hash = end->hash;
-   start->ol_flags = end->ol_flags;
-   /* we need to strip crc for the whole packet */
-   start->pkt_len -= rxq->crc_len;
-   if (end->data_len > rxq->crc_len) {
-   end->data_len -= rxq->crc_len;
-   } else {
-   /* free up last mbuf */
-   struct rte_mbuf *secondlast = start;
-
-   while (secondlast->next != end)
-   secondlast = secondlast->next;
-   secondlast->data_len -= (rxq->crc_len -
-   end->data_len);
-   secondlast->next = NULL;
-   rte_pktmbuf_free_seg(end);
-   end = secondlast;
-   }
-   pkts[pkt_idx++] = start;
-   start = end = NULL;
-   }
-   } else {
-   /* not processing a split packet */
-   if (!split_flags[buf_idx]) {
-   /* not a split packet, save and skip */
-   pkts[pkt_idx++] = rx_bufs[buf_idx];
-   continue;
-   }
-   end = start = rx_bufs[buf_idx];
-   rx_bufs[buf_idx]->data_len += rxq->crc_len;
-   rx_bufs[buf_idx]->pkt_len += rxq->crc_len;
-   }
-   }
-
-   /* save the partial packet for next time */
-   rxq->pkt_first_seg = start;
-   rxq->pkt_last_seg = end;
-   memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
-   return pkt_idx;
-}
-
  /* vPMD receive routine that reassembles scattered packets
  * Notice:
  * - nb_pkts < RTE_I40E_DESCS_PER_LOOP, just return no packet
@@ -548,73 +487,6 @@ vtx(volatile struct i40e_tx_desc *txdp,
vtx1(txdp, *pkt, flags);
 }

-static inline int __attribute__((always_inline))
-i40e_tx_free_bufs(struct i40e_tx_queue *txq)
-{
-   struct i40e_tx_entry *txep;
-   uint32_t n;
-   uint32_t i;
-   int nb_free = 0;
-   struct rte_mbuf *m, *free[RTE_I40E_TX_MAX_FREE_BUF_SZ];
-
-   /* check DD bits on threshold descriptor */
-   if ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-   rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-   rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
-   return 0;
-
-   n = txq->tx_rs_thresh;
-
-/

[dpdk-dev] [PATCH 0/5] i40e: vector poll-mode driver on ARM64

2016-08-24 Thread Jianbo Liu

This patch set is to implement i40e vector PMD on ARM64.
For x86, vPMD is only reorganized, there should be no performance loss.

Jianbo Liu (5):
  i40e: extract non-x86 specific code from vector driver
  i40e: implement vector PMD for ARM architecture
  i40e: enable i40e vector PMD on ARMv8a platform
  i40e: make vector driver filenames consistent
  maintainers: claim i40e vector PMD on ARM

 MAINTAINERS|   1 +
 config/defconfig_arm64-armv8a-linuxapp-gcc |   1 -
 doc/guides/nics/features/i40e_vec.ini  |   1 +
 doc/guides/nics/features/i40e_vf_vec.ini   |   1 +
 drivers/net/i40e/Makefile  |   8 +-
 drivers/net/i40e/i40e_rxtx_vec_common.h| 239 +
 drivers/net/i40e/i40e_rxtx_vec_neon.c  | 581 +
 .../i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c}  | 184 +--
 8 files changed, 833 insertions(+), 183 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_common.h
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_neon.c
 rename drivers/net/i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c} (78%)

-- 
2.4.11

[dpdk-dev] [PATCH v3] i40e: enable i40e pmd on ARM platform

2016-08-05 Thread Jianbo Liu

And add read memory barrier to avoid status inconsistency
between two RX descriptors readings.

Signed-off-by: Jianbo Liu 
---
 config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +-
 doc/guides/nics/features/i40e.ini  | 1 +
 drivers/net/i40e/i40e_rxtx.c   | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 1a17126..08f282b 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -46,6 +46,6 @@ CONFIG_RTE_EAL_IGB_UIO=n

 CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
-CONFIG_RTE_LIBRTE_I40E_PMD=n
+CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n

 CONFIG_RTE_SCHED_VECTOR=n
diff --git a/doc/guides/nics/features/i40e.ini 
b/doc/guides/nics/features/i40e.ini
index fb3fb60..0d143bc 100644
--- a/doc/guides/nics/features/i40e.ini
+++ b/doc/guides/nics/features/i40e.ini
@@ -45,3 +45,4 @@ Linux UIO= Y
 Linux VFIO   = Y
 x86-32   = Y
 x86-64   = Y
+ARMv8= Y
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 554d167..57825fb 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -994,6 +994,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
I40E_RXD_QW1_STATUS_SHIFT;
}

+   rte_smp_rmb();
+
/* Compute how many status bits were set */
for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
nb_dd += s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-- 
2.4.11

[dpdk-dev] [PATCH] i40e: enable i40e pmd on ARM platform

2016-08-03 Thread Jianbo Liu

On 3 August 2016 at 16:29, Ananyev, Konstantin
 wrote:
>
> Hi Jianbo,
>
>> > Hi, Jianbo
>> >
>> > I have tested you patch on my X86 platform,  the single core performance 
>> > for Non-vector PMD will have about 1Mpps drop
>> > Non-vector PMD single core performance with patch   :  ~33.9 
>> > Mpps
>> > Non-vector PMD single core performance without patch:  ~35.1 Mpps
>> > Is there any way to avoid such performance drop on X86? Thanks.
>> >
>>
>> I think we can place a compiling condition before rte_rmb() to avoid 
>> performance decrease on x86.
>> For example:  #if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>
> I suppose you can use rte_smp_rmb() here?

Great. Thank Konstantin.
I'll send v2.

Jianbo

[dpdk-dev] [PATCH] i40e: enable i40e pmd on ARM platform

2016-08-03 Thread Jianbo Liu

Hi Thomas,

On 3 August 2016 at 15:58, Thomas Monjalon  wrote:
> 2016-08-03 14:02, Jianbo Liu:
>> I think we can place a compiling condition before rte_rmb() to avoid
>> performance decrease on x86.
>> For example:  #if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>
> Please could you explain why a memory barrier would be needed on ARM but

The reason is that ARM is weealy ordered processor, and data access
will be executed out of order to improve performance.
In this case, we have to read 2 times, 8 descriptors each. The read
statuses could be wrong if no memory barrier.
I also got the outdated status for some descriptors in my testing.

> not on x86? What about other architectures?

I think Konstantin gave me a good solution, by using rte_smp_rmb :)

Jianbo

[dpdk-dev] [PATCH v2] i40e: enable i40e pmd on ARM platform

2016-08-03 Thread Jianbo Liu

And add read memory barrier to avoid status inconsistency
between two RX descriptors readings.

Signed-off-by: Jianbo Liu 
---
 config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +-
 doc/guides/nics/overview.rst   | 2 +-
 drivers/net/i40e/i40e_rxtx.c   | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 1a17126..08f282b 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -46,6 +46,6 @@ CONFIG_RTE_EAL_IGB_UIO=n

 CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
-CONFIG_RTE_LIBRTE_I40E_PMD=n
+CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n

 CONFIG_RTE_SCHED_VECTOR=n
diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index 6abbae6..5175591 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -138,7 +138,7 @@ Most of these differences are summarized below.
Linux VFIO Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y  
   Y   Y Y
Other kdrv Y Y  
 Y
ARMv7   
 Y Y Y
-   ARMv8  Y Y Y Y  
 Y Y   Y Y
+   ARMv8  Y   Y Y Y Y  
 Y Y   Y Y
Power8 Y Y  
 Y
TILE-Gx 
 Y
x86-32 Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y  
 Y   Y Y Y
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 554d167..19cfec4 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -994,6 +994,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
I40E_RXD_QW1_STATUS_SHIFT;
}

+   rte_rmb_smp();
+
/* Compute how many status bits were set */
for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
nb_dd += s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-- 
2.4.11

[dpdk-dev] [PATCH] i40e: enable i40e pmd on ARM platform

2016-08-03 Thread Jianbo Liu

On 3 August 2016 at 11:26, Yao, Lei A  wrote:
> Hi, Jianbo
>
> I have tested you patch on my X86 platform,  the single core performance for 
> Non-vector PMD will have about 1Mpps drop
> Non-vector PMD single core performance with patch   :  ~33.9 Mpps
> Non-vector PMD single core performance without patch:  ~35.1 Mpps
> Is there any way to avoid such performance drop on X86? Thanks.
>

I think we can place a compiling condition before rte_rmb() to avoid
performance decrease on x86.
For example:  #if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)

Thanks!
Jianbo

> BRs
> Lei
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianbo Liu
> Sent: Tuesday, August 2, 2016 2:58 PM
> To: dev at dpdk.org; Zhang, Helin ; Wu, Jingjing 
> 
> Cc: Jianbo Liu 
> Subject: [dpdk-dev] [PATCH] i40e: enable i40e pmd on ARM platform
>
> And add read memory barrier to avoid status inconsistency between two RX 
> descriptors readings.
>
> Signed-off-by: Jianbo Liu 
> ---
>  config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +-
>  doc/guides/nics/overview.rst   | 2 +-
>  drivers/net/i40e/i40e_rxtx.c   | 2 ++
>  3 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
> b/config/defconfig_arm64-armv8a-linuxapp-gcc
> index 1a17126..08f282b 100644
> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> @@ -46,6 +46,6 @@ CONFIG_RTE_EAL_IGB_UIO=n
>
>  CONFIG_RTE_LIBRTE_IVSHMEM=n
>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
> -CONFIG_RTE_LIBRTE_I40E_PMD=n
> +CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n
>
>  CONFIG_RTE_SCHED_VECTOR=n
> diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst 
> index 6abbae6..5175591 100644
> --- a/doc/guides/nics/overview.rst
> +++ b/doc/guides/nics/overview.rst
> @@ -138,7 +138,7 @@ Most of these differences are summarized below.
> Linux VFIO Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
>  Y   Y Y
> Other kdrv Y Y
>Y
> ARMv7 
>Y Y Y
> -   ARMv8  Y Y Y Y
>Y Y   Y Y
> +   ARMv8  Y   Y Y Y Y
>Y Y   Y Y
> Power8 Y Y
>Y
> TILE-Gx   
>Y
> x86-32 Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
>Y   Y Y Y
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c 
> index 554d167..4004b8e 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -994,6 +994,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> I40E_RXD_QW1_STATUS_SHIFT;
> }
>
> +   rte_rmb();
> +
> /* Compute how many status bits were set */
> for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
> nb_dd += s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);
> --
> 2.4.11
>

[dpdk-dev] [PATCH] i40e: enable i40e pmd on ARM platform

2016-08-02 Thread Jianbo Liu

And add read memory barrier to avoid status inconsistency
between two RX descriptors readings.

Signed-off-by: Jianbo Liu 
---
 config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +-
 doc/guides/nics/overview.rst   | 2 +-
 drivers/net/i40e/i40e_rxtx.c   | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 1a17126..08f282b 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -46,6 +46,6 @@ CONFIG_RTE_EAL_IGB_UIO=n

 CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
-CONFIG_RTE_LIBRTE_I40E_PMD=n
+CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n

 CONFIG_RTE_SCHED_VECTOR=n
diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index 6abbae6..5175591 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -138,7 +138,7 @@ Most of these differences are summarized below.
Linux VFIO Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y  
   Y   Y Y
Other kdrv Y Y  
 Y
ARMv7   
 Y Y Y
-   ARMv8  Y Y Y Y  
 Y Y   Y Y
+   ARMv8  Y   Y Y Y Y  
 Y Y   Y Y
Power8 Y Y  
 Y
TILE-Gx 
 Y
x86-32 Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y  
 Y   Y Y Y
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 554d167..4004b8e 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -994,6 +994,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
I40E_RXD_QW1_STATUS_SHIFT;
}

+   rte_rmb();
+
/* Compute how many status bits were set */
for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
nb_dd += s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-- 
2.4.11

[dpdk-dev] [PATCH v3 4/4] virtio: add neon support

2016-07-06 Thread Jianbo Liu

On 5 July 2016 at 20:49, Jerin Jacob  wrote:
> Added neon based Rx vector implementation.
> Selection of the new handler based neon availability at runtime.
> Updated the release notes and MAINTAINERS file.
>
> Signed-off-by: Jerin Jacob 
> ---
>  MAINTAINERS  |   1 +
>  doc/guides/rel_notes/release_16_07.rst   |   2 +
>  drivers/net/virtio/Makefile  |   2 +
>  drivers/net/virtio/virtio_rxtx.c |   3 +
>  drivers/net/virtio/virtio_rxtx_simple_neon.c | 235 
> +++
>  5 files changed, 243 insertions(+)
>  create mode 100644 drivers/net/virtio/virtio_rxtx_simple_neon.c
>

Acked-by: Jianbo Liu

[dpdk-dev] [PATCH 3/4] virtio: move SSE based Rx implementation to separate file

2016-06-28 Thread Jianbo Liu

On 27 June 2016 at 19:54, Jerin Jacob  wrote:
> split out SSE instruction based virtio simple rx
> implementation to a separate file
>
> Signed-off-by: Jerin Jacob 
> ---
>  drivers/net/virtio/virtio_rxtx_simple.c | 166 +---
>  drivers/net/virtio/virtio_rxtx_simple_sse.h | 225 
> 
>  2 files changed, 226 insertions(+), 165 deletions(-)
>  create mode 100644 drivers/net/virtio/virtio_rxtx_simple_sse.h
>
I think it's better to move sse implementation to a C file,
as Bruce pointed out at
http://www.dpdk.org/ml/archives/dev/2016-April/037937.html

Jianbo

[dpdk-dev] [PATCH] arm64: change rte_memcpy to inline function

2016-06-22 Thread Jianbo Liu

On 17 June 2016 at 18:30, Thomas Monjalon  wrote:
> 2016-05-19 17:56, Thomas Monjalon:
>> 2016-05-19 21:48, Jianbo Liu:
>> > On 13 May 2016 at 23:49, Thomas Monjalon  
>> > wrote:
>> > > 2016-05-10 14:01, Jianbo Liu:
>> > >> Other APP may call rte_memcpy by function pointer,
>> > >> so change it to an inline function.
>> > >
>> > > Any example in mind?
>> > >
>> > It's for ODP-DPDK.
>>
>> Given that ODP is open (dataplane), you should also consider ppc64 and tile.
>>
>> > >> --- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
>> > >> +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
>> > >> -#define rte_memcpy(d, s, n)  memcpy((d), (s), (n))
>> > >> +static inline void *
>> > >> +rte_memcpy(void *dst, const void *src, size_t n)
>> > >> +{
>> > >> + return memcpy(dst, src, n);
>> > >> +}
>> > >
>> > > It has no sense if other archs (arm32, ppc64, tile) are not updated.
>> > >
>> > But it also an inline function on x86.
>>
>> In x86, it was implemented as a function because there is some code.
>> If you want to make sure it is always a function, even in the case
>> of just calling memcpy from libc, you should put a doxygen comment in
>> the generic part and adapt every archs.
>
> no news?
> a v2 would be welcome

Hi Thomas,
Please close it, since there is already a solution to this issue in odp-dpdk.

Thanks!
Jianbo

[dpdk-dev] [PATCH] ixgbe: use rte_mbuf_prefetch_part2 for cacheline1 access

2016-06-20 Thread Jianbo Liu

On 17 June 2016 at 22:06, Jerin Jacob  wrote:
> made second cache line access behavior same as IA
>
> Signed-off-by: Jerin Jacob 
> ---
>  drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c 
> b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
> index 9c1d124..64a329e 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
> @@ -280,10 +280,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
> rte_mbuf **rx_pkts,
> vst1q_u64((uint64_t *)_pkts[pos + 2], mbp2);
>
> if (split_packet) {
> -   rte_prefetch_non_temporal(_pkts[pos]->cacheline1);
> -   rte_prefetch_non_temporal(_pkts[pos + 
> 1]->cacheline1);
> -   rte_prefetch_non_temporal(_pkts[pos + 
> 2]->cacheline1);
> -   rte_prefetch_non_temporal(_pkts[pos + 
> 3]->cacheline1);
> +   rte_mbuf_prefetch_part2(rx_pkts[pos]);
> +   rte_mbuf_prefetch_part2(rx_pkts[pos + 1]);
> +   rte_mbuf_prefetch_part2(rx_pkts[pos + 2]);
> +   rte_mbuf_prefetch_part2(rx_pkts[pos + 3]);
> }
>
>     /* D.1 pkt 3,4 convert format from desc to pktmbuf */
> --
> 2.5.5
>

Reviewed-by: Jianbo Liu

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-02 Thread Jianbo Liu

On 2 June 2016 at 15:10, Olivier MATZ  wrote:
> Hi Jianbo,
>
> On 06/01/2016 05:29 AM, Jianbo Liu wrote:
>>> enum rte_mbuf_prefetch_type {
>>> > PREFETCH0,
>>> > PREFETCH1,
>>> > ...
>>> > };
>>> >
>>> > static inline void
>>> > rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
>>> > struct rte_mbuf *m)
>>> > {
>>> > switch (type) {
>>> > case PREFETCH0:
>>> > rte_prefetch0(>cacheline0);
>>> > break;
>>> > case PREFETCH1:
>>> > rte_prefetch1(>cacheline0);
>>> > break;
>>> > ...
>>> > }
>>> >
>> How about adding these to forbid the illegal use of this macro?
>> enum rte_mbuf_prefetch_type {
>>  ENUM_prefetch0,
>>  ENUM_prefetch1,
>>  ...
>> };
>>
>> #define RTE_MBUF_PREFETCH_PART1(type, m) \
>> if (ENUM_##type == ENUM_prefretch0) \
>> rte_prefetch0(&(m)->cacheline0);   \
>> else if (ENUM_##type == ENUM_prefetch1) \
>> rte_prefetch1(&(m)->cacheline0); \
>> 
>>
>
> As Stephen stated, a static inline is better than a macro, mainly
> because it is understood by the compiler instead of beeing a dumb
> code replacement.
>
> Any reason why you would prefer a macro in that case?
>
For the simplicity reason. If not, we may have to write several
similar functions for different prefetchings.

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-02 Thread Jianbo Liu

On 1 June 2016 at 14:00, Jerin Jacob  wrote:
> On Wed, Jun 01, 2016 at 11:29:47AM +0800, Jianbo Liu wrote:
>> On 1 June 2016 at 03:28, Olivier MATZ  wrote:
>> > Hi Jianbo,
>> >
>> > On 05/31/2016 05:06 AM, Jianbo Liu wrote:
>> >> Change the inline function to macro with parameters
>> >>
>> >> Signed-off-by: Jianbo Liu 
>> >>
>> >> [...]
[...]
>> It's for performance consideration, and only on armv8a platform.
>
> Strictly it is not armv8 specific, IA also implemented this API with
> _MM_HINT_NTA hint.

I mean this patch is only for ixgbe vector PMD on armv8 platform.

>
> Do we really need non-temporal/transient version of prefetch for ixgbe?

Strictly speaking, we don't have to since we don't know how APPs use
the mbuf header.
But, is it high possibility that the second part is used only once or
short period because prefetching is done only when split_packet is not
NULL?

> If so, for x86 also it makes sense to keep it? Right?
>
> The primary use case for transient version would be use with pipe line
> line mode where the same cpu wont consume the packet.
>
> /**
>  * Prefetch a cache line into all cache levels (non-temporal/transient
>  * version)
>  *
>  * The non-temporal prefetch is intended as a prefetch hint that
>  * processor will
>  * use the prefetched data only once or short period, unlike the
>  * rte_prefetch0() function which imply that prefetched data to use
>  * repeatedly.
>  *
>  * @param p
>  *   Address to prefetch
>  */
> static inline void rte_prefetch_non_temporal(const volatile void *p);
>
>>
>> >
>> > By the way, I did not try to apply the patch, but it looks
>> > it's on top of dpdk-next-net/rel_16_07, right?
>> >
>> Yes

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-01 Thread Jianbo Liu

On 1 June 2016 at 03:28, Olivier MATZ  wrote:
> Hi Jianbo,
>
> On 05/31/2016 05:06 AM, Jianbo Liu wrote:
>> Change the inline function to macro with parameters
>>
>> Signed-off-by: Jianbo Liu 
>>
>> [...]
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -849,14 +849,15 @@ struct rte_mbuf {
>>   * in the receive path. If the cache line of the architecture is higher than
>>   * 64B, the second part will also be prefetched.
>>   *
>> + * @param method
>> + *   The prefetch method: prefetch0, prefetch1, prefetch2 or
>> + *prefetch_non_temporal.
>> + *
>>   * @param m
>>   *   The pointer to the mbuf.
>>   */
>> -static inline void
>> -rte_mbuf_prefetch_part1(struct rte_mbuf *m)
>> -{
>> - rte_prefetch0(>cacheline0);
>> -}
>> +#define RTE_MBUF_PREFETCH_PART1(method, m)   \
>> + rte_##method(&(m)->cacheline0)
>
> I'm not very fan of this macro, because it allows to
> really do everything):
>
>   RTE_MBUF_PREFETCH_PART1(pktmbuf_free, m)
>
> would expand as:
>
>   rte_pktmbuf_free(m)
>
>
> I'd prefer to have a switch case like this, almost similar
> to what Keith proposed in the initial discussion for my
> patch:
>
> enum rte_mbuf_prefetch_type {
> PREFETCH0,
> PREFETCH1,
> ...
> };
>
> static inline void
> rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
> struct rte_mbuf *m)
> {
> switch (type) {
> case PREFETCH0:
> rte_prefetch0(>cacheline0);
> break;
> case PREFETCH1:
> rte_prefetch1(>cacheline0);
> break;
> ...
> }
>
How about adding these to forbid the illegal use of this macro?
enum rte_mbuf_prefetch_type {
 ENUM_prefetch0,
 ENUM_prefetch1,
 ...
};

#define RTE_MBUF_PREFETCH_PART1(type, m) \
if (ENUM_##type == ENUM_prefretch0) \
rte_prefetch0(&(m)->cacheline0);   \
else if (ENUM_##type == ENUM_prefetch1) \
rte_prefetch1(&(m)->cacheline0); \


>
> Some questions: could you give some details about the use
> of non-temporal prefetch in ixgbe_vec_neon? What are the
> pros and cons, and would it be useful in other drivers?
> Currently all drivers are doing prefetch0 when they prefetch
> the mbuf structure. Some drivers use prefetch1 for data.
>
It's for performance consideration, and only on armv8a platform.

>
> By the way, I did not try to apply the patch, but it looks
> it's on top of dpdk-next-net/rel_16_07, right?
>
Yes

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-05-31 Thread Jianbo Liu

Change the inline function to macro with parameters

Signed-off-by: Jianbo Liu 
---
 drivers/net/fm10k/fm10k_rxtx_vec.c  |  8 
 drivers/net/i40e/i40e_rxtx_vec.c|  8 
 drivers/net/ixgbe/ixgbe_rxtx_vec.c  |  8 
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 12 
 drivers/net/mlx4/mlx4.c |  4 ++--
 drivers/net/mlx5/mlx5_rxtx.c|  4 ++--
 examples/ipsec-secgw/ipsec-secgw.c  |  2 +-
 lib/librte_mbuf/rte_mbuf.h  | 25 +
 8 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index ef256a5..0e4c91c 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -487,10 +487,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
rte_compiler_barrier();

if (split_packet) {
-   rte_mbuf_prefetch_part2(rx_pkts[pos]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 1]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 2]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 3]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 1]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 2]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 3]);
}

/* D.1 pkt 3,4 convert format from desc to pktmbuf */
diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c
index eef80d9..a5c4847 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -297,10 +297,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
_mm_storeu_si128((__m128i *)_pkts[pos+2], mbp2);

if (split_packet) {
-   rte_mbuf_prefetch_part2(rx_pkts[pos]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 1]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 2]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 3]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 1]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 2]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 3]);
}

/* avoid compiler reorder optimization */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index 09f4892..55adb56 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -308,10 +308,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
_mm_storeu_si128((__m128i *)_pkts[pos+2], mbp2);

if (split_packet) {
-   rte_mbuf_prefetch_part2(rx_pkts[pos]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 1]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 2]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 3]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 1]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 2]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 3]);
}

/* avoid compiler reorder optimization */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index 9c1d124..941b2d5 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -280,10 +280,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
vst1q_u64((uint64_t *)_pkts[pos + 2], mbp2);

if (split_packet) {
-   rte_prefetch_non_temporal(_pkts[pos]->cacheline1);
-   rte_prefetch_non_temporal(_pkts[pos + 
1]->cacheline1);
-   rte_prefetch_non_temporal(_pkts[pos + 
2]->cacheline1);
-   rte_prefetch_non_temporal(_pkts[pos + 
3]->cacheline1);
+   RTE_MBUF_PREFETCH_PART2(prefetch_non_temporal,
+   rx_pkts[pos]);
+   RTE_MBUF_PREFETCH_PART2(prefetch_non_temporal,
+   rx_pkts[pos + 1]);
+   RTE_MBUF_PREFETCH_PART2(prefetch_non_temporal,
+   rx_pkts[pos + 2]);
+   RTE_MBUF_PREFETCH_PART2(prefetch_non_temporal,
+   rx_

[dpdk-dev] [PATCH v3 0/4] ixgbe: enable ixgbe vector PMD on ARM

2016-05-27 Thread Jianbo Liu

On 25 May 2016 at 00:12, Bruce Richardson  wrote:
> On Tue, May 24, 2016 at 05:10:01PM +0100, Bruce Richardson wrote:
>> On Fri, May 06, 2016 at 11:55:44AM +0530, Jianbo Liu wrote:
>> > Implement ixgbe vPMD on ARM with NEON intrinsic.
>> >
>> > v3:
>> >  - rebase to rel_16_07 branch on dpdk-next-net.
>> >
>> > v2:
>> >  - move the common code to new header file.
>> >
>> > Jianbo Liu (4):
>> >   ixgbe: rearrange vector PMD code for x86
>> >   ixgbe: implement vector PMD for arm architecture
>> >   ixgbe: enable ixgbe vector PMD on ARMv8a platform
>> >   maintainers: claim responsibility for ixgbe vector PMD on ARM
>> >
>> Acked-by: Bruce Richardson 
>>
> Applied to dpdk-next-net/rel_16_07
>
> Jianbo, I've fixed some checkpatch issues in patch 2, and updated the NIC 
> features
> overview table as part of patch 3 when applying them. Please verify all is ok
> with you on the 16.07 branch, since I don't have ARM platforms to check things
> on.
>
Thanks Bruce.
No need to change that list. I have verified the ixgbe VF PMD and vPMD
on ARMv8a platform.

Jianbo

[dpdk-dev] [PATCH v3 2/4] ixgbe: implement vector PMD for arm architecture

2016-05-26 Thread Jianbo Liu

On 25 May 2016 at 20:29, Jerin Jacob  wrote:
> On Fri, May 06, 2016 at 11:55:46AM +0530, Jianbo Liu wrote:
>> use ARM NEON intrinsic to implement ixgbe vPMD
>>
>> Signed-off-by: Jianbo Liu 
>> ---
>>  drivers/net/ixgbe/Makefile  |   4 +
>>  drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 561 
>> 
>>  2 files changed, 565 insertions(+)
>>  create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c

>> + /* Read desc statuses backwards to avoid race condition */
>> + /* A.1 load 4 pkts desc */
>> + descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
>> + rte_rmb();
>
> Any specific reason to add rte_rmb() here, If there is no performance
> drop then it makes sense to add before descs[3] uses it.i.e
> at rte_compiler_barrier() place in x86 code.
>
To avoid desc statuses inconsistent since they are read backwards.

>> +
>> + /* B.2 copy 2 mbuf point into rx_pkts  */
>> + vst1q_u64((uint64_t *)_pkts[pos], mbp1);
>> +
>> + /* B.1 load 1 mbuf point */
>> + mbp2 = vld1q_u64((uint64_t *)_ring[pos + 2]);
>> +
>> + descs[2] =  vld1q_u64((uint64_t *)(rxdp + 2));
>> + /* B.1 load 2 mbuf point */
>> + descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
>> + descs[0] =  vld1q_u64((uint64_t *)(rxdp));
>> +
>> + /* B.2 copy 2 mbuf point into rx_pkts  */
>> + vst1q_u64((uint64_t *)_pkts[pos + 2], mbp2);
>> +
>> + if (split_packet) {
>> + rte_prefetch_non_temporal(_pkts[pos]->cacheline1);
>> + rte_prefetch_non_temporal(_pkts[pos+1]->cacheline1);
>> + rte_prefetch_non_temporal(_pkts[pos+2]->cacheline1);
>> + rte_prefetch_non_temporal(_pkts[pos+3]->cacheline1);
>
> replace with rte_mbuf_prefetch_part2 or equivalent
>
rte_mbuf_prefetch_part2 is new functions after this patchset, so it's
better to submit a new patch as Bruce said.

[dpdk-dev] [PATCH] arm64: change rte_memcpy to inline function

2016-05-19 Thread Jianbo Liu

On 13 May 2016 at 23:49, Thomas Monjalon  wrote:
> 2016-05-10 14:01, Jianbo Liu:
>> Other APP may call rte_memcpy by function pointer,
>> so change it to an inline function.
>
> Any example in mind?
>
It's for ODP-DPDK.
>> --- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
>> +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
>> -#define rte_memcpy(d, s, n)  memcpy((d), (s), (n))
>> +static inline void *
>> +rte_memcpy(void *dst, const void *src, size_t n)
>> +{
>> + return memcpy(dst, src, n);
>> +}
>
> It has no sense if other archs (arm32, ppc64, tile) are not updated.
>
But it also an inline function on x86.
Sorry for my late reply...

[dpdk-dev] [PATCHv3 1/2] config/armv8a: disable igb_uio

2016-05-13 Thread Jianbo Liu

On 13 May 2016 at 15:47, Jerin Jacob  wrote:
> On Fri, May 13, 2016 at 03:37:01AM +, Hemant Agrawal wrote:
>>
>>
>> > -Original Message-----
>> > From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
>> > Sent: Friday, May 13, 2016 7:13 AM
>> > To: Santosh Shukla 
>> > Cc: Stephen Hemminger ; Jerin Jacob
>> > ; Hemant Agrawal
>> > ; dev at dpdk.org; Thomas Monjalon
>> > 
>> > Subject: Re: [dpdk-dev] [PATCHv3 1/2] config/armv8a: disable igb_uio
>> >
>> > On 12 May 2016 at 18:31, Santosh Shukla
>> >  wrote:
>> > > On Thu, May 12, 2016 at 05:52:54PM +0800, Jianbo Liu wrote:
>> > >> On 12 May 2016 at 16:57, Santosh Shukla
>> > >>  wrote:
>> > >> > On Thu, May 12, 2016 at 01:54:13PM +0800, Jianbo Liu wrote:
>> > >> >> On 12 May 2016 at 13:06, Santosh Shukla
>> > >> >>  wrote:
>> > >> >> > On Thu, May 12, 2016 at 11:42:26AM +0800, Jianbo Liu wrote:
>> > >> >> >> On 12 May 2016 at 11:17, Santosh Shukla
>> > >> >> >>  wrote:
>> > >> >> >> > On Thu, May 12, 2016 at 10:01:05AM +0800, Jianbo Liu wrote:
>> > >> >> >> >> On 12 May 2016 at 02:25, Stephen Hemminger
>> >  wrote:
>> > >> >> >> >> > On Wed, 11 May 2016 22:32:16 +0530 Jerin Jacob
>> > >> >> >> >> >  wrote:
>> > >> >> >> >> >
>> > >> >> >> >> >> On Wed, May 11, 2016 at 08:22:59AM -0700, Stephen
>> > Hemminger wrote:
>> > >> >> >> >> >> > On Wed, 11 May 2016 19:17:58 +0530 Hemant Agrawal
>> > >> >> >> >> >> >  wrote:
>> > >> >> >> >> >> >
>> > >> >> >> >> >> > > IGB_UIO not supported for arm64 arch in kernel so 
>> > >> >> >> >> >> > > disable.
>> > >> >> >> >> >> > >
>> > >> >> >> >> >> > > Signed-off-by: Hemant Agrawal
>> > >> >> >> >> >> > > 
>> > >> >> >> >> >> > > Reviewed-by: Santosh Shukla
>> > >> >> >> >> >> > > 
>> > >> >> >> >> >> >
>> > >> >> >> >> >> > Really, I have use IGB_UIO on ARM64
>> > >> >> >> >> >>
>> > >> >> >> >> >> May I know what is the technical use case for igb_uio on
>> > >> >> >> >> >> arm64 which cannot be addressed through vfio or vfioionommu.
>> > >> >> >> >> >
>> > >> >> >> >> > I was running on older kernel which did not support 
>> > >> >> >> >> > vfioionommu
>> > mode.
>> > >> >> >> >>
>> > >> >> >> >> As I said, most of DPDK developers are not kernel
>> > >> >> >> >> developers. They may have their own kernel tree, and
>> > >> >> >> >> couldn't like to upgrade to latest kernel.
>> > >> >> >> >> They can choose to use or not use igb_uio when binding the
>> > >> >> >> >> driver. But blindly disabling it in the base config seems 
>> > >> >> >> >> unreasonable.
>> > >> >> >> >
>> > >> >> >> > if user keeping his own kernel so they could also keep
>> > >> >> >> > IGB_UIO=y in their local
>> > >> >> >> Most likely they don't have local dpdk tree. They write their
>> > >> >> >> own applications, complie and link to dpdk lib, then done.
>> > >> >> >>
>> > >> >> >> > dpdk tree. Why are you imposing user-x custome depedancy on
>> > >> >> >> > upstream dpdk base
>> > >> >> >> Customer requiremnts is important. I want they can choose the way
>> > they like.
>> > >> >> >>
>> > >> >> >
>> > >> >> > so you choose to keep igb_uio option, provided arch doesn't 
>> > >> >> > support

[dpdk-dev] [PATCH v1 09/28] eal: introduce --no-soc option

2016-05-13 Thread Jianbo Liu

On 6 May 2016 at 21:47, Jan Viktorin  wrote:
> This option has the same meaning for the SoC infra as the --no-pci
> for the PCI infra.
>
> Signed-off-by: Jan Viktorin 
> ---
>  lib/librte_eal/common/eal_common_options.c | 5 +
>  lib/librte_eal/common/eal_internal_cfg.h   | 1 +
>  lib/librte_eal/common/eal_options.h| 2 ++
>  3 files changed, 8 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_options.c 
> b/lib/librte_eal/common/eal_common_options.c
> index 3efc90f..09d64f7 100644
> --- a/lib/librte_eal/common/eal_common_options.c
> +++ b/lib/librte_eal/common/eal_common_options.c
> @@ -85,6 +85,7 @@ eal_long_options[] = {
> {OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
> {OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
> {OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
> +   {OPT_NO_SOC,0, NULL, OPT_NO_SOC_NUM   },
> {OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
> {OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
> {OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM},
> @@ -841,6 +842,10 @@ eal_parse_common_option(int opt, const char *optarg,
> conf->no_pci = 1;
> break;
>
> +   case OPT_NO_SOC_NUM:
> +   conf->no_soc = 1;

Could it be better to rename to enable_soc, and disable soc by default?

> +   break;
> +
> case OPT_NO_HPET_NUM:
> conf->no_hpet = 1;
> break;
> diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
> b/lib/librte_eal/common/eal_internal_cfg.h
> index 5f1367e..3a98e94 100644
> --- a/lib/librte_eal/common/eal_internal_cfg.h
> +++ b/lib/librte_eal/common/eal_internal_cfg.h
> @@ -67,6 +67,7 @@ struct internal_config {
> unsigned hugepage_unlink; /**< true to unlink backing files */
> volatile unsigned xen_dom0_support; /**< support app running on Xen 
> Dom0*/
> volatile unsigned no_pci; /**< true to disable PCI */
> +   volatile unsigned no_soc; /**< true to disable SoC */
> volatile unsigned no_hpet;/**< true to disable HPET */
> volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
>   
>   * instead of native TSC */
> diff --git a/lib/librte_eal/common/eal_options.h 
> b/lib/librte_eal/common/eal_options.h
> index a881c62..ba1e704 100644
> --- a/lib/librte_eal/common/eal_options.h
> +++ b/lib/librte_eal/common/eal_options.h
> @@ -69,6 +69,8 @@ enum {
> OPT_NO_HUGE_NUM,
>  #define OPT_NO_PCI"no-pci"
> OPT_NO_PCI_NUM,
> +#define OPT_NO_SOC"no-soc"
> +   OPT_NO_SOC_NUM,
>  #define OPT_NO_SHCONF "no-shconf"
> OPT_NO_SHCONF_NUM,
>  #define OPT_SOCKET_MEM"socket-mem"
> --
> 2.8.0
>

[dpdk-dev] [PATCHv3 1/2] config/armv8a: disable igb_uio

2016-05-13 Thread Jianbo Liu

On 12 May 2016 at 18:31, Santosh Shukla
 wrote:
> On Thu, May 12, 2016 at 05:52:54PM +0800, Jianbo Liu wrote:
>> On 12 May 2016 at 16:57, Santosh Shukla
>>  wrote:
>> > On Thu, May 12, 2016 at 01:54:13PM +0800, Jianbo Liu wrote:
>> >> On 12 May 2016 at 13:06, Santosh Shukla
>> >>  wrote:
>> >> > On Thu, May 12, 2016 at 11:42:26AM +0800, Jianbo Liu wrote:
>> >> >> On 12 May 2016 at 11:17, Santosh Shukla
>> >> >>  wrote:
>> >> >> > On Thu, May 12, 2016 at 10:01:05AM +0800, Jianbo Liu wrote:
>> >> >> >> On 12 May 2016 at 02:25, Stephen Hemminger > >> >> >> networkplumber.org> wrote:
>> >> >> >> > On Wed, 11 May 2016 22:32:16 +0530
>> >> >> >> > Jerin Jacob  wrote:
>> >> >> >> >
>> >> >> >> >> On Wed, May 11, 2016 at 08:22:59AM -0700, Stephen Hemminger 
>> >> >> >> >> wrote:
>> >> >> >> >> > On Wed, 11 May 2016 19:17:58 +0530
>> >> >> >> >> > Hemant Agrawal  wrote:
>> >> >> >> >> >
>> >> >> >> >> > > IGB_UIO not supported for arm64 arch in kernel so disable.
>> >> >> >> >> > >
>> >> >> >> >> > > Signed-off-by: Hemant Agrawal 
>> >> >> >> >> > > Reviewed-by: Santosh Shukla > >> >> >> >> > > caviumnetworks.com>
>> >> >> >> >> >
>> >> >> >> >> > Really, I have use IGB_UIO on ARM64
>> >> >> >> >>
>> >> >> >> >> May I know what is the technical use case for igb_uio on arm64
>> >> >> >> >> which cannot be addressed through vfio or vfioionommu.
>> >> >> >> >
>> >> >> >> > I was running on older kernel which did not support vfioionommu 
>> >> >> >> > mode.
>> >> >> >>
>> >> >> >> As I said, most of DPDK developers are not kernel developers. They 
>> >> >> >> may
>> >> >> >> have their own kernel tree, and couldn't like to upgrade to latest
>> >> >> >> kernel.
>> >> >> >> They can choose to use or not use igb_uio when binding the driver. 
>> >> >> >> But
>> >> >> >> blindly disabling it in the base config seems unreasonable.
>> >> >> >
>> >> >> > if user keeping his own kernel so they could also keep IGB_UIO=y in 
>> >> >> > their local
>> >> >> Most likely they don't have local dpdk tree. They write their own
>> >> >> applications, complie and link to dpdk lib, then done.
>> >> >>
>> >> >> > dpdk tree. Why are you imposing user-x custome depedancy on upstream 
>> >> >> > dpdk base
>> >> >> Customer requiremnts is important. I want they can choose the way they 
>> >> >> like.
>> >> >>
>> >> >
>> >> > so you choose to keep igb_uio option, provided arch doesn't support?
>> >> > new user did reported issues with igb_uio for arm64, refer this thread 
>> >> > [1], as
>> >> > well hemanth too faced issues. we want to avoid that.
>> >> >
>> >> > If customer maintaing out-of-tree kernel then he can also switch to 
>> >> > vfio-way.
>> >> > isn;t it?
>> >> >
>> >> >> > config. Is it not enough for explanation that - Base config ie.. 
>> >> >> > armv8 doesn;t
>> >> >> > support pci mmap, so igb_uio is n/a. New user wont able to build/run 
>> >> >> > dpdk/arm64
>> >> >> > in igb_uio-way, He'll prefer to use upstream stuff. I think, you are 
>> >> >> > not making
>> >> >> You are wrong, he can build dpdk. If he like to use upstream without
>> >> >> patching, he can use vfio.
>> >> >
>> >> > I disagree, we want to avoid [1] for new user.
>> >> >
>> >> >> But you can't ignore the need from old user which is more comfortable
>> >> >> with older kernel.
>> >> >>
>> >> > arm/arm64 dpdk support recently adde

[dpdk-dev] [PATCH v1 03/28] eal/linux: extract function rte_eal_unbind_kernel_driver

2016-05-13 Thread Jianbo Liu

On 6 May 2016 at 21:47, Jan Viktorin  wrote:
> Generalize the PCI-specific pci_unbind_kernel_driver. It is now divided into
> two parts. First, determination of the path and string identification of the
> device to be unbound. Second, the actual unbind operation which is generic.
>
> Signed-off-by: Jan Viktorin 
> ---
>  lib/librte_eal/common/eal_private.h   | 13 +
>  lib/librte_eal/linuxapp/eal/eal.c | 26 ++
>  lib/librte_eal/linuxapp/eal/eal_pci.c | 33 +
>  3 files changed, 48 insertions(+), 24 deletions(-)
>
> diff --git a/lib/librte_eal/common/eal_private.h 
> b/lib/librte_eal/common/eal_private.h
> index 81816a6..3fb8353 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -289,6 +289,19 @@ int rte_eal_alarm_init(void);
>  int rte_eal_check_module(const char *module_name);
>
>  /**
> + * Unbind kernel driver bound to the device specified by the given devpath,
> + * and its string identification.
> + *
> + * @param devpath  path to the device directory ("/sys/.../devices/")
> + * @param devididentification of the device ()
> + *
> + * @return
> + *  -1  unbind has failed
> + *   0  module has been unbound
> + */
> +int rte_eal_unbind_kernel_driver(const char *devpath, const char *devid);
> +
> +/**
>   * Get cpu core_id.
>   *
>   * This function is private to the EAL.
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
> b/lib/librte_eal/linuxapp/eal/eal.c
> index e8fce6b..844f958 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -949,3 +949,29 @@ rte_eal_check_module(const char *module_name)
> /* Module has been found */
> return 1;
>  }
> +
> +int
> +rte_eal_unbind_kernel_driver(const char *devpath, const char *devid)
> +{
> +   char filename[PATH_MAX];
> +   FILE *f;
> +
> +   snprintf(filename, sizeof(filename),
> +"%s/driver/unbind", devpath);
> +
> +   f = fopen(filename, "w");
> +   if (f == NULL) /* device was not bound */
> +   return 0;
> +
> +   if (fwrite(devid, strlen(devid), 1, f) == 0) {
> +   RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__,
> +   filename);
> +   goto error;
> +   }
> +
> +   fclose(f);
> +   return 0;
> +error:
> +   fclose(f);
> +   return -1;
> +}
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index fd7e34f..312cb14 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -59,38 +59,23 @@ int
>  pci_unbind_kernel_driver(struct rte_pci_device *dev)
>  {
> int n;
> -   FILE *f;
> -   char filename[PATH_MAX];
> -   char buf[BUFSIZ];
> +   char devpath[PATH_MAX];
> +   char devid[BUFSIZ];
> struct rte_pci_addr *loc = >addr;
>
> -   /* open /sys/bus/pci/devices/:BB:CC.D/driver */
> -   snprintf(filename, sizeof(filename),
> -SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/driver/unbind",
> +   /* devpath /sys/bus/pci/devices/:BB:CC.D */
> +   snprintf(devpath, sizeof(devpath),
> +SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
>  loc->domain, loc->bus, loc->devid, loc->function);
>
> -   f = fopen(filename, "w");
> -   if (f == NULL) /* device was not bound */
> -   return 0;
> -
> -   n = snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n",
> +   n = snprintf(devid, sizeof(devid), PCI_PRI_FMT "\n",
>  loc->domain, loc->bus, loc->devid, loc->function);
> -   if ((n < 0) || (n >= (int)sizeof(buf))) {
> +   if ((n < 0) || (n >= (int)sizeof(devid))) {

Is it better to move "(n >= (int)sizeof(devid))" before snprintf and
it has different reason from "n < 0"?

> RTE_LOG(ERR, EAL, "%s(): snprintf failed\n", __func__);
> -   goto error;
> -   }
> -   if (fwrite(buf, n, 1, f) == 0) {
> -   RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__,
> -   filename);
> -   goto error;
> +   return -1;
> }
>
> -   fclose(f);
> -   return 0;
> -
> -error:
> -   fclose(f);
> -   return -1;
> +   return rte_eal_unbind_kernel_driver(devpath, devid);
>  }
>
>  static int
> --
> 2.8.0
>

[dpdk-dev] [PATCHv3 1/2] config/armv8a: disable igb_uio

2016-05-12 Thread Jianbo Liu

On 12 May 2016 at 16:57, Santosh Shukla
 wrote:
> On Thu, May 12, 2016 at 01:54:13PM +0800, Jianbo Liu wrote:
>> On 12 May 2016 at 13:06, Santosh Shukla
>>  wrote:
>> > On Thu, May 12, 2016 at 11:42:26AM +0800, Jianbo Liu wrote:
>> >> On 12 May 2016 at 11:17, Santosh Shukla
>> >>  wrote:
>> >> > On Thu, May 12, 2016 at 10:01:05AM +0800, Jianbo Liu wrote:
>> >> >> On 12 May 2016 at 02:25, Stephen Hemminger > >> >> networkplumber.org> wrote:
>> >> >> > On Wed, 11 May 2016 22:32:16 +0530
>> >> >> > Jerin Jacob  wrote:
>> >> >> >
>> >> >> >> On Wed, May 11, 2016 at 08:22:59AM -0700, Stephen Hemminger wrote:
>> >> >> >> > On Wed, 11 May 2016 19:17:58 +0530
>> >> >> >> > Hemant Agrawal  wrote:
>> >> >> >> >
>> >> >> >> > > IGB_UIO not supported for arm64 arch in kernel so disable.
>> >> >> >> > >
>> >> >> >> > > Signed-off-by: Hemant Agrawal 
>> >> >> >> > > Reviewed-by: Santosh Shukla > >> >> >> > > caviumnetworks.com>
>> >> >> >> >
>> >> >> >> > Really, I have use IGB_UIO on ARM64
>> >> >> >>
>> >> >> >> May I know what is the technical use case for igb_uio on arm64
>> >> >> >> which cannot be addressed through vfio or vfioionommu.
>> >> >> >
>> >> >> > I was running on older kernel which did not support vfioionommu mode.
>> >> >>
>> >> >> As I said, most of DPDK developers are not kernel developers. They may
>> >> >> have their own kernel tree, and couldn't like to upgrade to latest
>> >> >> kernel.
>> >> >> They can choose to use or not use igb_uio when binding the driver. But
>> >> >> blindly disabling it in the base config seems unreasonable.
>> >> >
>> >> > if user keeping his own kernel so they could also keep IGB_UIO=y in 
>> >> > their local
>> >> Most likely they don't have local dpdk tree. They write their own
>> >> applications, complie and link to dpdk lib, then done.
>> >>
>> >> > dpdk tree. Why are you imposing user-x custome depedancy on upstream 
>> >> > dpdk base
>> >> Customer requiremnts is important. I want they can choose the way they 
>> >> like.
>> >>
>> >
>> > so you choose to keep igb_uio option, provided arch doesn't support?
>> > new user did reported issues with igb_uio for arm64, refer this thread 
>> > [1], as
>> > well hemanth too faced issues. we want to avoid that.
>> >
>> > If customer maintaing out-of-tree kernel then he can also switch to 
>> > vfio-way.
>> > isn;t it?
>> >
>> >> > config. Is it not enough for explanation that - Base config ie.. armv8 
>> >> > doesn;t
>> >> > support pci mmap, so igb_uio is n/a. New user wont able to build/run 
>> >> > dpdk/arm64
>> >> > in igb_uio-way, He'll prefer to use upstream stuff. I think, you are 
>> >> > not making
>> >> You are wrong, he can build dpdk. If he like to use upstream without
>> >> patching, he can use vfio.
>> >
>> > I disagree, we want to avoid [1] for new user.
>> >
>> >> But you can't ignore the need from old user which is more comfortable
>> >> with older kernel.
>> >>
>> > arm/arm64 dpdk support recently added and I am guessing, most likely 
>> > customer
>> > using near latest kernel, switching to vfio won't be so difficult.
>> >
>> > Or can you take up responsibility of upstreaming pci mmap patch, then we 
>> > don't
>> > need this patch.
>> >
>> > [1] http://dpdk.org/ml/archives/dev/2016-January/031313.html
>>
>> Can you read carefully about the guide at
>> http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html? It says to use
>> uio_pci_generic, igb_uio or vfio-pci.
>
> *** applicable and works for x86 only, not for arm64: because pci mmap support
> not present for arm64, in that case we should update the doc.
>
>> Could it be possible that the user in that thread has already read and
>> tried them all and found that he can't enable vifo with his kernel,
>> and igb_uio is the easy way for him and asked for help from community?
>> If so, we have no choice but keeping igb_uio enabled.
>
> By then vfionoiommu support was wip progress in dpdk/linux. but now it merged
> and it works. So no need to retain igb_uio in base config for which to work -
> user need to use mmap patch at linux side.

We can't decide which kernel user will use.

>
> Or can you maintain out-of-tree pci mmap patch/ kerne source and make it
> explicit somewhere in dpdk build doc that - if user want igb_uio way then
> use kernel/mmap patch from x location.

The patch is in the kernel maillist, and user google it.
And isn't funny to ask someone to do something again and again (3
times) in this thread?

>
>> He use lsmod to show us the modules, most likely he know vifo-pci.
>>
>> Below are the details on modules, hugepages and device binding.
>> root at arm64:~# lsmod
>> Module  Size  Used by
>> rte_kni   292795  0
>> igb_uio 4338  0
>> ixgbe 184456  0

[dpdk-dev] [PATCHv3 1/2] config/armv8a: disable igb_uio

2016-05-12 Thread Jianbo Liu

On 12 May 2016 at 13:06, Santosh Shukla
 wrote:
> On Thu, May 12, 2016 at 11:42:26AM +0800, Jianbo Liu wrote:
>> On 12 May 2016 at 11:17, Santosh Shukla
>>  wrote:
>> > On Thu, May 12, 2016 at 10:01:05AM +0800, Jianbo Liu wrote:
>> >> On 12 May 2016 at 02:25, Stephen Hemminger > >> networkplumber.org> wrote:
>> >> > On Wed, 11 May 2016 22:32:16 +0530
>> >> > Jerin Jacob  wrote:
>> >> >
>> >> >> On Wed, May 11, 2016 at 08:22:59AM -0700, Stephen Hemminger wrote:
>> >> >> > On Wed, 11 May 2016 19:17:58 +0530
>> >> >> > Hemant Agrawal  wrote:
>> >> >> >
>> >> >> > > IGB_UIO not supported for arm64 arch in kernel so disable.
>> >> >> > >
>> >> >> > > Signed-off-by: Hemant Agrawal 
>> >> >> > > Reviewed-by: Santosh Shukla 
>> >> >> >
>> >> >> > Really, I have use IGB_UIO on ARM64
>> >> >>
>> >> >> May I know what is the technical use case for igb_uio on arm64
>> >> >> which cannot be addressed through vfio or vfioionommu.
>> >> >
>> >> > I was running on older kernel which did not support vfioionommu mode.
>> >>
>> >> As I said, most of DPDK developers are not kernel developers. They may
>> >> have their own kernel tree, and couldn't like to upgrade to latest
>> >> kernel.
>> >> They can choose to use or not use igb_uio when binding the driver. But
>> >> blindly disabling it in the base config seems unreasonable.
>> >
>> > if user keeping his own kernel so they could also keep IGB_UIO=y in their 
>> > local
>> Most likely they don't have local dpdk tree. They write their own
>> applications, complie and link to dpdk lib, then done.
>>
>> > dpdk tree. Why are you imposing user-x custome depedancy on upstream dpdk 
>> > base
>> Customer requiremnts is important. I want they can choose the way they like.
>>
>
> so you choose to keep igb_uio option, provided arch doesn't support?
> new user did reported issues with igb_uio for arm64, refer this thread [1], as
> well hemanth too faced issues. we want to avoid that.
>
> If customer maintaing out-of-tree kernel then he can also switch to vfio-way.
> isn;t it?
>
>> > config. Is it not enough for explanation that - Base config ie.. armv8 
>> > doesn;t
>> > support pci mmap, so igb_uio is n/a. New user wont able to build/run 
>> > dpdk/arm64
>> > in igb_uio-way, He'll prefer to use upstream stuff. I think, you are not 
>> > making
>> You are wrong, he can build dpdk. If he like to use upstream without
>> patching, he can use vfio.
>
> I disagree, we want to avoid [1] for new user.
>
>> But you can't ignore the need from old user which is more comfortable
>> with older kernel.
>>
> arm/arm64 dpdk support recently added and I am guessing, most likely customer
> using near latest kernel, switching to vfio won't be so difficult.
>
> Or can you take up responsibility of upstreaming pci mmap patch, then we don't
> need this patch.
>
> [1] http://dpdk.org/ml/archives/dev/2016-January/031313.html

Can you read carefully about the guide at
http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html? It says to use
uio_pci_generic, igb_uio or vfio-pci.
Could it be possible that the user in that thread has already read and
tried them all and found that he can't enable vifo with his kernel,
and igb_uio is the easy way for him and asked for help from community?
If so, we have no choice but keeping igb_uio enabled.

He use lsmod to show us the modules, most likely he know vifo-pci.

Below are the details on modules, hugepages and device binding.
root at arm64:~# lsmod
Module  Size  Used by
rte_kni   292795  0
igb_uio 4338  0
ixgbe 184456  0

[dpdk-dev] [PATCHv3 1/2] config/armv8a: disable igb_uio

2016-05-12 Thread Jianbo Liu

On 12 May 2016 at 11:17, Santosh Shukla
 wrote:
> On Thu, May 12, 2016 at 10:01:05AM +0800, Jianbo Liu wrote:
>> On 12 May 2016 at 02:25, Stephen Hemminger  
>> wrote:
>> > On Wed, 11 May 2016 22:32:16 +0530
>> > Jerin Jacob  wrote:
>> >
>> >> On Wed, May 11, 2016 at 08:22:59AM -0700, Stephen Hemminger wrote:
>> >> > On Wed, 11 May 2016 19:17:58 +0530
>> >> > Hemant Agrawal  wrote:
>> >> >
>> >> > > IGB_UIO not supported for arm64 arch in kernel so disable.
>> >> > >
>> >> > > Signed-off-by: Hemant Agrawal 
>> >> > > Reviewed-by: Santosh Shukla 
>> >> >
>> >> > Really, I have use IGB_UIO on ARM64
>> >>
>> >> May I know what is the technical use case for igb_uio on arm64
>> >> which cannot be addressed through vfio or vfioionommu.
>> >
>> > I was running on older kernel which did not support vfioionommu mode.
>>
>> As I said, most of DPDK developers are not kernel developers. They may
>> have their own kernel tree, and couldn't like to upgrade to latest
>> kernel.
>> They can choose to use or not use igb_uio when binding the driver. But
>> blindly disabling it in the base config seems unreasonable.
>
> if user keeping his own kernel so they could also keep IGB_UIO=y in their 
> local
Most likely they don't have local dpdk tree. They write their own
applications, complie and link to dpdk lib, then done.

> dpdk tree. Why are you imposing user-x custome depedancy on upstream dpdk base
Customer requiremnts is important. I want they can choose the way they like.

> config. Is it not enough for explanation that - Base config ie.. armv8 doesn;t
> support pci mmap, so igb_uio is n/a. New user wont able to build/run 
> dpdk/arm64
> in igb_uio-way, He'll prefer to use upstream stuff. I think, you are not 
> making
You are wrong, he can build dpdk. If he like to use upstream without
patching, he can use vfio.
But you can't ignore the need from old user which is more comfortable
with older kernel.

> sense.
>

[dpdk-dev] [PATCHv3 1/2] config/armv8a: disable igb_uio

2016-05-12 Thread Jianbo Liu

On 12 May 2016 at 02:25, Stephen Hemminger  
wrote:
> On Wed, 11 May 2016 22:32:16 +0530
> Jerin Jacob  wrote:
>
>> On Wed, May 11, 2016 at 08:22:59AM -0700, Stephen Hemminger wrote:
>> > On Wed, 11 May 2016 19:17:58 +0530
>> > Hemant Agrawal  wrote:
>> >
>> > > IGB_UIO not supported for arm64 arch in kernel so disable.
>> > >
>> > > Signed-off-by: Hemant Agrawal 
>> > > Reviewed-by: Santosh Shukla 
>> >
>> > Really, I have use IGB_UIO on ARM64
>>
>> May I know what is the technical use case for igb_uio on arm64
>> which cannot be addressed through vfio or vfioionommu.
>
> I was running on older kernel which did not support vfioionommu mode.

As I said, most of DPDK developers are not kernel developers. They may
have their own kernel tree, and couldn't like to upgrade to latest
kernel.
They can choose to use or not use igb_uio when binding the driver. But
blindly disabling it in the base config seems unreasonable.

[dpdk-dev] [PATCH v3 2/4] ixgbe: implement vector PMD for arm architecture

2016-05-11 Thread Jianbo Liu

On 10 May 2016 at 22:49, Bruce Richardson  wrote:
> On Fri, May 06, 2016 at 11:55:46AM +0530, Jianbo Liu wrote:
>> use ARM NEON intrinsic to implement ixgbe vPMD
>>
>> Signed-off-by: Jianbo Liu 
>> ---
>>  drivers/net/ixgbe/Makefile  |   4 +
>>  drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 561 
>> 
>>  2 files changed, 565 insertions(+)
>>  create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
>>
>> diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
>> index 50bf51c..b1c7a60 100644
>> --- a/drivers/net/ixgbe/Makefile
>> +++ b/drivers/net/ixgbe/Makefile
>> @@ -108,7 +108,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_rxtx.c
>>  SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_ethdev.c
>>  SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_fdir.c
>>  SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_pf.c
>> +ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
>> +SRCS-$(CONFIG_RTE_IXGBE_INC_VECTOR) += ixgbe_rxtx_vec_neon.c
>> +else
>>  SRCS-$(CONFIG_RTE_IXGBE_INC_VECTOR) += ixgbe_rxtx_vec.c
>> +endif
>>
> Since you are adding ixgbe_rxtx_vec_neon.c here, it might be worthwhile adding
> in an extra patch to rename ixgbe_rxtx_vec.c to ixgbe_rxtx_vec_sse.c for
> consistency.
>
OK, I'll do that.

[dpdk-dev] [PATCH] ixgbe: rename ixgbe_rxtx_vec.c to ixgbe_rxtx_vec_sse.c

2016-05-11 Thread Jianbo Liu

To be consistent with the naming for ARM NEON implementation,
ixgbe_rxtx_vec.c is renamed to ixgbe_rxtx_vec_sse.c.

Signed-off-by: Jianbo Liu 
---
 drivers/net/ixgbe/Makefile   | 2 +-
 drivers/net/ixgbe/{ixgbe_rxtx_vec.c => ixgbe_rxtx_vec_sse.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename drivers/net/ixgbe/{ixgbe_rxtx_vec.c => ixgbe_rxtx_vec_sse.c} (100%)

diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index b1c7a60..12b63b4 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -111,7 +111,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_pf.c
 ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
 SRCS-$(CONFIG_RTE_IXGBE_INC_VECTOR) += ixgbe_rxtx_vec_neon.c
 else
-SRCS-$(CONFIG_RTE_IXGBE_INC_VECTOR) += ixgbe_rxtx_vec.c
+SRCS-$(CONFIG_RTE_IXGBE_INC_VECTOR) += ixgbe_rxtx_vec_sse.c
 endif

 ifeq ($(CONFIG_RTE_NIC_BYPASS),y)
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
similarity index 100%
rename from drivers/net/ixgbe/ixgbe_rxtx_vec.c
rename to drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
-- 
2.4.11

[dpdk-dev] [PATCH] arm64: change rte_memcpy to inline function

2016-05-10 Thread Jianbo Liu

Other APP may call rte_memcpy by function pointer,
so change it to an inline function.

Signed-off-by: Jianbo Liu 
---
 lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
index 917cdc1..3abe7cd 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
@@ -78,7 +78,11 @@ rte_mov256(uint8_t *dst, const uint8_t *src)
memcpy(dst, src, 256);
 }

-#define rte_memcpy(d, s, n)memcpy((d), (s), (n))
+static inline void *
+rte_memcpy(void *dst, const void *src, size_t n)
+{
+   return memcpy(dst, src, n);
+}

 static inline void *
 rte_memcpy_func(void *dst, const void *src, size_t n)
-- 
2.4.11

[dpdk-dev] [PATCH] mk: Introduce NXP dpaa2 architecture based on armv8-a

2016-05-10 Thread Jianbo Liu

On 10 May 2016 at 00:17, Jerin Jacob  wrote:
> On Mon, May 09, 2016 at 11:22:15PM +0800, Jianbo Liu wrote:
>> On 9 May 2016 at 20:11, Jerin Jacob  
>> wrote:
>> > On Mon, May 09, 2016 at 07:02:36PM +0800, Jianbo Liu wrote:
>> >> On 9 May 2016 at 17:06, Jerin Jacob  
>> >> wrote:
>> >> > On Mon, May 09, 2016 at 07:18:22PM +0530, Hemant Agrawal wrote:
>> >> >> This patch introduces dpaa2 machine target to address difference
>> >> >> in cpu parameter, number of core to 8 and no numa support
>> >> >> w.r.t default armv8-a machine
>> >> >>
>> >> >> Signed-off-by: Hemant Agrawal 
>> >> >> ---
>
> Snip
>
>> >> >> +#
>> >> >> +# Compile Environment Abstraction Layer
>> >> >> +#
>> >> >> +CONFIG_RTE_MAX_LCORE=8
>> >> >> +CONFIG_RTE_MAX_NUMA_NODES=1
>> >> >> +CONFIG_RTE_EAL_IGB_UIO=n
>> >> >
>> >> > I think it makes sense to move this option to generic arm64 config
>> >> > as upstream arm64 kernel does not have support for sysfs based PCI mmap
>> >> > resource file,(/sys/bus/pci/devices/B:D:F/resource[_wc]X) need for
>> >> > CONFIG_RTE_EAL_IGB_UIO to work) and use VFIO for all cases.
>> >> >
>> >> > Any objections?
>> >> >
>> >> Is there any conflict to keep both?
>> >
>> > I would like to avoid the case like below in dpdk.org ml.
>> > http://dpdk.org/ml/archives/dev/2016-January/031313.html
>> >
>> So no conflict to enable both.
>
> IMO, Conflict part comes secondary, It does not even work with upstream 
> kernel.
> Why keep the broken configuration? Two main reasons I think it makes
> sense to disable
> - It is broken, I don't think arm64 kernel developers likes non VFIO approach
I don't think DPDK user is kernel developer in most cases. They maybe
like the traditional way.

> now. So mostly likely it will be broken
> - Trying to avoid out of tree patches wherever is possible as
> distribution folks like to work with upstream version.
Agree. But there is possible that people/company maintain their own kernel tree.

>
>> I'd rather keep as it is for armv8a defconfig, becasue it's the base,
>> any change may affect existing user.
> IMO, It makes sense to disable at armv8a defconfig otherwise all armv8
> variants need add CONFIG_RTE_EAL_IGB_UIO=n in all the configs and its
> arch specific issue.
We don't have to do that.
You didn't explictly disable this config in your current
defconfig_arm64-thunderx-linuxapp-gcc, but you know which module to
bind.

[dpdk-dev] [PATCH] mk: Introduce NXP dpaa2 architecture based on armv8-a

2016-05-10 Thread Jianbo Liu

On 9 May 2016 at 20:11, Jerin Jacob  wrote:
> On Mon, May 09, 2016 at 07:02:36PM +0800, Jianbo Liu wrote:
>> On 9 May 2016 at 17:06, Jerin Jacob  
>> wrote:
>> > On Mon, May 09, 2016 at 07:18:22PM +0530, Hemant Agrawal wrote:
>> >> This patch introduces dpaa2 machine target to address difference
>> >> in cpu parameter, number of core to 8 and no numa support
>> >> w.r.t default armv8-a machine
>> >>
>> >> Signed-off-by: Hemant Agrawal 
>> >> ---
>> >>  config/defconfig_arm64-dpaa2-linuxapp-gcc | 44 +++
>> >>  mk/machine/dpaa2/rte.vars.mk  | 60 
>> >> +++
>> >>  mk/rte.module.mk  |  5 +++
>> >>  3 files changed, 109 insertions(+)
>> >>  create mode 100644 config/defconfig_arm64-dpaa2-linuxapp-gcc
>> >>  create mode 100644 mk/machine/dpaa2/rte.vars.mk
>> >>
>> >> diff --git a/config/defconfig_arm64-dpaa2-linuxapp-gcc 
>> >> b/config/defconfig_arm64-dpaa2-linuxapp-gcc
>> >> new file mode 100644
>> >> index 000..80bda26
>> >> --- /dev/null
>> >> +++ b/config/defconfig_arm64-dpaa2-linuxapp-gcc
>> >> @@ -0,0 +1,44 @@
>> >> +#   BSD LICENSE
>> >> +#
>> >> +#   Copyright(c) 2016 Freescale Semiconductor, Inc. All rights reserved.
>> >> +#
>> >> +#   Redistribution and use in source and binary forms, with or without
>> >> +#   modification, are permitted provided that the following conditions
>> >> +#   are met:
>> >> +#
>> >> +# * Redistributions of source code must retain the above copyright
>> >> +#   notice, this list of conditions and the following disclaimer.
>> >> +# * Redistributions in binary form must reproduce the above copyright
>> >> +#   notice, this list of conditions and the following disclaimer in
>> >> +#   the documentation and/or other materials provided with the
>> >> +#   distribution.
>> >> +# * Neither the name of Freescale Semiconductor nor the names of its
>> >> +#   contributors may be used to endorse or promote products derived
>> >> +#   from this software without specific prior written permission.
>> >> +#
>> >> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> >> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> >> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> >> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> >> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> >> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> >> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> >> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> >> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> >> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> >> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> >> +#
>> >> +
>> >> +#include "defconfig_arm64-armv8a-linuxapp-gcc"
>> >> +
>> >> +# NXP (Freescale) - Soc Architecture with WRIOP and QBMAN support
>> >> +CONFIG_RTE_MACHINE="dpaa2"
>> >> +CONFIG_RTE_ARCH_ARM_TUNE="cortex-a57+fp+simd"
>> >> +
>> >> +#
>> >> +# Compile Environment Abstraction Layer
>> >> +#
>> >> +CONFIG_RTE_MAX_LCORE=8
>> >> +CONFIG_RTE_MAX_NUMA_NODES=1
>> >> +CONFIG_RTE_EAL_IGB_UIO=n
>> >
>> > I think it makes sense to move this option to generic arm64 config
>> > as upstream arm64 kernel does not have support for sysfs based PCI mmap
>> > resource file,(/sys/bus/pci/devices/B:D:F/resource[_wc]X) need for
>> > CONFIG_RTE_EAL_IGB_UIO to work) and use VFIO for all cases.
>> >
>> > Any objections?
>> >
>> Is there any conflict to keep both?
>
> I would like to avoid the case like below in dpdk.org ml.
> http://dpdk.org/ml/archives/dev/2016-January/031313.html
>
So no conflict to enable both.
I'd rather keep as it is for armv8a defconfig, becasue it's the base,
any change may affect existing user.

[dpdk-dev] [PATCH] mk: Introduce NXP dpaa2 architecture based on armv8-a

2016-05-09 Thread Jianbo Liu

On 9 May 2016 at 17:06, Jerin Jacob  wrote:
> On Mon, May 09, 2016 at 07:18:22PM +0530, Hemant Agrawal wrote:
>> This patch introduces dpaa2 machine target to address difference
>> in cpu parameter, number of core to 8 and no numa support
>> w.r.t default armv8-a machine
>>
>> Signed-off-by: Hemant Agrawal 
>> ---
>>  config/defconfig_arm64-dpaa2-linuxapp-gcc | 44 +++
>>  mk/machine/dpaa2/rte.vars.mk  | 60 
>> +++
>>  mk/rte.module.mk  |  5 +++
>>  3 files changed, 109 insertions(+)
>>  create mode 100644 config/defconfig_arm64-dpaa2-linuxapp-gcc
>>  create mode 100644 mk/machine/dpaa2/rte.vars.mk
>>
>> diff --git a/config/defconfig_arm64-dpaa2-linuxapp-gcc 
>> b/config/defconfig_arm64-dpaa2-linuxapp-gcc
>> new file mode 100644
>> index 000..80bda26
>> --- /dev/null
>> +++ b/config/defconfig_arm64-dpaa2-linuxapp-gcc
>> @@ -0,0 +1,44 @@
>> +#   BSD LICENSE
>> +#
>> +#   Copyright(c) 2016 Freescale Semiconductor, Inc. All rights reserved.
>> +#
>> +#   Redistribution and use in source and binary forms, with or without
>> +#   modification, are permitted provided that the following conditions
>> +#   are met:
>> +#
>> +# * Redistributions of source code must retain the above copyright
>> +#   notice, this list of conditions and the following disclaimer.
>> +# * Redistributions in binary form must reproduce the above copyright
>> +#   notice, this list of conditions and the following disclaimer in
>> +#   the documentation and/or other materials provided with the
>> +#   distribution.
>> +# * Neither the name of Freescale Semiconductor nor the names of its
>> +#   contributors may be used to endorse or promote products derived
>> +#   from this software without specific prior written permission.
>> +#
>> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> +#
>> +
>> +#include "defconfig_arm64-armv8a-linuxapp-gcc"
>> +
>> +# NXP (Freescale) - Soc Architecture with WRIOP and QBMAN support
>> +CONFIG_RTE_MACHINE="dpaa2"
>> +CONFIG_RTE_ARCH_ARM_TUNE="cortex-a57+fp+simd"
>> +
>> +#
>> +# Compile Environment Abstraction Layer
>> +#
>> +CONFIG_RTE_MAX_LCORE=8
>> +CONFIG_RTE_MAX_NUMA_NODES=1
>> +CONFIG_RTE_EAL_IGB_UIO=n
>
> I think it makes sense to move this option to generic arm64 config
> as upstream arm64 kernel does not have support for sysfs based PCI mmap
> resource file,(/sys/bus/pci/devices/B:D:F/resource[_wc]X) need for
> CONFIG_RTE_EAL_IGB_UIO to work) and use VFIO for all cases.
>
> Any objections?
>
Is there any conflict to keep both?

[dpdk-dev] [PATCH v3 4/4] maintainers: claim responsibility for ixgbe vector PMD on ARM

2016-05-06 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ba4053a..78b46e2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -142,6 +142,7 @@ F: lib/librte_eal/common/include/arch/arm/*_64.h
 F: lib/librte_acl/acl_run_neon.*
 F: lib/librte_lpm/rte_lpm_neon.h
 F: lib/librte_hash/rte*_arm64.h
+F: drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c

 EZchip TILE-Gx
 M: Zhigang Lu 
-- 
2.4.11

[dpdk-dev] [PATCH v3 3/4] ixgbe: enable ixgbe vector PMD on ARMv8a platform

2016-05-06 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 config/defconfig_arm64-armv8a-linuxapp-gcc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 9abeca4..98cc054 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -42,7 +42,6 @@ CONFIG_RTE_FORCE_INTRINSICS=y
 CONFIG_RTE_TOOLCHAIN="gcc"
 CONFIG_RTE_TOOLCHAIN_GCC=y

-CONFIG_RTE_IXGBE_INC_VECTOR=n
 CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
-- 
2.4.11

[dpdk-dev] [PATCH v3 2/4] ixgbe: implement vector PMD for arm architecture

2016-05-06 Thread Jianbo Liu

use ARM NEON intrinsic to implement ixgbe vPMD

Signed-off-by: Jianbo Liu 
---
 drivers/net/ixgbe/Makefile  |   4 +
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 561 
 2 files changed, 565 insertions(+)
 create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c

diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index 50bf51c..b1c7a60 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -108,7 +108,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_fdir.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_pf.c
+ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
+SRCS-$(CONFIG_RTE_IXGBE_INC_VECTOR) += ixgbe_rxtx_vec_neon.c
+else
 SRCS-$(CONFIG_RTE_IXGBE_INC_VECTOR) += ixgbe_rxtx_vec.c
+endif

 ifeq ($(CONFIG_RTE_NIC_BYPASS),y)
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_bypass.c
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
new file mode 100644
index 000..11a6115
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -0,0 +1,561 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ixgbe_ethdev.h"
+#include "ixgbe_rxtx.h"
+#include "ixgbe_rxtx_vec_common.h"
+
+#include 
+
+#pragma GCC diagnostic ignored "-Wcast-qual"
+
+static inline void
+ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
+{
+   int i;
+   uint16_t rx_id;
+   volatile union ixgbe_adv_rx_desc *rxdp;
+   struct ixgbe_rx_entry *rxep = >sw_ring[rxq->rxrearm_start];
+   struct rte_mbuf *mb0, *mb1;
+   uint64x2_t dma_addr0, dma_addr1;
+   uint64x2_t zero = vdupq_n_u64(0);
+   uint64_t paddr;
+   uint8x8_t p;
+
+   rxdp = rxq->rx_ring + rxq->rxrearm_start;
+
+   /* Pull 'n' more MBUFs into the software ring */
+   if (unlikely(rte_mempool_get_bulk(rxq->mb_pool,
+ (void *)rxep,
+ RTE_IXGBE_RXQ_REARM_THRESH) < 0)) {
+   if (rxq->rxrearm_nb + RTE_IXGBE_RXQ_REARM_THRESH >=
+   rxq->nb_rx_desc) {
+   for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) {
+   rxep[i].mbuf = >fake_mbuf;
+   vst1q_u64((uint64_t *)[i].read,
+ zero);
+   }
+   }
+   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_IXGBE_RXQ_REARM_THRESH;
+   return;
+   }
+
+   p = vld1_u8((uint8_t *)>mbuf_initializer);
+
+   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
+   for (i = 0; i < RTE_IXGBE_RXQ_REARM_THRESH; i += 2, rxep += 2) {
+   mb0 = rxep[0].mbuf;
+   mb1 = rxep[1].mbuf;
+
+   /*
+* Flush mbuf with pkt template.
+* Data to be rearmed is 6 bytes long.
+* Though, RX will overwrite ol_flags that are coming next
+* anyway. So overwrite whole 8 bytes with one load:
+* 6

[dpdk-dev] [PATCH v3 1/4] ixgbe: rearrange vector PMD code for x86

2016-05-06 Thread Jianbo Liu

move common code to new file "ixgbe_rxtx_vec_common.h",
and vPMD for x86 is implemented in ixgbe_rxtx_vec.c

Signed-off-by: Jianbo Liu 
Suggested-by: Bruce Richardson 
---
 drivers/net/ixgbe/ixgbe_rxtx_vec.c| 258 +--
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 327 ++
 2 files changed, 335 insertions(+), 250 deletions(-)
 create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index c4d709b..5e2d621 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -37,6 +37,7 @@

 #include "ixgbe_ethdev.h"
 #include "ixgbe_rxtx.h"
+#include "ixgbe_rxtx_vec_common.h"

 #include 

@@ -420,69 +421,6 @@ ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return _recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }

-static inline uint16_t
-reassemble_packets(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_bufs,
-  uint16_t nb_bufs, uint8_t *split_flags)
-{
-   struct rte_mbuf *pkts[nb_bufs]; /*finished pkts*/
-   struct rte_mbuf *start = rxq->pkt_first_seg;
-   struct rte_mbuf *end =  rxq->pkt_last_seg;
-   unsigned int pkt_idx, buf_idx;
-
-   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
-   if (end != NULL) {
-   /* processing a split packet */
-   end->next = rx_bufs[buf_idx];
-   rx_bufs[buf_idx]->data_len += rxq->crc_len;
-
-   start->nb_segs++;
-   start->pkt_len += rx_bufs[buf_idx]->data_len;
-   end = end->next;
-
-   if (!split_flags[buf_idx]) {
-   /* it's the last packet of the set */
-   start->hash = end->hash;
-   start->ol_flags = end->ol_flags;
-   /* we need to strip crc for the whole packet */
-   start->pkt_len -= rxq->crc_len;
-   if (end->data_len > rxq->crc_len)
-   end->data_len -= rxq->crc_len;
-   else {
-   /* free up last mbuf */
-   struct rte_mbuf *secondlast = start;
-
-   start->nb_segs--;
-   while (secondlast->next != end)
-   secondlast = secondlast->next;
-   secondlast->data_len -= (rxq->crc_len -
-   end->data_len);
-   secondlast->next = NULL;
-   rte_pktmbuf_free_seg(end);
-   end = secondlast;
-   }
-   pkts[pkt_idx++] = start;
-   start = end = NULL;
-   }
-   } else {
-   /* not processing a split packet */
-   if (!split_flags[buf_idx]) {
-   /* not a split packet, save and skip */
-   pkts[pkt_idx++] = rx_bufs[buf_idx];
-   continue;
-   }
-   end = start = rx_bufs[buf_idx];
-   rx_bufs[buf_idx]->data_len += rxq->crc_len;
-   rx_bufs[buf_idx]->pkt_len += rxq->crc_len;
-   }
-   }
-
-   /* save the partial packet for next time */
-   rxq->pkt_first_seg = start;
-   rxq->pkt_last_seg = end;
-   memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
-   return pkt_idx;
-}
-
 /*
  * vPMD receive routine that reassembles scattered packets
  *
@@ -546,73 +484,6 @@ vtx(volatile union ixgbe_adv_tx_desc *txdp,
vtx1(txdp, *pkt, flags);
 }

-static inline int __attribute__((always_inline))
-ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
-{
-   struct ixgbe_tx_entry_v *txep;
-   uint32_t status;
-   uint32_t n;
-   uint32_t i;
-   int nb_free = 0;
-   struct rte_mbuf *m, *free[RTE_IXGBE_TX_MAX_FREE_BUF_SZ];
-
-   /* check DD bit on threshold descriptor */
-   status = txq->tx_ring[txq->tx_next_dd].wb.status;
-   if (!(status & IXGBE_ADVTXD_STAT_DD))
-   return 0;
-
-   n = txq->tx_rs_thresh;
-
-   /*
-* first buffer to free from S/W ring is at index
-* tx_next_dd - (tx_rs_thresh-1)
-*/
-   txep = >sw_ring_v[txq->tx_next_dd - (n - 1)];
-   m = __rte_p

[dpdk-dev] [PATCH v3 0/4] ixgbe: enable ixgbe vector PMD on ARM

2016-05-06 Thread Jianbo Liu

Implement ixgbe vPMD on ARM with NEON intrinsic.

v3:
 - rebase to rel_16_07 branch on dpdk-next-net.

v2:
 - move the common code to new header file.

Jianbo Liu (4):
  ixgbe: rearrange vector PMD code for x86
  ixgbe: implement vector PMD for arm architecture
  ixgbe: enable ixgbe vector PMD on ARMv8a platform
  maintainers: claim responsibility for ixgbe vector PMD on ARM

 MAINTAINERS|   1 +
 config/defconfig_arm64-armv8a-linuxapp-gcc |   1 -
 drivers/net/ixgbe/Makefile |   4 +
 drivers/net/ixgbe/ixgbe_rxtx_vec.c | 258 +
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h  | 327 +
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c| 561 +
 6 files changed, 901 insertions(+), 251 deletions(-)
 create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
 create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c

-- 
2.4.11

[dpdk-dev] [PATCH v2 4/4] maintainers: claim responsibility for ixgbe vector PMD on ARM

2016-04-26 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1953ea2..20158e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -142,6 +142,7 @@ F: lib/librte_eal/common/include/arch/arm/*_64.h
 F: lib/librte_acl/acl_run_neon.*
 F: lib/librte_lpm/rte_lpm_neon.h
 F: lib/librte_hash/rte*_arm64.h
+F: drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c

 EZchip TILE-Gx
 M: Zhigang Lu 
-- 
1.8.3.1

[dpdk-dev] [PATCH v2 3/4] ixgbe: enable ixgbe vector PMD on ARMv8a platform

2016-04-26 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 config/defconfig_arm64-armv8a-linuxapp-gcc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 9abeca4..98cc054 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -42,7 +42,6 @@ CONFIG_RTE_FORCE_INTRINSICS=y
 CONFIG_RTE_TOOLCHAIN="gcc"
 CONFIG_RTE_TOOLCHAIN_GCC=y

-CONFIG_RTE_IXGBE_INC_VECTOR=n
 CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
-- 
1.8.3.1

[dpdk-dev] [PATCH v2 2/4] ixgbe: implement vector PMD for arm architecture

2016-04-26 Thread Jianbo Liu

use ARM NEON intrinsic to implement ixgbe vPMD

Signed-off-by: Jianbo Liu 
---
 drivers/net/ixgbe/Makefile  |   4 +
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 556 
 2 files changed, 560 insertions(+)
 create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c

diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index 50bf51c..b1c7a60 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -108,7 +108,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_fdir.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_pf.c
+ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
+SRCS-$(CONFIG_RTE_IXGBE_INC_VECTOR) += ixgbe_rxtx_vec_neon.c
+else
 SRCS-$(CONFIG_RTE_IXGBE_INC_VECTOR) += ixgbe_rxtx_vec.c
+endif

 ifeq ($(CONFIG_RTE_NIC_BYPASS),y)
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_bypass.c
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
new file mode 100644
index 000..2d63490
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -0,0 +1,556 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ixgbe_ethdev.h"
+#include "ixgbe_rxtx.h"
+#include "ixgbe_rxtx_vec_common.h"
+
+#include 
+
+#pragma GCC diagnostic ignored "-Wcast-qual"
+
+static inline void
+ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
+{
+   int i;
+   uint16_t rx_id;
+   volatile union ixgbe_adv_rx_desc *rxdp;
+   struct ixgbe_rx_entry *rxep = >sw_ring[rxq->rxrearm_start];
+   struct rte_mbuf *mb0, *mb1;
+   uint64x2_t dma_addr0, dma_addr1;
+   uint64x2_t zero = vdupq_n_u64(0);
+   uint64_t paddr;
+   uint8x8_t p;
+
+   rxdp = rxq->rx_ring + rxq->rxrearm_start;
+
+   /* Pull 'n' more MBUFs into the software ring */
+   if (unlikely(rte_mempool_get_bulk(rxq->mb_pool,
+ (void *)rxep,
+ RTE_IXGBE_RXQ_REARM_THRESH) < 0)) {
+   if (rxq->rxrearm_nb + RTE_IXGBE_RXQ_REARM_THRESH >=
+   rxq->nb_rx_desc) {
+   for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) {
+   rxep[i].mbuf = >fake_mbuf;
+   vst1q_u64((uint64_t *)[i].read,
+ zero);
+   }
+   }
+   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_IXGBE_RXQ_REARM_THRESH;
+   return;
+   }
+
+   p = vld1_u8((uint8_t *)>mbuf_initializer);
+
+   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
+   for (i = 0; i < RTE_IXGBE_RXQ_REARM_THRESH; i += 2, rxep += 2) {
+   mb0 = rxep[0].mbuf;
+   mb1 = rxep[1].mbuf;
+
+   /*
+* Flush mbuf with pkt template.
+* Data to be rearmed is 6 bytes long.
+* Though, RX will overwrite ol_flags that are coming next
+* anyway. So overwrite whole 8 bytes with one load:
+* 6

[dpdk-dev] [PATCH v2 1/4] ixgbe: rearrange vector PMD code for x86

2016-04-26 Thread Jianbo Liu

move common code to new file "ixgbe_rxtx_vec_common.h",
and vPMD for x86 is implemented in ixgbe_rxtx_vec.c

Signed-off-by: Jianbo Liu 
Suggested-by: Bruce Richardson 
---
 drivers/net/ixgbe/ixgbe_rxtx_vec.c| 256 +--
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 325 ++
 2 files changed, 333 insertions(+), 248 deletions(-)
 create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index 5040704..b704a57 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -37,6 +37,7 @@

 #include "ixgbe_ethdev.h"
 #include "ixgbe_rxtx.h"
+#include "ixgbe_rxtx_vec_common.h"

 #include 

@@ -414,69 +415,6 @@ ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return _recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }

-static inline uint16_t
-reassemble_packets(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_bufs,
-   uint16_t nb_bufs, uint8_t *split_flags)
-{
-   struct rte_mbuf *pkts[nb_bufs]; /*finished pkts*/
-   struct rte_mbuf *start = rxq->pkt_first_seg;
-   struct rte_mbuf *end =  rxq->pkt_last_seg;
-   unsigned pkt_idx, buf_idx;
-
-   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
-   if (end != NULL) {
-   /* processing a split packet */
-   end->next = rx_bufs[buf_idx];
-   rx_bufs[buf_idx]->data_len += rxq->crc_len;
-
-   start->nb_segs++;
-   start->pkt_len += rx_bufs[buf_idx]->data_len;
-   end = end->next;
-
-   if (!split_flags[buf_idx]) {
-   /* it's the last packet of the set */
-   start->hash = end->hash;
-   start->ol_flags = end->ol_flags;
-   /* we need to strip crc for the whole packet */
-   start->pkt_len -= rxq->crc_len;
-   if (end->data_len > rxq->crc_len)
-   end->data_len -= rxq->crc_len;
-   else {
-   /* free up last mbuf */
-   struct rte_mbuf *secondlast = start;
-
-   start->nb_segs--;
-   while (secondlast->next != end)
-   secondlast = secondlast->next;
-   secondlast->data_len -= (rxq->crc_len -
-   end->data_len);
-   secondlast->next = NULL;
-   rte_pktmbuf_free_seg(end);
-   end = secondlast;
-   }
-   pkts[pkt_idx++] = start;
-   start = end = NULL;
-   }
-   } else {
-   /* not processing a split packet */
-   if (!split_flags[buf_idx]) {
-   /* not a split packet, save and skip */
-   pkts[pkt_idx++] = rx_bufs[buf_idx];
-   continue;
-   }
-   end = start = rx_bufs[buf_idx];
-   rx_bufs[buf_idx]->data_len += rxq->crc_len;
-   rx_bufs[buf_idx]->pkt_len += rxq->crc_len;
-   }
-   }
-
-   /* save the partial packet for next time */
-   rxq->pkt_first_seg = start;
-   rxq->pkt_last_seg = end;
-   memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
-   return pkt_idx;
-}
-
 /*
  * vPMD receive routine that reassembles scattered packets
  *
@@ -539,72 +477,6 @@ vtx(volatile union ixgbe_adv_tx_desc *txdp,
vtx1(txdp, *pkt, flags);
 }

-static inline int __attribute__((always_inline))
-ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
-{
-   struct ixgbe_tx_entry_v *txep;
-   uint32_t status;
-   uint32_t n;
-   uint32_t i;
-   int nb_free = 0;
-   struct rte_mbuf *m, *free[RTE_IXGBE_TX_MAX_FREE_BUF_SZ];
-
-   /* check DD bit on threshold descriptor */
-   status = txq->tx_ring[txq->tx_next_dd].wb.status;
-   if (!(status & IXGBE_ADVTXD_STAT_DD))
-   return 0;
-
-   n = txq->tx_rs_thresh;
-
-   /*
-* first buffer to free from S/W ring is at index
-* tx_next_dd - (tx_rs_thresh-1)
-*/
-   txep = >sw_ring_v[txq->tx_next_dd - (n - 1)];
-   m = __rte_pktmbuf_prefree_seg(txep[0].mbuf

[dpdk-dev] [PATCH 1/4] ixgbe: rearrange vector PMD code for x86

2016-04-26 Thread Jianbo Liu

On 26 April 2016 at 00:35, Bruce Richardson  
wrote:
> On Wed, Apr 20, 2016 at 09:44:59PM +0800, Jianbo Liu wrote:
>> move SSE-dependent code to new file "ixgbe_rxtx_vec_sse.h"
>>
>> Signed-off-by: Jianbo Liu 
>> ---
>>  drivers/net/ixgbe/ixgbe_rxtx_vec.c | 369 +
>>  drivers/net/ixgbe/ixgbe_rxtx_vec_sse.h | 408 
>> +
>>  2 files changed, 409 insertions(+), 368 deletions(-)
>>  create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.h
>>
> Hi Jianbo,
>
> functionally I've given this a quick sanity test and see no issues with 
> performance
> on the x86(_64) side of things.
>
> However, in terms of how the driver split in done in this set of patches, I 
> think
> it might be better to reverse what goes in the header files and in the .c 
> files.
> Rather than having the common code in the .c file and the arch specific code 
> in
> the header file, I think the common code should be in a header file and the
> arch specific code in a .c file.
>
> The reason for this is the need for possibly different compiler flags to be
> passed for the vector drivers from the makefile e.g. as is done by my patchset
> for i40e [http://dpdk.org/dev/patchwork/patch/12082/]. This would be a bit 
> more
> awkward if that one C file is shared by multiple architectures, as we'd have
> architecture specific branches in both makefile and C file. As well as that,
> the possibility exists of multiple vector drivers for one architecture, e.g.
> an SSE and AVX driver for x86_64 with selection of code patch at runtime as 
> done
> by the ACL library. In that case, you want multiple vector code paths compiled
> with different CFLAG overrides, which necessitates different C files.
>
> Therefore, I think using a C file per instruction set/architecture, rather 
> than
> a header file per arch may be more expandable in future.
>

Good suggestion. I will submit v2 later.

Thanks!
Jianbo

[dpdk-dev] [PATCH 4/4] maintainers: claim responsibility for ixgbe vector PMD on ARM

2016-04-20 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1953ea2..07a9a44 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -142,6 +142,7 @@ F: lib/librte_eal/common/include/arch/arm/*_64.h
 F: lib/librte_acl/acl_run_neon.*
 F: lib/librte_lpm/rte_lpm_neon.h
 F: lib/librte_hash/rte*_arm64.h
+F: drivers/net/ixgbe/ixgbe_rxtx_vec_neon.h

 EZchip TILE-Gx
 M: Zhigang Lu 
-- 
1.8.3.1

[dpdk-dev] [PATCH 3/4] ixgbe: enable ixgbe vector PMD on ARMv8a platform

2016-04-20 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 config/defconfig_arm64-armv8a-linuxapp-gcc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 9abeca4..98cc054 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -42,7 +42,6 @@ CONFIG_RTE_FORCE_INTRINSICS=y
 CONFIG_RTE_TOOLCHAIN="gcc"
 CONFIG_RTE_TOOLCHAIN_GCC=y

-CONFIG_RTE_IXGBE_INC_VECTOR=n
 CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
-- 
1.8.3.1

[dpdk-dev] [PATCH 2/4] ixgbe: implement vector PMD for arm architecture

2016-04-20 Thread Jianbo Liu

use ARM NEON intrinsic to implement ixgbe vPMD

Signed-off-by: Jianbo Liu 
---
 drivers/net/ixgbe/ixgbe_rxtx_vec.c  |   4 +
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.h | 371 
 2 files changed, 375 insertions(+)
 create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.h

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index 064a00b..9fcc956 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -38,7 +38,11 @@
 #include "ixgbe_ethdev.h"
 #include "ixgbe_rxtx.h"

+#ifdef RTE_ARCH_ARM64
+#include "ixgbe_rxtx_vec_neon.h"
+#else
 #include "ixgbe_rxtx_vec_sse.h"
+#endif

 /*
  * vPMD receive routine, only accept(nb_pkts >= RTE_IXGBE_DESCS_PER_LOOP)
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.h 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.h
new file mode 100644
index 000..2f1e1ce
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.h
@@ -0,0 +1,371 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ixgbe_ethdev.h"
+#include "ixgbe_rxtx.h"
+
+#include 
+
+#pragma GCC diagnostic ignored "-Wcast-qual"
+
+static inline void
+ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
+{
+   int i;
+   uint16_t rx_id;
+   volatile union ixgbe_adv_rx_desc *rxdp;
+   struct ixgbe_rx_entry *rxep = >sw_ring[rxq->rxrearm_start];
+   struct rte_mbuf *mb0, *mb1;
+   uint64x2_t dma_addr0, dma_addr1;
+   uint64x2_t zero = vdupq_n_u64(0);
+   uint64_t paddr;
+   uint8x8_t p;
+
+   rxdp = rxq->rx_ring + rxq->rxrearm_start;
+
+   /* Pull 'n' more MBUFs into the software ring */
+   if (unlikely(rte_mempool_get_bulk(rxq->mb_pool,
+ (void *)rxep,
+ RTE_IXGBE_RXQ_REARM_THRESH) < 0)) {
+   if (rxq->rxrearm_nb + RTE_IXGBE_RXQ_REARM_THRESH >=
+   rxq->nb_rx_desc) {
+   for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) {
+   rxep[i].mbuf = >fake_mbuf;
+   vst1q_u64((uint64_t *)[i].read,
+ zero);
+   }
+   }
+   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_IXGBE_RXQ_REARM_THRESH;
+   return;
+   }
+
+   p = vld1_u8((uint8_t *)>mbuf_initializer);
+
+   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
+   for (i = 0; i < RTE_IXGBE_RXQ_REARM_THRESH; i += 2, rxep += 2) {
+   mb0 = rxep[0].mbuf;
+   mb1 = rxep[1].mbuf;
+
+   /*
+* Flush mbuf with pkt template.
+* Data to be rearmed is 6 bytes long.
+* Though, RX will overwrite ol_flags that are coming next
+* anyway. So overwrite whole 8 bytes with one load:
+* 6 bytes of rearm_data plus first 2 bytes of ol_flags.
+*/
+   vst1_u8((uint8_t *)>rearm_data, p);
+   paddr = mb0->buf_physaddr + RTE_PKTMBUF_HEADROOM;
+

[dpdk-dev] [PATCH 1/4] ixgbe: rearrange vector PMD code for x86

2016-04-20 Thread Jianbo Liu

move SSE-dependent code to new file "ixgbe_rxtx_vec_sse.h"

Signed-off-by: Jianbo Liu 
---
 drivers/net/ixgbe/ixgbe_rxtx_vec.c | 369 +
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.h | 408 +
 2 files changed, 409 insertions(+), 368 deletions(-)
 create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.h

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index 5040704..064a00b 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -38,364 +38,7 @@
 #include "ixgbe_ethdev.h"
 #include "ixgbe_rxtx.h"

-#include 
-
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
-static inline void
-ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
-{
-   int i;
-   uint16_t rx_id;
-   volatile union ixgbe_adv_rx_desc *rxdp;
-   struct ixgbe_rx_entry *rxep = >sw_ring[rxq->rxrearm_start];
-   struct rte_mbuf *mb0, *mb1;
-   __m128i hdr_room = _mm_set_epi64x(RTE_PKTMBUF_HEADROOM,
-   RTE_PKTMBUF_HEADROOM);
-   __m128i dma_addr0, dma_addr1;
-
-   const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX);
-
-   rxdp = rxq->rx_ring + rxq->rxrearm_start;
-
-   /* Pull 'n' more MBUFs into the software ring */
-   if (rte_mempool_get_bulk(rxq->mb_pool,
-(void *)rxep,
-RTE_IXGBE_RXQ_REARM_THRESH) < 0) {
-   if (rxq->rxrearm_nb + RTE_IXGBE_RXQ_REARM_THRESH >=
-   rxq->nb_rx_desc) {
-   dma_addr0 = _mm_setzero_si128();
-   for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) {
-   rxep[i].mbuf = >fake_mbuf;
-   _mm_store_si128((__m128i *)[i].read,
-   dma_addr0);
-   }
-   }
-   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
-   RTE_IXGBE_RXQ_REARM_THRESH;
-   return;
-   }
-
-   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
-   for (i = 0; i < RTE_IXGBE_RXQ_REARM_THRESH; i += 2, rxep += 2) {
-   __m128i vaddr0, vaddr1;
-   uintptr_t p0, p1;
-
-   mb0 = rxep[0].mbuf;
-   mb1 = rxep[1].mbuf;
-
-   /*
-* Flush mbuf with pkt template.
-* Data to be rearmed is 6 bytes long.
-* Though, RX will overwrite ol_flags that are coming next
-* anyway. So overwrite whole 8 bytes with one load:
-* 6 bytes of rearm_data plus first 2 bytes of ol_flags.
-*/
-   p0 = (uintptr_t)>rearm_data;
-   *(uint64_t *)p0 = rxq->mbuf_initializer;
-   p1 = (uintptr_t)>rearm_data;
-   *(uint64_t *)p1 = rxq->mbuf_initializer;
-
-   /* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
-   vaddr0 = _mm_loadu_si128((__m128i *)&(mb0->buf_addr));
-   vaddr1 = _mm_loadu_si128((__m128i *)&(mb1->buf_addr));
-
-   /* convert pa to dma_addr hdr/data */
-   dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
-   dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
-
-   /* add headroom to pa values */
-   dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
-   dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
-
-   /* set Header Buffer Address to zero */
-   dma_addr0 =  _mm_and_si128(dma_addr0, hba_msk);
-   dma_addr1 =  _mm_and_si128(dma_addr1, hba_msk);
-
-   /* flush desc with pa dma_addr */
-   _mm_store_si128((__m128i *)++->read, dma_addr0);
-   _mm_store_si128((__m128i *)++->read, dma_addr1);
-   }
-
-   rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH;
-   if (rxq->rxrearm_start >= rxq->nb_rx_desc)
-   rxq->rxrearm_start = 0;
-
-   rxq->rxrearm_nb -= RTE_IXGBE_RXQ_REARM_THRESH;
-
-   rx_id = (uint16_t) ((rxq->rxrearm_start == 0) ?
-(rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1));
-
-   /* Update the tail pointer on the NIC */
-   IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rx_id);
-}
-
-/* Handling the offload flags (olflags) field takes computation
- * time when receiving packets. Therefore we provide a flag to disable
- * the processing of the olflags field when they are not needed. This
- * gives improved performance, at the cost of losing the offload info
- * in the received packet
- */
-#ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE
-
-#define VTAG_SHIFT (3)
-
-static inline void
-desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkt

[dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the tail of rx hwring

2016-03-28 Thread Jianbo Liu

Hi Qian,

On 28 March 2016 at 10:30, Xu, Qian Q  wrote:
> Jianbo
> Could you tell me the case that can reproduce the issue? We can help evaluate 
> the impact of performance on ixgbe, but I'm not sure how to check if your 
> patch really fix a problem because I don?t know how to reproduce the problem! 
> Could you first teach me on how to reproduce your issue? Or you may not 
> reproduce it by yourself?
>
It is more an refactoring to original design than fixing an issue. So
I don't know how to reproduce either.
Can you use your usual performance testing cases first, and see if
there is any impact or improvement?

Thanks!
Jianbo

[dpdk-dev] [RFC 0/6] Flattened Device Tree access from DPDK

2016-03-28 Thread Jianbo Liu

On 26 March 2016 at 09:12, Jan Viktorin  wrote:
> Hello,
>
> while extending the DPDK by a kind of platform devices (for the 16.07), an
> access to the FDT might be necessary (or at least very helpful). This patch
> series for 16.07 introduces an approach to solve this topic.
>
> The API is designed from scratch and there is only the Linux backend for it.
> The Linux backend can read and traverse the /proc/device-tree structure. The
> API, however, stays as independent as possible. It is possible to:
>
> * open the FDT in a platform independent way (rte_fdt_open/close)
> * define a path in the FDT in an abstract way (rte_fdt_path)
> * read strings, 32 and 64 bit values, a binary content (rte_fdt_path_readX)
> * walk the FDT structure from a selected point (rte_fdt_path_walk)
>
> I've included unit tests of the API and of the Linux implemention. Some basic
> API tests are introduced in the patch 3. Then a simplified device-tree file
> structure is added together with more tests testing the Linux backend (4,5).
> I've left those 3 patches separated for now but I think they can be aggregated
> into a single patch later.
>
> Here, I've encounter an issue. The testing FDT files (app/test/linux-fdt) need
> to be copied (or linked) to the working directory of the _test_ executable. I
> have no idea, how to integrate such logic into the build system.
>
Why not store FDT files in the code, for example, as a group of binary arrays?
When test is executed, it firstly creates the files in the working
directory from those arrays.

Jianbo

[dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the tail of rx hwring

2016-03-25 Thread Jianbo Liu

On 22 March 2016 at 22:27, Ananyev, Konstantin
 wrote:
>
>
>> -Original Message-----
>> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
>> Sent: Monday, March 21, 2016 2:27 AM
>> To: Richardson, Bruce
>> Cc: Lu, Wenzhuo; Zhang, Helin; Ananyev, Konstantin; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking 
>> at the tail of rx hwring
>>
>> On 18 March 2016 at 18:03, Bruce Richardson  
>> wrote:
>> > On Thu, Mar 17, 2016 at 10:20:01AM +0800, Jianbo Liu wrote:
>> >> On 16 March 2016 at 19:14, Bruce Richardson > >> intel.com> wrote:
>> >> > On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:
>> >> >> Hi Wenzhuo,
>> >> >>
>> >> >> On 16 March 2016 at 14:06, Lu, Wenzhuo  wrote:
>> >> >> > HI Jianbo,
>> >> >> >
>> >> >> >
>> >> >> >> -Original Message-
>> >> >> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianbo Liu
>> >> >> >> Sent: Monday, March 14, 2016 10:26 PM
>> >> >> >> To: Zhang, Helin; Ananyev, Konstantin; dev at dpdk.org
>> >> >> >> Cc: Jianbo Liu
>> >> >> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when 
>> >> >> >> checking at the
>> >> >> >> tail of rx hwring
>> >> >> >>
>> >> >> >> When checking rx ring queue, it's possible that loop will break at 
>> >> >> >> the tail while
>> >> >> >> there are packets still in the queue header.
>> >> >> > Would you like to give more details about in what scenario this 
>> >> >> > issue will be hit? Thanks.
>> >> >> >
>> >> >>
>> >> >> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
>> >> >> descriptiors at the end of hwring to avoid overflow when do checking
>> >> >> on rx side.
>> >> >>
>> >> >> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
>> >> >> time. If all 4 DD are set, and all 4 packets are received.That's OK in
>> >> >> the middle.
>> >> >> But if come to the end of hwring, and less than 4 descriptors left, we
>> >> >> still need to check 4 descriptors at the same time, so the extra empty
>> >> >> descriptors are checked with them.
>> >> >> This time, the number of received packets is apparently less than 4,
>> >> >> and we break out of the loop because of the condition "var !=
>> >> >> RTE_IXGBE_DESCS_PER_LOOP".
>> >> >> So the problem arises. It is possible that there could be more packets
>> >> >> at the hwring beginning that still waiting for being received.
>> >> >> I think this fix can avoid this situation, and at least reduce the
>> >> >> latency for the packets in the header.
>> >> >>
>> >> > Packets are always received in order from the NIC, so no packets ever 
>> >> > get left
>> >> > behind or skipped on an RX burst call.
>> >> >
>> >> > /Bruce
>> >> >
>> >>
>> >> I knew packets are received in order, and no packets will be skipped,
>> >> but some will be left behind as I explained above.
>> >> vPMD will not received nb_pkts required by one RX burst call, and
>> >> those at the beginning of hwring are still waiting to be received till
>> >> the next call.
>> >>
>> >> Thanks!
>> >> Jianbo
>> > HI Jianbo,
>> >
>> > ok, I understand now. I'm not sure that this is a significant problem 
>> > though,
>> > since we are working in polling mode. Is there a performance impact to your
>> > change, because I don't think that we can reduce performance just to fix 
>> > this?
>> >
>> > Regards,
>> > /Bruce
>> It will be a problem because the possibility could be high.
>> Considering rx hwring size is 128 and rx burst is 32, the possiblity
>> can be 32/128.
>> I know this change is critical, so I want you (and maintainers) to do
>> full evaluations about throughput/latency..before making conclusion.
>
> I am still not sure what is a problem you are trying to solve here.
> Yes recv_raw_pkts_vec() call wouldn't wrap around HW ring boundary,
> and yes can return less packets that are actually available by the HW.
> Though as Bruce pointed, they'll be returned to the user by next call.
Have you thought of the interval between these two call, how long could it be?
If application is a simple one like l2fwd/testpmd, that's fine.
But if the interval is long because application has more work to do,
they are different.

> Actually recv_pkts_bulk_alloc() works in a similar way.
> Why do you consider that as a problem?
Driver should pull packets out of hardware and give them to APP as
fast as possible.
If not, there is a possibility that overflow the hardware queue by
more incoming packets.

I did some testings with pktgen-dpdk, and it behaves a little better
with this patch (at least not worse).
Sorry I can't provide more concreate evidences because I don't have
ixia/sprint equipment at hand.
That's why I asked you to do full evaluations before reject this patch. :-)

Thanks!

> Konstantin
>
>>
>> Jianbo

[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-21 Thread Jianbo Liu

On 20 March 2016 at 03:58, Jan Viktorin  wrote:
> The flag is used to enable memcpy optimizations in EAL. As it is not always
> the performance benefit, the flag allows to disable it.
>
> Signed-off-by: Jan Viktorin 
> ---
>  config/defconfig_arm-armv7a-linuxapp-gcc   | 1 +
>  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
> b/config/defconfig_arm-armv7a-linuxapp-gcc
> index 96c3343..2c60c2c 100644
> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
>  CONFIG_RTE_ARCH_ARM=y
>  CONFIG_RTE_ARCH_ARMv7=y
>  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
>
If it's not always benefit, why not disable here since it is common
armv7a config, and enable in your or other user's own config file?

Thanks!
Jianbo

[dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the tail of rx hwring

2016-03-21 Thread Jianbo Liu

On 18 March 2016 at 18:03, Bruce Richardson  
wrote:
> On Thu, Mar 17, 2016 at 10:20:01AM +0800, Jianbo Liu wrote:
>> On 16 March 2016 at 19:14, Bruce Richardson  
>> wrote:
>> > On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:
>> >> Hi Wenzhuo,
>> >>
>> >> On 16 March 2016 at 14:06, Lu, Wenzhuo  wrote:
>> >> > HI Jianbo,
>> >> >
>> >> >
>> >> >> -Original Message-
>> >> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianbo Liu
>> >> >> Sent: Monday, March 14, 2016 10:26 PM
>> >> >> To: Zhang, Helin; Ananyev, Konstantin; dev at dpdk.org
>> >> >> Cc: Jianbo Liu
>> >> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking 
>> >> >> at the
>> >> >> tail of rx hwring
>> >> >>
>> >> >> When checking rx ring queue, it's possible that loop will break at the 
>> >> >> tail while
>> >> >> there are packets still in the queue header.
>> >> > Would you like to give more details about in what scenario this issue 
>> >> > will be hit? Thanks.
>> >> >
>> >>
>> >> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
>> >> descriptiors at the end of hwring to avoid overflow when do checking
>> >> on rx side.
>> >>
>> >> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
>> >> time. If all 4 DD are set, and all 4 packets are received.That's OK in
>> >> the middle.
>> >> But if come to the end of hwring, and less than 4 descriptors left, we
>> >> still need to check 4 descriptors at the same time, so the extra empty
>> >> descriptors are checked with them.
>> >> This time, the number of received packets is apparently less than 4,
>> >> and we break out of the loop because of the condition "var !=
>> >> RTE_IXGBE_DESCS_PER_LOOP".
>> >> So the problem arises. It is possible that there could be more packets
>> >> at the hwring beginning that still waiting for being received.
>> >> I think this fix can avoid this situation, and at least reduce the
>> >> latency for the packets in the header.
>> >>
>> > Packets are always received in order from the NIC, so no packets ever get 
>> > left
>> > behind or skipped on an RX burst call.
>> >
>> > /Bruce
>> >
>>
>> I knew packets are received in order, and no packets will be skipped,
>> but some will be left behind as I explained above.
>> vPMD will not received nb_pkts required by one RX burst call, and
>> those at the beginning of hwring are still waiting to be received till
>> the next call.
>>
>> Thanks!
>> Jianbo
> HI Jianbo,
>
> ok, I understand now. I'm not sure that this is a significant problem though,
> since we are working in polling mode. Is there a performance impact to your
> change, because I don't think that we can reduce performance just to fix this?
>
> Regards,
> /Bruce
It will be a problem because the possibility could be high.
Considering rx hwring size is 128 and rx burst is 32, the possiblity
can be 32/128.
I know this change is critical, so I want you (and maintainers) to do
full evaluations about throughput/latency..before making conclusion.

Jianbo

[dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the tail of rx hwring

2016-03-17 Thread Jianbo Liu

On 16 March 2016 at 19:14, Bruce Richardson  
wrote:
> On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:
>> Hi Wenzhuo,
>>
>> On 16 March 2016 at 14:06, Lu, Wenzhuo  wrote:
>> > HI Jianbo,
>> >
>> >
>> >> -Original Message-
>> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianbo Liu
>> >> Sent: Monday, March 14, 2016 10:26 PM
>> >> To: Zhang, Helin; Ananyev, Konstantin; dev at dpdk.org
>> >> Cc: Jianbo Liu
>> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at 
>> >> the
>> >> tail of rx hwring
>> >>
>> >> When checking rx ring queue, it's possible that loop will break at the 
>> >> tail while
>> >> there are packets still in the queue header.
>> > Would you like to give more details about in what scenario this issue will 
>> > be hit? Thanks.
>> >
>>
>> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
>> descriptiors at the end of hwring to avoid overflow when do checking
>> on rx side.
>>
>> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
>> time. If all 4 DD are set, and all 4 packets are received.That's OK in
>> the middle.
>> But if come to the end of hwring, and less than 4 descriptors left, we
>> still need to check 4 descriptors at the same time, so the extra empty
>> descriptors are checked with them.
>> This time, the number of received packets is apparently less than 4,
>> and we break out of the loop because of the condition "var !=
>> RTE_IXGBE_DESCS_PER_LOOP".
>> So the problem arises. It is possible that there could be more packets
>> at the hwring beginning that still waiting for being received.
>> I think this fix can avoid this situation, and at least reduce the
>> latency for the packets in the header.
>>
> Packets are always received in order from the NIC, so no packets ever get left
> behind or skipped on an RX burst call.
>
> /Bruce
>

I knew packets are received in order, and no packets will be skipped,
but some will be left behind as I explained above.
vPMD will not received nb_pkts required by one RX burst call, and
those at the beginning of hwring are still waiting to be received till
the next call.

Thanks!
Jianbo

[dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the tail of rx hwring

2016-03-16 Thread Jianbo Liu

Hi Wenzhuo,

On 16 March 2016 at 14:06, Lu, Wenzhuo  wrote:
> HI Jianbo,
>
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianbo Liu
>> Sent: Monday, March 14, 2016 10:26 PM
>> To: Zhang, Helin; Ananyev, Konstantin; dev at dpdk.org
>> Cc: Jianbo Liu
>> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the
>> tail of rx hwring
>>
>> When checking rx ring queue, it's possible that loop will break at the tail 
>> while
>> there are packets still in the queue header.
> Would you like to give more details about in what scenario this issue will be 
> hit? Thanks.
>

vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
descriptiors at the end of hwring to avoid overflow when do checking
on rx side.

For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
time. If all 4 DD are set, and all 4 packets are received.That's OK in
the middle.
But if come to the end of hwring, and less than 4 descriptors left, we
still need to check 4 descriptors at the same time, so the extra empty
descriptors are checked with them.
This time, the number of received packets is apparently less than 4,
and we break out of the loop because of the condition "var !=
RTE_IXGBE_DESCS_PER_LOOP".
So the problem arises. It is possible that there could be more packets
at the hwring beginning that still waiting for being received.
I think this fix can avoid this situation, and at least reduce the
latency for the packets in the header.

Thanks!
Jianbo

[dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the tail of rx hwring

2016-03-14 Thread Jianbo Liu

When checking rx ring queue, it's possible that loop will break at the tail
while there are packets still in the queue header.

Signed-off-by: Jianbo Liu 
---
 drivers/net/ixgbe/ixgbe_rxtx_vec.c | 68 +-
 1 file changed, 38 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index ccd93c7..611e431 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -206,10 +206,9 @@ static inline uint16_t
 _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts, uint8_t *split_packet)
 {
-   volatile union ixgbe_adv_rx_desc *rxdp;
+   volatile union ixgbe_adv_rx_desc *rxdp, *rxdp_end;
struct ixgbe_rx_entry *sw_ring;
-   uint16_t nb_pkts_recd;
-   int pos;
+   uint16_t rev;
uint64_t var;
__m128i shuf_msk;
__m128i crc_adjust = _mm_set_epi16(
@@ -232,6 +231,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
/* Just the act of getting into the function from the application is
 * going to cost about 7 cycles */
rxdp = rxq->rx_ring + rxq->rx_tail;
+   rxdp_end = rxq->rx_ring + rxq->nb_rx_desc;

_mm_prefetch((const void *)rxdp, _MM_HINT_T0);

@@ -275,9 +275,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
 * [C*. extract the end-of-packet bit, if requested]
 * D. fill info. from desc to mbuf
 */
-   for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
-   pos += RTE_IXGBE_DESCS_PER_LOOP,
-   rxdp += RTE_IXGBE_DESCS_PER_LOOP) {
+   for (rev = 0; rev < nb_pkts; ) {
__m128i descs0[RTE_IXGBE_DESCS_PER_LOOP];
__m128i descs[RTE_IXGBE_DESCS_PER_LOOP];
__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
@@ -285,17 +283,17 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */

/* B.1 load 1 mbuf point */
-   mbp1 = _mm_loadu_si128((__m128i *)_ring[pos]);
+   mbp1 = _mm_loadu_si128((__m128i *)_ring[0]);

/* Read desc statuses backwards to avoid race condition */
/* A.1 load 4 pkts desc */
descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));

/* B.2 copy 2 mbuf point into rx_pkts  */
-   _mm_storeu_si128((__m128i *)_pkts[pos], mbp1);
+   _mm_storeu_si128((__m128i *)_pkts[rev], mbp1);

/* B.1 load 1 mbuf point */
-   mbp2 = _mm_loadu_si128((__m128i *)_ring[pos+2]);
+   mbp2 = _mm_loadu_si128((__m128i *)_ring[2]);

descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
/* B.1 load 2 mbuf point */
@@ -303,13 +301,13 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
descs0[0] = _mm_loadu_si128((__m128i *)(rxdp));

/* B.2 copy 2 mbuf point into rx_pkts  */
-   _mm_storeu_si128((__m128i *)_pkts[pos+2], mbp2);
+   _mm_storeu_si128((__m128i *)_pkts[rev + 2], mbp2);

if (split_packet) {
-   rte_prefetch0(_pkts[pos]->cacheline1);
-   rte_prefetch0(_pkts[pos + 1]->cacheline1);
-   rte_prefetch0(_pkts[pos + 2]->cacheline1);
-   rte_prefetch0(_pkts[pos + 3]->cacheline1);
+   rte_prefetch0(_pkts[rev]->cacheline1);
+   rte_prefetch0(_pkts[rev + 1]->cacheline1);
+   rte_prefetch0(_pkts[rev + 2]->cacheline1);
+   rte_prefetch0(_pkts[rev + 3]->cacheline1);
}

/* A* mask out 0~3 bits RSS type */
@@ -333,7 +331,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
sterr_tmp1 = _mm_unpackhi_epi32(descs[1], descs[0]);

/* set ol_flags with vlan packet type */
-   desc_to_olflags_v(descs0, _pkts[pos]);
+   desc_to_olflags_v(descs0, _pkts[rev]);

/* D.2 pkt 3,4 set in_port/nb_seg and remove crc */
pkt_mb4 = _mm_add_epi16(pkt_mb4, crc_adjust);
@@ -348,9 +346,9 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
staterr = _mm_unpacklo_epi32(sterr_tmp1, sterr_tmp2);

/* D.3 copy final 3,4 data to rx_pkts */
-   _mm_storeu_si128((void *)_pkts[pos+3]->rx_descriptor_fields1,
+   _mm_storeu_si128((void *)_pkts[rev+3]->rx_descriptor_fields1,
pkt_mb4);
-   _mm_storeu_si128((void *)_pkts[pos+2]->rx_descriptor_fields1,
+   _mm_store

[dpdk-dev] [ [PATCH v2] 06/13] config: armv7/v8: Enable RTE_LIBRTE_VIRTIO_PMD

2015-12-15 Thread Jianbo Liu

On 14 December 2015 at 21:00, Santosh Shukla  wrote:
> Enable RTE_LIBRTE_VIRTIO_PMD for armv7/v8 and setting RTE_VIRTIO_INC_VEC=n.
> Builds successfully for armv7/v8.
>
> Signed-off-by: Santosh Shukla 
> ---
>  config/defconfig_arm-armv7a-linuxapp-gcc   |6 +-
>  config/defconfig_arm64-armv8a-linuxapp-gcc |6 +-
>  2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
> b/config/defconfig_arm-armv7a-linuxapp-gcc
> index cbebd64..d840dc2 100644
> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> @@ -43,6 +43,11 @@ CONFIG_RTE_FORCE_INTRINSICS=y
>  CONFIG_RTE_TOOLCHAIN="gcc"
>  CONFIG_RTE_TOOLCHAIN_GCC=y
>
> +# VIRTIO support for ARM
> +CONFIG_RTE_LIBRTE_VIRTIO_PMD=y

I don't think the above line is needed since already enabled in
config/common_linuxapp.

> +# Disable VIRTIO VECTOR support
> +CONFIG_RTE_VIRTIO_INC_VECTOR=n
> +
>  # ARM doesn't have support for vmware TSC map
>  CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n
>
> @@ -70,7 +75,6 @@ CONFIG_RTE_LIBRTE_I40E_PMD=n
>  CONFIG_RTE_LIBRTE_IXGBE_PMD=n
>  CONFIG_RTE_LIBRTE_MLX4_PMD=n
>  CONFIG_RTE_LIBRTE_MPIPE_PMD=n
> -CONFIG_RTE_LIBRTE_VIRTIO_PMD=n
>  CONFIG_RTE_LIBRTE_VMXNET3_PMD=n
>  CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
>  CONFIG_RTE_LIBRTE_PMD_BNX2X=n
> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
> b/config/defconfig_arm64-armv8a-linuxapp-gcc
> index 504f3ed..b3a4b28 100644
> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> @@ -45,8 +45,12 @@ CONFIG_RTE_TOOLCHAIN_GCC=y
>
>  CONFIG_RTE_CACHE_LINE_SIZE=64
>
> +# Enable VIRTIO support for ARM
> +CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
> +# Disable VIRTIO VECTOR support
> +CONFIG_RTE_VIRTIO_INC_VECTOR=n
> +
>  CONFIG_RTE_IXGBE_INC_VECTOR=n
> -CONFIG_RTE_LIBRTE_VIRTIO_PMD=n
>  CONFIG_RTE_LIBRTE_IVSHMEM=n
>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
>  CONFIG_RTE_LIBRTE_I40E_PMD=n
> --
> 1.7.9.5
>

[dpdk-dev] [PATCH v2 2/3] eal/acl: enable acl for armv7-a

2015-12-08 Thread Jianbo Liu

On 8 December 2015 at 18:03, Thomas Monjalon  
wrote:
> 2015-12-08 15:56, Jianbo Liu:
>> On 8 December 2015 at 10:23, Thomas Monjalon  
>> wrote:
>> > 2015-12-08 09:50, Jianbo Liu:
>> >> On 8 December 2015 at 09:18, Thomas Monjalon > >> 6wind.com> wrote:
>> >> > 2015-12-03 23:02, Jianbo Liu:
>> >> >> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
>> >> >> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
>> >> > [...]
>> >> >> +#ifdef RTE_ARCH_ARM
>> >> >> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>> >> >
>> >> > I'm convinced there is a good reason why ARMv8 is also called 
>> >> > ARCH_ARM64,
>> >> > and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
>> >> >
>> >> https://lkml.org/lkml/2012/7/15/133
>> >>
>> >> > Is ARCH_ARM32 or ARCH_ARMv7 too simple?
>> >> > Is it possible to have a 32-bit ARMv8?
>> >> Yes, ARMv8-R/M
>> >
>> > So what does mean CONFIG_RTE_ARCH_ARM?
>> > ARMv7? ARM32?
>> > Please consider a renaming.
>>
>> I'd rather not renaming becase it can be both ARMv7 and AARCH32, which
>> are ISA compatibility.
>> If further differentiation is needed, CONFIG_RTE_ARCH_ARMv7 is added
>> in the config, just like Jan Viktorin did.
>
> I don't understand.
> You say CONFIG_RTE_ARCH_ARM is for ARMv7 and AARCH32, right?
> Both are 32-bit right?
> Why not rename it to CONFIG_RTE_ARCH_ARM32?

I understand that you want to make the naming more clear.
But arm/arm64 are used in Linux kernel, I think it's better to stay the same.

[dpdk-dev] [PATCH v2 2/3] eal/acl: enable acl for armv7-a

2015-12-08 Thread Jianbo Liu

On 8 December 2015 at 10:23, Thomas Monjalon  
wrote:
> 2015-12-08 09:50, Jianbo Liu:
>> On 8 December 2015 at 09:18, Thomas Monjalon  
>> wrote:
>> > 2015-12-03 23:02, Jianbo Liu:
>> >> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
>> >> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
>> > [...]
>> >> +#ifdef RTE_ARCH_ARM
>> >> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>> >
>> > I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
>> > and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
>> >
>> https://lkml.org/lkml/2012/7/15/133
>>
>> > Is ARCH_ARM32 or ARCH_ARMv7 too simple?
>> > Is it possible to have a 32-bit ARMv8?
>> Yes, ARMv8-R/M
>
> So what does mean CONFIG_RTE_ARCH_ARM?
> ARMv7? ARM32?
> Please consider a renaming.

I'd rather not renaming becase it can be both ARMv7 and AARCH32, which
are ISA compatibility.
If further differentiation is needed, CONFIG_RTE_ARCH_ARMv7 is added
in the config, just like Jan Viktorin did.

[dpdk-dev] [PATCH v2 2/3] eal/acl: enable acl for armv7-a

2015-12-08 Thread Jianbo Liu

On 8 December 2015 at 09:18, Thomas Monjalon  
wrote:
> 2015-12-03 23:02, Jianbo Liu:
>> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
>> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
> [...]
>> +#ifdef RTE_ARCH_ARM
>> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>
> I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
> and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
>
https://lkml.org/lkml/2012/7/15/133

> Is ARCH_ARM32 or ARCH_ARMv7 too simple?
> Is it possible to have a 32-bit ARMv8?
Yes, ARMv8-R/M

[dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic

2015-12-07 Thread Jianbo Liu

On 4 December 2015 at 23:14, Jerin Jacob  
wrote:
> -Used architecture agnostic xmm_t to represent 128 bit SIMD variable
>
> -Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4
> API in  architecture agnostic way
>
> -Moved rte_lpm_lookupx4 SSE implementation to architecture specific
> rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation
> for a different architecture.
>
> Signed-off-by: Jerin Jacob 
> ---
>  app/test/test_lpm.c  |  21 ---
>  app/test/test_xmmt_ops.h |  47 ++
>  lib/librte_lpm/Makefile  |   2 +
>  lib/librte_lpm/rte_lpm.h |  93 +---
>  lib/librte_lpm/rte_lpm_sse.h | 143 
> +++
>  5 files changed, 206 insertions(+), 100 deletions(-)
>  create mode 100644 app/test/test_xmmt_ops.h
>  create mode 100644 lib/librte_lpm/rte_lpm_sse.h
>
> diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
> index 8b4ded9..59674f1 100644
> --- a/app/test/test_lpm.c
> +++ b/app/test/test_lpm.c
> @@ -49,6 +49,7 @@
>
>  #include "rte_lpm.h"
>  #include "test_lpm_routes.h"
> +#include "test_xmmt_ops.h"
>
>  #define TEST_LPM_ASSERT(cond) do {   
>  \
> if (!(cond)) {
> \
> @@ -308,7 +309,7 @@ test6(void)
>  int32_t
>  test7(void)
>  {
> -   __m128i ipx4;
> +   xmm_t ipx4;
> uint16_t hop[4];
> struct rte_lpm *lpm = NULL;
> uint32_t ip = IPv4(0, 0, 0, 0);
> @@ -324,7 +325,7 @@ test7(void)
> status = rte_lpm_lookup(lpm, ip, _hop_return);
> TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add));
>
> -   ipx4 = _mm_set_epi32(ip, ip + 0x100, ip - 0x100, ip);
> +   ipx4 = vect_set_epi32(ip, ip + 0x100, ip - 0x100, ip);
> rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX);
> TEST_LPM_ASSERT(hop[0] == next_hop_add);
> TEST_LPM_ASSERT(hop[1] == UINT16_MAX);
> @@ -354,7 +355,7 @@ test7(void)
>  int32_t
>  test8(void)
>  {
> -   __m128i ipx4;
> +   xmm_t ipx4;
> uint16_t hop[4];
> struct rte_lpm *lpm = NULL;
> uint32_t ip1 = IPv4(127, 255, 255, 255), ip2 = IPv4(128, 0, 0, 0);
> @@ -380,7 +381,7 @@ test8(void)
> TEST_LPM_ASSERT((status == 0) &&
> (next_hop_return == next_hop_add));
>
> -   ipx4 = _mm_set_epi32(ip2, ip1, ip2, ip1);
> +   ipx4 = vect_set_epi32(ip2, ip1, ip2, ip1);
> rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX);
> TEST_LPM_ASSERT(hop[0] == UINT16_MAX);
> TEST_LPM_ASSERT(hop[1] == next_hop_add);
> @@ -408,7 +409,7 @@ test8(void)
> status = rte_lpm_lookup(lpm, ip1, _hop_return);
> TEST_LPM_ASSERT(status == -ENOENT);
>
> -   ipx4 = _mm_set_epi32(ip1, ip1, ip2, ip2);
> +   ipx4 = vect_set_epi32(ip1, ip1, ip2, ip2);
> rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX);
> if (depth != 1) {
> TEST_LPM_ASSERT(hop[0] == next_hop_add);
> @@ -850,7 +851,7 @@ test11(void)
>  int32_t
>  test12(void)
>  {
> -   __m128i ipx4;
> +   xmm_t ipx4;
> uint16_t hop[4];
> struct rte_lpm *lpm = NULL;
> uint32_t ip, i;
> @@ -872,7 +873,7 @@ test12(void)
> TEST_LPM_ASSERT((status == 0) &&
> (next_hop_return == next_hop_add));
>
> -   ipx4 = _mm_set_epi32(ip, ip + 1, ip, ip - 1);
> +   ipx4 = vect_set_epi32(ip, ip + 1, ip, ip - 1);
> rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX);
> TEST_LPM_ASSERT(hop[0] == UINT16_MAX);
> TEST_LPM_ASSERT(hop[1] == next_hop_add);
> @@ -1289,10 +1290,10 @@ perf_test(void)
> begin = rte_rdtsc();
> for (j = 0; j < BATCH_SIZE; j += RTE_DIM(next_hops)) {
> unsigned k;
> -   __m128i ipx4;
> +   xmm_t ipx4;
>
> -   ipx4 = _mm_loadu_si128((__m128i *)(ip_batch + j));
> -   ipx4 = *(__m128i *)(ip_batch + j);
> +   ipx4 = vect_loadu_sil128((xmm_t *)(ip_batch + j));
> +   ipx4 = *(xmm_t *)(ip_batch + j);
> rte_lpm_lookupx4(lpm, ipx4, next_hops, UINT16_MAX);
> for (k = 0; k < RTE_DIM(next_hops); k++)
> if (unlikely(next_hops[k] == UINT16_MAX))
> diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h
> new file mode 100644
> index 000..c055912
> --- /dev/null
> +++ b/app/test/test_xmmt_ops.h
Why add this new file under app/test, which is only for test app?
Should vect_loadu_sil128/vect_set_epi32 be in each ARCH's rte_vect.h?

> @@ -0,0 +1,47 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 Cavium Networks. All rights reserved.
>

[dpdk-dev] [PATCH v2 3/3] maintainers: claim resposibility for ARMv7 and ARMv8

2015-12-03 Thread Jianbo Liu

Signed-off-by: Jianbo Liu 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4478862..f859985 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -124,10 +124,12 @@ F: doc/guides/sample_app_ug/multi_process.rst

 ARM v7
 M: Jan Viktorin 
+M: Jianbo Liu 
 F: lib/librte_eal/common/include/arch/arm/

 ARM v8
 M: Jerin Jacob 
+M: Jianbo Liu 
 F: lib/librte_eal/common/include/arch/arm/*_64.h
 F: lib/librte_acl/acl_run_neon.*

-- 
1.8.3.1

[dpdk-dev] [PATCH v2 2/3] eal/acl: enable acl for armv7-a

2015-12-03 Thread Jianbo Liu

Implement vqtbl1q_u8 intrinsic function, which is not support in armv7-a.

Signed-off-by: Jianbo Liu 
---
 config/defconfig_arm-armv7a-linuxapp-gcc  |  1 -
 lib/librte_acl/Makefile   |  2 +-
 lib/librte_acl/rte_acl.c  |  5 -
 lib/librte_eal/common/include/arch/arm/rte_vect.h | 23 +++
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
b/config/defconfig_arm-armv7a-linuxapp-gcc
index 9924ff9..cbebd64 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -53,7 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
 CONFIG_RTE_EAL_IGB_UIO=n

 # fails to compile on ARM
-CONFIG_RTE_LIBRTE_ACL=n
 CONFIG_RTE_LIBRTE_LPM=n
 CONFIG_RTE_LIBRTE_TABLE=n
 CONFIG_RTE_LIBRTE_PIPELINE=n
diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
index 897237d..2e394c9 100644
--- a/lib/librte_acl/Makefile
+++ b/lib/librte_acl/Makefile
@@ -49,7 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c

-ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
+ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_neon.c
 CFLAGS_acl_run_neon.o += -flax-vector-conversions -Wno-maybe-uninitialized
 else
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index e2fdebd..4ba9786 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -114,8 +114,11 @@ rte_acl_init(void)
 {
enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT;

-#ifdef RTE_ARCH_ARM64
+#if defined(RTE_ARCH_ARM64)
alg =  RTE_ACL_CLASSIFY_NEON;
+#elif defined(RTE_ARCH_ARM)
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   alg =  RTE_ACL_CLASSIFY_NEON;
 #else
 #ifdef CC_AVX2_SUPPORT
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h 
b/lib/librte_eal/common/include/arch/arm/rte_vect.h
index 21cdb4d..a33c054 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
@@ -53,6 +53,29 @@ typedef union rte_xmm {
double   pd[XMM_SIZE / sizeof(double)];
 } __attribute__((aligned(16))) rte_xmm_t;

+#ifdef RTE_ARCH_ARM
+/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
+static __inline uint8x16_t
+vqtbl1q_u8(uint8x16_t a, uint8x16_t b)
+{
+   uint8_t i, pos;
+   rte_xmm_t rte_a, rte_b, rte_ret;
+
+   vst1q_u8(rte_a.u8, a);
+   vst1q_u8(rte_b.u8, b);
+
+   for (i = 0; i < 16; i++) {
+   pos = rte_b.u8[i];
+   if (pos < 16)
+   rte_ret.u8[i] = rte_a.u8[pos];
+   else
+   rte_ret.u8[i] = 0;
+   }
+
+   return vld1q_u8(rte_ret.u8);
+}
+#endif
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1

[dpdk-dev] [PATCH v2 1/3] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h

2015-12-03 Thread Jianbo Liu

CONFIG_* from config files can not be used in code.

Signed-off-by: Jianbo Liu 
Acked-by: Jan Viktorin 
---
 lib/librte_eal/common/include/arch/arm/rte_cycles_32.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h 
b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
index 6c6098e..9c1be71 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
@@ -54,7 +54,7 @@ extern "C" {
  * @return
  *   The time base for this lcore.
  */
-#ifndef CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
+#ifndef RTE_ARM_EAL_RDTSC_USE_PMU

 /**
  * This call is easily portable to any ARM architecture, however,
-- 
1.8.3.1

[dpdk-dev] [PATCH v2 0/3] support acl lib for armv7-a and a small fix

2015-12-03 Thread Jianbo Liu

This patchset includes a small fix in rte_cycle_32.h,
and support ACL for armv7-a platform.

v2:
- select alg as RTE_ACL_CLASSIFY_NEON only when NEON is checked in cpuflags.
- remove lpm/table/pipeline patch, and part of change will be merged into 
Jerin's.

Jianbo Liu (3):
  eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
  eal/acl: enable acl for armv7-a
  maintainers: claim resposibility for ARMv7 and ARMv8

 MAINTAINERS|  2 ++
 config/defconfig_arm-armv7a-linuxapp-gcc   |  1 -
 lib/librte_acl/Makefile|  2 +-
 lib/librte_acl/rte_acl.c   |  5 -
 .../common/include/arch/arm/rte_cycles_32.h|  2 +-
 lib/librte_eal/common/include/arch/arm/rte_vect.h  | 23 ++
 6 files changed, 31 insertions(+), 4 deletions(-)

-- 
1.8.3.1

[dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs

2015-12-02 Thread Jianbo Liu

On 2 December 2015 at 18:39, Jerin Jacob  
wrote:
> On Wed, Dec 02, 2015 at 05:49:41PM +0800, Jianbo Liu wrote:
>> On 2 December 2015 at 16:03, Jerin Jacob  
>> wrote:
>> > On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote:
>> >> On 2 December 2015 at 00:41, Jerin Jacob > >> caviumnetworks.com> wrote:
>> >> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
>> >> >> Adds ARM NEON support for lpm.
>> >> >> And enables table/pipeline libraries which depend on lpm.
>> >> >
>> >> > I already sent the patch on the same yesterday.
>> >> > We can converge the patches after the discussion.
>> >> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
>> >> >
>> >> Yes, I have read your patch. But there are many differences, so I sent
>> >> mine for your reviewing :)
>> >>
>> >> >
>> >> >>
>> >> >> Signed-off-by: Jianbo Liu 
>> >> >> ---
>> >> >>  config/defconfig_arm-armv7a-linuxapp-gcc  |  3 -
>> >> >>  config/defconfig_arm64-armv8a-linuxapp-gcc|  3 -
>> >> >>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++
>> >> >>  lib/librte_lpm/rte_lpm.h  | 68 
>> >> >> ---
>> >> >>  4 files changed, 77 insertions(+), 25 deletions(-)
>> >> >>
>> >> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
>> >> >> b/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> >> index cbebd64..efffa1f 100644
>> >> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
>> >> >>  CONFIG_RTE_EAL_IGB_UIO=n
>> >> >>
>> >> >>  # fails to compile on ARM
>> >> >> -CONFIG_RTE_LIBRTE_LPM=n
>> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
>> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
>> >> >>  CONFIG_RTE_SCHED_VECTOR=n
>> >> >>
>> >> >>  # cannot use those on ARM
>> >> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
>> >> >> b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> >> index 504f3ed..57f7941 100644
>> >> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
>> >> >>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
>> >> >>  CONFIG_RTE_LIBRTE_I40E_PMD=n
>> >> >>
>> >> >> -CONFIG_RTE_LIBRTE_LPM=n
>> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
>> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
>> >> >>  CONFIG_RTE_SCHED_VECTOR=n
>> >> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h 
>> >> >> b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> >> index a33c054..7437711 100644
>> >> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> >> @@ -41,6 +41,8 @@ extern "C" {
>> >> >>
>> >> >>  typedef int32x4_t xmm_t;
>> >> >>
>> >> >> +typedef int32x4_t __m128i;
>> >> >> +
>> >> >>  #define  XMM_SIZE(sizeof(xmm_t))
>> >> >>  #define  XMM_MASK(XMM_SIZE - 1)
>> >> >>
>> >> >> @@ -53,6 +55,32 @@ typedef union rte_xmm {
>> >> >>   double   pd[XMM_SIZE / sizeof(double)];
>> >> >>  } __attribute__((aligned(16))) rte_xmm_t;
>> >> >>
>> >> >> +static __inline __m128i
>> >> >> +_mm_set_epi32(int i3, int i2, int i1, int i0)
>> >> >> +{
>> >> >> + int32_t r[4] = {i0, i1, i2, i3};
>> >> >> +
>> >> >> + return vld1q_s32(r);
>> >> >> +}
>> >> >> +
>> >> >> +static __inline __m128i
>> >> >> +_mm_loadu_si128(__m128i *p)
>> >> >> +{
>> >> >> + return vld1q_s32((int32_t *)p);
>> >> >> +}

[dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs

2015-12-02 Thread Jianbo Liu

On 2 December 2015 at 16:03, Jerin Jacob  
wrote:
> On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote:
>> On 2 December 2015 at 00:41, Jerin Jacob  
>> wrote:
>> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
>> >> Adds ARM NEON support for lpm.
>> >> And enables table/pipeline libraries which depend on lpm.
>> >
>> > I already sent the patch on the same yesterday.
>> > We can converge the patches after the discussion.
>> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
>> >
>> Yes, I have read your patch. But there are many differences, so I sent
>> mine for your reviewing :)
>>
>> >
>> >>
>> >> Signed-off-by: Jianbo Liu 
>> >> ---
>> >>  config/defconfig_arm-armv7a-linuxapp-gcc  |  3 -
>> >>  config/defconfig_arm64-armv8a-linuxapp-gcc|  3 -
>> >>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++
>> >>  lib/librte_lpm/rte_lpm.h  | 68 
>> >> ---
>> >>  4 files changed, 77 insertions(+), 25 deletions(-)
>> >>
>> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
>> >> b/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> index cbebd64..efffa1f 100644
>> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
>> >>  CONFIG_RTE_EAL_IGB_UIO=n
>> >>
>> >>  # fails to compile on ARM
>> >> -CONFIG_RTE_LIBRTE_LPM=n
>> >> -CONFIG_RTE_LIBRTE_TABLE=n
>> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
>> >>  CONFIG_RTE_SCHED_VECTOR=n
>> >>
>> >>  # cannot use those on ARM
>> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
>> >> b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> index 504f3ed..57f7941 100644
>> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
>> >>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
>> >>  CONFIG_RTE_LIBRTE_I40E_PMD=n
>> >>
>> >> -CONFIG_RTE_LIBRTE_LPM=n
>> >> -CONFIG_RTE_LIBRTE_TABLE=n
>> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
>> >>  CONFIG_RTE_SCHED_VECTOR=n
>> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h 
>> >> b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> index a33c054..7437711 100644
>> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> @@ -41,6 +41,8 @@ extern "C" {
>> >>
>> >>  typedef int32x4_t xmm_t;
>> >>
>> >> +typedef int32x4_t __m128i;
>> >> +
>> >>  #define  XMM_SIZE(sizeof(xmm_t))
>> >>  #define  XMM_MASK(XMM_SIZE - 1)
>> >>
>> >> @@ -53,6 +55,32 @@ typedef union rte_xmm {
>> >>   double   pd[XMM_SIZE / sizeof(double)];
>> >>  } __attribute__((aligned(16))) rte_xmm_t;
>> >>
>> >> +static __inline __m128i
>> >> +_mm_set_epi32(int i3, int i2, int i1, int i0)
>> >> +{
>> >> + int32_t r[4] = {i0, i1, i2, i3};
>> >> +
>> >> + return vld1q_s32(r);
>> >> +}
>> >> +
>> >> +static __inline __m128i
>> >> +_mm_loadu_si128(__m128i *p)
>> >> +{
>> >> + return vld1q_s32((int32_t *)p);
>> >> +}
>> >> +
>> >> +static __inline __m128i
>> >> +_mm_set1_epi32(int i)
>> >> +{
>> >> + return vdupq_n_s32(i);
>> >> +}
>> >> +
>> >> +static __inline __m128i
>> >> +_mm_and_si128(__m128i a, __m128i b)
>> >> +{
>> >> + return vandq_s32(a, b);
>> >> +}
>> >> +
>
> IMO, it's not always good to emulate GCC defined intrinsics of
> other architecture. What if a legacy DPDK application has such mappings
> then BOOM, multiple definition, which one is correct? which one
> to comment it out? Integration pain starts for DPDK library consumer:-(
>
They can include rte_vect.h in build/include directly, which is linked correctly
to the one for that ARCH, so there is no need to worry about.


>> >
>> > IMO, it makes sense to

[dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs

2015-12-02 Thread Jianbo Liu

On 2 December 2015 at 00:41, Jerin Jacob  
wrote:
> On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
>> Adds ARM NEON support for lpm.
>> And enables table/pipeline libraries which depend on lpm.
>
> I already sent the patch on the same yesterday.
> We can converge the patches after the discussion.
> Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
>
Yes, I have read your patch. But there are many differences, so I sent
mine for your reviewing :)

>
>>
>> Signed-off-by: Jianbo Liu 
>> ---
>>  config/defconfig_arm-armv7a-linuxapp-gcc  |  3 -
>>  config/defconfig_arm64-armv8a-linuxapp-gcc|  3 -
>>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++
>>  lib/librte_lpm/rte_lpm.h  | 68 
>> ---
>>  4 files changed, 77 insertions(+), 25 deletions(-)
>>
>> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
>> b/config/defconfig_arm-armv7a-linuxapp-gcc
>> index cbebd64..efffa1f 100644
>> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
>> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
>> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
>>  CONFIG_RTE_EAL_IGB_UIO=n
>>
>>  # fails to compile on ARM
>> -CONFIG_RTE_LIBRTE_LPM=n
>> -CONFIG_RTE_LIBRTE_TABLE=n
>> -CONFIG_RTE_LIBRTE_PIPELINE=n
>>  CONFIG_RTE_SCHED_VECTOR=n
>>
>>  # cannot use those on ARM
>> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
>> b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> index 504f3ed..57f7941 100644
>> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
>> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
>>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
>>  CONFIG_RTE_LIBRTE_I40E_PMD=n
>>
>> -CONFIG_RTE_LIBRTE_LPM=n
>> -CONFIG_RTE_LIBRTE_TABLE=n
>> -CONFIG_RTE_LIBRTE_PIPELINE=n
>>  CONFIG_RTE_SCHED_VECTOR=n
>> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h 
>> b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> index a33c054..7437711 100644
>> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> @@ -41,6 +41,8 @@ extern "C" {
>>
>>  typedef int32x4_t xmm_t;
>>
>> +typedef int32x4_t __m128i;
>> +
>>  #define  XMM_SIZE(sizeof(xmm_t))
>>  #define  XMM_MASK(XMM_SIZE - 1)
>>
>> @@ -53,6 +55,32 @@ typedef union rte_xmm {
>>   double   pd[XMM_SIZE / sizeof(double)];
>>  } __attribute__((aligned(16))) rte_xmm_t;
>>
>> +static __inline __m128i
>> +_mm_set_epi32(int i3, int i2, int i1, int i0)
>> +{
>> + int32_t r[4] = {i0, i1, i2, i3};
>> +
>> + return vld1q_s32(r);
>> +}
>> +
>> +static __inline __m128i
>> +_mm_loadu_si128(__m128i *p)
>> +{
>> + return vld1q_s32((int32_t *)p);
>> +}
>> +
>> +static __inline __m128i
>> +_mm_set1_epi32(int i)
>> +{
>> + return vdupq_n_s32(i);
>> +}
>> +
>> +static __inline __m128i
>> +_mm_and_si128(__m128i a, __m128i b)
>> +{
>> + return vandq_s32(a, b);
>> +}
>> +
>
> IMO, it makes sense to not emulate the SSE intrinsics with NEON
> Let's create the rte_vect_* as required. look at the existing patch.
>
I thought of creating a layer of SIMD over all the platforms before.
But can't you see it make things complicated, considering there are
only few simple intrinsic to implement?
If do so, we also need to explain to others how to use these interfaces.
Besides, this patch did the smallest changes to the original code, and
more likely to be accepted by others.

>
>>  #ifdef RTE_ARCH_ARM
>>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>>  static __inline uint8x16_t
>> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
>> index c299ce2..c76c07d 100644
>> --- a/lib/librte_lpm/rte_lpm.h
>> +++ b/lib/librte_lpm/rte_lpm.h
>> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, 
>> const uint32_t * ips,
>>  /* Mask four results. */
>>  #define   RTE_LPM_MASKX4_RES UINT64_C(0x00ff00ff00ff00ff)
>>
>> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>
> Separate out arm implementation to the different header file.
> Too many ifdef looks odd in the header file and difficult to manage.
>
But there are many ifdefs already.
And It seems unreasonable to add a new file only for one small function.

>
>> +static inline void
>> +rte_lpm_t

[dpdk-dev] [PATCH 2/4] eal/acl: enable acl for armv7-a

2015-12-02 Thread Jianbo Liu

On 1 December 2015 at 22:46, Jan Viktorin  wrote:
> On Tue, 1 Dec 2015 20:13:49 +0530
> Jerin Jacob  wrote:
>
>> > enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT;
>> >
>> > -#ifdef RTE_ARCH_ARM64
>> > +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>> > alg =  RTE_ACL_CLASSIFY_NEON;
>>
>> I believe SIMD is optional in armv7. If true, select alg as
>> RTE_ACL_CLASSIFY_NEON only when cpufeature NEON enabled.
>
> Yes. Or, probably, we can be happy with
>
> #if defined(__ARM_NEON_FP)
> ...
> #endif
>
> as it is currently done in rte_memcpy_32.h.
>
> Regards
> Jan

Athough optional for armv7, I believe there is NEON in most of the
popular armv7a chips.
Anyway, I will add the checking...

Thanks!

[dpdk-dev] [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8

2015-12-01 Thread Jianbo Liu

On Tue, Dec 01, 2015 at 01:47:23PM +0100, Jan Viktorin wrote:
> On Tue,  1 Dec 2015 13:41:12 -0500
> Jianbo Liu  wrote:
> 
> > Hi,
> > I'm from Linaro.org, and will work on DPDK to make it better
> > runing on different ARM Platforms.
> > 
> > This patchset includes a small fix in rte_cycle_32.h,
> > and enables acl/lpm/table/pipeline libs for armv7 and armv8.
> > Please apply it after [PATCH v4 0/2] disable CONFIG_RTE_SCHED_VECTOR for 
> > arm.
> 
> Would it avoid some merge conflicts or is there some other dependency?
> 
There is no conflicts, but please apply Jerin's patch first since this
patchset is based on that.

> Jan
> 
> > 
> > Thanks!
> > Jianbo
> > 
> > 
> > Jianbo Liu (4):
> >   eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
> >   eal/acl: enable acl for armv7-a
> >   eal/arm: Enable lpm/table/pipeline libs
> >   maintainers: claim resposibility for ARMv7 and ARMv8
> > 
> >  MAINTAINERS|  2 +
> >  config/defconfig_arm-armv7a-linuxapp-gcc   |  4 --
> >  config/defconfig_arm64-armv8a-linuxapp-gcc |  3 -
> >  lib/librte_acl/Makefile|  2 +-
> >  lib/librte_acl/rte_acl.c   |  2 +-
> >  .../common/include/arch/arm/rte_cycles_32.h|  2 +-
> >  lib/librte_eal/common/include/arch/arm/rte_vect.h  | 51 
> >  lib/librte_lpm/rte_lpm.h   | 68 
> > --
> >  8 files changed, 105 insertions(+), 29 deletions(-)
> > 
> 
> 
> 
> -- 
>Jan Viktorin  E-mail: Viktorin at RehiveTech.com
>System Architect  Web:www.RehiveTech.com
>RehiveTech
>Brno, Czech Republic

1 2 >

1 - 100 of 109 matches

Mail list logo