date:20160804

[dpdk-dev] [RFC 0/4] Use Google Test as DPDK unit test framework

2016-08-04 Thread Wiles, Keith


> On Aug 4, 2016, at 2:47 PM, Jim Murphy  wrote:
> 
> Hi,
> 
> We are looking at using our existing test environment for our DPDK
> applications that will run on our build servers. Hughpages therefore is an
> issue. What is involved in running DPDK without huge pages?

Command line option  ?no-huge should work. Note two dashs in front.

> 
> Thanks,
> 
> Jim
> 
> 
> On Wed, Aug 3, 2016 at 1:46 PM, Ming Zhao(??) 
> wrote:
> 
>> googletest is a very nice test framework and we use it very
>> extensively in our company(Luminate Wireless), together with gmock.
>> 
>> I understand the resistance from the maintainers that are concerned
>> about introducing a C++ dependency to a pure C code base. The approach
>> we take doesn't require any change to the dpdk core, instead we just
>> use things like a mock PMD(through gmock framework) to allow mocking
>> the RX/TX code path, disabling huge page usage in test so that the
>> test can be easily launched without worrying about huge page
>> collision, etc.
>> 
>> Personally I highly recommend using googletest plus some basic test
>> cases, which removes a lot of boilerplate and let the developers focus
>> the test itself.
>> 
>> On Wed, Aug 3, 2016 at 2:57 AM, Doherty, Declan
>>  wrote:
>>> 
>>> 
 -Original Message-
>>> ...
 You are not advocating but the unit test must be written in C++.
 I don't think it is a good idea to force people to write and maintain
>> the tests
 in a different language than the code it tests.
>>> 
>>> I know where you are coming from on this point, and I general would
>> agree if
>>> it were not for the advantages you get from C++ test framework. Having
>> worked with
>>> multiple C and C++ frameworks, I've found that one of the biggest
>> advantages of the
>>> C++ frameworks is the amount of boilerplate code they can save you from
>> writing. Also
>>> nearly all of C frameworks I've used make use macros to the point that
>> they look more like
>>> objective C than C. In general I feel that even if the test code is
>> written in C++ the code itself
>>> should be simple enough that someone with even a passing knowledge of
>> C++ could easily
>>> understand the intent of the test code.
>>> 
> Some of the major advantages of google test that I see over
>> continuing to use
 the
> current test include giving a consist feel to all tests, a powerful
>> test
> execution framework which allow individual test suites or tests to be
>> specified
> from the command line, support for a standard xunit output which can
>> be
 integrated
> into a continuous build systems, and a very powerful mocking library
> which allows much more control over testing failure conditions.
 
 It would be interesting to better describe in details what is missing
>> currently
 and what such a framework can bring.
 (I agree there is a huge room for improvements on unit tests)
>>> 
>>> Some of the things I've come across include:
>>> No standard output format to integrated with continuous regression
>> systems
>>> No ability to specify specific unit tests or groups of tests to run from
>> the command line
>>> No standard set of test assertions used across the test suites.
>>> No standard setup and teardown functions across test suites, state from
>> previous test
>>> suite can break current
>>> Requirement to use a python script to orchestrate test runs.
>>> No support for mocking functionality.
>>> 
>>> I know that none of the above couldn't be fixed in our current test
>> application, but I would
>>> question if it is effort worthwhile when we take an off the shelf
>> framework, which does all
>>> those things and a whole lot more, which has been test and used in a
>> huge variety of
>>> projects.
>>> 
>>> I certainly willing to look at other frameworks both C and C++ but I yet
>> to find a C framework
>>> which come close to the usability and flexibility of the popular C++
>> ones.
>>> 
>>> 
>>> 
>>

[dpdk-dev] [PATCH] doc: announce ivshmem support removal

2016-08-04 Thread Thomas Monjalon

2016-08-04 17:00, :
> Hi Thomas,
> 
> I'm going to develop a NFV based carrier system with SPP because it's able
> to chain VMs in high-performance and I think it might be the best solution
> for service function chaining. Without ivshmem, throuput between VMs is
> largely decreased. So we are not so happy if ivshmem is obsoleted.

We are not obsoleting IVSHMEM which is a QEMU feature.
We are just dropping the automatic allocation of DPDK objects in the guest
through IVSHMEM.
It is a weird design that nobody really wants to maintain/redesign.
If someone wants to do the required work or reimplement it differently,
it may be accepted.

> As you mentioned, we can use v16.07 but it's unwelcome situation for us and
> our possible users cannot gain of improvements from future version of DPDK.
> I'm appreciated if you kindly keep ivshmem to be maintained.

Nobody wants to really work on it. That's a fact, sorry.

> Or, is there
> any idea for high-performance inter-VM communication as ivshmem?

Yes you can implement an IVSHMEM driver in DPDK.
There was an attempt to do so which is also unmaintained:
http://dpdk.org/browse/old/memnic/
I think it would be a better approach than what exists currently.

[dpdk-dev] rte_eth_dev_attach returns 0, although device is not attached

2016-08-04 Thread Igor Ryzhov


> 4 ???. 2016 ?., ? 16:21, Ferruh Yigit  ???(?):
> 
> On 8/4/2016 12:51 PM, Igor Ryzhov wrote:
>> Hello Ferruh,
>> 
>>> 4 ???. 2016 ?., ? 14:33, Ferruh Yigit  
>>> ???(?):
>>> 
>>> Hi Igor,
>>> 
>>> On 8/3/2016 5:58 PM, Igor Ryzhov wrote:
 Hello.
 
 Function rte_eth_dev_attach can return false positive result.
 It happens because rte_eal_pci_probe_one returns zero if no driver is 
 found for the device:
 ret = pci_probe_all_drivers(dev);
 if (ret < 0)
goto err_return;
 return 0;
 (pci_probe_all_drivers returns 1 in that case)
 
 For example, it can be easily reproduced by trying to attach virtio 
 device, managed by kernel driver.
>>> 
>>> You are right, and I did able to reproduce this issue with virtio as you
>>> suggest.
>>> 
>>> But I wonder why rte_eth_dev_get_port_by_addr() is not catching this.
>>> Perhaps a dev->attached check needs to be added into this function.
> 
> With a second check, rte_eth_dev_get_port_by_addr() catches it if the
> driver is missing.
> 
> But for virtio case, problem is not missing driver.
> Problem is eth_virtio_dev_init() is returning a positive value on fail.
> 
> Call stack is:
> rte_eal_pci_probe_one
>pci_probe_all_drivers
>rte_eal_pci_probe_one_driver
>rte_eth_dev_init
>   eth_virtio_dev_init
> 
> So rte_eal_pci_probe_one_driver() also returns positive value, as no
> driver found, and rte_eth_dev_get_port_by_addr() returns a valid
> port_id, since rte_eth_dev_init() allocated an eth_dev.
> 
> Briefly, this can be fixed in virtio pmd, instead of eal pci.
> 
>>> 
 
 I think it should be:
 ret = pci_probe_all_drivers(dev);
 if (ret)
goto err_return;
 return 0;
>>> 
>>> Your proposal looks good to me. Will you send a patch?
>> 
> 
> Original code silently ignores the if driver is missing for that dev,
> although it is still questionable, I think we can keep this as it is.
> 
>> Patch sent.
> 
> Sorry for this, but can you please test with following modification in
> virtio:
> index 07d6449..c74 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1156,7 +1156,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
>if (pci_dev) {
>ret = vtpci_init(pci_dev, hw, _flags);
>if (ret)
> -   return ret;
> +   return -1;
>}
> 
>/* Reset the device although not necessary at startup */

I think it's not a good change, because it will break the idea of this patch - 
http://dpdk.org/browse/dpdk/commit/?id=ac5e1d83 


Also, with your patch the application will not start, because rte_eal_pci_probe 
will fail:

if (ret < 0)
rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 " cannot be used\n", dev->addr.domain, dev->addr.bus,
 dev->addr.devid, dev->addr.function);

And now I think that maybe we should change the way rte_eal_pci_probe works.
I think we shouldn't stop the application if just one of PCI devices is not 
probed successfully.

> 
> 
>> 
>>> 
 Best regards,
 Igor

[dpdk-dev] [PATCH] ethtool: remove triple license information

2016-08-04 Thread Ferruh Yigit

On 8/1/2016 1:17 PM, Christian Ehrhardt wrote:
> License information is already in LICENSE.GPL.
> Remove two extra copies and change referred filename in the files.
> 
> Signed-off-by: Christian Ehrhardt 

In patch subject, I think it is better to use "kni: " tag instead of
"ethtool:", apart from this:

Acked-by: Ferruh Yigit

[dpdk-dev] dpdk 16.07, issues with rte_mempool_create and rte_kni_alloc()

2016-08-04 Thread Ferruh Yigit

On 8/1/2016 10:19 PM, Gopakumar Choorakkot Edakkunni wrote:
> Well, for my purpose I just ended up creating a seperate/smaller pool
> earlier during bootup to try to guarantee its from one memseg.
> 
> But I am assuming that this KNI restriction is something thats "currently"
> not fixed and is "fixable" ? 

> Any ideas on what the summary of the reason
> for this restriction is - I was gonna check if I can fix that

KNI expects all mbufs are from a physically continuous memory. This is
because of current address translation implementation.

mbufs allocated in userspace and accessed from both user and kernel
space, so mbuf userspace virtual address needs to be converted into
kernelspace virtual address.

Currently this address translation done by first calculating an offset
between virtual addresses using first filed of mempool, later applying
same offset to all mbufs. This is why all mbufs should be in physically
continuous memory.

I think this address translation can be done in different way which can
remove the restriction, but not sure about the effect of the
performance. I will send a patch for this.

Regards,
ferruh

[dpdk-dev] [PATCH] crypto/qat: optimisation of request copy

2016-08-04 Thread John Griffin

On 04/08/16 13:00, Fiona Trahe wrote:
> From: Fiona Trahe 
>
> using rte_mov128 instead of structure assignment to copy
> template request from session context into request
>
> Signed-off-by: Fiona Trahe 
>
> ---
>   drivers/crypto/qat/qat_crypto.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>

Acked-by: John Griffin

[dpdk-dev] [PATCH] doc: announce ivshmem support removal

2016-08-04 Thread 小川泰文

Hi Thomas,

I'm going to develop a NFV based carrier system with SPP because it's able
to chain VMs in high-performance and I think it might be the best solution
for service function chaining. Without ivshmem, throuput between VMs is
largely decreased. So we are not so happy if ivshmem is obsoleted.

As you mentioned, we can use v16.07 but it's unwelcome situation for us and
our possible users cannot gain of improvements from future version of DPDK.
I'm appreciated if you kindly keep ivshmem to be maintained. Or, is there
any idea for high-performance inter-VM communication as ivshmem?

Regards,

Yasufumi Ogawa
Research Engineer
NTT Network Service Systems Labs

[dpdk-dev] [PATCH] kni: memzone info not required to get mbuf address

2016-08-04 Thread Ferruh Yigit

Originally mempool->mz is used to get address of the mbuf, but now
address get directly from mempool, so mempool->mz information is not
required.

Fixes: d1d914ebbc25 ("mempool: allocate in several memory chunks by default")

Signed-off-by: Ferruh Yigit 
---
 lib/librte_kni/rte_kni.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index 3028fd4..f48b72b 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -321,9 +321,7 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool,
struct rte_kni_device_info dev_info;
struct rte_kni *ctx;
char intf_name[RTE_KNI_NAMESIZE];
-   char mz_name[RTE_MEMZONE_NAMESIZE];
const struct rte_memzone *mz;
-   const struct rte_mempool *mp;
struct rte_kni_memzone_slot *slot = NULL;

if (!pktmbuf_pool || !conf || !conf->name[0])
@@ -416,17 +414,12 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool,


/* MBUF mempool */
-   snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_MZ_FORMAT,
-   pktmbuf_pool->name);
-   mz = rte_memzone_lookup(mz_name);
-   KNI_MEM_CHECK(mz == NULL);
-   mp = (struct rte_mempool *)mz->addr;
/* KNI currently requires to have only one memory chunk */
-   if (mp->nb_mem_chunks != 1)
+   if (pktmbuf_pool->nb_mem_chunks != 1)
goto kni_fail;

-   dev_info.mbuf_va = STAILQ_FIRST(>mem_list)->addr;
-   dev_info.mbuf_phys = STAILQ_FIRST(>mem_list)->phys_addr;
+   dev_info.mbuf_va = STAILQ_FIRST(_pool->mem_list)->addr;
+   dev_info.mbuf_phys = STAILQ_FIRST(_pool->mem_list)->phys_addr;
ctx->pktmbuf_pool = pktmbuf_pool;
ctx->group_id = conf->group_id;
ctx->slot_id = slot->id;
-- 
2.7.4

[dpdk-dev] rte_eth_dev_attach returns 0, although device is not attached

2016-08-04 Thread Ferruh Yigit

On 8/4/2016 3:54 PM, Igor Ryzhov wrote:
> 
>> 4 ???. 2016 ?., ? 16:21, Ferruh Yigit > > ???(?):
>>
>> On 8/4/2016 12:51 PM, Igor Ryzhov wrote:
>>> Hello Ferruh,
>>>
 4 ???. 2016 ?., ? 14:33, Ferruh Yigit >>> > ???(?):

 Hi Igor,

 On 8/3/2016 5:58 PM, Igor Ryzhov wrote:
> Hello.
>
> Function rte_eth_dev_attach can return false positive result.
> It happens because rte_eal_pci_probe_one returns zero if no driver
> is found for the device:
> ret = pci_probe_all_drivers(dev);
> if (ret < 0)
> goto err_return;
> return 0;
> (pci_probe_all_drivers returns 1 in that case)
>
> For example, it can be easily reproduced by trying to attach virtio
> device, managed by kernel driver.

 You are right, and I did able to reproduce this issue with virtio as you
 suggest.

 But I wonder why rte_eth_dev_get_port_by_addr() is not catching this.
 Perhaps a dev->attached check needs to be added into this function.
>>
>> With a second check, rte_eth_dev_get_port_by_addr() catches it if the
>> driver is missing.
>>
>> But for virtio case, problem is not missing driver.
>> Problem is eth_virtio_dev_init() is returning a positive value on fail.
>>
>> Call stack is:
>> rte_eal_pci_probe_one
>>pci_probe_all_drivers
>>rte_eal_pci_probe_one_driver
>>rte_eth_dev_init
>>   eth_virtio_dev_init
>>
>> So rte_eal_pci_probe_one_driver() also returns positive value, as no
>> driver found, and rte_eth_dev_get_port_by_addr() returns a valid
>> port_id, since rte_eth_dev_init() allocated an eth_dev.
>>
>> Briefly, this can be fixed in virtio pmd, instead of eal pci.
>>

>
> I think it should be:
> ret = pci_probe_all_drivers(dev);
> if (ret)
> goto err_return;
> return 0;

 Your proposal looks good to me. Will you send a patch?
>>>
>>
>> Original code silently ignores the if driver is missing for that dev,
>> although it is still questionable, I think we can keep this as it is.
>>
>>> Patch sent.
>>
>> Sorry for this, but can you please test with following modification in
>> virtio:
>> index 07d6449..c74 100644
>> --- a/drivers/net/virtio/virtio_ethdev.c
>> +++ b/drivers/net/virtio/virtio_ethdev.c
>> @@ -1156,7 +1156,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
>>if (pci_dev) {
>>ret = vtpci_init(pci_dev, hw, _flags);
>>if (ret)
>> -   return ret;
>> +   return -1;
>>}
>>
>>/* Reset the device although not necessary at startup */
> 
> I think it's not a good change, because it will break the idea of this
> patch - http://dpdk.org/browse/dpdk/commit/?id=ac5e1d83

Yes, breaks this one, I wasn't aware of this patch. But in this patch,
commit log says: "return 1 to tell the upper layer we
don't take over this device.", I am not sure upper layer designed for this.

> 
> Also, with your patch the application will not start, because
> rte_eal_pci_probe will fail:
> 
> if (ret < 0)
> rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
>  " cannot be used\n", dev->addr.domain, dev->addr.bus,
>  dev->addr.devid, dev->addr.function);

Yes it fails, and this looks like intended behavior. This failure is
correct according code.

> 
> And now I think that maybe we should change the way rte_eal_pci_probe works.
> I think we shouldn't stop the application if just one of PCI devices is
> not probed successfully.

Agreed. Overall rte_exit() usage already discussed a few times.

I think best option is:
- don't exit app if rte_eal_pci_probe() fails, only print an error.
- eth_virtio_dev_init() return negative error value for all error cases
(including device managed by kernel)

Or perhaps RTE_KDRV_UNKNOWN check can be moved from virtio_pmd into
higher level and can be done for all devices. Like
pci_probe_one_driver() can fail if device driver is RTE_KDRV_UNKNOWN.

Any comments?


> 
>>
>>
>>>

> Best regards,
> Igor
>

[dpdk-dev] [PATCH v1] crypto/qat: make the session struct variable in size

2016-08-04 Thread John Griffin

This patch changes the qat firmware session data structure from a fixed
size to a variable size which is dependent on the size of the chosen
algorithm.
This reduces the amount of bytes which are transferred across
PCIe and thus helps to increase qat performance when the
accelerator is bound by PCIe.

Signed-off-by: John Griffin 
---
v1:
* Fixed a compile issue with icc.

 drivers/crypto/qat/qat_adf/qat_algs.h|   5 +-
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 463 +--
 drivers/crypto/qat/qat_crypto.c  |  15 +-
 3 files changed, 184 insertions(+), 299 deletions(-)

diff --git a/drivers/crypto/qat/qat_adf/qat_algs.h 
b/drivers/crypto/qat/qat_adf/qat_algs.h
index 243c1b4..6a86053 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs.h
+++ b/drivers/crypto/qat/qat_adf/qat_algs.h
@@ -87,8 +87,10 @@ struct qat_session {
enum icp_qat_hw_cipher_mode qat_mode;
enum icp_qat_hw_auth_algo qat_hash_alg;
struct qat_alg_cd cd;
+   uint8_t *cd_cur_ptr;
phys_addr_t cd_paddr;
struct icp_qat_fw_la_bulk_req fw_req;
+   uint8_t aad_len;
struct qat_crypto_instance *inst;
uint8_t salt[ICP_QAT_HW_AES_BLK_SZ];
rte_spinlock_t lock;/* protects this struct */
@@ -115,7 +117,8 @@ int qat_alg_aead_session_create_content_desc_auth(struct 
qat_session *cdesc,
uint32_t digestsize,
unsigned int operation);

-void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr *header);
+void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr *header,
+   uint16_t proto);

 void qat_alg_ablkcipher_init_enc(struct qat_alg_ablkcipher_cd *cd,
int alg, const uint8_t *key,
diff --git a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c 
b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
index 185bb33..c658f6e 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
+++ b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
@@ -344,7 +344,8 @@ static int qat_alg_do_precomputes(enum icp_qat_hw_auth_algo 
hash_alg,
return 0;
 }

-void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr *header)
+void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr *header,
+   uint16_t proto)
 {
PMD_INIT_FUNC_TRACE();
header->hdr_flags =
@@ -358,7 +359,7 @@ void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr 
*header)
ICP_QAT_FW_LA_CIPH_IV_FLD_FLAG_SET(header->serv_specif_flags,
   ICP_QAT_FW_CIPH_IV_16BYTE_DATA);
ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags,
-   ICP_QAT_FW_LA_NO_PROTO);
+   proto);
ICP_QAT_FW_LA_UPDATE_STATE_SET(header->serv_specif_flags,
   ICP_QAT_FW_LA_NO_UPDATE_STATE);
 }
@@ -375,127 +376,88 @@ int 
qat_alg_aead_session_create_content_desc_cipher(struct qat_session *cdesc,
struct icp_qat_fw_cipher_cd_ctrl_hdr *cipher_cd_ctrl = ptr;
struct icp_qat_fw_auth_cd_ctrl_hdr *hash_cd_ctrl = ptr;
enum icp_qat_hw_cipher_convert key_convert;
+   uint32_t total_key_size;
uint16_t proto = ICP_QAT_FW_LA_NO_PROTO;/* no CCM/GCM/Snow3G */
-   uint16_t cipher_offset = 0;
+   uint16_t cipher_offset, cd_size;

PMD_INIT_FUNC_TRACE();

-   if (cdesc->qat_cmd == ICP_QAT_FW_LA_CMD_HASH_CIPHER &&
-   cdesc->qat_hash_alg != ICP_QAT_HW_AUTH_ALGO_SNOW_3G_UIA2) {
-   cipher =
-   (struct icp_qat_hw_cipher_algo_blk *)((char *)>cd +
-   sizeof(struct icp_qat_hw_auth_algo_blk));
-   cipher_offset = sizeof(struct icp_qat_hw_auth_algo_blk);
-   } else {
-   cipher = (struct icp_qat_hw_cipher_algo_blk *)>cd;
-   cipher_offset = 0;
-   }
-   /* CD setup */
-   if (cdesc->qat_dir == ICP_QAT_HW_CIPHER_ENCRYPT) {
-   ICP_QAT_FW_LA_RET_AUTH_SET(header->serv_specif_flags,
-   ICP_QAT_FW_LA_RET_AUTH_RES);
-   ICP_QAT_FW_LA_CMP_AUTH_SET(header->serv_specif_flags,
-   ICP_QAT_FW_LA_NO_CMP_AUTH_RES);
-   } else {
+   if (cdesc->qat_cmd == ICP_QAT_FW_LA_CMD_CIPHER) {
+   cd_pars->u.s.content_desc_addr = cdesc->cd_paddr;
+   ICP_QAT_FW_COMN_CURR_ID_SET(cipher_cd_ctrl,
+   ICP_QAT_FW_SLICE_CIPHER);
+   ICP_QAT_FW_COMN_NEXT_ID_SET(cipher_cd_ctrl,
+   ICP_QAT_FW_SLICE_DRAM_WR);
ICP_QAT_FW_LA_RET_AUTH_SET(header->serv_specif_flags,
ICP_QAT_FW_LA_NO_RET_AUTH_RES);
ICP_QAT_FW_LA_CMP_AUTH_SET(header->serv_specif_flags,
-

[dpdk-dev] [PATCH] ivshmem: remove integration in dpdk

2016-08-04 Thread Panu Matilainen

On 07/29/2016 03:28 PM, David Marchand wrote:
> Following discussions on the mailing list [1] and since nobody stood up to
> implement the necessary cleanups, here is the ivshmem integration removal.
>
> There is not much to say about this patch, a lot of code is being removed.
> The default configuration file for packet_ordering example is replaced with
> the "native" x86 file.
> The only tricky part is in eal_memory with the memseg index stuff.
>
> More cleanups can be done after this but will come in subsequent patchsets.
>
> [1]: http://dpdk.org/ml/archives/dev/2016-June/040844.html
>
> Signed-off-by: David Marchand 
> ---
>  MAINTAINERS  |   8 -
>  app/test/Makefile|   1 -
>  app/test/autotest_data.py|   6 -
>  app/test/test.c  |   3 -
>  app/test/test.h  |   1 -
>  app/test/test_ivshmem.c  | 433 
>  config/defconfig_arm64-armv8a-linuxapp-gcc   |   1 -
>  config/defconfig_x86_64-ivshmem-linuxapp-gcc |  49 --
>  config/defconfig_x86_64-ivshmem-linuxapp-icc |  49 --
>  doc/api/doxy-api-index.md|   1 -
>  doc/api/doxy-api.conf|   1 -
>  doc/api/examples.dox |   2 -
>  doc/guides/linux_gsg/build_dpdk.rst  |   2 +-
>  doc/guides/linux_gsg/quick_start.rst |  14 +-
>  doc/guides/prog_guide/img/ivshmem.png| Bin 44920 -> 0 bytes
>  doc/guides/prog_guide/index.rst  |   1 -
>  doc/guides/prog_guide/ivshmem_lib.rst| 160 -
>  doc/guides/prog_guide/source_org.rst |   1 -
>  doc/guides/rel_notes/deprecation.rst |   3 -
>  doc/guides/rel_notes/release_16_11.rst   |   3 +
>  examples/Makefile|   1 -
>  examples/l2fwd-ivshmem/Makefile  |  43 --
>  examples/l2fwd-ivshmem/guest/Makefile|  50 --
>  examples/l2fwd-ivshmem/guest/guest.c | 452 -
>  examples/l2fwd-ivshmem/host/Makefile |  50 --
>  examples/l2fwd-ivshmem/host/host.c   | 895 -
>  examples/l2fwd-ivshmem/include/common.h  | 111 
>  examples/packet_ordering/Makefile|   2 +-
>  lib/Makefile |   1 -
>  lib/librte_eal/common/eal_common_memzone.c   |  12 -
>  lib/librte_eal/common/eal_private.h  |  22 -
>  lib/librte_eal/common/include/rte_memory.h   |   3 -
>  lib/librte_eal/common/include/rte_memzone.h  |   7 +-
>  lib/librte_eal/common/malloc_heap.c  |   8 -
>  lib/librte_eal/linuxapp/eal/Makefile |   9 -
>  lib/librte_eal/linuxapp/eal/eal.c|  10 -
>  lib/librte_eal/linuxapp/eal/eal_ivshmem.c| 954 
> ---
>  lib/librte_eal/linuxapp/eal/eal_memory.c |  30 +-
>  lib/librte_ivshmem/Makefile  |  54 --
>  lib/librte_ivshmem/rte_ivshmem.c | 919 --
>  lib/librte_ivshmem/rte_ivshmem.h | 165 -
>  lib/librte_ivshmem/rte_ivshmem_version.map   |  12 -
>  mk/rte.app.mk|   1 -
>  43 files changed, 13 insertions(+), 4537 deletions(-)
>  delete mode 100644 app/test/test_ivshmem.c
>  delete mode 100644 config/defconfig_x86_64-ivshmem-linuxapp-gcc
>  delete mode 100644 config/defconfig_x86_64-ivshmem-linuxapp-icc
>  delete mode 100644 doc/guides/prog_guide/img/ivshmem.png
>  delete mode 100644 doc/guides/prog_guide/ivshmem_lib.rst
>  delete mode 100644 examples/l2fwd-ivshmem/Makefile
>  delete mode 100644 examples/l2fwd-ivshmem/guest/Makefile
>  delete mode 100644 examples/l2fwd-ivshmem/guest/guest.c
>  delete mode 100644 examples/l2fwd-ivshmem/host/Makefile
>  delete mode 100644 examples/l2fwd-ivshmem/host/host.c
>  delete mode 100644 examples/l2fwd-ivshmem/include/common.h
>  delete mode 100644 lib/librte_eal/linuxapp/eal/eal_ivshmem.c
>  delete mode 100644 lib/librte_ivshmem/Makefile
>  delete mode 100644 lib/librte_ivshmem/rte_ivshmem.c
>  delete mode 100644 lib/librte_ivshmem/rte_ivshmem.h
>  delete mode 100644 lib/librte_ivshmem/rte_ivshmem_version.map
>
[...]

Ooh, what a nice "welcome back from vacation" message in my inbox :)
FWIW,

Acked-by: Panu Matilainen 

- Panu -

[dpdk-dev] Mbuf leak issue with IXGBE in vector mod

2016-08-04 Thread Ori Zakin

Hi,


  1.  When calling rte_eth_dev_stop mbuf pool is depleted.
There appears to be a race condition that occurs when RTE_IXGBE_INC_VECTOR is 
defined:
ixgbe_reset_rx_queue(struct ixgbe_adapter *adapter, struct ixgbe_rx_queue *rxq)
{
??.

#ifdef RTE_IXGBE_INC_VECTOR
rxq->rxrearm_start = 0;
sleep here appears to solve
rxq->rxrearm_nb = 0;
#endif


Behaviour also described here:
http://dpdk.org/ml/archives/users/2016-April/000488.html


  2.  Steps to recreate issue:
 *   rte_mempool_free_count
 *   rte_eth_dev_stop
 *   rte_mempool_free_count - should see a spike in allocated mbufs from 
mempool.

  3.  2 workarounds that appear to work:
 *   Set CONFIG_RTE_IXGBE_INC_VECTOR=n.
 *   Add sleep in ixgbe_reset_rx_queue


Regards.
Ori Zakin

[dpdk-dev] [RFC] Generic flow director/filtering/classification API

2016-08-04 Thread Adrien Mazarguil

On Wed, Aug 03, 2016 at 12:11:56PM -0700, John Fastabend wrote:
> [...]
> 
> >> The proposal looks very good.  It satisfies most of the features
> >> supported by Chelsio NICs.  We are looking for suggestions on exposing
> >> more additional features supported by Chelsio NICs via this API.
> >>
> >> Chelsio NICs have two regions in which filters can be placed -
> >> Maskfull and Maskless regions.  As their names imply, maskfull region
> >> can accept masks to match a range of values; whereas, maskless region
> >> don't accept any masks and hence perform a more strict exact-matches.
> >> Filters without masks can also be placed in maskfull region.  By
> >> default, maskless region have higher priority over the maskfull region.
> >> However, the priority between the two regions is configurable.
> >
> > I understand this configuration affects the entire device. Just to be 
> > clear,
> > assuming some filters are already configured, are they affected by a 
> > change
> > of region priority later?
> >
> 
>  Both the regions exist at the same time in the device.  Each filter can
>  either belong to maskfull or the maskless region.
> 
>  The priority is configured at time of filter creation for every
>  individual filter and cannot be changed while the filter is still
>  active. If priority needs to be changed for a particular filter then,
>  it needs to be deleted first and re-created.
> >>>
> >>> Could you model this as two tables and add a table_id to the API? This
> >>> way user space could populate the table it chooses. We would have to add
> >>> some capabilities attributes to "learn" if tables support masks or not
> >>> though.
> >>>
> >>
> >> This approach sounds interesting.
> > 
> > Now I understand the idea behind these tables, however from an application
> > point of view I still think it's better if the PMD could take care of flow
> > rules optimizations automatically. Think about it, PMDs have exactly a
> > single kind of device they know perfectly well to manage, while applications
> > want the best possible performance out of any device in the most generic
> > fashion.
> 
> The problem is keeping priorities in order and/or possibly breaking
> rules apart (e.g. you have an L2 table and an L3 table) becomes very
> complex to manage at driver level. I think its easier for the
> application which has some context to do this. The application "knows"
> if its a router for example will likely be able to pack rules better
> than a PMD will.

I don't think most applications know they are L2 or L3 routers. They may not
know more than the pattern provided to the PMD, which may indeed end at a L2
or L3 protocol. If the application simply chooses a table based on this
information, then the PMD could have easily done the same.

I understand the issue is what happens when applications really want to
define e.g. L2/L3/L2 rules in this specific order (or any ordering that
cannot be satisfied by HW due to table constraints).

By exposing tables, in such a case applications should move all rules from
L2 to a L3 table themselves (assuming this is even supported) to guarantee
ordering between rules, or fail to add them. This is basically what the PMD
could have done, possibly in a more efficient manner in my opinion.

Let's assume two opposite scenarios for this discussion:

- App #1 is a command-line interface directly mapped to flow rules, which
  basically gets slow random input from users depending on how they want to
  configure their traffic. All rules differ considerably (L2, L3, L4, some
  with incomplete bit-masks, etc). All in all, few but complex rules with
  specific priorities.

- App #2 is something like OVS, creating and deleting a large number of very
  specific (without incomplete bit-masks) and mostly identical
  single-priority rules automatically and very frequently.

Actual applications will certainly be a mix of both.

For app #1, users would have to be aware of these tables and base their
filtering decisions according to them. Reporting tables capabilities, making
sure priorities between tables are well configured will be their
responsibility. Obviously applications may take care of these details for
them, but the end result will be the same. At some point, some combination
won't be possible. Getting there was only more complicated from
users/applications point of view.

For app #2 if the first rule can be created then subsequent rules shouldn't
be a problem until their number reaches device limits. Selecting the proper
table to use for these can easily be done by the PMD.

> >>> I don't see how the PMD can sort this out in any meaningful way and it
> >>> has to be exposed to the application that has the intelligence to 'know'
> >>> priorities between masks and non-masks filters. I'm sure you could come
> >>> up with something but it would be less than ideal in many cases I would
> >>> guess

[dpdk-dev] [RFC] Generic flow director/filtering/classification API

2016-08-04 Thread Adrien Mazarguil

On Wed, Aug 03, 2016 at 11:10:49AM -0700, John Fastabend wrote:
> [...]
> 
>  Considering that allowed pattern/actions combinations cannot be known in
>  advance and would result in an unpractically large number of 
>  capabilities to
>  expose, a method is provided to validate a given rule from the current
>  device configuration state without actually adding it (akin to a "dry 
>  run"
>  mode).
> >>>
> >>> Rather than have a query/validate process why did we jump over having an
> >>> intermediate representation of the capabilities? Here you state it is
> >>> unpractical but we know how to represent parse graphs and the drivers
> >>> could report their supported parse graph via a single query to a middle
> >>> layer.
> >>>
> >>> This will actually reduce the msg chatter imagine many applications at
> >>> init time or in boundary cases where a large set of applications come
> >>> online at once and start banging on the interface all at once seems less
> >>> than ideal.
> > 
> > Well, I also thought about a kind of graph to represent capabilities but
> > feared the extra complexity would not be worth the trouble, thus settled on
> > the query idea. A couple more reasons:
> > 
> > - Capabilities evolve at the same time as devices are configured. For
> >   example, if a device supports a single RSS context, then a single rule
> >   with a RSS action may be created. The graph would have to be rewritten
> >   accordingly and thus queried/parsed again by the application.
> 
> The graph would not help here because this is an action
> restriction not a parsing restriction. This is yet another query to see
> what actions are supported and how many of each action are supported.
> 
>get_parse_graph - report the parsable fields
>get_actions - report the supported actions and possible num of each

OK, now I understand your idea, in my mind the graph was indeed supposed to
represent complete flow rules.

> > - Expressing capabilities at bit granularity (say, for a matching pattern
> >   item mask) is complex, there is no way to simplify the representation of
> >   capabilities without either losing information or making the graph more
> >   complex to parse than simply providing a flow rule from an application
> >   point of view.
> > 
> 
> I'm not sure I understand 'bit granularity' here. I would say we have
> devices now that have rather strange restrictions due to hardware
> implementation. Going forward we should get better hardware and a lot
> of this will go away in my view. Yes this is a long term view and
> doesn't help the current state. The overall point you are making is
> the sum off all these strange/odd bits in the hardware implementation
> means capabilities queries are very difficult to guarantee. On existing
> hardware and I think you've convinced me. Thanks ;)

Precisely. By "bit granularity" I meant that while it is fairly easy to
report whether bit-masking is supported on protocol fields such as MAC
addresses at all, devices may have restrictions on the possible bit-masks,
like they may only have an effect at byte level (0xff), may not allow
specific bits (broadcast) or there even may be a fixed set of bit-masks to
choose from.

[...]
> > I understand, however I think this approach may be too low-level to express
> > all the possible combinations. This graph would have to include possible
> > actions for each possible pattern, all while considering that some actions
> > are not possible with some patterns and that there are exclusive actions.
> > 
> 
> Really? You have hardware that has dependencies between the parser and
> the supported actions? Ugh...

Not that I know of actually, even though we cannot rule out this
possibility.

Here are the possible cases I have in mind with existing HW:

- Too many actions specified for a single rule, even though each of them is
  otherwise supported.

- Performing several encap/decap actions. None are defined in the initial
  specification but these are already planned.

- Assuming there is a single table from the application point of view
  (separate discussion for the other thread), some actions may only be
  possible with the right pattern item or meta item. Asking HW to perform
  tunnel decap may only be safe if the pattern specifically matches that
  protocol.

> If the hardware has separate tables then we shouldn't try to have the
> PMD flatten those into a single table because we will have no way of
> knowing how to do that. (I'll respond to the other thread on this in
> an attempt to not get to scattered).

OK, will reply there as well.

> > Also while memory consumption is not really an issue, such a graph may be
> > huge. It could take a while for the PMD to update it when adding a rule
> > impacting capabilities.
> 
> Ugh... I wouldn't suggest updating the capabilities at runtime like
> this. But I see your point if the graph has to _guarantee_ correctness
> how does it represent limited number of masks and other

[dpdk-dev] [PATCH] pci: fix one device probing

2016-08-04 Thread Igor Ryzhov

The rte_eal_pci_probe_one function could return false positive result if
no driver is found for the device.

Signed-off-by: Igor Ryzhov 
---
 lib/librte_eal/common/eal_common_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 7248c38..bfb6fd2 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -344,7 +344,7 @@ rte_eal_pci_probe_one(const struct rte_pci_addr *addr)
continue;

ret = pci_probe_all_drivers(dev);
-   if (ret < 0)
+   if (ret)
goto err_return;
return 0;
}
-- 
2.6.4

[dpdk-dev] rte_eth_dev_attach returns 0, although device is not attached

2016-08-04 Thread Ferruh Yigit

On 8/4/2016 12:51 PM, Igor Ryzhov wrote:
> Hello Ferruh,
> 
>> 4 ???. 2016 ?., ? 14:33, Ferruh Yigit  ???(?):
>>
>> Hi Igor,
>>
>> On 8/3/2016 5:58 PM, Igor Ryzhov wrote:
>>> Hello.
>>>
>>> Function rte_eth_dev_attach can return false positive result.
>>> It happens because rte_eal_pci_probe_one returns zero if no driver is found 
>>> for the device:
>>> ret = pci_probe_all_drivers(dev);
>>> if (ret < 0)
>>> goto err_return;
>>> return 0;
>>> (pci_probe_all_drivers returns 1 in that case)
>>>
>>> For example, it can be easily reproduced by trying to attach virtio device, 
>>> managed by kernel driver.
>>
>> You are right, and I did able to reproduce this issue with virtio as you
>> suggest.
>>
>> But I wonder why rte_eth_dev_get_port_by_addr() is not catching this.
>> Perhaps a dev->attached check needs to be added into this function.

With a second check, rte_eth_dev_get_port_by_addr() catches it if the
driver is missing.

But for virtio case, problem is not missing driver.
Problem is eth_virtio_dev_init() is returning a positive value on fail.

Call stack is:
rte_eal_pci_probe_one
pci_probe_all_drivers
rte_eal_pci_probe_one_driver
rte_eth_dev_init
   eth_virtio_dev_init

So rte_eal_pci_probe_one_driver() also returns positive value, as no
driver found, and rte_eth_dev_get_port_by_addr() returns a valid
port_id, since rte_eth_dev_init() allocated an eth_dev.

Briefly, this can be fixed in virtio pmd, instead of eal pci.

>>
>>>
>>> I think it should be:
>>> ret = pci_probe_all_drivers(dev);
>>> if (ret)
>>> goto err_return;
>>> return 0;
>>
>> Your proposal looks good to me. Will you send a patch?
> 

Original code silently ignores the if driver is missing for that dev,
although it is still questionable, I think we can keep this as it is.

> Patch sent.

Sorry for this, but can you please test with following modification in
virtio:
index 07d6449..c74 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1156,7 +1156,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
if (pci_dev) {
ret = vtpci_init(pci_dev, hw, _flags);
if (ret)
-   return ret;
+   return -1;
}

/* Reset the device although not necessary at startup */


> 
>>
>>> Best regards,
>>> Igor
>>>
>>
>

[dpdk-dev] how to design high performance QoS support for a large amount of subscribers

2016-08-04 Thread Yuyong Zhang

Thank you very much Cristian for the insightful response. 

Very much appreciated.

Regards,

Yuyong

-Original Message-
From: Dumitrescu, Cristian [mailto:cristian.dumitre...@intel.com] 
Sent: Thursday, August 4, 2016 9:01 AM
To: Yuyong Zhang ; dev at dpdk.org; users at 
dpdk.org
Subject: RE: how to design high performance QoS support for a large amount of 
subscribers

Hi Yuyong,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yuyong Zhang
> Sent: Tuesday, August 2, 2016 4:26 PM
> To: dev at dpdk.org; users at dpdk.org
> Subject: [dpdk-dev] how to design high performance QoS support for a 
> large amount of subscribers
> 
> Hi,
> 
> I am trying to add QoS support for a high performance VNF with large 
> amount of subscribers (millions).

Welcome to the world of DPDK QoS users!

It requires to support guaranteed bit rate
> for different service level of subscribers. I.e. four service levels 
> need to be
> supported:
> 
> * Diamond, 500M
> 
> * Gold, 100M
> 
> * Silver, 50M
> 
> * Bronze, 10M

Service levels translate to pipe profiles in our DPDK implementation. The set 
of pipe profiles is defined per port.

> 
> Here is the current pipeline design using DPDK:
> 
> 
> * 4 RX threads, does packet classification and load balancing
> 
> * 10-20 worker thread, does application subscriber management
> 
> * 4 TX threads, sends packets to TX NICs.
> 
> * Ring buffers used among RX threads, Worker threads, and TX threads
> 
> I read DPDK program guide for QoS framework regarding  hierarchical
> scheduler: Port, sub-port, pipe, TC and queues, I am looking for 
> advice on how to design QoS scheduler to support millions of 
> subscribers (pipes) which traffic are processed in tens of worker 
> threads where subscriber management processing are handled?

Having millions of pipes per port poses some challenges:
1. Does it actually make sense? Assuming the port rate is 10GbE, looking at the 
smallest user rate you mention above (Bronze, 10Mbps/user), this means that 
fully provisioning all users (i.e. making sure you can fully handle each user 
in worst case scenario) results in a maximum of 1000 users per port. Assuming 
overprovisioning of 50:1, this means a maximum of 50K users per port.
2. Memory challenge. The number of pipes per port is configurable -- hey, this 
is SW! :) -- but each of these pipes has 16 queues. For 4K pipes per port, this 
is 64K queues per port; for typical value of 64 packets per queue, this is 4M 
packets per port, so worst case scenario we need to provision 4M packets in the 
buffer pool for each output port that has hierarchical scheduler enabled; for 
buffer size of ~2KB each, this means ~8GB of memory for each output port. If 
you go from 4k pipes per port to 4M pipes per port, this means 8TB of memory 
per port. Do you have enough memory in your system? :)

One thing to realize is that even for millions of users in your system, not all 
of them are active at the same time. So maybe have a smaller number of pipes 
and only map the active users (those that have any packets to send now) to them 
(a fraction of the total set of users), with the set of active users changing 
over time.

You can also consider mapping several users to the same pipe.

> 
> One design thought is as the following:
> 
> 8 ports (each one is associated with one physical port), 16-20 
> sub-ports (each is used by one Worker thread), each sub-port supports 
> 250K pipes for subscribers. Each worker thread manages one sub-port 
> and does metering for the sub-port to get color, and after identity 
> subscriber flow pick a unused pipe, and do sched enqueuer/de-queue and 
> then put into TX rings to TX threads, and TX threads send the packets to TX 
> NICs.
> 

In the current implementation, each port scheduler object has to be owned by a 
single thread, i.e. you cannot slit a port across multiple threads, therefore 
is not straightforward to have different sub-ports handled by different 
threads. The workaround is to split yourself the physical NIC port into 
multiple port scheduler objects: for example, create 8 port scheduler objects, 
set the rate of each to 1/8 of 10GbE, have each of them feed a different NIC TX 
queue of the same physical NIC port.

You can probably get this scenario (or very similar) up pretty quickly just by 
handcrafting yourself a configuration file for examples/ip_pipeline application.

> Are there functional and performance issues with above approach?
> 
> Any advice and input are appreciated.
> 
> Regards,
> 
> Yuyong
> 
> 
> 

Regards,
Cristian

[dpdk-dev] [PATCH 4/4] doc: make the devbind man page be part of section 8

2016-08-04 Thread Christian Ehrhardt

As a root only program in sbin it should belong to section 8
"8   System administration commands (usually only for root)"

Signed-off-by: Christian Ehrhardt 
---
 doc/guides/conf.py   | 2 +-
 mk/rte.sdkinstall.mk | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index ad8e8b3..52e2acf 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -102,7 +102,7 @@ man_pages = [("testpmd_app_ug/run_app", "testpmd",
  ("sample_app_ug/pmdinfo", "dpdk-pmdinfo",
   "dump a PMDs hardware support info", "", 1),
  ("sample_app_ug/devbind", "dpdk-devbind",
-  "check device status and bind/unbind them from drivers", "", 1)]
+  "check device status and bind/unbind them from drivers", "", 8)]

  :numref: fallback 
 # The following hook functions add some simple handling for the :numref:
diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 533d369..b1faf28 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -139,6 +139,11 @@ ifneq ($(wildcard $O/doc/man/*/*.1),)
$(Q)$(call rte_mkdir, $(DESTDIR)$(mandir)/man1)
$(Q)cp -a $O/doc/man/*/*.1 $(DESTDIR)$(mandir)/man1
 endif
+ifneq ($(wildcard $O/doc/man/*/*.8),)
+   $(Q)$(call rte_mkdir, $(DESTDIR)$(mandir))
+   $(Q)$(call rte_mkdir, $(DESTDIR)$(mandir)/man8)
+   $(Q)cp -a $O/doc/man/*/*.8 $(DESTDIR)$(mandir)/man8
+endif

 install-kmod:
 ifneq ($(wildcard $O/kmod/*),)
-- 
2.7.4

[dpdk-dev] [PATCH 3/4] doc: add basic invocation info for dpdk-devbind

2016-08-04 Thread Christian Ehrhardt

This summarizes the "how to call dpdk-pmdinfo" in one place to be
picked up by html/pdf/man-page docs.

That knowledge was available before but spread in various docs along
examples (which are great and have to be kept) as well as in the
--usage/--help option of the tool itself.

Signed-off-by: Christian Ehrhardt 
---
 doc/guides/conf.py   |   4 +-
 doc/guides/sample_app_ug/devbind.rst | 150 +++
 doc/guides/sample_app_ug/index.rst   |   1 +
 3 files changed, 154 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/sample_app_ug/devbind.rst

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 48fe890..ad8e8b3 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -100,7 +100,9 @@ man_pages = [("testpmd_app_ug/run_app", "testpmd",
  ("sample_app_ug/proc_info", "dpdk-procinfo",
   "access dpdk port stats and memory info", "", 1),
  ("sample_app_ug/pmdinfo", "dpdk-pmdinfo",
-  "dump a PMDs hardware support info", "", 1)]
+  "dump a PMDs hardware support info", "", 1),
+ ("sample_app_ug/devbind", "dpdk-devbind",
+  "check device status and bind/unbind them from drivers", "", 1)]

  :numref: fallback 
 # The following hook functions add some simple handling for the :numref:
diff --git a/doc/guides/sample_app_ug/devbind.rst 
b/doc/guides/sample_app_ug/devbind.rst
new file mode 100644
index 000..297e1b7
--- /dev/null
+++ b/doc/guides/sample_app_ug/devbind.rst
@@ -0,0 +1,150 @@
+
+..  BSD LICENSE
+Copyright(c) 2016 Canonical Limited. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+dpdk-devbind Application
+
+
+The ``dpdk-devbind`` tool is a Data Plane Development Kit (DPDK) tool that 
helps binding and unbinding devices from specific drivers.
+As well as checking their status in that regard.
+
+
+Running the Application
+---
+
+The tool has a number of command line options:
+
+.. code-block:: console
+
+   dpdk-devbind [options] DEVICE1 DEVICE2 
+
+OPTIONS
+---
+
+* ``--help, --usage``
+
+Display usage information and quit
+
+* ``-s, --status``
+
+Print the current status of all known network interfaces.
+For each device, it displays the PCI domain, bus, slot and function,
+along with a text description of the device. Depending upon whether the
+device is being used by a kernel driver, the igb_uio driver, or no
+driver, other relevant information will be displayed:
+* the Linux interface name e.g. if=eth0
+* the driver being used e.g. drv=igb_uio
+* any suitable drivers not currently using that device
+e.g. unused=igb_uio
+NOTE: if this flag is passed along with a bind/unbind option, the
+status display will always occur after the other operations have taken
+place.
+
+* ``-b driver, --bind=driver``
+
+Select the driver to use or "none" to unbind the device
+
+* ``-u, --unbind``
+
+Unbind a device (Equivalent to "-b none")
+
+* ``--force``
+
+By default, devices which are used by Linux - as indicated by having
+routes in the routing table - cannot be modified. Using the --force
+flag overrides this behavior, allowing active links to be forcibly
+unbound.
+WARNING: This can lead to loss of network connection and should be used
+with

[dpdk-dev] [PATCH 2/4] doc: add basic invocation info for dpdk-pmdinfo

2016-08-04 Thread Christian Ehrhardt

This summarizes the "how to call dpdk-pmdinfo" in one place to be picked
up by html/pdf/man-page docs.

Signed-off-by: Christian Ehrhardt 
---
 doc/guides/conf.py   |  4 ++-
 doc/guides/sample_app_ug/index.rst   |  1 +
 doc/guides/sample_app_ug/pmdinfo.rst | 62 
 3 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/sample_app_ug/pmdinfo.rst

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 4435974..48fe890 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -98,7 +98,9 @@ man_pages = [("testpmd_app_ug/run_app", "testpmd",
  ("sample_app_ug/pdump", "dpdk-pdump",
   "enable packet capture on dpdk ports", "", 1),
  ("sample_app_ug/proc_info", "dpdk-procinfo",
-  "access dpdk port stats and memory info", "", 1)]
+  "access dpdk port stats and memory info", "", 1),
+ ("sample_app_ug/pmdinfo", "dpdk-pmdinfo",
+  "dump a PMDs hardware support info", "", 1)]

  :numref: fallback 
 # The following hook functions add some simple handling for the :numref:
diff --git a/doc/guides/sample_app_ug/index.rst 
b/doc/guides/sample_app_ug/index.rst
index 96bb317..7801688 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -77,6 +77,7 @@ Sample Applications User Guide
 performance_thread
 ipsec_secgw
 pdump
+pmdinfo

 **Figures**

diff --git a/doc/guides/sample_app_ug/pmdinfo.rst 
b/doc/guides/sample_app_ug/pmdinfo.rst
new file mode 100644
index 000..6bbf7e2
--- /dev/null
+++ b/doc/guides/sample_app_ug/pmdinfo.rst
@@ -0,0 +1,62 @@
+
+..  BSD LICENSE
+Copyright(c) 2016 Canonical Limited. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+dpdk-pmdinfo Application
+
+
+The ``dpdk-pmdinfo`` tool is a Data Plane Development Kit (DPDK) tool that can
+dump a PMDs hardware support info.
+
+   .. Note::
+
+  * The actual data is stored in the object files as PMD_INFO_STRING
+
+
+Running the Application
+---
+
+The tool has a number of command line options:
+
+.. code-block:: console
+
+
+   dpdk-pmdinfo [-hrtp] [-d 
+
+   -h, --helpshow a short help message and exit
+   -r, --raw Dump as raw json strings
+   -d FILE, --pcidb=FILE
+ specify a pci database to get vendor names from
+   -t, --table   output information on hw support as a hex table
+   -p, --plugindir   scan dpdk for autoload plugins
+
+.. Note::
+
+   * Parameters inside the square brackets represents optional parameters.
-- 
2.7.4

[dpdk-dev] [PATCH 1/4] doc: rendering and installation of man pages

2016-08-04 Thread Christian Ehrhardt

This enables the rendering of rst into man pages as well as installing
them (if built) along the binaries. To do so there is a new make target
"doc-guides-man" which will render the rst files into man format.

Currently these three tools had docs that were compatible "enough" to
make up for a reasonable manpage.
- testpmd
- dpdk-pdump
- dpdk-procinfo

Since a man page should be installed along the binary they are not
installed in install-doc but install-runtime insteade. If not explicitly
built by the "doc-guides-man" target before calling install-runtime there no
change to the old behaviour.

Signed-off-by: Christian Ehrhardt 
---
 doc/guides/conf.py   | 8 
 mk/rte.sdkdoc.mk | 2 +-
 mk/rte.sdkinstall.mk | 6 ++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 2c5610f..4435974 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -92,6 +92,14 @@ class CustomLatexFormatter(LatexFormatter):
 # Replace the default latex formatter.
 PygmentsBridge.latex_formatter = CustomLatexFormatter

+# Configuration for man pages
+man_pages = [("testpmd_app_ug/run_app", "testpmd",
+  "tests for dpdk pmds", "", 1),
+ ("sample_app_ug/pdump", "dpdk-pdump",
+  "enable packet capture on dpdk ports", "", 1),
+ ("sample_app_ug/proc_info", "dpdk-procinfo",
+  "access dpdk port stats and memory info", "", 1)]
+
  :numref: fallback 
 # The following hook functions add some simple handling for the :numref:
 # directive for Sphinx versions prior to 1.3.1. The functions replace the
diff --git a/mk/rte.sdkdoc.mk b/mk/rte.sdkdoc.mk
index 9952f25..21d9bdf 100644
--- a/mk/rte.sdkdoc.mk
+++ b/mk/rte.sdkdoc.mk
@@ -63,7 +63,7 @@ help:
 all: api-html guides-html guides-pdf

 .PHONY: clean
-clean: api-html-clean guides-html-clean guides-pdf-clean
+clean: api-html-clean guides-html-clean guides-pdf-clean guides-man-clean

 .PHONY: api-html
 api-html: api-html-clean
diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 5217063..533d369 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -66,6 +66,7 @@ includedir  ?=  $(prefix)/include/dpdk
 datarootdir ?=  $(prefix)/share
 docdir  ?=   $(datarootdir)/doc/dpdk
 datadir ?=   $(datarootdir)/dpdk
+mandir  ?=   $(datarootdir)/man
 sdkdir  ?=$(datadir)
 targetdir   ?=$(datadir)/$(RTE_TARGET)

@@ -133,6 +134,11 @@ install-runtime:
   $(DESTDIR)$(sbindir)/dpdk-devbind)
$(Q)$(call rte_symlink,$(DESTDIR)$(datadir)/tools/dpdk-pmdinfo.py, \
   $(DESTDIR)$(bindir)/dpdk-pmdinfo)
+ifneq ($(wildcard $O/doc/man/*/*.1),)
+   $(Q)$(call rte_mkdir, $(DESTDIR)$(mandir))
+   $(Q)$(call rte_mkdir, $(DESTDIR)$(mandir)/man1)
+   $(Q)cp -a $O/doc/man/*/*.1 $(DESTDIR)$(mandir)/man1
+endif

 install-kmod:
 ifneq ($(wildcard $O/kmod/*),)
-- 
2.7.4

[dpdk-dev] [PATCH 0/4] provide man pages for binaries provided by DPDK

2016-08-04 Thread Christian Ehrhardt

Hi,
this is about providing manpages for the binaries installed by DPDK.
Eventually people using commands expect at least something reasonable avalable
behind "man command".

Still it is a try to stick to the rst/sphinx based doc creation.
I found that for three of the 5 binaries that are usually installed the current
rst files are sufficient to make a meaningful man page:
- testpmd
- dpdk-pdump
- dpd-procinfo

To be clear, this is only meant for the binaries installed by DPDK,
there is no reason to render all the guides and howto's as one huge manpage.
Also this series doesn't strive to render the api doc as man pages,
while this certainly might be possible and even reasonable for section
"3   Library calls (functions within program libraries)".

Finally I must beg a pardon - I'm no makefile magician and sometimes even prefer
things that work compared to long cryptic lines with many special chars.
Yet if someone has something reasonable to unify the copy in patch #4
please let me know.

Christian Ehrhardt (4):
  doc: rendering and installation of man pages
  doc: add basic invocation info for dpdk-pmdinfo
  doc: add basic invocation info for dpdk-devbind
  doc: make the devbind man page be part of section 8

 doc/guides/conf.py   |  12 +++
 doc/guides/sample_app_ug/devbind.rst | 150 +++
 doc/guides/sample_app_ug/index.rst   |   2 +
 doc/guides/sample_app_ug/pmdinfo.rst |  62 +++
 mk/rte.sdkdoc.mk |   2 +-
 mk/rte.sdkinstall.mk |  11 +++
 6 files changed, 238 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/sample_app_ug/devbind.rst
 create mode 100644 doc/guides/sample_app_ug/pmdinfo.rst

-- 
2.7.4

[dpdk-dev] how to design high performance QoS support for a large amount of subscribers

2016-08-04 Thread Dumitrescu, Cristian

Hi Yuyong,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yuyong Zhang
> Sent: Tuesday, August 2, 2016 4:26 PM
> To: dev at dpdk.org; users at dpdk.org
> Subject: [dpdk-dev] how to design high performance QoS support for a large
> amount of subscribers
> 
> Hi,
> 
> I am trying to add QoS support for a high performance VNF with large
> amount of subscribers (millions).

Welcome to the world of DPDK QoS users!

It requires to support guaranteed bit rate
> for different service level of subscribers. I.e. four service levels need to 
> be
> supported:
> 
> * Diamond, 500M
> 
> * Gold, 100M
> 
> * Silver, 50M
> 
> * Bronze, 10M

Service levels translate to pipe profiles in our DPDK implementation. The set 
of pipe profiles is defined per port.

> 
> Here is the current pipeline design using DPDK:
> 
> 
> * 4 RX threads, does packet classification and load balancing
> 
> * 10-20 worker thread, does application subscriber management
> 
> * 4 TX threads, sends packets to TX NICs.
> 
> * Ring buffers used among RX threads, Worker threads, and TX threads
> 
> I read DPDK program guide for QoS framework regarding  hierarchical
> scheduler: Port, sub-port, pipe, TC and queues, I am looking for advice on
> how to design QoS scheduler to support millions of subscribers (pipes) which
> traffic are processed in tens of worker threads where subscriber
> management processing are handled?

Having millions of pipes per port poses some challenges:
1. Does it actually make sense? Assuming the port rate is 10GbE, looking at the 
smallest user rate you mention above (Bronze, 10Mbps/user), this means that 
fully provisioning all users (i.e. making sure you can fully handle each user 
in worst case scenario) results in a maximum of 1000 users per port. Assuming 
overprovisioning of 50:1, this means a maximum of 50K users per port.
2. Memory challenge. The number of pipes per port is configurable -- hey, this 
is SW! :) -- but each of these pipes has 16 queues. For 4K pipes per port, this 
is 64K queues per port; for typical value of 64 packets per queue, this is 4M 
packets per port, so worst case scenario we need to provision 4M packets in the 
buffer pool for each output port that has hierarchical scheduler enabled; for 
buffer size of ~2KB each, this means ~8GB of memory for each output port. If 
you go from 4k pipes per port to 4M pipes per port, this means 8TB of memory 
per port. Do you have enough memory in your system? :)

One thing to realize is that even for millions of users in your system, not all 
of them are active at the same time. So maybe have a smaller number of pipes 
and only map the active users (those that have any packets to send now) to them 
(a fraction of the total set of users), with the set of active users changing 
over time.

You can also consider mapping several users to the same pipe.

> 
> One design thought is as the following:
> 
> 8 ports (each one is associated with one physical port), 16-20 sub-ports (each
> is used by one Worker thread), each sub-port supports 250K pipes for
> subscribers. Each worker thread manages one sub-port and does metering
> for the sub-port to get color, and after identity subscriber flow pick a 
> unused
> pipe, and do sched enqueuer/de-queue and then put into TX rings to TX
> threads, and TX threads send the packets to TX NICs.
> 

In the current implementation, each port scheduler object has to be owned by a 
single thread, i.e. you cannot slit a port across multiple threads, therefore 
is not straightforward to have different sub-ports handled by different 
threads. The workaround is to split yourself the physical NIC port into 
multiple port scheduler objects: for example, create 8 port scheduler objects, 
set the rate of each to 1/8 of 10GbE, have each of them feed a different NIC TX 
queue of the same physical NIC port.

You can probably get this scenario (or very similar) up pretty quickly just by 
handcrafting yourself a configuration file for examples/ip_pipeline application.

> Are there functional and performance issues with above approach?
> 
> Any advice and input are appreciated.
> 
> Regards,
> 
> Yuyong
> 
> 
> 

Regards,
Cristian

[dpdk-dev] [PATCH] crypto/qat: optimisation of request copy

2016-08-04 Thread Fiona Trahe (fiona.tr...@intel.com)

From: Fiona Trahe 

using rte_mov128 instead of structure assignment to copy
template request from session context into request

Signed-off-by: Fiona Trahe 

---
 drivers/crypto/qat/qat_crypto.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index e2a501b..ff0c691 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -832,7 +832,7 @@ qat_write_hw_desc_entry(struct rte_crypto_op *op, uint8_t 
*out_msg)

ctx = (struct qat_session *)op->sym->session->_private;
qat_req = (struct icp_qat_fw_la_bulk_req *)out_msg;
-   *qat_req = ctx->fw_req;
+   rte_mov128((uint8_t *)qat_req, (const uint8_t *)&(ctx->fw_req));
qat_req->comn_mid.opaque_data = (uint64_t)(uintptr_t)op;

qat_req->comn_mid.dst_length =
-- 
2.1.0

[dpdk-dev] [RFC 0/4] Use Google Test as DPDK unit test framework

2016-08-04 Thread Jim Murphy

Hi,

We are looking at using our existing test environment for our DPDK
applications that will run on our build servers. Hughpages therefore is an
issue. What is involved in running DPDK without hugepages?

Thanks,

Jim


On Wed, Aug 3, 2016 at 1:46 PM, Ming Zhao(??) 
wrote:

> googletest is a very nice test framework and we use it very
> extensively in our company(Luminate Wireless), together with gmock.
>
> I understand the resistance from the maintainers that are concerned
> about introducing a C++ dependency to a pure C code base. The approach
> we take doesn't require any change to the dpdk core, instead we just
> use things like a mock PMD(through gmock framework) to allow mocking
> the RX/TX code path, disabling huge page usage in test so that the
> test can be easily launched without worrying about huge page
> collision, etc.
>
> Personally I highly recommend using googletest plus some basic test
> cases, which removes a lot of boilerplate and let the developers focus
> the test itself.
>
> On Wed, Aug 3, 2016 at 2:57 AM, Doherty, Declan
>  wrote:
> >
> >
> >> -Original Message-
> > ...
> >> You are not advocating but the unit test must be written in C++.
> >> I don't think it is a good idea to force people to write and maintain
> the tests
> >> in a different language than the code it tests.
> >
> > I know where you are coming from on this point, and I general would
> agree if
> > it were not for the advantages you get from C++ test framework. Having
> worked with
> > multiple C and C++ frameworks, I've found that one of the biggest
> advantages of the
> > C++ frameworks is the amount of boilerplate code they can save you from
> writing. Also
> > nearly all of C frameworks I've used make use macros to the point that
> they look more like
> > objective C than C. In general I feel that even if the test code is
> written in C++ the code itself
> > should be simple enough that someone with even a passing knowledge of
> C++ could easily
> > understand the intent of the test code.
> >
> >> > Some of the major advantages of google test that I see over
> continuing to use
> >> the
> >> > current test include giving a consist feel to all tests, a powerful
> test
> >> > execution framework which allow individual test suites or tests to be
> specified
> >> > from the command line, support for a standard xunit output which can
> be
> >> integrated
> >> > into a continuous build systems, and a very powerful mocking library
> >> > which allows much more control over testing failure conditions.
> >>
> >> It would be interesting to better describe in details what is missing
> currently
> >> and what such a framework can bring.
> >> (I agree there is a huge room for improvements on unit tests)
> >
> > Some of the things I've come across include:
> > No standard output format to integrated with continuous regression
> systems
> > No ability to specify specific unit tests or groups of tests to run from
> the command line
> > No standard set of test assertions used across the test suites.
> > No standard setup and teardown functions across test suites, state from
> previous test
> > suite can break current
> > Requirement to use a python script to orchestrate test runs.
> > No support for mocking functionality.
> >
> > I know that none of the above couldn't be fixed in our current test
> application, but I would
> > question if it is effort worthwhile when we take an off the shelf
> framework, which does all
> > those things and a whole lot more, which has been test and used in a
> huge variety of
> > projects.
> >
> > I certainly willing to look at other frameworks both C and C++ but I yet
> to find a C framework
> > which come close to the usability and flexibility of the popular C++
> ones.
> >
> >
> >
>

[dpdk-dev] rte_eth_dev_attach returns 0, although device is not attached

2016-08-04 Thread Ferruh Yigit

Hi Igor,

On 8/3/2016 5:58 PM, Igor Ryzhov wrote:
> Hello.
> 
> Function rte_eth_dev_attach can return false positive result.
> It happens because rte_eal_pci_probe_one returns zero if no driver is found 
> for the device:
> ret = pci_probe_all_drivers(dev);
> if (ret < 0)
>   goto err_return;
> return 0;
> (pci_probe_all_drivers returns 1 in that case)
> 
> For example, it can be easily reproduced by trying to attach virtio device, 
> managed by kernel driver.

You are right, and I did able to reproduce this issue with virtio as you
suggest.

But I wonder why rte_eth_dev_get_port_by_addr() is not catching this.
Perhaps a dev->attached check needs to be added into this function.

> 
> I think it should be:
> ret = pci_probe_all_drivers(dev);
> if (ret)
>   goto err_return;
> return 0;

Your proposal looks good to me. Will you send a patch?

> Best regards,
> Igor
>

[dpdk-dev] limited tx performance with VMware VM + PCI pass-through

2016-08-04 Thread martin_curran-g...@keysight.com

Hello Grisha,

I'm not sure which OS you are using.

We encountered  a problem moving from sandybridge based machines to haswell 
based machines, with using  pass-through to VMs.

We had to change to a newer version of our OS since the one we used was too old 
to understand the haswell.

We intially saw a huge drop in DPDK performance in the vms, native was fine.

There were changes required in the configuration of the vms, to do with huge 
pages and anonymous huge pages, and VM backing with huge pages

Not sure if this helps

Martin


Martin Curran-Gray
HW/FPGA/SW Engineer
Network Monitoring

Keysight Technologies UK Ltd

web -> http://www.keysight.com

[dpdk-dev] [PATCH] qat: change the session structure to be variable sized

2016-08-04 Thread John Griffin

On 02/08/16 09:30, John Griffin wrote:
> This patch changes the qat session data structure sent to qat from a
> fixed size to a variable size which is dependent on the size of
> the chosen algorithm.
> This reduces the amount of bytes which are transferred across
> PCIe and thus helps to increase qat performance when the
> accelerator is bound by PCIe.
>
> Signed-off-by: John Griffin 
> ---
>   drivers/crypto/qat/qat_adf/qat_algs.h|   5 +-
>   drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 462 
> +--
>   drivers/crypto/qat/qat_crypto.c  |  15 +-
>   3 files changed, 183 insertions(+), 299 deletions(-)
>

Self-Nack compile issue on icc - will fix and send again.

[dpdk-dev] Using qemu-system-x86_64 of KVM sourceforge project versus qemu-system-x86_64 of http://www.qemu.org/

2016-08-04 Thread Kevin Wilson

Hi Mauricio,
Thanks!

>be sure that kvm is enabled by
> setting "accel=kvm" in the qemu command line.

Isn't "--enable-kvm" intended for that ?

Kevin

[dpdk-dev] [PATCH 2/2] net/bonding: enable slave VLAN filter

2016-08-04 Thread Eric Kinzie

SR-IOV virtual functions cannot rely on promiscuous mode for the reception
of VLAN tagged frames.  Program the vlan filter for each slave when a
vlan is configured for the bonding master.

Signed-off-by: Eric Kinzie 
---
 drivers/net/bonding/rte_eth_bond_api.c |   68 
 drivers/net/bonding/rte_eth_bond_pmd.c |   36 +++
 drivers/net/bonding/rte_eth_bond_private.h |4 ++
 3 files changed, 108 insertions(+)

diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index 3c16973..a556f7b 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -166,6 +166,7 @@ rte_eth_bond_create(const char *name, uint8_t mode, uint8_t 
socket_id)
 {
struct bond_dev_private *internals = NULL;
struct rte_eth_dev *eth_dev = NULL;
+   uint32_t vlan_filter_bmp_size;

/* now do all data allocation - for eth_dev structure, dummy pci driver
 * and internal (private) data
@@ -260,6 +261,27 @@ rte_eth_bond_create(const char *name, uint8_t mode, 
uint8_t socket_id)
goto err;
}

+   vlan_filter_bmp_size =
+   rte_bitmap_get_memory_footprint(ETHER_MAX_VLAN_ID+1);
+   internals->vlan_filter_bmpmem = rte_malloc(name, vlan_filter_bmp_size,
+  RTE_CACHE_LINE_SIZE);
+   if (internals->vlan_filter_bmpmem == NULL) {
+   RTE_BOND_LOG(ERR,
+"Failed to allocate vlan bitmap for bonded device 
%u\n",
+eth_dev->data->port_id);
+   goto err;
+   }
+
+   internals->vlan_filter_bmp = rte_bitmap_init(ETHER_MAX_VLAN_ID+1,
+   internals->vlan_filter_bmpmem, vlan_filter_bmp_size);
+   if (internals->vlan_filter_bmp == NULL) {
+   RTE_BOND_LOG(ERR,
+"Failed to init vlan bitmap for bonded device 
%u\n",
+eth_dev->data->port_id);
+   rte_free(internals->vlan_filter_bmpmem);
+   goto err;
+   }
+
return eth_dev->data->port_id;

 err:
@@ -299,6 +321,9 @@ rte_eth_bond_free(const char *name)
eth_dev->rx_pkt_burst = NULL;
eth_dev->tx_pkt_burst = NULL;

+   internals = eth_dev->data->dev_private;
+   rte_bitmap_free(internals->vlan_filter_bmp);
+   rte_free(internals->vlan_filter_bmpmem);
rte_free(eth_dev->data->dev_private);
rte_free(eth_dev->data->mac_addrs);

@@ -308,6 +333,46 @@ rte_eth_bond_free(const char *name)
 }

 static int
+slave_vlan_filter_set(uint8_t bonded_port_id, uint8_t slave_port_id)
+{
+   struct rte_eth_dev *bonded_eth_dev;
+   struct bond_dev_private *internals;
+   int found;
+   int res = 0;
+   uint64_t slab = 0;
+   uint32_t pos = 0;
+   uint16_t first;
+
+   bonded_eth_dev = _eth_devices[bonded_port_id];
+   if (bonded_eth_dev->data->dev_conf.rxmode.hw_vlan_filter == 0)
+   return 0;
+
+   internals = bonded_eth_dev->data->dev_private;
+   found = rte_bitmap_scan(internals->vlan_filter_bmp, , );
+   first = pos;
+
+   if (!found)
+   return 0;
+
+   do {
+   uint32_t i;
+   uint64_t mask;
+
+   for (i = 0, mask = 1;
+i < RTE_BITMAP_SLAB_BIT_SIZE;
+i ++, mask <<= 1) {
+   if (unlikely(slab & mask))
+   res = rte_eth_dev_vlan_filter(slave_port_id,
+ (uint16_t)pos, 1);
+   }
+   found = rte_bitmap_scan(internals->vlan_filter_bmp,
+   , );
+   } while (found && first != pos && res == 0);
+
+   return res;
+}
+
+static int
 __eth_bond_slave_add_lock_free(uint8_t bonded_port_id, uint8_t slave_port_id)
 {
struct rte_eth_dev *bonded_eth_dev, *slave_eth_dev;
@@ -427,6 +492,9 @@ __eth_bond_slave_add_lock_free(uint8_t bonded_port_id, 
uint8_t slave_port_id)
activate_slave(bonded_eth_dev, slave_port_id);
}
}
+
+   slave_vlan_filter_set(bonded_port_id, slave_port_id);
+
return 0;

 }
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 25fe00a..0b6caf6 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1335,6 +1335,9 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
bonded_eth_dev->data->dev_conf.rxmode.mq_mode;
}

+   slave_eth_dev->data->dev_conf.rxmode.hw_vlan_filter =
+   bonded_eth_dev->data->dev_conf.rxmode.hw_vlan_filter;
+
/* Configure device */
errval = rte_eth_dev_configure(slave_eth_dev->data->port_id,

[dpdk-dev] [PATCH 1/2] net/bonding: validate speed after link up

2016-08-04 Thread Eric Kinzie

It's possible for the bonding driver to mistakenly reject an interface
based in it's, as yet, unnegotiated link speed and duplex.  Always allow
the interface to be added to the bonding interface but require link
properties validation to succeed before slave is activated.

Fixes: 2efb58cbab6e ("bond: new link bonding library")

Signed-off-by: Eric Kinzie 
---
 drivers/net/bonding/rte_eth_bond_api.c |   15 ---
 drivers/net/bonding/rte_eth_bond_pmd.c |   10 ++
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index 203ebe9..3c16973 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -373,21 +373,6 @@ __eth_bond_slave_add_lock_free(uint8_t bonded_port_id, 
uint8_t slave_port_id)
internals->candidate_max_rx_pktlen = dev_info.max_rx_pktlen;

} else {
-   /* Check slave link properties are supported if props are set,
-* all slaves must be the same */
-   if (internals->link_props_set) {
-   if 
(link_properties_valid(&(bonded_eth_dev->data->dev_link),
- 
&(slave_eth_dev->data->dev_link))) {
-   slave_eth_dev->data->dev_flags &= 
(~RTE_ETH_DEV_BONDED_SLAVE);
-   RTE_BOND_LOG(ERR,
-   "Slave port %d link 
speed/duplex not supported",
-   slave_port_id);
-   return -1;
-   }
-   } else {
-   link_properties_set(bonded_eth_dev,
-   &(slave_eth_dev->data->dev_link));
-   }
internals->rx_offload_capa &= dev_info.rx_offload_capa;
internals->tx_offload_capa &= dev_info.tx_offload_capa;
internals->flow_type_rss_offloads &= 
dev_info.flow_type_rss_offloads;
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index b20a272..25fe00a 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1985,6 +1985,16 @@ bond_ethdev_lsc_event_callback(uint8_t port_id, enum 
rte_eth_event_type type,
/* Inherit eth dev link properties from first active 
slave */
link_properties_set(bonded_eth_dev,
&(slave_eth_dev->data->dev_link));
+   } else {
+   if (link_properties_valid(
+   _eth_dev->data->dev_link, ) != 0) {
+   slave_eth_dev->data->dev_flags &=
+   (~RTE_ETH_DEV_BONDED_SLAVE);
+   RTE_LOG(ERR, PMD,
+   "port %u invalid speed/duplex\n",
+   port_id);
+   return;
+   }
}

activate_slave(bonded_eth_dev, port_id);
-- 
1.7.10.4

[dpdk-dev] [PATCH 0/2] bonding link validation and vlan filters

2016-08-04 Thread Eric Kinzie

This series contains a fix to the validation of interfaces added to a
bond group and an enhancement to allow reception of tagged frames on
slave interfaces that are virtual functions.

Eric Kinzie (2):
  net/bonding: validate speed after link up
  net/bonding: enable slave VLAN filter

 drivers/net/bonding/rte_eth_bond_api.c |   83 +++-
 drivers/net/bonding/rte_eth_bond_pmd.c |   46 +++
 drivers/net/bonding/rte_eth_bond_private.h |4 ++
 3 files changed, 118 insertions(+), 15 deletions(-)

-- 
1.7.10.4

[dpdk-dev] Using qemu-system-x86_64 of KVM sourceforge project versus qemu-system-x86_64 of http://www.qemu.org/

2016-08-04 Thread Mauricio Vasquez



On 08/04/2016 10:27 AM, Kevin Wilson wrote:
> Hi Mauricio,
> Thanks!
>
>> be sure that kvm is enabled by
>> setting "accel=kvm" in the qemu command line.
> Isn't "--enable-kvm" intended for that ?

Sorry, I meant "-machine accel=kvm".
I don't know what is the exact difference but enable-kvm should also work.
>
> Kevin
>

[dpdk-dev] Running DPDK in a VM

2016-08-04 Thread Vaibhav Sood

Hi!

I am looking at running DPDK in a VM, I would like to know if there are any 
limitations when doing this in terms of any DPDK features that do not work in a 
VM

The only post I came across for running DPDK in a VM is this: 
http://dpdk.org/ml/archives/dev/2013-September/000441.html

As an example, I tried to run the DPDK test suite (DTS, 
http://dpdk.org/doc/dts/gsg/ ) and see some tests fail saying virtio nics are 
not supported (Specifically, the ethertype_filter and five_tuple_filter tests 
give an error: FAILED ''virtio nic not support syn filter''). I would like to 
know if there are any limitations along these lines when running DPDK in VMs as 
compared to physical machines/NICS

Thanks!
Vaibhav


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

[dpdk-dev] Using qemu-system-x86_64 of KVM sourceforge project versus qemu-system-x86_64 of http://www.qemu.org/

2016-08-04 Thread Mauricio Vasquez

Hi Kevin,

On 08/04/2016 08:55 AM, Kevin Wilson wrote:
> Hi,
> I am trying to use DPDK  SRIOV passthrough with DPDK on Intel NICs.
> I am following the instructions in "Network Interface Controller Drivers", in
> http://fast.dpdk.org/doc/pdf-guides/nics-16.07.pdf
>
> I saw in "11.2 Setting Up a KVM Virtual Machine Monitor" in this pdf
> that it says to
> download qemu-kvm-0.14.0 from
> http://sourceforge.net/projects/kvm/files/qemu-kvm/
> ,build it and run /usr/local/kvm/bin/qemu-system-x86_64
>
> My question is:
> there is also qemu-system-x86_64 executable in qemu Fedora rpms.
> It is built from the qemu project which is hosted on http://www.qemu.org.
> I made a brief comparison between the trees of both these projects
> (qemu and KVM qemu-kvm) and
> they seem different.
>
> Should SRIOV passthrough, as described in the aforementioned DPDK
> nics-16.07.pdf, work also with this qemu-system-x86_64 from the qemu
> project?
> or is using the qemu-system-x86_64 of the qemu project is not good
> enough, and using
> KVM qemu-system-x86_64 from
> http://sourceforge.net/projects/kvm/files/qemu-kvm/ is a must ?
I have used qemu-system-x86_64, (directly downloaded and compiled from 
http://www.qemu.org) without any issue, be sure that kvm is enabled by 
setting "accel=kvm" in the qemu command line.

> Regards,
> Kevin
>
Regards,

Mauricio V,

[dpdk-dev] FW: DPDK Community Survey - about to close

2016-08-04 Thread Glynn, Michael J

Hi all

Final reminder that the survey closes today
Thanks to those who have already provided their input!

Regards
Mike


-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Tuesday, August 2, 2016 1:45 PM
To: users at dpdk.org; dev at dpdk.org
Cc: Glynn, Michael J 
Subject: Re: DPDK Community Survey - about to close

Just 2 minutes for DPDK ;)
http://surveymonkey.com/r/DPDK_Community_Survey


2016-08-02 14:39, Thomas Monjalon:
> Hi all,
> 
> That's the first time a DPDK survey is published.
> It will help us in our future progress to decide what are the most 
> important stuff to work on.
> 
> Please do not wait to fill it out. It closes on August 4.
> Thanks for taking 2 minutes now to give your feedback.
> When you will have done your duty, you might peacefully do your next 
> task or just enjoy a nice August month :)
> 
> Each voice counts! Thanks
> 
> 
> 2016-07-28 16:45, Glynn, Michael J:
> > Hi all
> > 
> > As part of our ongoing efforts to improve DPDK, we'd like to hear 
> > your feedback!
> > 
> > We have created a number of DPDK-related questions here 
> > https://www.surveymonkey.com/r/DPDK_Community_Survey
> > and want to hear your views!!
> > 
> > The survey will close at midnight GMT on Thursday August 4th
> > 
> > Thanks in advance for your feedback - the more responses we get the 
> > more data we have to drive further features, improvements, etc...
> > so please respond!!
> > 
> > Regards
> > Mike
>

[dpdk-dev] [PATCH] examples/exception_path: fix shift operation in lcore setup

2016-08-04 Thread Ferruh Yigit

On 8/3/2016 12:44 PM, Daniel Mrzyglod wrote:
> The operaton may have an undefined behavior or yield to an unexpected result.
> A bit shift operation has a shift amount which is too large or has a negative 
> value.
> 
> Coverity issue: 30688
> Fixes: ea977ff1cb0b ("examples/exception_path: fix shift operation in lcore 
> setup")
> The previous patch forget to fix values also for input_cores_mask
> 
> Signed-off-by: Daniel Mrzyglod 
> ---
>  examples/exception_path/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/examples/exception_path/main.c b/examples/exception_path/main.c
> index e5eedcc..88e7708 100644
> --- a/examples/exception_path/main.c
> +++ b/examples/exception_path/main.c
> @@ -341,7 +341,7 @@ setup_port_lcore_affinities(void)
>  
>   /* Setup port_ids[] array, and check masks were ok */
>   RTE_LCORE_FOREACH(i) {
> - if (input_cores_mask & (1ULL << i)) {
> + if (input_cores_mask & (1ULL << (i & 0x3f))) {

I guess 0x3f is because "unsigned long long" is 64bits long, not sure if
we should hardcode this assumption. ULL can be >= 64bits.

RTE_LCORE_FOREACH(i) already makes sure "i" < RTE_MAX_CORE, and
RTE_MAX_CORE is 128 with current default config. So if user provides a
core value > 64, it is valid but will be ignored because of this check.

Another thing is "input_cores_mask" is also 64bits long, so even this
fixed application will not able to use this setting correctly.

I think it is good to
a) add flexible variable size set_bit/clear_bit/test_bit functions, like
Linux ones
b) make "input_cores_mask" an array that is large enough to keep
RTE_MAX_CORE

Although not sure if that is too much effort for this fix.

>   /* Skip ports that are not enabled */
>   while ((ports_mask & (1 << rx_port)) == 0) {
>   rx_port++;
>

[dpdk-dev] Using qemu-system-x86_64 of KVM sourceforge project versus qemu-system-x86_64 of http://www.qemu.org/

2016-08-04 Thread Kevin Wilson

Hi,
I am trying to use DPDK  SRIOV passthrough with DPDK on Intel NICs.
I am following the instructions in "Network Interface Controller Drivers", in
http://fast.dpdk.org/doc/pdf-guides/nics-16.07.pdf

I saw in "11.2 Setting Up a KVM Virtual Machine Monitor" in this pdf
that it says to
download qemu-kvm-0.14.0 from
http://sourceforge.net/projects/kvm/files/qemu-kvm/
,build it and run /usr/local/kvm/bin/qemu-system-x86_64

My question is:
there is also qemu-system-x86_64 executable in qemu Fedora rpms.
It is built from the qemu project which is hosted on http://www.qemu.org.
I made a brief comparison between the trees of both these projects
(qemu and KVM qemu-kvm) and
they seem different.

Should SRIOV passthrough, as described in the aforementioned DPDK
nics-16.07.pdf, work also with this qemu-system-x86_64 from the qemu
project?
or is using the qemu-system-x86_64 of the qemu project is not good
enough, and using
KVM qemu-system-x86_64 from
http://sourceforge.net/projects/kvm/files/qemu-kvm/ is a must ?

Regards,
Kevin

[dpdk-dev] [PATCH] app/testpmd: fix RSS-hash-key size

2016-08-04 Thread Ananyev, Konstantin

Hi Awal,

As I said in the offline discussion, here are few nits
that I think need to be addressed.
See below. 
Konstantin

> 
> RSS hash-key-size is retrieved from device configuration instead of using a 
> fixed size of 40 bytes.
> 
> Fixes: f79959ea1504 ("app/testpmd: allow to configure RSS hash key")
> 
> Signed-off-by: Mohammad Abdul Awal 
> ---
>  app/test-pmd/cmdline.c | 24 +---  app/test-pmd/config.c  
> | 17 ++---
>  2 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 
> f90befc..14412b4 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -1608,7 +1608,6 @@ struct cmd_config_rss_hash_key {
>   cmdline_fixed_string_t key;
>  };
> 
> -#define RSS_HASH_KEY_LENGTH 40
>  static uint8_t
>  hexa_digit_to_value(char hexa_digit)
>  {
> @@ -1640,20 +1639,30 @@ cmd_config_rss_hash_key_parsed(void *parsed_result,
>  __attribute__((unused)) void *data)  {
>   struct cmd_config_rss_hash_key *res = parsed_result;
> - uint8_t hash_key[RSS_HASH_KEY_LENGTH];
> + uint8_t hash_key[16 * 4];

No need for hard-coded constants.
I'd suggest to keep RSS_HASH_KEY_LENGTH, just increase it to 52 (or might be 
even bigger) value.

>   uint8_t xdgt0;
>   uint8_t xdgt1;
>   int i;
> + struct rte_eth_dev_info dev_info;
> + uint8_t hash_key_size;
> 
> + memset(_info, 0, sizeof(dev_info));
> + rte_eth_dev_info_get(res->port_id, _info);
> + if (dev_info.hash_key_size > 0) {

&& dev_info.hash_key_size <= sizeof(hash_key) {

> + hash_key_size = dev_info.hash_key_size;
> + } else {
> + printf("dev_info did not provide a valid hash key size\n");
> + return;
> + }
>   /* Check the length of the RSS hash key */
> - if (strlen(res->key) != (RSS_HASH_KEY_LENGTH * 2)) {
> + if (strlen(res->key) != (hash_key_size * 2)) {
>   printf("key length: %d invalid - key must be a string of %d"
>  "hexa-decimal numbers\n", (int) strlen(res->key),
> -RSS_HASH_KEY_LENGTH * 2);
> +hash_key_size * 2);
>   return;
>   }
>   /* Translate RSS hash key into binary representation */
> - for (i = 0; i < RSS_HASH_KEY_LENGTH; i++) {
> + for (i = 0; i < hash_key_size; i++) {
>   xdgt0 = parse_and_check_key_hexa_digit(res->key, (i * 2));
>   if (xdgt0 == 0xFF)
>   return;
> @@ -1663,7 +1672,7 @@ cmd_config_rss_hash_key_parsed(void *parsed_result,
>   hash_key[i] = (uint8_t) ((xdgt0 * 16) + xdgt1);
>   }
>   port_rss_hash_key_update(res->port_id, res->rss_type, hash_key,
> -  RSS_HASH_KEY_LENGTH);
> + hash_key_size);
>  }
> 
>  cmdline_parse_token_string_t cmd_config_rss_hash_key_port = @@ -1692,7 
> +1701,8 @@ cmdline_parse_inst_t
> cmd_config_rss_hash_key = {
>   "port config X rss-hash-key ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|"
>   "ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|"
>   "ipv6-sctp|ipv6-other|l2-payload|"
> - "ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex 80 hexa digits\n",
> + "ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex "
> + "80 hexa digits (104 hexa digits for fortville)\n",

No need to mention particular NIC (Fortville) here.
I'd say better: 'array of hex digits (variable size, NIC dependent)' or so.

>   .tokens = {
>   (void *)_config_rss_hash_key_port,
>   (void *)_config_rss_hash_key_config,
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 
> bfcbff9..851408b 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -1012,14 +1012,25 @@ void
>  port_rss_hash_conf_show(portid_t port_id, char rss_info[], int show_rss_key) 
>  {
>   struct rte_eth_rss_conf rss_conf;
> - uint8_t rss_key[10 * 4] = "";
> + uint8_t rss_key[16 * 4] = "";

better rss_key[RSS_HASH_KEY_LENGTH ], and I think there is no need to put '0' 
into first element.

>   uint64_t rss_hf;
>   uint8_t i;
>   int diag;
> + struct rte_eth_dev_info dev_info;
> + uint8_t hash_key_size;
> 
>   if (port_id_is_invalid(port_id, ENABLED_WARN))
>   return;
> 
> + memset(_info, 0, sizeof(dev_info));
> + rte_eth_dev_info_get(port_id, _info);
> + if (dev_info.hash_key_size > 0) {
> + hash_key_size = dev_info.hash_key_size;
> + } else {
> + printf("dev_info did not provide a valid hash key size\n");
> + return;
> + }
> +
>   rss_conf.rss_hf = 0;
>   for (i = 0; i < RTE_DIM(rss_type_table); i++) {
>   if (!strcmp(rss_info, rss_type_table[i].str)) @@ -1028,7 
> +1039,7 @@ port_rss_hash_conf_show(portid_t port_id, char
> rss_info[], int show_rss_key)
> 
>   /* Get RSS hash key if asked to display it

[dpdk-dev] [PATCH 2/2] examples/tep_term: fix inner L4 checksum failure

2016-08-04 Thread Jianfeng Tan

When sending packets from virtual machine which in need of TSO
by hardware NIC, the inner L4 checksum is not correct on the
other side of the cable.

It's because get_psd_sum() depends on PKT_TX_TCP_SEG to calculate
pseudo-header checksum, but currently this bit is set after the
function get_psd_sum() is called. The fix is straightforward.
Move the bit setting before get_psd_sum() is called.

Fixes: a50245ede72a ("examples/tep_term: initialize VXLAN sample")

Signed-off-by: Jianfeng Tan 
---
 examples/tep_termination/vxlan.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/examples/tep_termination/vxlan.c b/examples/tep_termination/vxlan.c
index 4bad33d..155415c 100644
--- a/examples/tep_termination/vxlan.c
+++ b/examples/tep_termination/vxlan.c
@@ -141,14 +141,17 @@ process_inner_cksums(struct ether_hdr *eth_hdr, union 
tunnel_offload_info *info)
ethertype, ol_flags);
} else if (l4_proto == IPPROTO_TCP) {
tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
-   ol_flags |= PKT_TX_TCP_CKSUM;
-   tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype,
-   ol_flags);
+   /* Put PKT_TX_TCP_SEG bit setting before get_psd_sum(), because
+* it depends on PKT_TX_TCP_SEG to calculate pseudo-header
+* checksum.
+*/
if (tso_segsz != 0) {
ol_flags |= PKT_TX_TCP_SEG;
info->tso_segsz = tso_segsz;
info->l4_len = sizeof(struct tcp_hdr);
}
+   ol_flags |= PKT_TX_TCP_CKSUM;
+   tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype, ol_flags);

} else if (l4_proto == IPPROTO_SCTP) {
sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len);
-- 
2.7.4

[dpdk-dev] [PATCH 1/2] examples/tep_term: fix offload on VXLAN failure

2016-08-04 Thread Jianfeng Tan

Based on previous fix of offload on VXLAN using i40e, applications
need to set proper tunneling type on ol_flags so that i40e driver
can pass it to NIC.

Fixes: a50245ede72a ("examples/tep_term: initialize VXLAN sample")

Signed-off-by: Jianfeng Tan 
---
 examples/tep_termination/vxlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/examples/tep_termination/vxlan.c b/examples/tep_termination/vxlan.c
index 5ee1f95..4bad33d 100644
--- a/examples/tep_termination/vxlan.c
+++ b/examples/tep_termination/vxlan.c
@@ -237,6 +237,8 @@ encapsulation(struct rte_mbuf *m, uint8_t queue_id)
m->outer_l2_len = sizeof(struct ether_hdr);
m->outer_l3_len = sizeof(struct ipv4_hdr);

+   ol_flags |= PKT_TX_TUNNEL_VXLAN;
+
m->ol_flags |= ol_flags;
m->tso_segsz = tx_offload.tso_segsz;

-- 
2.7.4

[dpdk-dev] [PATCH 0/2] Two offloading issues of tep_term

2016-08-04 Thread Jianfeng Tan

This patch set depends on:
 - http://dpdk.org/ml/archives/dev/2016-August/044924.html

Patch 1: fill tunneling type.
Patch 2: inner L4 checksum error.

Signed-off-by: Jianfeng Tan 

Jianfeng Tan (2):
  examples/tep_term: fix offload on VXLAN failure
  examples/tep_term: fix inner L4 checksum failure

 examples/tep_termination/vxlan.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

-- 
2.7.4

42 matches

Mail list logo