[dpdk-dev] [dpdk-users] rte_zmalloc() returning non-zeroed memory on FreeBSD

2016-11-04 Thread Sergio Gonzalez Monroy
On 03/11/2016 20:04, Lewis Donzis wrote:
> I?m curious about how/whether this got resolved.  The 16.07.1 code doesn?t 
> appear to have this fixed.
>
> Is it still forthcoming?
>
> Thanks,
> lew

It should have been fixed in 16.07.1.
http://dpdk.org/browse/dpdk-stable/commit/?id=82f931805506efbb8b5046e9045bec8f04bbabf6

Do you have an easy test to reproduce? can you reproduce it with any of 
the app/examples?

Sergio


[dpdk-dev] [PATCH v5 0/2] app/testpmd: improve multiprocess support

2016-10-18 Thread Sergio Gonzalez Monroy
On 30/09/2016 15:00, Marcin Kerlin wrote:
> This patch ensure not overwrite device data in the multiprocess application.
>
> 1)Changes in the library introduces continuity in array rte_eth_dev_data[]
> shared between all processes. Secondary process adds new entries in free
> space instead of overwriting existing entries.
>
> 2)Changes in application testpmd allow secondary process to attach the
> mempool created by primary process rather than create new and in the case of
> quit or force quit to free devices data from shared array rte_eth_dev_data[].
>
> -
> How to reproduce the bug:
>
> 1) Run primary process:
> ./testpmd -c 0xf -n 4 --socket-mem='512,0' -w 03:00.1 -w 03:00.0
> --proc-type=primary --file-prefix=xz1 -- -i
>
> (gdb) print rte_eth_devices[0].data.name
> $52 = "3:0.1"
> (gdb) print rte_eth_devices[1].data.name
> $53 = "3:0.0"
>
> 2) Run secondary process:
> ./testpmd -c 0xf0 --socket-mem='512,0' -n 4 -v -b 03:00.1 -b 03:00.0
> --vdev 'eth_pcap0,rx_pcap=/var/log/device1.pcap, 
> tx_pcap=/var/log/device2.pcap'
> --proc-type=secondary --file-prefix=xz1 -- -i
>
> (gdb) print rte_eth_devices[0].data.name
> $52 = "eth_pcap0"
> (gdb) print rte_eth_devices[1].data.name
> $53 = "eth_pcap1"
>
> 3) Go back to the primary and re-check:
> (gdb) print rte_eth_devices[0].data.name
> $54 = "eth_pcap0"
> (gdb) print rte_eth_devices[1].data.name
> $55 = "eth_pcap1"
>
> It means that secondary process overwrite data of primary process.
>
> This patch fix it and now if we go back to the primary and re-check then
> everything is fine:
> (gdb) print rte_eth_devices[0].data.name
> $56 = "3:0.1"
> (gdb) print rte_eth_devices[1].data.name
> $57 = "3:0.0"
>
> So after this fix structure rte_eth_dev_data[] will keep all data one after
> the other instead of overwriting:
> (gdb) print rte_eth_dev_data[0].name
> $52 = "3:0.1"
> (gdb) print rte_eth_dev_data[1].name
> $53 = "3:0.0"
> (gdb) print rte_eth_dev_data[2].name
> $54 = "eth_pcap0"
> (gdb) print rte_eth_dev_data[3].name
> $55 = "eth_pcap1"
> and so on will be append in the next indexes
>
> If secondary process will be turned off then also will be deleted from array:
> (gdb) print rte_eth_dev_data[0].name
> $52 = "3:0.1"
> (gdb) print rte_eth_dev_data[1].name
> $53 = "3:0.0"
> (gdb) print rte_eth_dev_data[2].name
> $54 = ""
> (gdb) print rte_eth_dev_data[3].name
> $55 = ""
> this also allows re-use index 2 and 3 for next another process
> -
>
> Breaking ABI:
> Changes in the library librte_ether causes extending existing structure
> rte_eth_dev_data with a new field lock. The reason is that this structure
> is sharing between all the processes so it should be protected against
> attempting to write from two different processes.
>
> Tomasz Kulasek sent announce ABI change in librte_ether on 21 July 2016.
> I would like to join to this breaking ABI, if it is possible.
>
> v2:
> * fix syntax error in version script
> v3:
> * changed scope of function
> * improved description
> v4:
> * fix syntax error in version script
> v5:
> * fix header file
>
> Marcin Kerlin (2):
>librte_ether: add protection against overwrite device data
>app/testpmd: improve handling of multiprocess
>
>   app/test-pmd/testpmd.c | 37 +-
>   app/test-pmd/testpmd.h |  1 +
>   lib/librte_ether/rte_ethdev.c  | 90 
> +++---
>   lib/librte_ether/rte_ethdev.h  | 12 +
>   lib/librte_ether/rte_ether_version.map |  6 +++
>   5 files changed, 136 insertions(+), 10 deletions(-)
>

NACK series for 16.11

The patch would break the use case where primary and secondary share 
same PCI device.

Overall, I think Thomas has already mentioned it, we need further 
discussion on the use cases
and scope of DPDK multi-process.
This could be a good topic for the upcoming DPDK Userspace event.

Sergio


[dpdk-dev] [PATCH] mempool: fix search of maximum contiguous pages

2016-10-13 Thread Sergio Gonzalez Monroy
+Olivier

On 13/10/2016 10:37, Wei Dai wrote:
> paddr[i] + pg_sz always points to the start physical address of the
> 2nd page after pddr[i], so only up to 2 pages can be combinded to
> be used. With this revision, more than 2 pages can be used.
>
> Fixes: 84121f197187 ("mempool: store memory chunks in a list")
>
> Signed-off-by: Wei Dai 
> ---
>   lib/librte_mempool/rte_mempool.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_mempool/rte_mempool.c 
> b/lib/librte_mempool/rte_mempool.c
> index 71017e1..e3e254a 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -426,9 +426,12 @@ rte_mempool_populate_phys_tab(struct rte_mempool *mp, 
> char *vaddr,
>   
>   for (i = 0; i < pg_num && mp->populated_size < mp->size; i += n) {
>   
> + phys_addr_t paddr_next;
> + paddr_next = paddr[i] + pg_sz;
> +
>   /* populate with the largest group of contiguous pages */
>   for (n = 1; (i + n) < pg_num &&
> -  paddr[i] + pg_sz == paddr[i+n]; n++)
> +  paddr_next == paddr[i+n]; n++, paddr_next += pg_sz)
>   ;
>   
>   ret = rte_mempool_populate_phys(mp, vaddr + i * pg_sz,




[dpdk-dev] [PATCH] examples/ipsec-secgw: Update checksum while decrementing ttl

2016-10-10 Thread Sergio Gonzalez Monroy
On 07/10/2016 21:53, De Lara Guarch, Pablo wrote:
>> -Original Message-
>> From: Akhil Goyal [mailto:akhil.goyal at nxp.com]
>> Sent: Tuesday, October 04, 2016 11:33 PM
>> To: De Lara Guarch, Pablo; Gonzalez Monroy, Sergio; dev at dpdk.org
>> Subject: Re: [PATCH] examples/ipsec-secgw: Update checksum while
>> decrementing ttl
>>
>> On 10/5/2016 6:04 AM, De Lara Guarch, Pablo wrote:
>>>
>>>> -Original Message-
>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sergio Gonzalez
>>>> Monroy
>>>> Sent: Monday, September 26, 2016 6:28 AM
>>>> To: akhil.goyal at nxp.com; dev at dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH] examples/ipsec-secgw: Update checksum
>>>> while decrementing ttl
>>>>
>>>> Hi Akhil,
>>>>
>>>> This application relies on checksum offload in both outbound and
>> inbound
>>>> paths (PKT_TX_IP_CKSUM flag).
>> [Akhil]Agreed that the application relies on checksum offload, but here
>> we are talking about the inner ip header. Inner IP checksum will be
>> updated on the next end point after decryption. This would expect that
>> the next end point must have checksum offload capability. What if we are
>> capturing the encrypted packets on wireshark or say send it to some
>> other machine which does not run DPDK and do not know about checksum
>> offload, then wireshark/other machine will not be able to get the
>> correct the checksum and will show error.

Understood, we need to have a valid inner checksum.
RFC1624 states that the computation would be incorrect in 
corner/boundary case.
I reckon you are basing your incremental update on RFC1141?

Also I think you should take care of endianess and increment the 
checksum with
host_to_be(0x0100) instead of +1.

>>>> Because we assume that we always forward the packet in both paths, we
>>>> decrement the ttl in both inbound and outbound.
>>>> You seem to only increment (recalculate) the checksum of the inner IP
>>>> header in the outbound path but not the inbound path.
>> [Akhil]Correct I missed out the inbound path.
>>>> Also, in the inbound path you have to consider a possible ECN value
>> update.
>> [Akhil]If I take care of the ECN then it would mean I need to calculate
>> the checksum completely, incremental checksum wont give correct results.
>> This would surely impact performance. Any suggestion on how should we
>> take care of ECN update. Should I recalculate the checksum and send the
>> patch for ECN update? Or do we have a better solution.

If I am understanding the RFCs mentioned above correctly, you should be 
able to do
incremental checksum update for any 16bit field/value of the IP header.
I don't see no reason why you couldn't do something like that, except 
that you would
have to follow the full equation instead of just adding 0x0100, which 
would be always
the case when decrementing TTL.

What do you think?

Sergio

>>> Any further comments here, Akhil?
>>>
>>> Thanks,
>>> Pablo
>>>
>> [Akhil] Sorry I missed out the previous reply from Sergio.
> Any more comments, Sergio?
>
> Pablo
>> Thanks,
>> Akhil




[dpdk-dev] [PATCH v2 1/3] mem: fix hugepage mapping error messages

2016-10-04 Thread Sergio Gonzalez Monroy
On 04/10/2016 18:17, Jean Tourrilhes wrote:
> Running secondary is tricky due to the need to map the memory region
> at the right place in VM, which is whatever primary has chosen. If the
> base address for primary happens to by already mapped in the
> secondary, we will hit precisely these error messages (depending if we
> fail on the config region or the hugepages). This is why there is
> already a comment about ASLR.
>
> The issue is that in most cases, remapping does not happen and "errno"
> is not changed and therefore stale. In our case, we got a "permission
> denied", which sent us down the wrong track. It's such a common error
> for secondary that I feel this error message should be unambiguous and
> helpful.
> The call to close was also moved because close() may override errno.
>
> Signed-off-by: Jean Tourrilhes 
> ---
>   lib/librte_eal/linuxapp/eal/eal.c| 14 +++---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 16 ----
>   2 files changed, 23 insertions(+), 7 deletions(-)

Acked-by: Sergio Gonzalez Monroy 


[dpdk-dev] [PATCH 1/1] eal: Don't fail secondary if primary is missing tailqs

2016-10-04 Thread Sergio Gonzalez Monroy
Hi Jean,

As with your other patch, commit title needs fixing and also missing 
Signed-off-by line.

On 22/09/2016 22:17, Jean Tourrilhes wrote:
>   lib/librte_eal/common/eal_common_tailqs.c | 15 ---
>   1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_eal/common/eal_common_tailqs.c 
> b/lib/librte_eal/common/eal_common_tailqs.c
> index bb08ec8..6960d06 100644
> --- a/lib/librte_eal/common/eal_common_tailqs.c
> +++ b/lib/librte_eal/common/eal_common_tailqs.c
> @@ -143,6 +143,8 @@ rte_eal_tailq_update(struct rte_tailq_elem *t)
>   t->head = rte_eal_tailq_create(t->name);
>   } else {
>   t->head = rte_eal_tailq_lookup(t->name);
> + if (t->head != NULL)
> + rte_tailqs_count++;
>   }
>   }
>   
> @@ -188,9 +190,16 @@ rte_eal_tailqs_init(void)
>   if (t->head == NULL) {
>   RTE_LOG(ERR, EAL,
>   "Cannot initialize tailq: %s\n", t->name);
> - /* no need to TAILQ_REMOVE, we are going to panic in
> -  * rte_eal_init() */
> - goto fail;
> + if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> + /* no need to TAILQ_REMOVE, we are going
> +  * to panic in rte_eal_init() */
> + goto fail;
> + } else {
> + /* This means our list of constructor is
> +  * no the same as primary. Just remove
> +  * that missing tailq and continue */
> + TAILQ_REMOVE(_tailq_elem_head, t, next);
> + }
>   }
>   }
>   
I might be missing something here so bear with me.
The case you are trying to fix is, as an example, when your secondary 
app is using LPM but your primary is not.
So basically with this patch, you are removing the tailq for LPM on 
secondary and continuing as normal, is that the case?

I am not convinced about this approach.
I guess the assumption here is that all the TAILQ infrastructure works 
even when the tailq list itself is NULL?
I say assumption because I don't have an easy way to trigger this use 
case with default DPDK sample/test apps.

What about letting the secondary process create a tailq if it doesn't 
exists?
We would need to lock protect at least the register function to avoid 
race conditions when having multiple secondary processes.

David, what do you think?

Sergio




[dpdk-dev] [PATCH 1/1 v2] eal: Fix misleading error messages, errno can't be trusted.

2016-10-04 Thread Sergio Gonzalez Monroy
On 03/10/2016 21:46, Thomas Monjalon wrote:
> 2016-10-03 08:55, Jean Tourrilhes:
>> On Mon, Oct 03, 2016 at 02:25:40PM +0100, Sergio Gonzalez Monroy wrote:
>>> Hi Jean,
>>>
>>> There are some format issues with the patch:
>>>
>>> You can run scripts/check-git-log.sh to check them:
>>> Wrong headline format:
>>>  eal: Fix misleading error messages, errno can't be trusted.
>>> Wrong headline uppercase:
>>>  eal: Fix misleading error messages, errno can't be trusted.
>>> Missing 'Fixes' tag:
>>>  eal: Fix misleading error messages, errno can't be trusted.
>>>
>>> The script's output highlights the different issues.
>>  SOrry about that, I casually read the page on
>> http://dpdk.org/dev, but obviously I need to look at it again.
> No problem. This guide is more oriented towards regular contributors.
> You come with a bug and its fix, we can make some effort to format
> the patch :)
>
> The title could be "mem: fix hugepage mapping error messages"
>
>>> On 21/09/2016 22:10, Jean Tourrilhes wrote:
>>>> @@ -263,9 +264,16 @@ rte_eal_config_reattach(void)
>>>>mem_config = (struct rte_mem_config *) mmap(rte_mem_cfg_addr,
>>>>sizeof(*mem_config), PROT_READ | PROT_WRITE, MAP_SHARED,
>>>>mem_cfg_fd, 0);
>>>> +  if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr) {
>>>> +  if (mem_config != MAP_FAILED)
>>>> +  /* errno is stale, don't use */
>>>> +  rte_panic("Cannot mmap memory for rte_config at [%p], 
>>>> got [%p] - please use '--base-virtaddr' option\n",
>>>> +rte_mem_cfg_addr, mem_config);
>>>> +  else
>>>> +  rte_panic("Cannot mmap memory for rte_config! error %i 
>>>> (%s)\n",
>>>> +errno, strerror(errno));
>>>> +  }
>>>>close(mem_cfg_fd);
>>>> -  if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr)
>>>> -  rte_panic("Cannot mmap memory for rte_config\n");
>>> NIT but any reason you moved the check before closing the file descriptor?
>>> (not that it matters with current code as we panic anyway)
>>  "close()" may change "errno" according to its man page.
> Sergio, do you have more comments?

Nop.

> Should we wait another version or is it OK?
> Maybe you'd prefer to rework it yourself?

AFAICS it really is just the commit message formatting, so maybe you can 
fix that when applying it?

Sergio



[dpdk-dev] [PATCH 1/1 v2] eal: Fix misleading error messages, errno can't be trusted.

2016-10-03 Thread Sergio Gonzalez Monroy
Hi Jean,

There are some format issues with the patch:

You can run scripts/check-git-log.sh to check them:
Wrong headline format:
 eal: Fix misleading error messages, errno can't be trusted.
Wrong headline uppercase:
 eal: Fix misleading error messages, errno can't be trusted.
Missing 'Fixes' tag:
 eal: Fix misleading error messages, errno can't be trusted.

The script's output highlights the different issues.


On 21/09/2016 22:10, Jean Tourrilhes wrote:
>   lib/librte_eal/linuxapp/eal/eal.c| 14 +++---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 16 
>   2 files changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
> b/lib/librte_eal/linuxapp/eal/eal.c
> index 3fb2188..5df9f6a 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -238,7 +238,8 @@ rte_eal_config_attach(void)
>   mem_config = (struct rte_mem_config *) mmap(NULL, sizeof(*mem_config),
>   PROT_READ, MAP_SHARED, mem_cfg_fd, 0);
>   if (mem_config == MAP_FAILED)
> - rte_panic("Cannot mmap memory for rte_config\n");
> + rte_panic("Cannot mmap memory for rte_config! error %i (%s)\n",
> +   errno, strerror(errno));
>   
>   rte_config.mem_config = mem_config;
>   }
> @@ -263,9 +264,16 @@ rte_eal_config_reattach(void)
>   mem_config = (struct rte_mem_config *) mmap(rte_mem_cfg_addr,
>   sizeof(*mem_config), PROT_READ | PROT_WRITE, MAP_SHARED,
>   mem_cfg_fd, 0);
> + if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr) {
> + if (mem_config != MAP_FAILED)
> + /* errno is stale, don't use */
> + rte_panic("Cannot mmap memory for rte_config at [%p], 
> got [%p] - please use '--base-virtaddr' option\n",
> +   rte_mem_cfg_addr, mem_config);
> + else
> + rte_panic("Cannot mmap memory for rte_config! error %i 
> (%s)\n",
> +   errno, strerror(errno));
> + }
>   close(mem_cfg_fd);
> - if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr)
> - rte_panic("Cannot mmap memory for rte_config\n");
>   

NIT but any reason you moved the check before closing the file 
descriptor? (not that it matters with current code as we panic anyway)

Thanks,
Sergio

>   rte_config.mem_config = mem_config;
>   }
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 41e0a92..b036ffc 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1615,10 +1615,18 @@ rte_eal_hugepage_attach(void)
>PROT_READ, MAP_PRIVATE, fd_zero, 0);
>   if (base_addr == MAP_FAILED ||
>   base_addr != mcfg->memseg[s].addr) {
> - RTE_LOG(ERR, EAL, "Could not mmap %llu bytes "
> - "in /dev/zero to requested address [%p]: 
> '%s'\n",
> - (unsigned long long)mcfg->memseg[s].len,
> - mcfg->memseg[s].addr, strerror(errno));
> + if (base_addr != MAP_FAILED)
> + /* errno is stale, don't use */
> + RTE_LOG(ERR, EAL, "Could not mmap %llu bytes "
> + "in /dev/zero at [%p], got [%p] - "
> + "please use '--base-virtaddr' option\n",
> + (unsigned long long)mcfg->memseg[s].len,
> + mcfg->memseg[s].addr, base_addr);
> + else
> + RTE_LOG(ERR, EAL, "Could not mmap %llu bytes "
> + "in /dev/zero at [%p]: '%s'\n",
> + (unsigned long long)mcfg->memseg[s].len,
> + mcfg->memseg[s].addr, strerror(errno));
>   if (aslr_enabled() > 0) {
>   RTE_LOG(ERR, EAL, "It is recommended to "
>   "disable ASLR in the kernel "




[dpdk-dev] [PATCH] eal: fix crash on mmap error in rte_eal_hugepage_attach()

2016-10-03 Thread Sergio Gonzalez Monroy
On 28/09/2016 11:52, maciej.czekaj at caviumnetworks.com wrote:
> From: Maciej Czekaj 
>
> In ASLR-enabled system, it is possible that selected
> virtual space is occupied by program segments. Therefore,
> error path should not blindly unmap all memmory segments
> but only those already mapped.
>
> Steps that lead to crash:
> 1. memeseg 0 in secondary process overlaps
> with libc.so
> 2. mmap of /dev/zero fails for virtual space of memseg 0
> 3. munmap of memseg 0 leads to unmapping libc.so itself
> 4. app gets SIGSEGV after returning from syscall to libc
>
> Fixes: ea329d7f8e34 ("mem: fix leak after mapping failure")
>
> Signed-off-by: Maciej Czekaj 
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 11 ++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)

Acked-by: Sergio Gonzalez Monroy 



[dpdk-dev] [PATCH 3/3] examples/ipsec-secgw: add AES-GCM support

2016-08-23 Thread Sergio Gonzalez Monroy
Add support for AES-GCM (Galois-Counter Mode).

RFC4106: The Use of Galois-Counter Mode (GCM) in IPSec ESP.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/esp.c   | 59 ++--
 examples/ipsec-secgw/ipsec.h |  9 +++
 examples/ipsec-secgw/sa.c| 36 +--
 3 files changed, 90 insertions(+), 14 deletions(-)

diff --git a/examples/ipsec-secgw/esp.c b/examples/ipsec-secgw/esp.c
index d640ab2..3852c7e 100644
--- a/examples/ipsec-secgw/esp.c
+++ b/examples/ipsec-secgw/esp.c
@@ -90,6 +90,8 @@ esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
sa->iv_len;
sym_cop->cipher.data.length = payload_len;

+   struct cnt_blk *nonce;
+   uint8_t *aad;
uint8_t *iv = RTE_PTR_ADD(ip4, ip_hdr_len + sizeof(struct esp_hdr));

switch (sa->cipher_algo) {
@@ -99,14 +101,41 @@ esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
sym_cop->cipher.iv.phys_addr = rte_pktmbuf_mtophys_offset(m,
 ip_hdr_len + sizeof(struct esp_hdr));
sym_cop->cipher.iv.length = sa->iv_len;
+   break;
+   case RTE_CRYPTO_CIPHER_AES_GCM:
+   nonce = get_cnt_blk(m);
+   nonce->salt = sa->salt;
+   memcpy(>iv, iv, 8);
+   sym_cop->cipher.iv.data = (uint8_t *)nonce;
+   sym_cop->cipher.iv.phys_addr = rte_pktmbuf_mtophys_offset(m,
+(uint8_t *)nonce - rte_pktmbuf_mtod(m, uint8_t *));
+   /* FIXME should be 12 for GCM, but API needs 16 */
+   sym_cop->cipher.iv.length = 16;
+   break;
+   default:
+   RTE_LOG(ERR, IPSEC_ESP, "unsupported cipher algorithm %u\n",
+   sa->cipher_algo);
+   return -EINVAL;
+   }

+   switch (sa->auth_algo) {
+   case RTE_CRYPTO_AUTH_NULL:
+   case RTE_CRYPTO_AUTH_SHA1_HMAC:
sym_cop->auth.data.offset = ip_hdr_len;
sym_cop->auth.data.length = sizeof(struct esp_hdr) +
sa->iv_len + payload_len;
break;
+   case RTE_CRYPTO_AUTH_AES_GCM:
+   aad = get_aad(m);
+   memcpy(aad, iv - sizeof(struct esp_hdr), 8);
+   sym_cop->auth.aad.data = aad;
+   sym_cop->auth.aad.phys_addr = rte_pktmbuf_mtophys_offset(m,
+   aad - rte_pktmbuf_mtod(m, uint8_t *));
+   sym_cop->auth.aad.length = 8;
+   break;
default:
-   RTE_LOG(ERR, IPSEC_ESP, "unsupported cipher algorithm %u\n",
-   sa->cipher_algo);
+   RTE_LOG(ERR, IPSEC_ESP, "unsupported auth algorithm %u\n",
+   sa->auth_algo);
return -EINVAL;
}

@@ -291,6 +320,12 @@ esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,
sizeof(struct esp_hdr);
sym_cop->cipher.data.length = pad_payload_len + sa->iv_len;
break;
+   case RTE_CRYPTO_CIPHER_AES_GCM:
+   *iv = sa->seq;
+   sym_cop->cipher.data.offset = ip_hdr_len +
+   sizeof(struct esp_hdr) + sa->iv_len;
+   sym_cop->cipher.data.length = pad_payload_len;
+   break;
default:
RTE_LOG(ERR, IPSEC_ESP, "unsupported cipher algorithm %u\n",
sa->cipher_algo);
@@ -312,16 +347,26 @@ esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,
/* FIXME should be 12 for GCM, but API needs 16 */
sym_cop->cipher.iv.length = 16;

-   switch (sa->cipher_algo) {
-   case RTE_CRYPTO_CIPHER_NULL:
-   case RTE_CRYPTO_CIPHER_AES_CBC:
+   uint8_t *aad;
+
+   switch (sa->auth_algo) {
+   case RTE_CRYPTO_AUTH_NULL:
+   case RTE_CRYPTO_AUTH_SHA1_HMAC:
sym_cop->auth.data.offset = ip_hdr_len;
sym_cop->auth.data.length = sizeof(struct esp_hdr) +
sa->iv_len + pad_payload_len;
break;
+   case RTE_CRYPTO_AUTH_AES_GCM:
+   aad = get_aad(m);
+   memcpy(aad, esp, 8);
+   sym_cop->auth.aad.data = aad;
+   sym_cop->auth.aad.phys_addr = rte_pktmbuf_mtophys_offset(m,
+   aad - rte_pktmbuf_mtod(m, uint8_t *));
+   sym_cop->auth.aad.length = 8;
+   break;
default:
-   RTE_LOG(ERR, IPSEC_ESP, "unsupported cipher algorithm %u\n",
-   sa->cipher_algo);
+   RTE_LOG(ERR, IPSEC_ESP, "unsupported auth algorithm %u\n",
+   sa->auth_algo);
return -EINVAL;
   

[dpdk-dev] [PATCH 2/3] examples/ipsec-secgw: reset crypto operation status

2016-08-23 Thread Sergio Gonzalez Monroy
Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/ipsec.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index 1e87d0d..f49143b 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -124,6 +124,7 @@ ipsec_enqueue(ipsec_xform_fn xform_func, struct ipsec_ctx 
*ipsec_ctx,
priv->sa = sa;

priv->cop.type = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
+   priv->cop.status = RTE_CRYPTO_OP_STATUS_NOT_PROCESSED;

rte_prefetch0(>sym_cop);
priv->cop.sym = >sym_cop;
-- 
2.5.5



[dpdk-dev] [PATCH 1/3] examples/ipsec-secgw: change CBC IV generation

2016-08-23 Thread Sergio Gonzalez Monroy
NIST SP800-38A recommends two methods to generate unpredictable IVs
(Initilisation Vector) for CBC mode:
1) Apply the forward function to a nonce (ie. counter)
2) Use a FIPS-approved random number generator

This patch implements the first recommended method by using the forward
function to generate the IV.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/esp.c   | 99 +---
 examples/ipsec-secgw/ipsec.h | 33 ++-
 2 files changed, 88 insertions(+), 44 deletions(-)

diff --git a/examples/ipsec-secgw/esp.c b/examples/ipsec-secgw/esp.c
index 05caa77..d640ab2 100644
--- a/examples/ipsec-secgw/esp.c
+++ b/examples/ipsec-secgw/esp.c
@@ -50,21 +50,6 @@
 #include "esp.h"
 #include "ipip.h"

-static inline void
-random_iv_u64(uint64_t *buf, uint16_t n)
-{
-   uint32_t left = n & 0x7;
-   uint32_t i;
-
-   RTE_ASSERT((n & 0x3) == 0);
-
-   for (i = 0; i < (n >> 3); i++)
-   buf[i] = rte_rand();
-
-   if (left)
-   *((uint32_t *)[i]) = (uint32_t)lrand48();
-}
-
 int
 esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
@@ -98,22 +83,32 @@ esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
return -EINVAL;
}

-   sym_cop = (struct rte_crypto_sym_op *)(cop + 1);
+   sym_cop = get_sym_cop(cop);

sym_cop->m_src = m;
sym_cop->cipher.data.offset =  ip_hdr_len + sizeof(struct esp_hdr) +
sa->iv_len;
sym_cop->cipher.data.length = payload_len;

-   sym_cop->cipher.iv.data = rte_pktmbuf_mtod_offset(m, void*,
-ip_hdr_len + sizeof(struct esp_hdr));
-   sym_cop->cipher.iv.phys_addr = rte_pktmbuf_mtophys_offset(m,
-ip_hdr_len + sizeof(struct esp_hdr));
-   sym_cop->cipher.iv.length = sa->iv_len;
+   uint8_t *iv = RTE_PTR_ADD(ip4, ip_hdr_len + sizeof(struct esp_hdr));
+
+   switch (sa->cipher_algo) {
+   case RTE_CRYPTO_CIPHER_NULL:
+   case RTE_CRYPTO_CIPHER_AES_CBC:
+   sym_cop->cipher.iv.data = iv;
+   sym_cop->cipher.iv.phys_addr = rte_pktmbuf_mtophys_offset(m,
+ip_hdr_len + sizeof(struct esp_hdr));
+   sym_cop->cipher.iv.length = sa->iv_len;

-   sym_cop->auth.data.offset = ip_hdr_len;
-   sym_cop->auth.data.length = sizeof(struct esp_hdr) +
-   sa->iv_len + payload_len;
+   sym_cop->auth.data.offset = ip_hdr_len;
+   sym_cop->auth.data.length = sizeof(struct esp_hdr) +
+   sa->iv_len + payload_len;
+   break;
+   default:
+   RTE_LOG(ERR, IPSEC_ESP, "unsupported cipher algorithm %u\n",
+   sa->cipher_algo);
+   return -EINVAL;
+   }

sym_cop->auth.digest.data = rte_pktmbuf_mtod_offset(m, void*,
rte_pktmbuf_pkt_len(m) - sa->digest_len);
@@ -282,10 +277,25 @@ esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,

sa->seq++;
esp->spi = rte_cpu_to_be_32(sa->spi);
-   esp->seq = rte_cpu_to_be_32(sa->seq);
+   esp->seq = rte_cpu_to_be_32((uint32_t)sa->seq);

-   if (sa->cipher_algo == RTE_CRYPTO_CIPHER_AES_CBC)
-   random_iv_u64((uint64_t *)(esp + 1), sa->iv_len);
+   uint64_t *iv = (uint64_t *)(esp + 1);
+
+   sym_cop = get_sym_cop(cop);
+   sym_cop->m_src = m;
+   switch (sa->cipher_algo) {
+   case RTE_CRYPTO_CIPHER_NULL:
+   case RTE_CRYPTO_CIPHER_AES_CBC:
+   memset(iv, 0, sa->iv_len);
+   sym_cop->cipher.data.offset = ip_hdr_len +
+   sizeof(struct esp_hdr);
+   sym_cop->cipher.data.length = pad_payload_len + sa->iv_len;
+   break;
+   default:
+   RTE_LOG(ERR, IPSEC_ESP, "unsupported cipher algorithm %u\n",
+   sa->cipher_algo);
+   return -EINVAL;
+   }

/* Fill pad_len using default sequential scheme */
for (i = 0; i < pad_len - 2; i++)
@@ -293,22 +303,27 @@ esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,
padding[pad_len - 2] = pad_len - 2;
padding[pad_len - 1] = nlp;

-   sym_cop = (struct rte_crypto_sym_op *)(cop + 1);
-
-   sym_cop->m_src = m;
-   sym_cop->cipher.data.offset = ip_hdr_len + sizeof(struct esp_hdr) +
-   sa->iv_len;
-   sym_cop->cipher.data.length = pad_payload_len;
-
-   sym_cop->cipher.iv.data = rte_pktmbuf_mtod_offset(m, uint8_t *,
-ip_hdr_len + sizeof(struct esp_hdr));
+   struct cnt_blk *nonce = get_cnt_blk(m);
+   nonce->salt = sa->salt;
+   nonce->iv = sa->seq

[dpdk-dev] [PATCH 0/3] IPSec AES-GCM support

2016-08-23 Thread Sergio Gonzalez Monroy
This patch set mainly introduces support for AES-GCM.

It also changes the IV generation for AES-CBC mode using the forward
function instead of randomly generating the IV.

Dependencies:
examples/ipsec-secgw: add configuration file support
http://dpdk.org/dev/patchwork/patch/15269/

examples/ipsec-secgw: add sample configuration files
http://dpdk.org/dev/patchwork/patch/15270/

Sergio Gonzalez Monroy (3):
  examples/ipsec-secgw: change CBC IV generation
  examples/ipsec-secgw: reset crypto operation status
  examples/ipsec-secgw: add AES-GCM support

 examples/ipsec-secgw/esp.c   | 144 ++-
 examples/ipsec-secgw/ipsec.c |   1 +
 examples/ipsec-secgw/ipsec.h |  42 -
 examples/ipsec-secgw/sa.c|  36 ---
 4 files changed, 172 insertions(+), 51 deletions(-)

-- 
2.5.5



[dpdk-dev] [PATCH] contigmem: zero all pages during mmap

2016-08-16 Thread Sergio Gonzalez Monroy
On 15/08/2016 19:17, Jim Harris wrote:
> On Linux, all huge pages are zeroed by the kernel before
> first access by the DPDK application.  But on FreeBSD,
> the contigmem driver would only zero the contiguous
> memory regions during initial driver load.
>
> DPDK commit b78c91751 eliminated the explicit memset()
> operation for rte_zmalloc(), which was OK on Linux
> because the kernel zeroes the pages during app start,
> but this broke FreeBSD.  So this patch explicitly
> zeroes the pages before they are mmap'd, to ensure
> equivalent behavior to Linux
>
> Reported-by: Daniel Verkamp 
> Tested-by: Daniel Verkamp 
> Signed-off-by: Jim Harris 
> ---
>   lib/librte_eal/bsdapp/contigmem/contigmem.c |8 ++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/bsdapp/contigmem/contigmem.c 
> b/lib/librte_eal/bsdapp/contigmem/contigmem.c
> index c6ca3b9..da971de 100644
> --- a/lib/librte_eal/bsdapp/contigmem/contigmem.c
> +++ b/lib/librte_eal/bsdapp/contigmem/contigmem.c
> @@ -216,15 +216,19 @@ static int
>   contigmem_mmap_single(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t 
> size,
>   struct vm_object **obj, int nprot)
>   {
> + uint64_t buffer_index;
> +
>   /*
>* The buffer index is encoded in the offset.  Divide the offset by
>*  PAGE_SIZE to get the index of the buffer requested by the user
>*  app.
>*/
> - if ((*offset/PAGE_SIZE) >= contigmem_num_buffers)
> + buffer_index = *offset / PAGE_SIZE;
> + if (buffer_index >= contigmem_num_buffers)
>   return EINVAL;
>   
> - *offset = (vm_ooffset_t)vtophys(contigmem_buffers[*offset/PAGE_SIZE]);
> + memset(contigmem_buffers[buffer_index], 0, contigmem_buffer_size);
> + *offset = (vm_ooffset_t)vtophys(contigmem_buffers[buffer_index]);
>   *obj = vm_pager_allocate(OBJT_DEVICE, cdev, size, nprot, *offset,
>   curthread->td_ucred);
>   
>
>

I think you should include the Fixes line:
Fixes: b78c9175118f ("mem: do not zero out memory on zmalloc")

Other than that:

Acked-by: Sergio Gonzalez Monroy 




[dpdk-dev] [dpdk-users] rte_zmalloc() returning non-zeroed memory on FreeBSD

2016-08-16 Thread Sergio Gonzalez Monroy
On 15/08/2016 18:23, Harris, James R wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
>> Sent: Thursday, August 11, 2016 12:05 AM
>> To: users at dpdk.org; dev at dpdk.org; Gonzalez Monroy, Sergio; Richardson,
>> Bruce
>> Cc: Verkamp, Daniel
>> Subject: Re: [dpdk-dev] [dpdk-users] rte_zmalloc() returning non-zeroed
>> memory on FreeBSD
>>
>> Hi,
>>
>> 2016-08-10 23:30, Verkamp, Daniel:
>>> It seems that with DPDK 16.07, rte_zmalloc() and related functions no
>>> longer return zeroed memory reliably on FreeBSD.
>>>
>>> I notice that commit b78c9175118f7d61022ddc5c62ce54a1bd73cea5 ("mem:
>> do
>>> not zero out memory on zmalloc") removed the explicit memset() that
>> used
>>> to ensure the buffer was zeroed; its log message says:
>>>
>>> "Zeroing out memory on rte_zmalloc_socket is not required anymore since
>>> all allocated memory is already zeroed."
>> On Linux, the memory is zeroed by the kernel.
>> Then the zero value is maintained in the rte_malloc pool by rte_free.
>>
>>> However, I don't see how this is guaranteed (at least for FreeBSD), and
>>> it is not true in practice.  I've attached a minimized reproducer program -
>>> running it twice in a row fails reliably for me.
>>>
>>> Is there a missing step in FreeBSD, or is it a more general problem for
>>> other platforms?
>> I guess the initial value from the kernel has been verified only on Linux.
>> We could re-add a memset for FreeBSD.
> The problem is that the FreeBSD contigmem driver does not re-zero the huge
> pages each time they are mmap'd - they are only zeroed when contigmem
> initially loads.  I will push a patch for this shortly.

So that is the case where we run the app more than once, right?
I missed that, I only ran it once.

Sergio


[dpdk-dev] [PATCH v2] eal: fix check number of bytes from read function

2016-07-22 Thread Sergio Gonzalez Monroy
On 22/07/2016 16:24, Thomas Monjalon wrote:
> 2016-07-22 16:33, Michal Jastrzebski:
>> v2:
>> -moved close(fd) just after read.
>> -when read() from fd we expect 8 bytes, so PFN_MASK_SIZE macro
>> was introduced instead sizeof(uint64_t).
>> -removed errno print when read returns less than 8 bytes
> Looks better.
> Note: this changelog should be below --- to avoid appearing in
> the commit.
>
>> In rte_mem_virt2phy: Value returned from a function and indicating the
>> number of bytes was ignored. This could cause a wrong pfn (page frame
>> number) mask read from pagemap file.
>> When read returns less than the number of sizeof(uint64_t) bytes,
>> function rte_mem_virt2phy returns error.
> Better title to explain the issue:
> mem: fix check of physical address retrieval
>
>> +#define PFN_MASK_SIZE   8
>> +
>>   #ifdef RTE_LIBRTE_XEN_DOM0
>>   int rte_xen_dom0_supported(void)
>>   {
>> @@ -158,7 +160,7 @@ rte_mem_lock_page(const void *virt)
>>   phys_addr_t
>>   rte_mem_virt2phy(const void *virtaddr)
>>   {
>> -int fd;
>> +int fd, retval;
>>  uint64_t page, physaddr;
>>  unsigned long virt_pfn;
>>  int page_size;
>> @@ -209,10 +211,19 @@ rte_mem_virt2phy(const void *virtaddr)
>>  close(fd);
>>  return RTE_BAD_PHYS_ADDR;
>>  }
>> -if (read(fd, , sizeof(uint64_t)) < 0) {
>> +
>> +retval = read(fd, , sizeof(uint64_t));
> We could use PFN_MASK_SIZE here
>
>> +close(fd);
>> +if (retval < 0) {
>>  RTE_LOG(ERR, EAL, "%s(): cannot read /proc/self/pagemap: %s\n",
>>  __func__, strerror(errno));
>> -close(fd);
>> +
> useless whitespace
>
>> +return RTE_BAD_PHYS_ADDR;
>> +} else if (retval != PFN_MASK_SIZE) {
>> +RTE_LOG(ERR, EAL, "%s(): read %d bytes from /proc/self/pagemap "
>> +"but expected %d:\n",
>> +__func__, retval, PFN_MASK_SIZE);
>> +
> useless whitespace
>
>>  return RTE_BAD_PHYS_ADDR;
>>  }
>>   
>> @@ -222,7 +233,7 @@ rte_mem_virt2phy(const void *virtaddr)
>>   */
>>  physaddr = ((page & 0x7fULL) * page_size)
>>  + ((unsigned long)virtaddr % page_size);
>> -close(fd);
>> +
>>  return physaddr;
>>   }
> If you and Sergio agree, I can make the minor changes before applying.

Go for it!

Sergio


[dpdk-dev] [PATCH 2/2] mempool: fix unsafe tailq element removal

2016-07-22 Thread Sergio Gonzalez Monroy
Potentially user provided function could remove/free tailq elements.
Doing so within a TAILQ_FOREACH loop is not safe.

Use _SAFE versions of _FOREACH macros.

Signed-off-by: Sergio Gonzalez Monroy 
---
 lib/librte_mempool/rte_mempool.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 8806633..394154a 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -157,10 +157,10 @@ rte_mempool_obj_iter(struct rte_mempool *mp,
rte_mempool_obj_cb_t *obj_cb, void *obj_cb_arg)
 {
struct rte_mempool_objhdr *hdr;
-   void *obj;
+   void *obj, *temp;
unsigned n = 0;

-   STAILQ_FOREACH(hdr, >elt_list, next) {
+   STAILQ_FOREACH_SAFE(hdr, >elt_list, next, temp) {
obj = (char *)hdr + sizeof(*hdr);
obj_cb(mp, obj_cb_arg, obj, n);
n++;
@@ -176,8 +176,9 @@ rte_mempool_mem_iter(struct rte_mempool *mp,
 {
struct rte_mempool_memhdr *hdr;
unsigned n = 0;
+   void *temp;

-   STAILQ_FOREACH(hdr, >mem_list, next) {
+   STAILQ_FOREACH_SAFE(hdr, >mem_list, next, temp) {
mem_cb(mp, mem_cb_arg, hdr, n);
n++;
}
@@ -1283,12 +1284,13 @@ void rte_mempool_walk(void (*func)(struct rte_mempool 
*, void *),
 {
struct rte_tailq_entry *te = NULL;
struct rte_mempool_list *mempool_list;
+   void *temp;

mempool_list = RTE_TAILQ_CAST(rte_mempool_tailq.head, rte_mempool_list);

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(te, mempool_list, next) {
+   TAILQ_FOREACH_SAFE(te, mempool_list, next, temp) {
(*func)((struct rte_mempool *) te->data, arg);
}

-- 
2.4.11



[dpdk-dev] [PATCH 1/2] eal: add stailq safe iterator macro

2016-07-22 Thread Sergio Gonzalez Monroy
Removing/freeing elements elements within a STAILQ_FOREACH loop
is not safe. FreeBSD defines STAILQ_FOREACH_SAFE macro, which permits
these operations safely.

This patch defines this macro for Linux systems, where it is not defined.

Signed-off-by: Sergio Gonzalez Monroy 
---

NOTE: This patch is based on top of:
http://dpdk.org/dev/patchwork/patch/14995/

 lib/librte_eal/common/include/rte_tailq.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_tailq.h 
b/lib/librte_eal/common/include/rte_tailq.h
index cc3c0f1..bba2835 100644
--- a/lib/librte_eal/common/include/rte_tailq.h
+++ b/lib/librte_eal/common/include/rte_tailq.h
@@ -163,6 +163,13 @@ void __attribute__((constructor, used)) tailqinitfn_ 
##t(void) \
(var) = (tvar))
 #endif

+#ifndef SLIST_FOREACH_SAFE
+#define SLIST_FOREACH_SAFE(var, head, field, tvar)  \
+   for ((var) = SLIST_FIRST((head));   \
+   (var) && ((tvar) = SLIST_NEXT((var), field), 1);\
+   (var) = (tvar))
+#endif
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.4.11



[dpdk-dev] [PATCH] eal: fix check number of bytes from read function

2016-07-21 Thread Sergio Gonzalez Monroy
On 20/07/2016 15:24, Michal Jastrzebski wrote:
> In rte_mem_virt2phy: Value returned from a function and indicating the
> number of bytes was ignored. This could cause a wrong pfn (page frame
> number) mask read from pagemap file.
> When read returns less than the number of sizeof(uint64_t) bytes,
> function rte_mem_virt2phy returns error.
>
> Coverity issue: 13212
> Fixes: 40b966a211ab ("ivshmem: library changes for mmaping using
> ivshmem").
>
> Signed-off-by: Michal Jastrzebski 
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c |   12 ++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 42a29fa..05769fb 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -158,7 +158,7 @@ rte_mem_lock_page(const void *virt)
>   phys_addr_t
>   rte_mem_virt2phy(const void *virtaddr)
>   {
> - int fd;
> + int fd, retval;
>   uint64_t page, physaddr;
>   unsigned long virt_pfn;
>   int page_size;
> @@ -209,11 +209,19 @@ rte_mem_virt2phy(const void *virtaddr)
>   close(fd);
>   return RTE_BAD_PHYS_ADDR;
>   }
> - if (read(fd, , sizeof(uint64_t)) < 0) {
> +
> + retval = read(fd, , sizeof(uint64_t));
> + if (retval < 0) {
>   RTE_LOG(ERR, EAL, "%s(): cannot read /proc/self/pagemap: %s\n",
>   __func__, strerror(errno));
>   close(fd);
>   return RTE_BAD_PHYS_ADDR;
> + }   else if (retval >= 0 && retval < (int)sizeof(uint64_t)) {

Just a couple of nits, retval >= 0 it's already implicit, no need to do 
that check.

> + RTE_LOG(ERR, EAL, "%s(): read %d bytes from /proc/self/pagemap "
> + "but expected %d: %s\n",
> + __func__, retval, (int)sizeof(uint64_t), 
> strerror(errno));
> + close(fd);

Another nit, we could just close(fd) right after read, regardless of 
read being success or error as
we close(fd) also on success just before exiting the function.

Other than that:

Acked-by: Sergio Gonzalez Monroy 

> + return RTE_BAD_PHYS_ADDR;
>   }
>   
>   /*



[dpdk-dev] [PATCH v3] mk: fix FreeBSD build

2016-07-19 Thread Sergio Gonzalez Monroy
The sed syntax of '0,/regexp/' is GNU specific and fails with
non GNU sed in FreeBSD.

To solve the issue we can use awk instead to remove duplicates.

The awk script basically keeps the last config value, while
maintaining order and comments from original config file.

Fixes: b2063f104db7 ("mk: filter duplicate configuration entries")

Signed-off-by: Sergio Gonzalez Monroy 
---
 mk/rte.sdkconfig.mk | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/mk/rte.sdkconfig.mk b/mk/rte.sdkconfig.mk
index e93237f..5c94edf 100644
--- a/mk/rte.sdkconfig.mk
+++ b/mk/rte.sdkconfig.mk
@@ -88,11 +88,13 @@ $(RTE_OUTPUT)/.config: $(RTE_CONFIG_TEMPLATE) FORCE | 
$(RTE_OUTPUT)
$(CPP) -undef -P -x assembler-with-cpp \
-ffreestanding \
-o $(RTE_OUTPUT)/.config_tmp $(RTE_CONFIG_TEMPLATE) ; \
-   for config in $$(grep -v "^#" $(RTE_OUTPUT)/.config_tmp | cut 
-d"=" -f1 | sort | uniq -d); do \
-   while [ $$(grep "^$${config}=" 
$(RTE_OUTPUT)/.config_tmp -c ) -gt 1 ]; do \
-   sed -i "0,/^$${config}=/{//d}" 
$(RTE_OUTPUT)/.config_tmp; \
-   done; \
-   done; \
+   config=$$(cat $(RTE_OUTPUT)/.config_tmp) ; \
+   echo "$$config" | awk -F '=' 'BEGIN {i=1} \
+   /^#/ {pos[i++]=$$0} \
+   !/^#/ {if (!s[$$1]) {pos[i]=$$0; s[$$1]=i++} \
+   else {pos[s[$$1]]=$$0}} END \
+   {for (j=1; j<i; j++) print pos[j]}' \
+   > $(RTE_OUTPUT)/.config_tmp ; \
if ! cmp -s $(RTE_OUTPUT)/.config_tmp $(RTE_OUTPUT)/.config; 
then \
cp $(RTE_OUTPUT)/.config_tmp $(RTE_OUTPUT)/.config ; \
cp $(RTE_OUTPUT)/.config_tmp $(RTE_OUTPUT)/.config.orig 
; \
-- 
2.4.11



[dpdk-dev] [PATCH v2] mk: fix FreeBSD build

2016-07-19 Thread Sergio Gonzalez Monroy
Sorry Christian, I did miss your previous patches.

On 19/07/2016 12:01, Christian Ehrhardt wrote:
> Hi,
> I haven't tested the new suggested way, just went into explaining what 
> formerly were the reasons.
> But I'd strongly vote against reordering and dropping comments.
>
Fair enough.
We would  still have some not-order options if they are duplicated as we 
would just keep the last one but nevertheless most of them would be 
properly grouped and ordered.

I'll rework something based on your v3 patch.

Sergio

> Sergio - v3 had still awk for some parts.
> It doesn't have the "0,..." you mentioned.
> Could you check if that is already using GNU-sed only syntax - 
> http://dpdk.org/dev/patchwork/patch/14592/ ?
>
> If this would be ok - AND - if it creates the same .config as the 
> current code I'd think that is the way to go.
>
>
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
>
> On Tue, Jul 19, 2016 at 12:32 PM, Ferruh Yigit  <mailto:ferruh.yigit at intel.com>> wrote:
>
> On 7/18/2016 5:06 PM, Sergio Gonzalez Monroy wrote:
> > The sed syntax of '0,/regexp/' is GNU specific and fails with
> > non GNU sed in FreeBSD.
> >
> > To solve the issue we can use awk instead to remove duplicates.
> >
> > Fixes: b2063f104db7 ("mk: filter duplicate configuration entries")
> >
> > Signed-off-by: Sergio Gonzalez Monroy
>  <mailto:sergio.gonzalez.monroy at intel.com>>
> > ---
> >
> > v2:
> > - Use temp var instead of temp file
> >
> >  mk/rte.sdkconfig.mk <http://rte.sdkconfig.mk> | 7 ++-
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/mk/rte.sdkconfig.mk <http://rte.sdkconfig.mk>
> b/mk/rte.sdkconfig.mk <http://rte.sdkconfig.mk>
> > index e93237f..c2b6e13 100644
> > --- a/mk/rte.sdkconfig.mk <http://rte.sdkconfig.mk>
> > +++ b/mk/rte.sdkconfig.mk <http://rte.sdkconfig.mk>
> > @@ -88,11 +88,8 @@ $(RTE_OUTPUT)/.config: $(RTE_CONFIG_TEMPLATE)
> FORCE | $(RTE_OUTPUT)
> >   $(CPP) -undef -P -x assembler-with-cpp \
> >   -ffreestanding \
> >   -o $(RTE_OUTPUT)/.config_tmp
> $(RTE_CONFIG_TEMPLATE) ; \
> > - for config in $$(grep -v "^#"
> $(RTE_OUTPUT)/.config_tmp | cut -d"=" -f1 | sort | uniq -d); do \
> > - while [ $$(grep "^$${config}="
> $(RTE_OUTPUT)/.config_tmp -c ) -gt 1 ]; do \
> > - sed -i "0,/^$${config}=/{//d}"
> $(RTE_OUTPUT)/.config_tmp; \
> > - done; \
> > - done; \
> > + config=$$(grep -v "^#" $(RTE_OUTPUT)/.config_tmp) ; \
> > + echo "$$config" | awk -F'=' '{a[$$1]=$$0} END {for
> (i in a) print a[i]}' > $(RTE_OUTPUT)/.config_tmp ; \
> This is another nice awk command.
>
> A few comments about new command:
> - Removes all comments from final config
> - Spreads config option all over the file, logical grouping of options
> removed.
>
> When both happens at the same time, I have a concern that this may
> lead
> missing some config options when somebody wants to update local config
> file, but I am OK if everybody is OK.
>
>
> >   if ! cmp -s $(RTE_OUTPUT)/.config_tmp
> $(RTE_OUTPUT)/.config; then \
> >   cp $(RTE_OUTPUT)/.config_tmp
> $(RTE_OUTPUT)/.config ; \
> >   cp $(RTE_OUTPUT)/.config_tmp
> $(RTE_OUTPUT)/.config.orig ; \
> >
>
>



[dpdk-dev] [PATCH] examples/ipsec-secgw: fix GCC 4.5.x build error

2016-07-19 Thread Sergio Gonzalez Monroy
GCC 4.5.x does not handle well initializing anonymous union and/or
structs.

To make the compiler happy we name those anonymous union/struct.

Fixes: 906257e965b7 ("examples/ipsec-secgw: support IPv6")

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/ipip.h  |  4 +--
 examples/ipsec-secgw/ipsec.h |  4 +--
 examples/ipsec-secgw/sa.c| 60 ++--
 3 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/examples/ipsec-secgw/ipip.h b/examples/ipsec-secgw/ipip.h
index ce25a2e..ff1dccd 100644
--- a/examples/ipsec-secgw/ipip.h
+++ b/examples/ipsec-secgw/ipip.h
@@ -100,8 +100,8 @@ ipip_outbound(struct rte_mbuf *m, uint32_t offset, uint32_t 
is_ipv6,
outip4->ip_ttl = IPDEFTTL;
outip4->ip_p = IPPROTO_ESP;

-   outip4->ip_src.s_addr = src->ip4;
-   outip4->ip_dst.s_addr = dst->ip4;
+   outip4->ip_src.s_addr = src->ip.ip4;
+   outip4->ip_dst.s_addr = dst->ip.ip4;

return outip4;
 }
diff --git a/examples/ipsec-secgw/ipsec.h b/examples/ipsec-secgw/ipsec.h
index 0d2ee25..a442a74 100644
--- a/examples/ipsec-secgw/ipsec.h
+++ b/examples/ipsec-secgw/ipsec.h
@@ -86,8 +86,8 @@ struct ip_addr {
union {
uint64_t ip6[2];
uint8_t ip6_b[16];
-   };
-   };
+   } ip6;
+   } ip;
 };

 struct ipsec_sa {
diff --git a/examples/ipsec-secgw/sa.c b/examples/ipsec-secgw/sa.c
index ab18b81..4439e0f 100644
--- a/examples/ipsec-secgw/sa.c
+++ b/examples/ipsec-secgw/sa.c
@@ -53,8 +53,8 @@
 const struct ipsec_sa sa_out[] = {
{
.spi = 5,
-   .src.ip4 = IPv4(172, 16, 1, 5),
-   .dst.ip4 = IPv4(172, 16, 2, 5),
+   .src.ip.ip4 = IPv4(172, 16, 1, 5),
+   .dst.ip.ip4 = IPv4(172, 16, 2, 5),
.cipher_algo = RTE_CRYPTO_CIPHER_AES_CBC,
.auth_algo = RTE_CRYPTO_AUTH_SHA1_HMAC,
.digest_len = 12,
@@ -64,8 +64,8 @@ const struct ipsec_sa sa_out[] = {
},
{
.spi = 6,
-   .src.ip4 = IPv4(172, 16, 1, 6),
-   .dst.ip4 = IPv4(172, 16, 2, 6),
+   .src.ip.ip4 = IPv4(172, 16, 1, 6),
+   .dst.ip.ip4 = IPv4(172, 16, 2, 6),
.cipher_algo = RTE_CRYPTO_CIPHER_AES_CBC,
.auth_algo = RTE_CRYPTO_AUTH_SHA1_HMAC,
.digest_len = 12,
@@ -93,8 +93,8 @@ const struct ipsec_sa sa_out[] = {
},
{
.spi = 15,
-   .src.ip4 = IPv4(172, 16, 1, 5),
-   .dst.ip4 = IPv4(172, 16, 2, 5),
+   .src.ip.ip4 = IPv4(172, 16, 1, 5),
+   .dst.ip.ip4 = IPv4(172, 16, 2, 5),
.cipher_algo = RTE_CRYPTO_CIPHER_NULL,
.auth_algo = RTE_CRYPTO_AUTH_NULL,
.digest_len = 0,
@@ -104,8 +104,8 @@ const struct ipsec_sa sa_out[] = {
},
{
.spi = 16,
-   .src.ip4 = IPv4(172, 16, 1, 6),
-   .dst.ip4 = IPv4(172, 16, 2, 6),
+   .src.ip.ip4 = IPv4(172, 16, 1, 6),
+   .dst.ip.ip4 = IPv4(172, 16, 2, 6),
.cipher_algo = RTE_CRYPTO_CIPHER_NULL,
.auth_algo = RTE_CRYPTO_AUTH_NULL,
.digest_len = 0,
@@ -115,9 +115,9 @@ const struct ipsec_sa sa_out[] = {
},
{
.spi = 25,
-   .src.ip6_b = { 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11,
+   .src.ip.ip6.ip6_b = { 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11,
0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x55, 0x55 },
-   .dst.ip6_b = { 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22,
+   .dst.ip.ip6.ip6_b = { 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22,
0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x55, 0x55 },
.cipher_algo = RTE_CRYPTO_CIPHER_AES_CBC,
.auth_algo = RTE_CRYPTO_AUTH_SHA1_HMAC,
@@ -128,9 +128,9 @@ const struct ipsec_sa sa_out[] = {
},
{
.spi = 26,
-   .src.ip6_b = { 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11,
+   .src.ip.ip6.ip6_b = { 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11,
0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x66, 0x66 },
-   .dst.ip6_b = { 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22,
+   .dst.ip.ip6.ip6_b = { 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22,
0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x66, 0x66 },
.cipher_algo = RTE_CRYPTO_CIPHER_AES_CBC,
.auth_algo = RTE_CRYPTO_AUTH_SHA1_HMAC,
@@ -145,8 +145,8 @@ const struct ipsec_sa sa_out[] = {
 const struct ipsec_sa sa_in[] = {
{
.spi = 105,
-   .src.ip4 = IPv4(172, 16, 2, 5),
-   .dst.ip4 = IPv4(172, 16, 1, 5),
+   .src.ip.ip4 = IPv4(172, 16, 2, 5),
+   .dst.ip.ip4 = IPv4(172, 16, 1, 5),
.cipher_algo = RTE_CRYPTO_CIPHER_AES_CBC,
.auth_algo = RTE_CRYPTO_AUTH_SHA1_HMAC,
.digest_len = 12,
@@ -156,8 +156,8 @@ const struct ipsec_sa sa_in[] = {
},
{
.spi = 106,
-   .src.ip4 = IPv4(172, 16, 2, 6),
-   .dst.ip4 = IPv4(172, 16, 1, 6),
+   .src.ip.ip4 = IPv4(172, 16, 2, 6),
+   .dst.ip.ip4 = I

[dpdk-dev] [PATCH v3 1/2] examples/ipsec-secgw: add configuration file support

2016-07-19 Thread Sergio Gonzalez Monroy
On 12/07/2016 10:44, Fan Zhang wrote:
> This patch adds the configuration file support to ipsec_secgw
> sample application. Instead of hard-coded rules, the users can
> specify their own SP, SA, and routing rules in the configuration
> file. An command line option "-f" is added to pass the
> configuration file location to the application.
>
> Configuration item formats:
>
> SP rule format:
> sp   esp \
>   
>
> SA rule format:
> sa   

I think we should be able to set the key also on config file for both 
cipher and auth.
Then we can check that the key size is the expected by the chosen 
cipher/auth algo.

I think we should also create and set the xforms dynamically instead of 
static as
they currently are (sa_add_rules function).

Sergio

> Routing rule format:
> rt
>
> Signed-off-by: Fan Zhang 
> ---


[dpdk-dev] [PATCH v2] mk: fix FreeBSD build

2016-07-18 Thread Sergio Gonzalez Monroy
The sed syntax of '0,/regexp/' is GNU specific and fails with
non GNU sed in FreeBSD.

To solve the issue we can use awk instead to remove duplicates.

Fixes: b2063f104db7 ("mk: filter duplicate configuration entries")

Signed-off-by: Sergio Gonzalez Monroy 
---

v2:
- Use temp var instead of temp file

 mk/rte.sdkconfig.mk | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mk/rte.sdkconfig.mk b/mk/rte.sdkconfig.mk
index e93237f..c2b6e13 100644
--- a/mk/rte.sdkconfig.mk
+++ b/mk/rte.sdkconfig.mk
@@ -88,11 +88,8 @@ $(RTE_OUTPUT)/.config: $(RTE_CONFIG_TEMPLATE) FORCE | 
$(RTE_OUTPUT)
$(CPP) -undef -P -x assembler-with-cpp \
-ffreestanding \
-o $(RTE_OUTPUT)/.config_tmp $(RTE_CONFIG_TEMPLATE) ; \
-   for config in $$(grep -v "^#" $(RTE_OUTPUT)/.config_tmp | cut 
-d"=" -f1 | sort | uniq -d); do \
-   while [ $$(grep "^$${config}=" 
$(RTE_OUTPUT)/.config_tmp -c ) -gt 1 ]; do \
-   sed -i "0,/^$${config}=/{//d}" 
$(RTE_OUTPUT)/.config_tmp; \
-   done; \
-   done; \
+   config=$$(grep -v "^#" $(RTE_OUTPUT)/.config_tmp) ; \
+   echo "$$config" | awk -F'=' '{a[$$1]=$$0} END {for (i in a) 
print a[i]}' > $(RTE_OUTPUT)/.config_tmp ; \
if ! cmp -s $(RTE_OUTPUT)/.config_tmp $(RTE_OUTPUT)/.config; 
then \
cp $(RTE_OUTPUT)/.config_tmp $(RTE_OUTPUT)/.config ; \
cp $(RTE_OUTPUT)/.config_tmp $(RTE_OUTPUT)/.config.orig 
; \
-- 
2.4.11



[dpdk-dev] [PATCH] mk: fix FreeBSD build

2016-07-18 Thread Sergio Gonzalez Monroy
On 18/07/2016 15:07, Thomas Monjalon wrote:
> 2016-07-18 15:54, Christian Ehrhardt:
>> Hi Sergio,
>> you might have seen that I had a similar version with awk in v2 IIRC. I
>> also had the secondary tmp file just like you now.
>> So, since it is so close to my old submission I wont object :-)
>>
>> Back then the discussion went for reduced build time dependencies and
>> avoiding a second temp file, which was ok for me - so sed was chosen.
> I don't see "awk" as a real dependency. I think it is as much common
> as "sed". Isn't it?

I would agree.

So v2 using temp var instead?

Sergio


[dpdk-dev] ip_chksum not updated in ipsec-secgw application

2016-07-18 Thread Sergio Gonzalez Monroy
On 18/07/2016 15:09, Thomas Monjalon wrote:
> 2016-07-18 14:57, Sergio Gonzalez Monroy:
>> On 18/07/2016 14:53, Akhil Goyal wrote:
>>> On 7/18/2016 6:50 PM, Thomas Monjalon wrote:
>>>> 2016-07-18 13:57, Sergio Gonzalez Monroy:
>>>>> On 18/07/2016 13:41, Akhil Goyal wrote:
>>>>>> In Ipsec-secgw application, while adding the outer IP header,
>>>>>> it seems that the application does not update the checksum value
>>>>>> for outbound packets. This result in incorrect ip->checksum in
>>>>>> the encrypted packet.
>>>> [...]
>>>>> It is intentional. The application is using IP checksum offload
>>>> The correct behaviour is to have a software fallback (using rte_ip.h)
>>>> for drivers which do not support checksum offload.
>>>> But given it is just an example, it is normal to have this kind of
>>>> constraint. However I think it should be explained in its doc.
>>>> And a list of tested NICs would be nice to have.
>>>>
>>> Agreed. The driver that I was using did not enable checksum offload.
>>> It is good to have a fallback option.
>> That's a good point.
>> So would it be enough to call out in the sample app guide that we use IP
>> checksum offload and
>> show a warning in case the Driver does not support such offload?
> Yes
> and a list of tested NICs would make it perfect :)

There is no mention of specific tested hardware in the example guides, 
is there?

I would prefer to just point to doc/guides/nics/overview.rst to check if 
the NIC supports
IP checksum offload and in the application itself check for such 
capability and display a
warning in case it is not supported.

Sergio


[dpdk-dev] [PATCH] mk: fix FreeBSD build

2016-07-18 Thread Sergio Gonzalez Monroy
On 18/07/2016 14:54, Christian Ehrhardt wrote:
> Hi Sergio,
> you might have seen that I had a similar version with awk in v2 IIRC. 
> I also had the secondary tmp file just like you now.
> So, since it is so close to my old submission I wont object :-)
>
> Back then the discussion went for reduced build time dependencies and 
> avoiding a second temp file, which was ok for me - so sed was chosen.
>

I haven't seen a noticeable performance impact by using second temp file.
Thomas has suggested using a temp var instead of second temp file, what 
do you think about that?

> I see that breaking on BSD causes us to rework it again, sorry that I 
> was unable to test there.
>

No worries.

> If you could come up with a Solution "sed + no-temp2 + noGNUspecifics" 
> that would be great and solve everybodies needs.
> If not, it is a call up to the participants of the old discussion if 
> not working on BSD outweighs their old feedback (I guess so).
>

Any reason of sed over awk? I reckon awk is simpler syntax for this job.
 From what I have seen most of the time is spent on resolving 
directory/library dependencies not on the .config itself.

Sergio

> Most active in the discussion back then was Ferruh IIRC - setting to CC.
>
>
>
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
>
> On Mon, Jul 18, 2016 at 3:39 PM, Sergio Gonzalez Monroy 
>  <mailto:sergio.gonzalez.monroy at intel.com>> wrote:
>
> On 18/07/2016 14:25, Thomas Monjalon wrote:
>
> 2016-07-18 14:11, Sergio Gonzalez Monroy:
>
> The sed syntax of '0,/regexp/' is GNU specific and fails with
> non GNU sed in FreeBSD.
>
> To solve the issue we can use awk instead to remove
> duplicates.
>
> Christian, an opinion please?
>
>
> Sorry, forgot to CC him.
>
> Fixes: b2063f104db7 ("mk: filter duplicate configuration
> entries")
>
> Signed-off-by: Sergio Gonzalez Monroy
>  <mailto:sergio.gonzalez.monroy at intel.com>>
>
> [...]
>
> -   for config in $$(grep -v "^#"
> $(RTE_OUTPUT)/.config_tmp | cut -d"=" -f1 | sort | uniq
> -d); do \
> -   while [ $$(grep "^$${config}="
> $(RTE_OUTPUT)/.config_tmp -c ) -gt 1 ]; do \
> -   sed -i
> "0,/^$${config}=/{//d}" $(RTE_OUTPUT)/.config_tmp; \
> -   done; \
> -   done; \
> +   grep -v "^#" $(RTE_OUTPUT)/.config_tmp |
> awk -F'=' '{a[$$1]=$$0} END {for (i in a) print a[i]}' >
> $(RTE_OUTPUT)/.config_tmp2 ; \
> +   mv $(RTE_OUTPUT)/.config_tmp2
> $(RTE_OUTPUT)/.config_tmp ; \
> +   rm -f $(RTE_OUTPUT)/.config_tmp2 ; \
>
> You can avoid creating/deleting the file .config_tmp2 by using
> a variable:
> config=$(grep -v '^#' $(RTE_OUTPUT)/.config_tmp)
> echo "$config" | awk ... > $(RTE_OUTPUT)/.config_tmp
>
>
> Sure.
>
> Sergio
>
>



[dpdk-dev] ip_chksum not updated in ipsec-secgw application

2016-07-18 Thread Sergio Gonzalez Monroy
On 18/07/2016 14:49, Akhil Goyal wrote:
> On 7/18/2016 6:27 PM, Sergio Gonzalez Monroy wrote:
>> Hi,
>>
>> On 18/07/2016 13:41, Akhil Goyal wrote:
>>> Hi,
>>>
>>> In Ipsec-secgw application, while adding the outer IP header, it seems
>>> that the application does not update the checksum value for outbound
>>> packets. This result in incorrect ip->checksum in the encrypted packet.
>>>
>>> Please let me know if the checksum value is updated somewhere else or
>>> not.
>>>
>>> Also In case of inner ip header also the TTL value is decremented by
>>> one but the checksum value is not updated. Is it intentional or it is
>>> done somewhere else?
>>
>> It is intentional. The application is using IP checksum offload but just
>> looking now at the code there is a bug for IPv6 packets where the flag
>> does not get setup.
>> Is it only for IPv6 traffic that you are having this issue?
>>

Duh! moment here.
No checksum for IPv6, that's why the flag is not setup so the code is 
correct
as it is, it just needs IPv4 checksum offload support.

Sergio

>> For IPv4 traffic the PKT_TX_IP_CKSUM flag is setup in 'prepare_tx_pkt'
>> function in ipsec-secgw.c
>>
>> Sergio
>>
>
> Thanks Sergio, got your point. I missed the flag. I was using it for 
> IPv4.
>
> Akhil
>
>



[dpdk-dev] ip_chksum not updated in ipsec-secgw application

2016-07-18 Thread Sergio Gonzalez Monroy
On 18/07/2016 14:53, Akhil Goyal wrote:
> On 7/18/2016 6:50 PM, Thomas Monjalon wrote:
>> 2016-07-18 13:57, Sergio Gonzalez Monroy:
>>> On 18/07/2016 13:41, Akhil Goyal wrote:
>>>> In Ipsec-secgw application, while adding the outer IP header,
>>>> it seems that the application does not update the checksum value
>>>> for outbound packets. This result in incorrect ip->checksum in
>>>> the encrypted packet.
>> [...]
>>>
>>> It is intentional. The application is using IP checksum offload
>>
>> The correct behaviour is to have a software fallback (using rte_ip.h)
>> for drivers which do not support checksum offload.
>> But given it is just an example, it is normal to have this kind of
>> constraint. However I think it should be explained in its doc.
>> And a list of tested NICs would be nice to have.
>>
> Agreed. The driver that I was using did not enable checksum offload. 
> It is good to have a fallback option.
>

That's a good point.
So would it be enough to call out in the sample app guide that we use IP 
checksum offload and
show a warning in case the Driver does not support such offload?

Sergio




[dpdk-dev] [PATCH] mk: fix FreeBSD build

2016-07-18 Thread Sergio Gonzalez Monroy
On 18/07/2016 14:25, Thomas Monjalon wrote:
> 2016-07-18 14:11, Sergio Gonzalez Monroy:
>> The sed syntax of '0,/regexp/' is GNU specific and fails with
>> non GNU sed in FreeBSD.
>>
>> To solve the issue we can use awk instead to remove duplicates.
> Christian, an opinion please?

Sorry, forgot to CC him.

>> Fixes: b2063f104db7 ("mk: filter duplicate configuration entries")
>>
>> Signed-off-by: Sergio Gonzalez Monroy 
> [...]
>> -for config in $$(grep -v "^#" $(RTE_OUTPUT)/.config_tmp | cut 
>> -d"=" -f1 | sort | uniq -d); do \
>> -while [ $$(grep "^$${config}=" 
>> $(RTE_OUTPUT)/.config_tmp -c ) -gt 1 ]; do \
>> -sed -i "0,/^$${config}=/{//d}" 
>> $(RTE_OUTPUT)/.config_tmp; \
>> -done; \
>> -done; \
>> +grep -v "^#" $(RTE_OUTPUT)/.config_tmp | awk -F'=' 
>> '{a[$$1]=$$0} END {for (i in a) print a[i]}' > $(RTE_OUTPUT)/.config_tmp2 ; \
>> +mv $(RTE_OUTPUT)/.config_tmp2 $(RTE_OUTPUT)/.config_tmp ; \
>> +rm -f $(RTE_OUTPUT)/.config_tmp2 ; \
> You can avoid creating/deleting the file .config_tmp2 by using a variable:
>   config=$(grep -v '^#' $(RTE_OUTPUT)/.config_tmp)
>   echo "$config" | awk ... > $(RTE_OUTPUT)/.config_tmp

Sure.

Sergio


[dpdk-dev] mutli process C/S model example init failed on xen dom0 with dpdk-16.07 rc2 package

2016-07-18 Thread Sergio Gonzalez Monroy
On 18/07/2016 12:49, Olivier Matz wrote:
> Hi Sergio,
>
> On 07/18/2016 01:33 PM, Sergio Gonzalez Monroy wrote:
>> On 12/07/2016 12:30, Olivier MATZ wrote:
>>> On 07/12/2016 11:22 AM, Xu, HuilongX wrote:
>>>> /examples/multi_process/client_server_mp/mp_server/mp_server/x86_64-native-linuxapp-gcc/mp_server
>>>>
>>>> -c f -n 4 --xen-dom0 -- -p 0x3 -n 2
>>>> EAL: Detected 72 lcore(s)
>>>> EAL: Probing VFIO support...
>>>> PMD: bnxt_rte_pmd_init() called for (null)
>>>> EAL: PCI device :01:00.0 on NUMA socket 0
>>>> EAL: probe driver: 8086:1521 rte_igb_pmd
>>>> EAL: PCI device :01:00.1 on NUMA socket 0
>>>> EAL: probe driver: 8086:1521 rte_igb_pmd
>>>> EAL: PCI device :04:00.0 on NUMA socket 0
>>>> EAL: probe driver: 8086:10fb rte_ixgbe_pmd
>>>> EAL: PCI device :04:00.1 on NUMA socket 0
>>>> EAL: probe driver: 8086:10fb rte_ixgbe_pmd
>>>> Creating mbuf pool 'MProc_pktmbuf_pool' [6144 mbufs] ...
>>>> Port 0 init ... Segmentation fault (core dumped)
>>>>
>>> I reproduced the issue on my platform. In my case, the crash occurs in
>>> rx_queue_setup():
>>>
>>>  /* Free memory prior to re-allocation if needed. */
>>>  if (dev->data->rx_queues[queue_idx] != NULL) {
>>> => em_rx_queue_release(dev->data->rx_queues[queue_idx]);
>>>  dev->data->rx_queues[queue_idx] = NULL;
>>>  }
>>>
>>> I don't this we should go in that area for the first rx queue
>>> initialization. I suspect it could be related to this commit:
>>> http://dpdk.org/browse/dpdk/commit/?id=ea0bddbd14e68f
>>>
>>> I think we cannot expect that memory is initialized at 0 when using
>>> Xen dom0. If I add the following (dirty) patch, I don't see a crash
>>> anymore:
>> I don't have a Xen system available right now, but I'm not sure I follow
>> here.
>> Are you saying that when we allocate pages/hugepages from Xen they are
>> not zeroed?
> I did not check it, but from the tests I've done, I suppose they're not.

If that is the case then I would suggest to zero all memory on EAL init 
(only for Xen) so
all memory is zeroed after init for both Linux and Xen.

What do you think about that?

Regards,
Sergio

>
>>> --- a/lib/librte_eal/common/eal_common_memzone.c
>>> +++ b/lib/librte_eal/common/eal_common_memzone.c
>>> @@ -258,6 +258,8 @@ memzone_reserve_aligned_thread_unsafe(const char
>>> *name, size_t len,
>>>  mz->flags = 0;
>>>  mz->memseg_id = elem->ms -
>>> rte_eal_get_configuration()->mem_config->memseg;
>>>
>>> +   memset(mz->addr, 0, mz->len);
>>> +
>>>  return mz;
>>>   }
>>>
>> The commit you are referring to does not touch the memzone reserve APIs,
>> only changes zmalloc and related APIs.
> I just did a quick test, adding the memset() at the places where I
> thought it could be required. Maybe the patch is a bit overkill and only
> the zmalloc part fixes the issue.
>
>
> Regards,
> Olivier




[dpdk-dev] [PATCH] mk: fix FreeBSD build

2016-07-18 Thread Sergio Gonzalez Monroy
The sed syntax of '0,/regexp/' is GNU specific and fails with
non GNU sed in FreeBSD.

To solve the issue we can use awk instead to remove duplicates.

Fixes: b2063f104db7 ("mk: filter duplicate configuration entries")

Signed-off-by: Sergio Gonzalez Monroy 
---
 mk/rte.sdkconfig.mk | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mk/rte.sdkconfig.mk b/mk/rte.sdkconfig.mk
index e93237f..5c28b7b 100644
--- a/mk/rte.sdkconfig.mk
+++ b/mk/rte.sdkconfig.mk
@@ -88,11 +88,9 @@ $(RTE_OUTPUT)/.config: $(RTE_CONFIG_TEMPLATE) FORCE | 
$(RTE_OUTPUT)
$(CPP) -undef -P -x assembler-with-cpp \
-ffreestanding \
-o $(RTE_OUTPUT)/.config_tmp $(RTE_CONFIG_TEMPLATE) ; \
-   for config in $$(grep -v "^#" $(RTE_OUTPUT)/.config_tmp | cut 
-d"=" -f1 | sort | uniq -d); do \
-   while [ $$(grep "^$${config}=" 
$(RTE_OUTPUT)/.config_tmp -c ) -gt 1 ]; do \
-   sed -i "0,/^$${config}=/{//d}" 
$(RTE_OUTPUT)/.config_tmp; \
-   done; \
-   done; \
+   grep -v "^#" $(RTE_OUTPUT)/.config_tmp | awk -F'=' 
'{a[$$1]=$$0} END {for (i in a) print a[i]}' > $(RTE_OUTPUT)/.config_tmp2 ; \
+   mv $(RTE_OUTPUT)/.config_tmp2 $(RTE_OUTPUT)/.config_tmp ; \
+   rm -f $(RTE_OUTPUT)/.config_tmp2 ; \
if ! cmp -s $(RTE_OUTPUT)/.config_tmp $(RTE_OUTPUT)/.config; 
then \
cp $(RTE_OUTPUT)/.config_tmp $(RTE_OUTPUT)/.config ; \
cp $(RTE_OUTPUT)/.config_tmp $(RTE_OUTPUT)/.config.orig 
; \
-- 
2.4.11



[dpdk-dev] ip_chksum not updated in ipsec-secgw application

2016-07-18 Thread Sergio Gonzalez Monroy
Hi,

On 18/07/2016 13:41, Akhil Goyal wrote:
> Hi,
>
> In Ipsec-secgw application, while adding the outer IP header, it seems that 
> the application does not update the checksum value for outbound packets. This 
> result in incorrect ip->checksum in the encrypted packet.
>
> Please let me know if the checksum value is updated somewhere else or not.
>
> Also In case of inner ip header also the TTL value is decremented by one but 
> the checksum value is not updated. Is it intentional or it is done somewhere 
> else?

It is intentional. The application is using IP checksum offload but just 
looking now at the code there is a bug for IPv6 packets where the flag 
does not get setup.
Is it only for IPv6 traffic that you are having this issue?

For IPv4 traffic the PKT_TX_IP_CKSUM flag is setup in 'prepare_tx_pkt' 
function in ipsec-secgw.c

Sergio

> After addition of following code, the checksum looks good and the encrypted 
> packets are good.
>
> diff --git a/examples/ipsec-secgw/ipip.h b/examples/ipsec-secgw/ipip.h
> index 322076c..0f7b60f 100644
> --- a/examples/ipsec-secgw/ipip.h
> +++ b/examples/ipsec-secgw/ipip.h
> @@ -41,6 +41,24 @@
> #include 
>
> #define IPV6_VERSION (6)
> +static inline uint16_t
> +ip_sum(const unaligned_uint16_t *hdr, int hdr_len)
> +{
> +   uint32_t sum = 0;
> +
> +   while (hdr_len > 1)
> +   {
> +   sum += *hdr++;
> +   if (sum & 0x8000)
> +   sum = (sum & 0x) + (sum >> 16);
> +   hdr_len -= 2;
> +   }
> +
> +   while (sum >> 16)
> +   sum = (sum & 0x) + (sum >> 16);
> +
> +   return ~sum;
> +}
>
> static inline  struct ip *
> ip4ip_outbound(struct rte_mbuf *m, uint32_t offset, uint32_t src, uint32_t 
> dst)
> @@ -71,7 +89,8 @@ ip4ip_outbound(struct rte_mbuf *m, uint32_t offset, 
> uint32_t src, uint32_t dst)
>
>  outip->ip_src.s_addr = src;
>  outip->ip_dst.s_addr = dst;
> -
> +   outip->ip_sum = 0;
> +   outip->ip_sum = ip_sum((const unaligned_uint16_t *)outip, 
> sizeof(struct ip));
>  return outip;
> }
>
> Regards,
> Akhil
>



[dpdk-dev] mutli process C/S model example init failed on xen dom0 with dpdk-16.07 rc2 package

2016-07-18 Thread Sergio Gonzalez Monroy
Hi,

On 12/07/2016 12:30, Olivier MATZ wrote:
> Hi Huilong,
>
>
> On 07/12/2016 11:22 AM, Xu, HuilongX wrote:
>> Hi all,
>>
>> I run mutli procee C/S model example failed on xen dom0. Does anyone
>> give me some suggest how to debug it?
>>
>> Thanks a lot
>>
>> test environment:
>>
>>OS: 3.17.4-301.fc21.x86_64
>>
>> Gcc version: gcc version 4.9.2 20141101 (Red Hat 4.9.2-1) (GCC)
>>
>> Package :dpdk.16.07-rc1.tar.gz
>>
>> Target: x86_64-native-linuxapp-gcc
>>
>> Compile switch: enable CONFIG_RTE_LIBRTE_XEN_DOM0
>>
>> Xen version:4.4.1
>>
>> Test cmdline and result:
>>
>> /examples/multi_process/client_server_mp/mp_server/mp_server/x86_64-native-linuxapp-gcc/mp_server
>>  
>>
>> -c f -n 4 --xen-dom0 -- -p 0x3 -n 2
>> EAL: Detected 72 lcore(s)
>> EAL: Probing VFIO support...
>> PMD: bnxt_rte_pmd_init() called for (null)
>> EAL: PCI device :01:00.0 on NUMA socket 0
>> EAL: probe driver: 8086:1521 rte_igb_pmd
>> EAL: PCI device :01:00.1 on NUMA socket 0
>> EAL: probe driver: 8086:1521 rte_igb_pmd
>> EAL: PCI device :04:00.0 on NUMA socket 0
>> EAL: probe driver: 8086:10fb rte_ixgbe_pmd
>> EAL: PCI device :04:00.1 on NUMA socket 0
>> EAL: probe driver: 8086:10fb rte_ixgbe_pmd
>> Creating mbuf pool 'MProc_pktmbuf_pool' [6144 mbufs] ...
>> Port 0 init ... Segmentation fault (core dumped)
>>
>
> I reproduced the issue on my platform. In my case, the crash occurs in 
> rx_queue_setup():
>
> /* Free memory prior to re-allocation if needed. */
> if (dev->data->rx_queues[queue_idx] != NULL) {
> => em_rx_queue_release(dev->data->rx_queues[queue_idx]);
> dev->data->rx_queues[queue_idx] = NULL;
> }
>
> I don't this we should go in that area for the first rx queue 
> initialization. I suspect it could be related to this commit:
> http://dpdk.org/browse/dpdk/commit/?id=ea0bddbd14e68f
>
> I think we cannot expect that memory is initialized at 0 when using 
> Xen dom0. If I add the following (dirty) patch, I don't see a crash 
> anymore:

I don't have a Xen system available right now, but I'm not sure I follow 
here.
Are you saying that when we allocate pages/hugepages from Xen they are 
not zeroed?

>
> --- a/lib/librte_eal/common/eal_common_memzone.c
> +++ b/lib/librte_eal/common/eal_common_memzone.c
> @@ -258,6 +258,8 @@ memzone_reserve_aligned_thread_unsafe(const char 
> *name, size_t len,
> mz->flags = 0;
> mz->memseg_id = elem->ms - 
> rte_eal_get_configuration()->mem_config->memseg;
>
> +   memset(mz->addr, 0, mz->len);
> +
> return mz;
>  }
>

The commit you are referring to does not touch the memzone reserve APIs, 
only changes zmalloc and related APIs.

> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -123,7 +123,13 @@ rte_malloc(const char *type, size_t size, 
> unsigned align)
>  void *
>  rte_zmalloc_socket(const char *type, size_t size, unsigned align, int 
> socket)
>  {
> -   return rte_malloc_socket(type, size, align, socket);
> +   void *x = rte_malloc_socket(type, size, align, socket);
> +
> +   if (x == NULL)
> +   return NULL;
> +
> +   memset(x, 0, size);
> +   return x;
>  }
>
>  /*
>
>
> Sergio, could you have a look at it?
>
> Regards,
> Olivier




[dpdk-dev] [PATCH] examples/ipsec-secgw: fix inbound segfault

2016-07-12 Thread Sergio Gonzalez Monroy
When sending Inbound non IPSec traffic that matches an Inbound Security
Policy set to Protect, the code will check that the SPI of the packet
and the associated Security Association match.

That check should only be done for IPSec packets and results in SEGFAULT
when done on non IPSec packets.

Fixes: 906257e965b7 ("examples/ipsec-secgw: support IPv6")

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/ipsec-secgw.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/examples/ipsec-secgw/ipsec-secgw.c 
b/examples/ipsec-secgw/ipsec-secgw.c
index f78743d..1ca144b 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -384,7 +384,8 @@ send_single_packet(struct rte_mbuf *m, uint8_t port)
 }

 static inline void
-inbound_sp_sa(struct sp_ctx *sp, struct sa_ctx *sa, struct traffic_type *ip)
+inbound_sp_sa(struct sp_ctx *sp, struct sa_ctx *sa, struct traffic_type *ip,
+   uint16_t lim)
 {
struct rte_mbuf *m;
uint32_t i, j, res, sa_idx;
@@ -399,15 +400,15 @@ inbound_sp_sa(struct sp_ctx *sp, struct sa_ctx *sa, 
struct traffic_type *ip)
for (i = 0; i < ip->num; i++) {
m = ip->pkts[i];
res = ip->res[i];
-   if (res & DISCARD) {
-   rte_pktmbuf_free(m);
-   continue;
-   }
if (res & BYPASS) {
ip->pkts[j++] = m;
continue;
}
-   /* Check return SA SPI matches pkt SPI */
+   if (res & DISCARD || i < lim) {
+   rte_pktmbuf_free(m);
+   continue;
+   }
+   /* Only check SPI match for processed IPSec packets */
sa_idx = ip->res[i] & PROTECT_MASK;
if (sa_idx == 0 || !inbound_sa_check(sa, m, sa_idx)) {
rte_pktmbuf_free(m);
@@ -423,11 +424,14 @@ process_pkts_inbound(struct ipsec_ctx *ipsec_ctx,
struct ipsec_traffic *traffic)
 {
struct rte_mbuf *m;
-   uint16_t idx, nb_pkts_in, i;
+   uint16_t idx, nb_pkts_in, i, n_ip4, n_ip6;

nb_pkts_in = ipsec_inbound(ipsec_ctx, traffic->ipsec.pkts,
traffic->ipsec.num, MAX_PKT_BURST);

+   n_ip4 = traffic->ip4.num;
+   n_ip6 = traffic->ip6.num;
+
/* SP/ACL Inbound check ipsec and ip4 */
for (i = 0; i < nb_pkts_in; i++) {
m = traffic->ipsec.pkts[i];
@@ -447,9 +451,11 @@ process_pkts_inbound(struct ipsec_ctx *ipsec_ctx,
rte_pktmbuf_free(m);
}

-   inbound_sp_sa(ipsec_ctx->sp4_ctx, ipsec_ctx->sa_ctx, >ip4);
+   inbound_sp_sa(ipsec_ctx->sp4_ctx, ipsec_ctx->sa_ctx, >ip4,
+   n_ip4);

-   inbound_sp_sa(ipsec_ctx->sp6_ctx, ipsec_ctx->sa_ctx, >ip6);
+   inbound_sp_sa(ipsec_ctx->sp6_ctx, ipsec_ctx->sa_ctx, >ip6,
+   n_ip6);
 }

 static inline void
-- 
2.4.11



[dpdk-dev] Ignoring number of bytes read in eal

2016-07-11 Thread Sergio Gonzalez Monroy
On 11/07/2016 05:48, David Marchand wrote:
> Hello,
>
> On Tue, Jun 28, 2016 at 10:40 AM, Kobylinski, MichalX
>  wrote:
>> CID 13212 - Ignoring number of bytes read:
>> The number of bytes copied into the buffer can be smaller than the requested 
>> number and the buffer can potentially be accessed out of range.
>> In rte_mem_virt2phy: Value returned from a function and indicating the 
>> number of bytes read is ignored.
>>
>> File: /lib/librte_eal/linuxapp/eal/eal_memory.c
>> Line: 187
>>
>> Can I mark this error as "False Positive"?
>>
>> Because return from read function is checked in "if" condition. If return 
>> from read is less than 0 function rte_mem_virt2phy is aborted and return: 
>> log message, RTE_BAD_PHYS_ADDR.
> ?
>
> Coverity is complaining because (in theory) read can return less than
> sizeof(uint64_t).
> This most likely can't happen, but still coverity is right from my pov.
>
> I'd rather fix this than mark this as false positive, Sergio ?

I agree with David, let's fix this.

Sergio


[dpdk-dev] [PATCH 2/2] malloc: no need to zero out memory on zmalloc

2016-07-05 Thread Sergio Gonzalez Monroy
Zeroing out memory on rte_zmalloc_socket is not required anymore since all
allocated memory is already zeroed.

Signed-off-by: Sergio Gonzalez Monroy 
---
 lib/librte_eal/common/rte_malloc.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/rte_malloc.c 
b/lib/librte_eal/common/rte_malloc.c
index 47deb00..f4a8835 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -123,11 +123,7 @@ rte_malloc(const char *type, size_t size, unsigned align)
 void *
 rte_zmalloc_socket(const char *type, size_t size, unsigned align, int socket)
 {
-   void *ptr = rte_malloc_socket(type, size, align, socket);
-
-   if (ptr != NULL)
-   memset(ptr, 0, size);
-   return ptr;
+   return rte_malloc_socket(type, size, align, socket);
 }

 /*
-- 
2.4.11



[dpdk-dev] [PATCH 1/2] mem: zero out memory on free

2016-07-05 Thread Sergio Gonzalez Monroy
Since [1] memzones are not guaranteed to be zeroed out.
This could potentially cause issues as applications might have been
relying on the allocated memory being zeroed out.

On init all allocated memory is zeroed by the kernel, so by zeroing out
memory on free, all available dpdk memory is always zeroed.

[1] fafcc11985a2 ("mem: rework memzone to be allocated by malloc")

Signed-off-by: Sergio Gonzalez Monroy 
---
 lib/librte_eal/common/malloc_elem.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/common/malloc_elem.c 
b/lib/librte_eal/common/malloc_elem.c
index 27e9925..42568e1 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -275,11 +275,14 @@ malloc_elem_free(struct malloc_elem *elem)
return -1;

rte_spinlock_lock(&(elem->heap->lock));
+   size_t sz = elem->size - sizeof(*elem);
+   uint8_t *ptr = (uint8_t *)[1];
struct malloc_elem *next = RTE_PTR_ADD(elem, elem->size);
if (next->state == ELEM_FREE){
/* remove from free list, join to this one */
elem_free_list_remove(next);
join_elem(elem, next);
+   sz += sizeof(*elem);
}

/* check if previous element is free, if so join with it and return,
@@ -288,15 +291,17 @@ malloc_elem_free(struct malloc_elem *elem)
if (elem->prev != NULL && elem->prev->state == ELEM_FREE) {
elem_free_list_remove(elem->prev);
join_elem(elem->prev, elem);
-   malloc_elem_free_list_insert(elem->prev);
-   }
-   /* otherwise add ourselves to the free list */
-   else {
-   malloc_elem_free_list_insert(elem);
-   elem->pad = 0;
+   sz += sizeof(*elem);
+   ptr -= sizeof(*elem);
+   elem = elem->prev;
}
+   malloc_elem_free_list_insert(elem);
+
/* decrease heap's count of allocated elements */
elem->heap->alloc_count--;
+
+   memset(ptr, 0, sz);
+
rte_spinlock_unlock(&(elem->heap->lock));

return 0;
-- 
2.4.11



[dpdk-dev] [PATCH v4 2/2] mk: fix acl library static linking

2016-07-01 Thread Sergio Gonzalez Monroy
Since below commit, ACL library is outside the scope of --whole-archive
and ACL autotest fails.

  RTE>>acl_autotest
  ACL: allocation of 25166728 bytes on socket 9 for ACL_acl_ctx failed
  ACL: rte_acl_add_rules(acl_ctx): rule #1 is invalid
  Line 1584: SSE classify with zero categories failed!
  Test Failed

This is the result of the linker picking weak over non-weak functions.

Fixes: 95dc3c3cf31c ("mk: reduce scope of whole-archive static linking")

Signed-off-by: Sergio Gonzalez Monroy 
---
v4:
- keep order of acl in _LDLIBS and wrap it with --whole-archive

v3:
- patch version history
- carry Acked-by

v2:
- add comment in makefile

 mk/rte.app.mk | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index bf8bcf9..ea22961 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -76,7 +76,10 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_IP_FRAG)+= -lrte_ip_frag
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METER)  += -lrte_meter
 _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lrte_sched
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LPM)+= -lrte_lpm
+# librte_acl needs --whole-archive because of weak functions
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= --whole-archive
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= --no-whole-archive
 _LDLIBS-$(CONFIG_RTE_LIBRTE_JOBSTATS)   += -lrte_jobstats
 _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)  += -lrte_power

-- 
2.4.11



[dpdk-dev] [PATCH v4 1/2] mk: allow duplicate linker flags in libraries list

2016-07-01 Thread Sergio Gonzalez Monroy
Since [1] duplicates in LDLIBS are removed. The side effect is that it
does not distinguish between libraries or linker flags.

This patch allows multiple linker flags in LDLIBS, such as
--whole-archive.

[1] Commit: edf4d331dcdb ("mk: eliminate duplicates from libraries list")

Signed-off-by: Sergio Gonzalez Monroy 
---
 mk/rte.app.mk | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 83314ca..bf8bcf9 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -166,7 +166,8 @@ LDLIBS += $(_LDLIBS-y) $(CPU_LDLIBS) $(EXTRA_LDLIBS)

 # Eliminate duplicates without sorting
 LDLIBS := $(shell echo $(LDLIBS) | \
-   awk '{for (i = 1; i <= NF; i++) { if (!seen[$$i]++) print $$i }}')
+   awk '{for (i = 1; i <= NF; i++) { \
+   if ($$i !~ /^-l.*/ || !seen[$$i]++) print $$i }}')

 ifeq ($(RTE_DEVEL_BUILD)$(CONFIG_RTE_BUILD_SHARED_LIB),yy)
 LDFLAGS += -rpath=$(RTE_SDK_BIN)/lib
-- 
2.4.11



[dpdk-dev] [PATCH v2] mk: fix acl library static linking

2016-07-01 Thread Sergio Gonzalez Monroy
On 01/07/2016 11:05, Thomas Monjalon wrote:
> 2016-07-01 09:05, Sergio Gonzalez Monroy:
>> On 30/06/2016 17:22, Thomas Monjalon wrote:
>>> 2016-06-30 17:14, Sergio Gonzalez Monroy:
>>>> On 30/06/2016 17:10, Thomas Monjalon wrote:
>>>>> 2016-06-30 17:01, Sergio Gonzalez Monroy:
>>>>>> --- a/mk/rte.app.mk
>>>>>> +++ b/mk/rte.app.mk
>>>>>> @@ -76,12 +76,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_IP_FRAG)+= 
>>>>>> -lrte_ip_frag
>>>>>> _LDLIBS-$(CONFIG_RTE_LIBRTE_METER)  += -lrte_meter
>>>>>> _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lrte_sched
>>>>>> _LDLIBS-$(CONFIG_RTE_LIBRTE_LPM)+= -lrte_lpm
>>>>>> -_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
>>>>>> _LDLIBS-$(CONFIG_RTE_LIBRTE_JOBSTATS)   += -lrte_jobstats
>>>>>> _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)  += -lrte_power
>>>>>> 
>>>>>> _LDLIBS-y += --whole-archive
>>>>>> 
>>>>>> +# librte_acl needs --whole-archive because of weak functions
>>>>>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
>>>>>> _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)  += -lrte_timer
>>>>>> _LDLIBS-$(CONFIG_RTE_LIBRTE_HASH)   += -lrte_hash
>>>>>> _LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lrte_vhost
>>>>> I was suggesting to keep -lrte_acl at the same place in the group of
>>>>> algorithms libraries, in order to keep an order satisfying this comment:
>>>>> # Order is important: from higher level to lower level
>>>>>
>>>>> But I have not tested if --whole-archive -lrte_acl --no-whole-archive 
>>>>> works.
>>>>>
>>>> Sorry, I missed that.
>>>>
>>>> Why is important being before jobstats and power?
>>> It is not.
>>> But I think we need to have some groups.
>>> And ACL is probably at the same layer level as lpm, sched, etc.
>> I guess I just don't see the groups you are mentioning :)
> I define groups as separated by blank line :)
>
>> How are timer, hash and vhost in the same group?
> It is far from perfect and subject to improvements :)
>
>> Wouldn't hash be in the same group as acl and lpm?
> It makes sense to use hash in drivers (example: enic).

You have not convinced me, but I'm not going to argue more over this.

You would just prefer to do the following, right?

+_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= --whole-archive
_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= --no-whole-archive



[dpdk-dev] weak functions in some drivers

2016-07-01 Thread Sergio Gonzalez Monroy
On 01/07/2016 11:16, Elo, Matias (Nokia - FI/Espoo) wrote:
>> -Original Message-
>> From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.monroy at intel.com]
>> Sent: Friday, July 01, 2016 1:05 PM
>> To: Elo, Matias (Nokia - FI/Espoo) ;
>> dev at dpdk.org
>> Cc: ferruh.yigit at intel.com; damarion at cisco.com
>> Subject: Re: [dpdk-dev] weak functions in some drivers
>>
>> On 01/07/2016 10:42, Elo, Matias (Nokia - FI/Espoo) wrote:
>>>>>>>> What is not clear to me is motivation to use weak here instead of 
>>>>>>>> simply
>>>>> using >CONFIG_RTE_I40E_INC_VECTOR
>>>>>>>> macro to exclude stubs in i40e_rxtx.c. It will make library smaller and
>> avoid
>>>>> issues like this one
>>>>>>>> which are quite hard to troubleshoot.
>>>>>>> Since this issue seen in fd.io, I didn't investigated more, but I don't
>>>>>>> want to clock your valid question, this is an attempt to resurrect the
>>>>>>> question ...
>>>>>> Hi,
>>>>>>
>>>>>> We are having exactly the same problem. For us the aforementioned
>>>>> workaround doesn't seem to work and vector mode is always disabled with
>> the
>>>>> i40e drivers. If I modify i40e_rxtx.c and exclude the stub functions using
>>>>> CONFIG_RTE_I40E_INC_VECTOR everything works as expected.
>>>>>> We are building DPDK with the CONFIG_RTE_BUILD_COMBINE_LIBS option
>>>>> enabled and link DPDK library to our application.
>>>>>> Any other ideas how this could be fixed?
>>>>>>
>>>>>> Regards,
>>>>>> Matias
>>>>>>
>>>>> So you have tried to link a combined static lib with --whole-archive
>>>>> -ldpdk --no-whole-archive and still get the wrong/weak function 
>>>>> definition?
>>>>>
>>>>> Sergio
>>>> I actually just managed to fix the problem. In our case I had to add
>>>> '-Wl,--whole-archive,-ldpdk,--no-whole-archive'  to the end of AM_LDFLAGS.
>>>>
>>> It turned out that the problem actually wasn't fixed.
>>>
>>> DPDK is built with CONFIG_RTE_BUILD_COMBINE_LIBS=y and
>> EXTRA_CFLAGS="-fPIC"
>>> What we are linking originally:
>>> -l:libdpdk.a
>>>
>>> This works otherwise but vector mode i40e is not enabled.
>>>
>>> When trying:
>>> -Wl,--whole-archive,-l:libdpdk.a,--no-whole-archive
>>>
>>> Linking fails with ' undefined reference' errors to several dpdk functions
>> (rte_eal_init, rte_cpu_get_flag_enabled, rte_eth_stats_get...).
>>> Btw. there seems to be a Stack Overflow question related to this:
>> http://stackoverflow.com/questions/38064021/dpdk-include-libraries-in-dpdk-
>> application-compiled-as-shared-library
>>> -Matias
>> What DPDK version are you using?
> v16.04

Ok. I was asking because there is no CONFIG_RTE_BUILD_COMBINE_LIBS in 16.04.


Could you provide full link/compile command line? I'm not able to 
reproduce the issue so far


[dpdk-dev] weak functions in some drivers

2016-07-01 Thread Sergio Gonzalez Monroy
On 01/07/2016 10:42, Elo, Matias (Nokia - FI/Espoo) wrote:
>> What is not clear to me is motivation to use weak here instead of simply
>>> using >CONFIG_RTE_I40E_INC_VECTOR
>> macro to exclude stubs in i40e_rxtx.c. It will make library smaller and 
>> avoid
>>> issues like this one
>> which are quite hard to troubleshoot.
> Since this issue seen in fd.io, I didn't investigated more, but I don't
> want to clock your valid question, this is an attempt to resurrect the
> question ...
 Hi,

 We are having exactly the same problem. For us the aforementioned
>>> workaround doesn't seem to work and vector mode is always disabled with the
>>> i40e drivers. If I modify i40e_rxtx.c and exclude the stub functions using
>>> CONFIG_RTE_I40E_INC_VECTOR everything works as expected.
 We are building DPDK with the CONFIG_RTE_BUILD_COMBINE_LIBS option
>>> enabled and link DPDK library to our application.
 Any other ideas how this could be fixed?

 Regards,
 Matias

>>> So you have tried to link a combined static lib with --whole-archive
>>> -ldpdk --no-whole-archive and still get the wrong/weak function definition?
>>>
>>> Sergio
>> I actually just managed to fix the problem. In our case I had to add
>> '-Wl,--whole-archive,-ldpdk,--no-whole-archive'  to the end of AM_LDFLAGS.
>>
> It turned out that the problem actually wasn't fixed.
>
> DPDK is built with CONFIG_RTE_BUILD_COMBINE_LIBS=y and EXTRA_CFLAGS="-fPIC"
>
> What we are linking originally:
>   -l:libdpdk.a
>
> This works otherwise but vector mode i40e is not enabled.
>
> When trying:
>   -Wl,--whole-archive,-l:libdpdk.a,--no-whole-archive
>
> Linking fails with ' undefined reference' errors to several dpdk functions 
> (rte_eal_init, rte_cpu_get_flag_enabled, rte_eth_stats_get...).
>
> Btw. there seems to be a Stack Overflow question related to this: 
> http://stackoverflow.com/questions/38064021/dpdk-include-libraries-in-dpdk-application-compiled-as-shared-library
>
> -Matias

What DPDK version are you using?


[dpdk-dev] [PATCH v2] mk: fix acl library static linking

2016-07-01 Thread Sergio Gonzalez Monroy
On 30/06/2016 17:22, Thomas Monjalon wrote:
> 2016-06-30 17:14, Sergio Gonzalez Monroy:
>> On 30/06/2016 17:10, Thomas Monjalon wrote:
>>> 2016-06-30 17:01, Sergio Gonzalez Monroy:
>>>> --- a/mk/rte.app.mk
>>>> +++ b/mk/rte.app.mk
>>>> @@ -76,12 +76,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_IP_FRAG)+= 
>>>> -lrte_ip_frag
>>>>_LDLIBS-$(CONFIG_RTE_LIBRTE_METER)  += -lrte_meter
>>>>_LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lrte_sched
>>>>_LDLIBS-$(CONFIG_RTE_LIBRTE_LPM)+= -lrte_lpm
>>>> -_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
>>>>_LDLIBS-$(CONFIG_RTE_LIBRTE_JOBSTATS)   += -lrte_jobstats
>>>>_LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)  += -lrte_power
>>>>
>>>>_LDLIBS-y += --whole-archive
>>>>
>>>> +# librte_acl needs --whole-archive because of weak functions
>>>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
>>>>_LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)  += -lrte_timer
>>>>_LDLIBS-$(CONFIG_RTE_LIBRTE_HASH)   += -lrte_hash
>>>>_LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lrte_vhost
>>> I was suggesting to keep -lrte_acl at the same place in the group of
>>> algorithms libraries, in order to keep an order satisfying this comment:
>>> # Order is important: from higher level to lower level
>>>
>>> But I have not tested if --whole-archive -lrte_acl --no-whole-archive works.
>>>
>> Sorry, I missed that.
>>
>> Why is important being before jobstats and power?
> It is not.
> But I think we need to have some groups.
> And ACL is probably at the same layer level as lpm, sched, etc.

I guess I just don't see the groups you are mentioning :)

How are timer, hash and vhost in the same group?
Wouldn't hash be in the same group as acl and lpm?


[dpdk-dev] [PATCH v2] mk: fix acl library static linking

2016-06-30 Thread Sergio Gonzalez Monroy
On 30/06/2016 17:10, Thomas Monjalon wrote:
> 2016-06-30 17:01, Sergio Gonzalez Monroy:
>> --- a/mk/rte.app.mk
>> +++ b/mk/rte.app.mk
>> @@ -76,12 +76,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_IP_FRAG)+= 
>> -lrte_ip_frag
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_METER)  += -lrte_meter
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lrte_sched
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_LPM)+= -lrte_lpm
>> -_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_JOBSTATS)   += -lrte_jobstats
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)  += -lrte_power
>>   
>>   _LDLIBS-y += --whole-archive
>>   
>> +# librte_acl needs --whole-archive because of weak functions
>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)  += -lrte_timer
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_HASH)   += -lrte_hash
>>   _LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lrte_vhost
> I was suggesting to keep -lrte_acl at the same place in the group of
> algorithms libraries, in order to keep an order satisfying this comment:
> # Order is important: from higher level to lower level
>
> But I have not tested if --whole-archive -lrte_acl --no-whole-archive works.
>

Sorry, I missed that.

Why is important being before jobstats and power?


[dpdk-dev] [PATCH v3] mk: fix acl library static linking

2016-06-30 Thread Sergio Gonzalez Monroy
Since below commit, ACL library is outside the scope of --whole-archive
and ACL autotest fails.

  RTE>>acl_autotest
  ACL: allocation of 25166728 bytes on socket 9 for ACL_acl_ctx failed
  ACL: rte_acl_add_rules(acl_ctx): rule #1 is invalid
  Line 1584: SSE classify with zero categories failed!
  Test Failed

This is the result of the linker picking weak over non-weak functions.

Fixes: 95dc3c3cf31c ("mk: reduce scope of whole-archive static linking")

Signed-off-by: Sergio Gonzalez Monroy 
Acked-by: Konstantin Ananyev 
---
v3:
- patch version history
- carry Acked-by

v2:
- add comment in makefile

 mk/rte.app.mk | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 83314ca..48d5e73 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -76,12 +76,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_IP_FRAG)+= -lrte_ip_frag
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METER)  += -lrte_meter
 _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lrte_sched
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LPM)+= -lrte_lpm
-_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
 _LDLIBS-$(CONFIG_RTE_LIBRTE_JOBSTATS)   += -lrte_jobstats
 _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)  += -lrte_power

 _LDLIBS-y += --whole-archive

+# librte_acl needs --whole-archive because of weak functions
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)  += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_HASH)   += -lrte_hash
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lrte_vhost
-- 
2.4.11



[dpdk-dev] [PATCH v2] mk: fix acl library static linking

2016-06-30 Thread Sergio Gonzalez Monroy
Since below commit, ACL library is outside the scope of --whole-archive
and ACL autotest fails.

  RTE>>acl_autotest
  ACL: allocation of 25166728 bytes on socket 9 for ACL_acl_ctx failed
  ACL: rte_acl_add_rules(acl_ctx): rule #1 is invalid
  Line 1584: SSE classify with zero categories failed!
  Test Failed

This is the result of the linker picking weak over non-weak functions.

Fixes: 95dc3c3cf31c ("mk: reduce scope of whole-archive static linking")

Signed-off-by: Sergio Gonzalez Monroy 
---
 mk/rte.app.mk | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 83314ca..48d5e73 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -76,12 +76,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_IP_FRAG)+= -lrte_ip_frag
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METER)  += -lrte_meter
 _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lrte_sched
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LPM)+= -lrte_lpm
-_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
 _LDLIBS-$(CONFIG_RTE_LIBRTE_JOBSTATS)   += -lrte_jobstats
 _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)  += -lrte_power

 _LDLIBS-y += --whole-archive

+# librte_acl needs --whole-archive because of weak functions
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)  += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_HASH)   += -lrte_hash
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lrte_vhost
-- 
2.4.11



[dpdk-dev] [PATCH] mk: fix acl library static linking

2016-06-30 Thread Sergio Gonzalez Monroy
On 30/06/2016 16:28, Thomas Monjalon wrote:
> 2016-06-30 15:02, Sergio Gonzalez Monroy:
>> On 30/06/2016 13:44, Thomas Monjalon wrote:
>>> 2016-06-30 13:04, Sergio Gonzalez Monroy:
>>>> On 30/06/2016 12:38, Thomas Monjalon wrote:
>>>>> Does it need to be commented in rte.app.mk?
>>>>> The other libs are in whole-archive to support dlopen of drivers.
>>>>> But the problem here is not because of a driver use.
>>>> There seem to be a bunch of libraries under --whole-archive scope that
>>>> are not
>>>> PMDs, ie. cfgfile, cmdline...
>>>>
>>>> What is the criteria?
>>> The criteria is a bit vague. We must try to include only libs which can
>>> be used by a driver.
>>> cmdline should probably not be there.
>>> Does it make sense to use cfgfile in a driver? maybe yes.
>> So as it is, ACL autotest is broken when building static libs
>> (non-combined).
> I think the --whole-archive option must be set specifically for ACL
> with a comment explaining it is required because of weak functions:
>
> # librte_acl needs --whole-archive because of weak functions
> _LDLIBS-$(CONFIG_RTE_LIBRTE_ACL) += --whole-archive -lrte_acl 
> --no-whole-archive

Will do.

>> For combined libs we usually wrap libdpdk.a with --whole-archive, thus it is
>> not an issue.
>>
>> Just thinking a bit more about the 'dlopen of drivers' case you
>> mentioned before,
>> shouldn't the driver have proper dependencies and therefore need shared
>> DPDK libraries?
> It is possible to build a .so, without any DT_NEEDED entries, which will
> find the required symbols in the static linked binary.

Of course! All DPDK libraries were like that until recently.
That doesn't mean it was right though.

>> What does happen if binary/app and driver are built against different
>> library versions?
> Bad things :)
>
>> Where does it say that we do support this use case?
> It is maybe not written. But I know it is used by people wanting to load
> some PMD.so on demand while having the rest statically compiled.
> I agree it needs to be documented and probably better managed and tested.
>

Note that this only applies to apps built with DPDK build system.

In my opinion, I don't think we should be supporting such case.
But if we were to, we are probably just better of whole-archiving all 
libraries into the
application. For example, what if there was a driver wanting to use ACL 
or any
other DPDK lib not currently in the set of libs we "consider" should be 
use by drivers?

Also, from what I have seen in the list, most folks do end up using 
combined lib and
wrapping it with --whole-archive.

Sergio


[dpdk-dev] [PATCH] mk: fix acl library static linking

2016-06-30 Thread Sergio Gonzalez Monroy
On 30/06/2016 13:44, Thomas Monjalon wrote:
> 2016-06-30 13:04, Sergio Gonzalez Monroy:
>> On 30/06/2016 12:38, Thomas Monjalon wrote:
>>> Does it need to be commented in rte.app.mk?
>>> The other libs are in whole-archive to support dlopen of drivers.
>>> But the problem here is not because of a driver use.
>> There seem to be a bunch of libraries under --whole-archive scope that
>> are not
>> PMDs, ie. cfgfile, cmdline...
>>
>> What is the criteria?
> The criteria is a bit vague. We must try to include only libs which can
> be used by a driver.
> cmdline should probably not be there.
> Does it make sense to use cfgfile in a driver? maybe yes.

So as it is, ACL autotest is broken when building static libs 
(non-combined).
For combined libs we usually wrap libdpdk.a with --whole-archive, thus it is
not an issue.

Just thinking a bit more about the 'dlopen of drivers' case you 
mentioned before,
shouldn't the driver have proper dependencies and therefore need shared 
DPDK libraries?
What does happen if binary/app and driver are built against different 
library versions?
Where does it say that we do support this use case?

Sergio




[dpdk-dev] [PATCH v3] eal/linuxapp: fix resource leak

2016-06-30 Thread Sergio Gonzalez Monroy
On 22/06/2016 17:50, Daniel Mrzyglod wrote:
> This patch fix all cases to do proper handle all munmap if pointer
> of hugepage is not NULL which prohibits resource leak.

I would reword the above commit to something like:

Current code does not munmap 'hugepage' mapping (hugepage info file) on 
function
exit, leaking resources.

> Coverity issue: 97920
> Fixes: b6a468ad41d5 ("memory: add --socket-mem option")
>
> Signed-off-by: Daniel Mrzyglod 
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 9251a5b..9b0d39a 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1051,7 +1051,7 @@ int
>   rte_eal_hugepage_init(void)
>   {
>   struct rte_mem_config *mcfg;
> - struct hugepage_file *hugepage, *tmp_hp = NULL;
> + struct hugepage_file *hugepage = NULL, *tmp_hp = NULL;
>   struct hugepage_info used_hp[MAX_HUGEPAGE_SIZES];
>   
>   uint64_t memory[RTE_MAX_NUMA_NODES];
> @@ -1367,13 +1367,15 @@ rte_eal_hugepage_init(void)
>   "of memory.\n",
>   i, nr_hugefiles, RTE_STR(CONFIG_RTE_MAX_MEMSEG),
>   RTE_MAX_MEMSEG);
> - return -ENOMEM;
> + goto fail;
>   }
>   

I missed this but we should also munmap 'hugepage' on successful 
function exit.

Sergio

>   return 0;
>   
>   fail:
>   free(tmp_hp);
> + if (hugepage != NULL)
> + munmap(hugepage, nr_hugefiles * sizeof(struct hugepage_file));
>   return -1;
>   }
>   




[dpdk-dev] [PATCH] mk: fix acl library static linking

2016-06-30 Thread Sergio Gonzalez Monroy
On 30/06/2016 12:38, Thomas Monjalon wrote:
> 2016-06-30 12:10, Sergio Gonzalez Monroy:
>> Since below commit, ACL library is outside the scope of --whole-archive
>> and ACL autotest fails.
>>
>>RTE>>acl_autotest
>>ACL: allocation of 25166728 bytes on socket 9 for ACL_acl_ctx failed
>>ACL: rte_acl_add_rules(acl_ctx): rule #1 is invalid
>>Line 1584: SSE classify with zero categories failed!
>>Test Failed
>>
>> This is the result of the linker picking weak over non-weak functions.
>>
>> Fixes: 95dc3c3cf31c ("mk: reduce scope of whole-archive static linking")
>>
>> Signed-off-by: Sergio Gonzalez Monroy 
> Please could you detail which symbol is missing?

It is not missing a symbol, it is picking weak over non-weak symbol.
It just happens that the only other using weak symbols are PMDs and
they are under the scope of --whole-archive already.

> Does it need to be commented in rte.app.mk?
> The other libs are in whole-archive to support dlopen of drivers.
> But the problem here is not because of a driver use.

There seem to be a bunch of libraries under --whole-archive scope that 
are not
PMDs, ie. cfgfile, cmdline...

What is the criteria?



[dpdk-dev] [PATCH] mk: fix acl library static linking

2016-06-30 Thread Sergio Gonzalez Monroy
Since below commit, ACL library is outside the scope of --whole-archive
and ACL autotest fails.

  RTE>>acl_autotest
  ACL: allocation of 25166728 bytes on socket 9 for ACL_acl_ctx failed
  ACL: rte_acl_add_rules(acl_ctx): rule #1 is invalid
  Line 1584: SSE classify with zero categories failed!
  Test Failed

This is the result of the linker picking weak over non-weak functions.

Fixes: 95dc3c3cf31c ("mk: reduce scope of whole-archive static linking")

Signed-off-by: Sergio Gonzalez Monroy 
---
 mk/rte.app.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 83314ca..7f89fd4 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -76,12 +76,12 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_IP_FRAG)+= -lrte_ip_frag
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METER)  += -lrte_meter
 _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lrte_sched
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LPM)+= -lrte_lpm
-_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
 _LDLIBS-$(CONFIG_RTE_LIBRTE_JOBSTATS)   += -lrte_jobstats
 _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)  += -lrte_power

 _LDLIBS-y += --whole-archive

+_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)  += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_HASH)   += -lrte_hash
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lrte_vhost
-- 
2.4.11



[dpdk-dev] weak functions in some drivers

2016-06-30 Thread Sergio Gonzalez Monroy
On 29/06/2016 14:26, Elo, Matias (Nokia - FI/Espoo) wrote:
>>> What is not clear to me is motivation to use weak here instead of simply 
>>> using >CONFIG_RTE_I40E_INC_VECTOR
>>> macro to exclude stubs in i40e_rxtx.c. It will make library smaller and 
>>> avoid issues like this one
>>> which are quite hard to troubleshoot.
>> Since this issue seen in fd.io, I didn't investigated more, but I don't
>> want to clock your valid question, this is an attempt to resurrect the
>> question ...
> Hi,
>
> We are having exactly the same problem. For us the aforementioned workaround 
> doesn't seem to work and vector mode is always disabled with the i40e 
> drivers. If I modify i40e_rxtx.c and exclude the stub functions using 
> CONFIG_RTE_I40E_INC_VECTOR everything works as expected.
>
> We are building DPDK with the CONFIG_RTE_BUILD_COMBINE_LIBS option enabled 
> and link DPDK library to our application.
>
> Any other ideas how this could be fixed?
>
> Regards,
> Matias
>

So you have tried to link a combined static lib with --whole-archive 
-ldpdk --no-whole-archive and still get the wrong/weak function definition?

Sergio


[dpdk-dev] [PATCH v5] eal: out-of-bounds write

2016-06-20 Thread Sergio Gonzalez Monroy
On 20/06/2016 11:09, Thomas Monjalon wrote:
> 2016-06-20 10:38, Sergio Gonzalez Monroy:
>> On 20/06/2016 10:14, Thomas Monjalon wrote:
>>>> +  RTE_LOG(ERR, EAL,
>>>> +  "All memory segments exhausted by IVSHMEM. "
>>> There is no evidence that it is related to IVSHMEM.
>>> "Not enough memory segments." would be more appropriate.
>> Actually we would hit this issue when all memsegs have been used by IVSHMEM.
>> So I think the message is accurate.
> I think it's saner to avoid mixing "potential root cause of a use case" and
> "accurate description of the error".
> One day, the root cause could be different and the message will become wrong.
> Here there is not enough memory segment.
>

Right.
So the whole point of doing the check before the loop was to display
the error message with its specific cause.

I would think that if the code changes and the message is not accurate then
it should also be updated.

So if folks prefer a more generic error message probably we don't need 
the check
before the loop and just change the check condition inside the loop that 
would
end up printing the generic error message (after the loop).

Basically v1 would do that.
http://dpdk.org/dev/patchwork/patch/12241/

Sergio



[dpdk-dev] [PATCH v5] eal: out-of-bounds write

2016-06-20 Thread Sergio Gonzalez Monroy
On 20/06/2016 10:14, Thomas Monjalon wrote:
>> +RTE_LOG(ERR, EAL,
>> +"All memory segments exhausted by IVSHMEM. "
> There is no evidence that it is related to IVSHMEM.
> "Not enough memory segments." would be more appropriate.

Actually we would hit this issue when all memsegs have been used by IVSHMEM.
So I think the message is accurate.

Am I missing something?

Sergio



[dpdk-dev] [PATCH v4 1/1] eal: fix resource leak of mapped memory

2016-06-16 Thread Sergio Gonzalez Monroy
On 15/06/2016 13:25, Marcin Kerlin wrote:
> Patch fixes resource leak in rte_eal_hugepage_attach() where mapped files
> were not freed back to the OS in case of failure. Patch uses the behavior
> of Linux munmap: "It is not an error if the indicated range does not
> contain any mapped pages".
>
> v4:
> 1)removed keyword const from pointer and dependent on that casting (void *)
> v3:
> 1)removed redundant casting
> 2)removed update error message
> v2:
> 1)unmapping also previous addresses

The patch version history should be after the triple dash below so it 
won't show up
on git log.

> Coverity issue: 13295, 13296, 13303
> Fixes: af75078fece3 ("first public release")
>
> Signed-off-by: Marcin Kerlin 
> ---

Insert here patch version history.

>   lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 79d1d2d..c935765 100644

Thomas, are you ok to update the commit message? Otherwise, please 
Mercin do v5 with changes and keep my ack.

Acked-by: Sergio Gonzalez Monroy 


[dpdk-dev] [PATCH v4] eal: out-of-bounds write

2016-06-16 Thread Sergio Gonzalez Monroy
On 15/06/2016 14:25, Slawomir Mrozowicz wrote:
> Overrunning array mcfg->memseg of 256 44-byte elements
> at element index 257 using index j.
> Fixed by add condition with message information.
>
> Fixes: af75078fece3 ("first public release")
> Coverity ID 13282
>
> Signed-off-by: Slawomir Mrozowicz 
> ---

Acked-by: Sergio Gonzalez Monroy 


[dpdk-dev] [PATCH v2 1/1] eal: fix resource leak of mapped memory

2016-06-15 Thread Sergio Gonzalez Monroy
Hi Marcin,

On 14/06/2016 16:33, Marcin Kerlin wrote:
> Patch fixes resource leak in rte_eal_hugepage_attach() where mapped files
> were not freed back to the OS in case of failure. Patch uses the behavior
> of Linux munmap: "It is not an error if the indicated range does not
> contain any mapped pages".
>
> Coverity issue: 13295, 13296, 13303
> Fixes: af75078fece3 ("first public release")
>
> Signed-off-by: Marcin Kerlin 
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 11 +--
>   1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 79d1d2d..1834fa0 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1417,7 +1417,7 @@ rte_eal_hugepage_attach(void)
>   if (internal_config.xen_dom0_support) {
>   #ifdef RTE_LIBRTE_XEN_DOM0
>   if (rte_xen_dom0_memory_attach() < 0) {
> - RTE_LOG(ERR, EAL,"Failed to attach memory setments of 
> primay "
> + RTE_LOG(ERR, EAL, "Failed to attach memory segments of 
> primary "
>   "process\n");

If you want to fix spelling of error message it probably should go in a 
different patch.

>   return -1;
>   }
> @@ -1481,7 +1481,7 @@ rte_eal_hugepage_attach(void)
>   
>   size = getFileSize(fd_hugepage);
>   hp = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd_hugepage, 0);
> - if (hp == NULL) {
> + if (hp == MAP_FAILED) {
>   RTE_LOG(ERR, EAL, "Could not mmap %s\n", 
> eal_hugepage_info_path());
>   goto error;
>   }
> @@ -1551,6 +1551,13 @@ rte_eal_hugepage_attach(void)
>   return 0;
>   
>   error:
> + s = 0;
> + while (s < RTE_MAX_MEMSEG && mcfg->memseg[s].len > 0) {
> + munmap(mcfg->memseg[s].addr, mcfg->memseg[s].len);
> + s++;
> + }
> + if (hp != NULL && hp != MAP_FAILED)
> + munmap((void *)(uintptr_t)hp, size);

No need for double casting, nor to cast to (void *).

Sergio

>   if (fd_zero >= 0)
>   close(fd_zero);
>   if (fd_hugepage >= 0)




[dpdk-dev] [PATCH] mem: fix possible memzone integer overflow

2016-06-14 Thread Sergio Gonzalez Monroy
It is possible to get an integer overflow if we try to reserve a memzone
with len = 0 (meaning the maximum contiguous space available) and the
maximum available elem size is less than (MALLOC_ELEM_OVERHEAD + align).

Issue reported by Coverity:

   >>> 10. overflow: Subtract operation overflows on operands len and
   >>> 64UL.
   >>> CID 107111 (#1 of 1): Overflowed return value (INTEGER_OVERFLOW)
   >>> 11. overflow_sink: Overflowed or truncated value (or a value
   >>> computed from an overflowed or truncated value)
   >>> len - 64UL - align used as return value.
   122 return len - MALLOC_ELEM_OVERHEAD - align;

Fixes: fafcc11985a2 ("mem: rework memzone to be allocated by malloc")

Signed-off-by: Sergio Gonzalez Monroy 
---
 lib/librte_eal/common/eal_common_memzone.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c 
b/lib/librte_eal/common/eal_common_memzone.c
index 452679e..5d28341 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -119,6 +119,9 @@ find_heap_max_free_elem(int *s, unsigned align)
}
}

+   if (len < MALLOC_ELEM_OVERHEAD + align)
+   return 0;
+
return len - MALLOC_ELEM_OVERHEAD - align;
 }

@@ -197,8 +200,13 @@ memzone_reserve_aligned_thread_unsafe(const char *name, 
size_t len,
if (len == 0) {
if (bound != 0)
requested_len = bound;
-   else
+   else {
requested_len = find_heap_max_free_elem(_id, 
align);
+   if (requested_len == 0) {
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+   }
}

if (socket_id == SOCKET_ID_ANY)
-- 
2.4.11



[dpdk-dev] [PATCH v3] eal: out-of-bounds write

2016-06-14 Thread Sergio Gonzalez Monroy
On 14/06/2016 10:02, Slawomir Mrozowicz wrote:
> Overrunning array mcfg->memseg of 256 44-byte elements
> at element index 257 using index j.
> Fixed by add condition with message information.
>
> Fixes: af75078fece3 ("first public release")
> Coverity ID 13282
>
> Signed-off-by: Slawomir Mrozowicz 
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 19 +--
>   1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 5b9132c..6a2daf5 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1301,6 +1301,15 @@ rte_eal_hugepage_init(void)
>   break;
>   }
>   
> + if (j >= RTE_MAX_MEMSEG) {
> + RTE_LOG(ERR, EAL,
> + "Failed: all memsegs used by ivshmem.\n"
> + "Current %d is not enough.\n"
> + "Please either increase the RTE_MAX_MEMSEG\n",
> + RTE_MAX_MEMSEG);
> + return -ENOMEM;
> + }
> +
>   for (i = 0; i < nr_hugefiles; i++) {
>   new_memseg = 0;
>   
> @@ -1333,8 +1342,14 @@ rte_eal_hugepage_init(void)
>   
>   if (new_memseg) {
>   j += 1;
> - if (j == RTE_MAX_MEMSEG)
> - break;
> + if (j >= RTE_MAX_MEMSEG) {
> + RTE_LOG(ERR, EAL,
> + "Failed: all memsegs used by ivshmem.\n"
> + "Current %d is not enough.\n"
> + "Please either increase the 
> RTE_MAX_MEMSEG\n",
> + RTE_MAX_MEMSEG);
> + return -ENOMEM;
> + }
>   

I don't think you need to change anything inside the for loop.
As it is in the patch, the error message is not accurate.
What I tried to say in my previous review was to not touch the loop and 
just do the check before the loop.
There is an error message after the loop which is correct for all cases 
except the case this patch is
addressing by checking before the loop.

Sergio

>   mcfg->memseg[j].phys_addr = hugepage[i].physaddr;
>   mcfg->memseg[j].addr = hugepage[i].final_va;




[dpdk-dev] [PATCH v3 9/9] doc: update ipsec sample guide

2016-06-09 Thread Sergio Gonzalez Monroy
Signed-off-by: Sergio Gonzalez Monroy 
---
 doc/guides/sample_app_ug/img/ipsec_endpoints.svg | 850 +
 doc/guides/sample_app_ug/ipsec_secgw.rst | 910 ++-
 2 files changed, 1400 insertions(+), 360 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/img/ipsec_endpoints.svg

diff --git a/doc/guides/sample_app_ug/img/ipsec_endpoints.svg 
b/doc/guides/sample_app_ug/img/ipsec_endpoints.svg
new file mode 100644
index 000..e4aba4c
--- /dev/null
+++ b/doc/guides/sample_app_ug/img/ipsec_endpoints.svg
@@ -0,0 +1,850 @@
+
+
+
+http://www.openswatchbook.org/uri/2009/osb;
+   xmlns:dc="http://purl.org/dc/elements/1.1/;
+   xmlns:cc="http://creativecommons.org/ns#;
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#;
+   xmlns:svg="http://www.w3.org/2000/svg;
+   xmlns="http://www.w3.org/2000/svg;
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd;
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape;
+   width="155.68507mm"
+   height="76.061203mm"
+   viewBox="0 0 551.64003 269.50821"
+   id="svg2"
+   version="1.1"
+   inkscape:version="0.91 r13725"
+   sodipodi:docname="endpoints.svg">
+  
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+
+
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
+  
+  
+
+  
+image/svg+xml
+http://purl.org/dc/dcmitype/StillImage; />
+
+  
+
+  
+  
+
+
+
+
+ep0
+ep1
+traffic gen
+traffic gen
+
+
+
+
+
+2
+3
+2
+3
+0
+1
+
+0
+1
+
+UNPROTECTEDcipher-text
+PROTECTEDclear-text
+
+
+outbound
+inbound
+
+
+
+
+
+
+  
+
diff --git a/doc/guides/sample_app_ug/ipsec_secgw.rst 
b/doc/guides/sample_app_ug/ipsec_secgw.rst
index c11c7e7..66dd326 100644
--- a/doc/guides/sample_app_ug/ipsec_secgw.rst
+++ b/doc/guides/sample_app_ug/ipsec_secgw.rst
@@ -38,165 +38,171 @@ Overview
 

 The application demonstrates the implementation of a Security Gateway
-(not IPsec compliant, see Constraints bellow) using DPDK based on RFC4301,
+(not IPsec compliant, see the Constraints section below) using DPDK based on 
RFC4301,
 RFC4303, RFC3602 and RFC2404.

 Internet Key Exchange (IKE) is not implemented, so only manual setting of
 Security Policies and Security Associations is supported.

 The Security Policies (SP) are implemented as ACL rules, the Security
-Associations (SA) are stored in a table and the Routing is implemented
+Associations (SA) are stored in a table and the routing is implemented
 using LPM.

-The application classify the ports between Protected and Unprotected.
-Thus, traffic received in an Unprotected or Protected port is consider
+The application classifies the ports as *Protected* and *Unprotected*.
+Thus, traffic received on an Unprotected or Protected port is consider
 Inbound or Outbound respectively.

-Path for IPsec Inbound traffic:
+The Path for IPsec Inbound traffic is:

-*  Read packets from the port
+*  Read packets from the port.
 *  Classify packets between IPv4 and ESP.
-*  Inbound SA lookup for ESP packets based on their SPI
-*  Verification/Decryption
-*  Removal of ESP and outer IP header
-*  Inbound SP check using ACL of decrypted packets and any other IPv4 packet
-   we read.
-*  Routing
-*  Write packet to port
-
-Path for IPsec Outbound traffic:
-
-*  Read packets from the port
-*  Outbound SP check using ACL of all IPv4 traffic
-*  Outbound SA lookup for packets that need IPsec protection
-*  Add ESP and outer IP header
-*  Encryption/Digest
-*  Routing
-*  Write packet to port
+*  Perform Inbound SA lookup for ESP packets based on their SPI.
+*  Perform Verification/Decryption.
+*  Remove ESP and outer IP header
+*  Inbound SP check using ACL of decrypted packets and any other IPv4 packets.
+*  Routing.
+*  Write packet to port.
+
+The Path for the IPsec Outbound traffic is:
+
+*  Read packets from the port.
+*  Perform Outbound SP check using ACL of all IPv4 traffic.
+*  Perform Outbound SA lookup for packets that need IPsec protection.
+*  Add ESP and outer IP header.
+*  Perform Encryption/Digest.
+*  Routing.
+*  Write packet to port.
+

 Constraints
 ---
-*  IPv4 traffic
-*  ESP tunnel mode
-*  EAS-CBC, HMAC-SHA1 and NULL
-*  Each SA must be ha

[dpdk-dev] [PATCH v3 8/9] examples/ipsec-secgw: transport mode support

2016-06-09 Thread Sergio Gonzalez Monroy
IPSec transport mode support.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/esp.c   | 124 ++-
 examples/ipsec-secgw/ipsec.h |   1 +
 examples/ipsec-secgw/rt.c|  32 +++
 examples/ipsec-secgw/sa.c|  39 ++
 examples/ipsec-secgw/sp4.c   |  40 ++
 examples/ipsec-secgw/sp6.c   |  48 +
 6 files changed, 248 insertions(+), 36 deletions(-)

diff --git a/examples/ipsec-secgw/esp.c b/examples/ipsec-secgw/esp.c
index e1fde36..05caa77 100644
--- a/examples/ipsec-secgw/esp.c
+++ b/examples/ipsec-secgw/esp.c
@@ -42,7 +42,6 @@
 #include 

 #include 
-#include 
 #include 
 #include 
 #include 
@@ -70,21 +69,24 @@ int
 esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
-   int32_t payload_len, ip_hdr_len;
+   struct ip *ip4;
struct rte_crypto_sym_op *sym_cop;
+   int32_t payload_len, ip_hdr_len;

RTE_ASSERT(m != NULL);
RTE_ASSERT(sa != NULL);
RTE_ASSERT(cop != NULL);

-   ip_hdr_len = 0;
-   switch (sa->flags) {
-   case IP4_TUNNEL:
-   ip_hdr_len = sizeof(struct ip);
-   break;
-   case IP6_TUNNEL:
+   ip4 = rte_pktmbuf_mtod(m, struct ip *);
+   if (likely(ip4->ip_v == IPVERSION))
+   ip_hdr_len = ip4->ip_hl * 4;
+   else if (ip4->ip_v == IP6_VERSION)
+   /* XXX No option headers supported */
ip_hdr_len = sizeof(struct ip6_hdr);
-   break;
+   else {
+   RTE_LOG(ERR, IPSEC_ESP, "invalid IP packet type %d\n",
+   ip4->ip_v);
+   return -EINVAL;
}

payload_len = rte_pktmbuf_pkt_len(m) - ip_hdr_len -
@@ -126,6 +128,8 @@ int
 esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
+   struct ip *ip4, *ip;
+   struct ip6_hdr *ip6;
uint8_t *nexthdr, *pad_len;
uint8_t *padding;
uint16_t i;
@@ -135,7 +139,7 @@ esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
RTE_ASSERT(cop != NULL);

if (cop->status != RTE_CRYPTO_OP_STATUS_SUCCESS) {
-   RTE_LOG(ERR, IPSEC_ESP, "Failed crypto op\n");
+   RTE_LOG(ERR, IPSEC_ESP, "failed crypto op\n");
return -1;
}

@@ -146,7 +150,7 @@ esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
padding = pad_len - *pad_len;
for (i = 0; i < *pad_len; i++) {
if (padding[i] != i + 1) {
-   RTE_LOG(ERR, IPSEC_ESP, "invalid pad_len field\n");
+   RTE_LOG(ERR, IPSEC_ESP, "invalid padding\n");
return -EINVAL;
}
}
@@ -157,7 +161,23 @@ esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
return -EINVAL;
}

-   ipip_inbound(m, sizeof(struct esp_hdr) + sa->iv_len);
+   if (unlikely(sa->flags == TRANSPORT)) {
+   ip = rte_pktmbuf_mtod(m, struct ip *);
+   ip4 = (struct ip *)rte_pktmbuf_adj(m,
+   sizeof(struct esp_hdr) + sa->iv_len);
+   if (likely(ip->ip_v == IPVERSION)) {
+   memmove(ip4, ip, ip->ip_hl * 4);
+   ip4->ip_p = *nexthdr;
+   ip4->ip_len = htons(rte_pktmbuf_data_len(m));
+   } else {
+   ip6 = (struct ip6_hdr *)ip4;
+   /* XXX No option headers supported */
+   memmove(ip6, ip, sizeof(struct ip6_hdr));
+   ip6->ip6_nxt = *nexthdr;
+   ip6->ip6_plen = htons(rte_pktmbuf_data_len(m));
+   }
+   } else
+   ipip_inbound(m, sizeof(struct esp_hdr) + sa->iv_len);

return 0;
 }
@@ -166,31 +186,57 @@ int
 esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
-   uint16_t pad_payload_len, pad_len, ip_hdr_len;
struct ip *ip4;
struct ip6_hdr *ip6;
-   struct esp_hdr *esp;
-   int32_t i;
-   char *padding;
+   struct esp_hdr *esp = NULL;
+   uint8_t *padding, *new_ip, nlp;
struct rte_crypto_sym_op *sym_cop;
+   int32_t i;
+   uint16_t pad_payload_len, pad_len, ip_hdr_len;

RTE_ASSERT(m != NULL);
RTE_ASSERT(sa != NULL);
RTE_ASSERT(cop != NULL);

-   /* Payload length */
-   pad_payload_len = RTE_ALIGN_CEIL(rte_pktmbuf_pkt_len(m) + 2,
-   sa->block_size);
-   pad_len = pad_payload_len - rte_pktmbuf_pkt_len(m);
-
ip_hdr_len = 0;
-   switch (sa->flags) {
-   case IP4_TUNNEL:
+
+   ip4 = rte_pktmbuf_mtod(m, struct ip *);
+   if (likely(ip4->ip_v == IPVERSION))

[dpdk-dev] [PATCH v3 6/9] examples/ipsec-secgw: consistent config variable names

2016-06-09 Thread Sergio Gonzalez Monroy
Modify the default SP config variables names to be consistent with SA.

The resulting naming convention is that variables with suffixes _out/_in
are the default for ep0 and the reverse for ep1.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/sp.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/examples/ipsec-secgw/sp.c b/examples/ipsec-secgw/sp.c
index 4f16730..6aa377d 100644
--- a/examples/ipsec-secgw/sp.c
+++ b/examples/ipsec-secgw/sp.c
@@ -112,7 +112,7 @@ struct rte_acl_field_def ipv4_defs[NUM_FIELDS_IPV4] = {

 RTE_ACL_RULE_DEF(acl4_rules, RTE_DIM(ipv4_defs));

-const struct acl4_rules acl4_rules_in[] = {
+const struct acl4_rules acl4_rules_out[] = {
{
.data = {.userdata = PROTECT(5), .category_mask = 1, .priority = 1},
/* destination IPv4 */
@@ -175,7 +175,7 @@ const struct acl4_rules acl4_rules_in[] = {
}
 };

-const struct acl4_rules acl4_rules_out[] = {
+const struct acl4_rules acl4_rules_in[] = {
{
.data = {.userdata = PROTECT(5), .category_mask = 1, .priority = 1},
/* destination IPv4 */
@@ -343,15 +343,15 @@ sp_init(struct socket_ctx *ctx, int socket_id, unsigned 
ep)
"initialized\n", socket_id);

if (ep == 0) {
-   rules_out = acl4_rules_in;
-   nb_out_rules = RTE_DIM(acl4_rules_in);
-   rules_in = acl4_rules_out;
-   nb_in_rules = RTE_DIM(acl4_rules_out);
-   } else if (ep == 1) {
rules_out = acl4_rules_out;
nb_out_rules = RTE_DIM(acl4_rules_out);
rules_in = acl4_rules_in;
nb_in_rules = RTE_DIM(acl4_rules_in);
+   } else if (ep == 1) {
+   rules_out = acl4_rules_in;
+   nb_out_rules = RTE_DIM(acl4_rules_in);
+   rules_in = acl4_rules_out;
+   nb_in_rules = RTE_DIM(acl4_rules_out);
} else
rte_exit(EXIT_FAILURE, "Invalid EP value %u. "
"Only 0 or 1 supported.\n", ep);
-- 
2.5.5



[dpdk-dev] [PATCH v3 5/9] examples/ipsec-secgw: fix no sa found case

2016-06-09 Thread Sergio Gonzalez Monroy
The application only ASSERTS that an SA is not NULL (only when debugging
is enabled) without properly dealing with the case of not having an SA
for the processed packet.

Behavior should be such as if no SA is found, drop the packet.

Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/ipsec.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index 90a9a86..ccc840f 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -110,6 +110,11 @@ ipsec_enqueue(ipsec_xform_fn xform_func, struct ipsec_ctx 
*ipsec_ctx,
struct ipsec_sa *sa;

for (i = 0; i < nb_pkts; i++) {
+   if (unlikely(sas[i] == NULL)) {
+   rte_pktmbuf_free(pkts[i]);
+   continue;
+   }
+
rte_prefetch0(sas[i]);
rte_prefetch0(pkts[i]);

@@ -117,8 +122,6 @@ ipsec_enqueue(ipsec_xform_fn xform_func, struct ipsec_ctx 
*ipsec_ctx,
sa = sas[i];
priv->sa = sa;

-   RTE_ASSERT(sa != NULL);
-
priv->cop.type = RTE_CRYPTO_OP_TYPE_SYMMETRIC;

rte_prefetch0(>sym_cop);
-- 
2.5.5



[dpdk-dev] [PATCH v3 4/9] examples/ipsec-secgw: rework ipsec execution loop

2016-06-09 Thread Sergio Gonzalez Monroy
Rework implementation moving from function pointers approach, where each
function implements very specific functionality, to a generic function
approach.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/esp.c   |   9 +-
 examples/ipsec-secgw/esp.h   |   9 +-
 examples/ipsec-secgw/ipsec.c |  36 --
 examples/ipsec-secgw/ipsec.h |   2 -
 examples/ipsec-secgw/sa.c| 272 ++-
 5 files changed, 145 insertions(+), 183 deletions(-)

diff --git a/examples/ipsec-secgw/esp.c b/examples/ipsec-secgw/esp.c
index 7dce78c..2ca97ad 100644
--- a/examples/ipsec-secgw/esp.c
+++ b/examples/ipsec-secgw/esp.c
@@ -67,9 +67,8 @@ random_iv_u64(uint64_t *buf, uint16_t n)
*((uint32_t *)[i]) = (uint32_t)lrand48();
 }

-/* IPv4 Tunnel */
 int
-esp4_tunnel_inbound_pre_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
int32_t payload_len;
@@ -117,7 +116,7 @@ esp4_tunnel_inbound_pre_crypto(struct rte_mbuf *m, struct 
ipsec_sa *sa,
 }

 int
-esp4_tunnel_inbound_post_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
uint8_t *nexthdr, *pad_len;
@@ -155,7 +154,7 @@ esp4_tunnel_inbound_post_crypto(struct rte_mbuf *m, struct 
ipsec_sa *sa,
 }

 int
-esp4_tunnel_outbound_pre_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
uint16_t pad_payload_len, pad_len;
@@ -234,7 +233,7 @@ esp4_tunnel_outbound_pre_crypto(struct rte_mbuf *m, struct 
ipsec_sa *sa,
 }

 int
-esp4_tunnel_outbound_post_crypto(struct rte_mbuf *m __rte_unused,
+esp_outbound_post(struct rte_mbuf *m __rte_unused,
struct ipsec_sa *sa __rte_unused,
struct rte_crypto_op *cop)
 {
diff --git a/examples/ipsec-secgw/esp.h b/examples/ipsec-secgw/esp.h
index 3101882..fa5cc8a 100644
--- a/examples/ipsec-secgw/esp.h
+++ b/examples/ipsec-secgw/esp.h
@@ -46,21 +46,20 @@ struct esp_hdr {
/* Integrity Check Value - ICV */
 };

-/* IPv4 Tunnel */
 int
-esp4_tunnel_inbound_pre_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop);

 int
-esp4_tunnel_inbound_post_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop);

 int
-esp4_tunnel_outbound_pre_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop);

 int
-esp4_tunnel_outbound_post_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_outbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop);

 #endif /* __RTE_IPSEC_XFORM_ESP_H__ */
diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index 3ffa77a..90a9a86 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -42,6 +42,7 @@
 #include 

 #include "ipsec.h"
+#include "esp.h"

 static inline int
 create_session(struct ipsec_ctx *ipsec_ctx __rte_unused, struct ipsec_sa *sa)
@@ -99,15 +100,14 @@ enqueue_cop(struct cdev_qp *cqp, struct rte_crypto_op *cop)
}
 }

-static inline uint16_t
-ipsec_processing(struct ipsec_ctx *ipsec_ctx, struct rte_mbuf *pkts[],
-   struct ipsec_sa *sas[], uint16_t nb_pkts, uint16_t max_pkts)
+static inline void
+ipsec_enqueue(ipsec_xform_fn xform_func, struct ipsec_ctx *ipsec_ctx,
+   struct rte_mbuf *pkts[], struct ipsec_sa *sas[],
+   uint16_t nb_pkts)
 {
-   int ret = 0, i, j, nb_cops;
+   int ret = 0, i;
struct ipsec_mbuf_metadata *priv;
-   struct rte_crypto_op *cops[max_pkts];
struct ipsec_sa *sa;
-   struct rte_mbuf *pkt;

for (i = 0; i < nb_pkts; i++) {
rte_prefetch0(sas[i]);
@@ -133,7 +133,7 @@ ipsec_processing(struct ipsec_ctx *ipsec_ctx, struct 
rte_mbuf *pkts[],
rte_crypto_op_attach_sym_session(>cop,
sa->crypto_session);

-   ret = sa->pre_crypto(pkts[i], sa, >cop);
+   ret = xform_func(pkts[i], sa, >cop);
if (unlikely(ret)) {
rte_pktmbuf_free(pkts[i]);
continue;
@@ -142,8 +142,18 @@ ipsec_processing(struct ipsec_ctx *ipsec_ctx, struct 
rte_mbuf *pkts[],
RTE_ASSERT(sa->cdev_id_qp < ipsec_ctx->nb_qps);
enqueue_cop(_ctx->tbl[sa->cdev_id_qp], >cop);
}
+}
+
+static inline int
+ipsec_dequeue(ipsec_xform_fn xform_func, struct ipsec_ctx *ipsec_ctx,
+   struct rte_mbuf *pkts[], uint16_t max_pkts)
+{
+   int nb_pkts = 0, ret = 0,

[dpdk-dev] [PATCH v3 3/9] examples/ipsec-secgw: add build option and cleanup

2016-06-09 Thread Sergio Gonzalez Monroy
Add support for building the application with DEBUG=1.
This option adds the compiler stack protection flag and enables extra
output in the application.

Also remove unnecessary VPATH setup.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/examples/ipsec-secgw/Makefile b/examples/ipsec-secgw/Makefile
index f9b59c2..6780ad5 100644
--- a/examples/ipsec-secgw/Makefile
+++ b/examples/ipsec-secgw/Makefile
@@ -46,8 +46,9 @@ ifeq ($(CONFIG_RTE_TOOLCHAIN_ICC),y)
 CFLAGS_sa.o += -diag-disable=vec
 endif

-
-VPATH += $(SRCDIR)/librte_ipsec
+ifeq ($(DEBUG),1)
+CFLAGS += -DIPSEC_DEBUG -fstack-protector-all
+endif

 #
 # all source are stored in SRCS-y
-- 
2.5.5



[dpdk-dev] [PATCH v3 2/9] examples/ipsec-secgw: fix stack smashing error

2016-06-09 Thread Sergio Gonzalez Monroy
Building the application with -O3 and -fstack-protection (default in
Ubuntu) results in the following error:

*** stack smashing detected ***: ./build/ipsec-secgw terminated

The error is caused by storing an 8B value in a 4B variable.

Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/ipsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index 4566d38..3ffa77a 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -46,7 +46,7 @@
 static inline int
 create_session(struct ipsec_ctx *ipsec_ctx __rte_unused, struct ipsec_sa *sa)
 {
-   uint32_t cdev_id_qp = 0;
+   unsigned long cdev_id_qp = 0;
int32_t ret;
struct cdev_key key = { 0 };

-- 
2.5.5



[dpdk-dev] [PATCH v3 1/9] examples/ipsec-secgw: fix esp padding check

2016-06-09 Thread Sergio Gonzalez Monroy
Current code fails to correctly check padding sequence for inbound
packets.
Padding sequence starts on 1 but it checks for 0.

Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/esp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/ipsec-secgw/esp.c b/examples/ipsec-secgw/esp.c
index 0f6b33e..7dce78c 100644
--- a/examples/ipsec-secgw/esp.c
+++ b/examples/ipsec-secgw/esp.c
@@ -139,7 +139,7 @@ esp4_tunnel_inbound_post_crypto(struct rte_mbuf *m, struct 
ipsec_sa *sa,

padding = pad_len - *pad_len;
for (i = 0; i < *pad_len; i++) {
-   if (padding[i] != i) {
+   if (padding[i] != i + 1) {
RTE_LOG(ERR, IPSEC_ESP, "invalid pad_len field\n");
return -EINVAL;
}
-- 
2.5.5



[dpdk-dev] [PATCH v3 0/9] IPSec Enhancements

2016-06-09 Thread Sergio Gonzalez Monroy
Update IPSec sample app with IPv6 and Transport mode support.

The series contains some bug fixes to facilitate patch merge.

v3:
 - change ipip_inbound function to reurn void
 - update some commit message and comments

v2:
 - rebase code
 - doc improvements
 - add missing image file

Sergio Gonzalez Monroy (9):
  examples/ipsec-secgw: fix esp padding check
  examples/ipsec-secgw: fix stack smashing error
  examples/ipsec-secgw: add build option and cleanup
  examples/ipsec-secgw: rework ipsec execution loop
  examples/ipsec-secgw: fix no sa found case
  examples/ipsec-secgw: consistent config variable names
  examples/ipsec-secgw: ipv6 support
  examples/ipsec-secgw: transport mode support
  doc: update ipsec sample guide

 doc/guides/sample_app_ug/img/ipsec_endpoints.svg | 850 +
 doc/guides/sample_app_ug/ipsec_secgw.rst | 910 ++-
 examples/ipsec-secgw/Makefile|   8 +-
 examples/ipsec-secgw/esp.c   | 210 --
 examples/ipsec-secgw/esp.h   |   9 +-
 examples/ipsec-secgw/ipip.h  | 149 +++-
 examples/ipsec-secgw/ipsec-secgw.c   | 332 ++---
 examples/ipsec-secgw/ipsec.c |  50 +-
 examples/ipsec-secgw/ipsec.h |  63 +-
 examples/ipsec-secgw/rt.c| 229 --
 examples/ipsec-secgw/sa.c| 466 +++-
 examples/ipsec-secgw/sp.c| 366 -
 examples/ipsec-secgw/sp4.c   | 447 +++
 examples/ipsec-secgw/sp6.c   | 448 +++
 14 files changed, 3313 insertions(+), 1224 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/img/ipsec_endpoints.svg
 delete mode 100644 examples/ipsec-secgw/sp.c
 create mode 100644 examples/ipsec-secgw/sp4.c
 create mode 100644 examples/ipsec-secgw/sp6.c

-- 
2.5.5



[dpdk-dev] [PATCH v2] mempool: fix local cache initialization

2016-06-09 Thread Sergio Gonzalez Monroy
The mempool local cache was not initialized properly leading to
undefined behavior in cases where the allocated memory was used
previously and left with data.

Fixes: 213af31e0960 ("mempool: reduce structure size if no cache needed")

Signed-off-by: Sergio Gonzalez Monroy 
---
 lib/librte_mempool/rte_mempool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index b54de43..216514c 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -787,7 +787,7 @@ rte_mempool_create_empty(const char *name, unsigned n, 
unsigned elt_size,

/* init the mempool structure */
mp = mz->addr;
-   memset(mp, 0, sizeof(*mp));
+   memset(mp, 0, MEMPOOL_HEADER_SIZE(mp, cache_size));
ret = snprintf(mp->name, sizeof(mp->name), "%s", name);
if (ret < 0 || ret >= (int)sizeof(mp->name)) {
rte_errno = ENAMETOOLONG;
-- 
2.4.11



[dpdk-dev] [PATCH] mempool: fix local cache initialization

2016-06-09 Thread Sergio Gonzalez Monroy
On 09/06/2016 09:03, Olivier Matz wrote:
> Hi Sergio,
>
> On 06/09/2016 09:57 AM, Sergio Gonzalez Monroy wrote:
>> Hi Olivier,
>>
>> On 08/06/2016 20:14, Olivier Matz wrote:
>>> Hi Sergio,
>>>
>>> Good catch, thanks. The patch looks ok, just few comments
>>> on the commit log:
>>>
>>> On 06/08/2016 05:10 PM, Sergio Gonzalez Monroy wrote:
>>>> The mempool local cache is not being initialize properly leading to
>>> 'initialize' -> 'initialized' ?
>>> and maybe 'is not being' -> 'was not' ?
>>>
>>>> undefined behavior in cases where the allocated memory was used and left
>>>> with data.
>>>>
>>>> Fixes: af75078fece3 ("first public release")
>>> I think it fixes this one instead:
>>>
>>> 213af31e0960 ("mempool: reduce structure size if no cache needed")
>> Fair enough, I thought the issue was there as we never
>> initialized/zeroed the local cache
>> on mempool creation. Usually we would have allocated all mempools on
>> init (or close)
>> and that would be it (initially all memory would be zeroed), but I think
>> you could still
>> manage to reproduce the problem if somehow you where to do something like:
>> rte_malloc(), rte_free(), rte_mempool_create() and the memory was the
>> one we got
>> with malloc and never gets zeroed again.
> Before Keith's commit (213af31e0960), the local cache was initialized
> when doing the memset() because it was included in the mempool
> structure. So I think the problem did not exist before this patch.
> Or did I miss something in your explanation?
>
> Regards,
> Olivier

You are spot on!

I did look at a wrong commit when checking for the old mempool struct.

Cheers,
Sergio


[dpdk-dev] [PATCH] mempool: fix local cache initialization

2016-06-09 Thread Sergio Gonzalez Monroy
Hi Olivier,

On 08/06/2016 20:14, Olivier Matz wrote:
> Hi Sergio,
>
> Good catch, thanks. The patch looks ok, just few comments
> on the commit log:
>
> On 06/08/2016 05:10 PM, Sergio Gonzalez Monroy wrote:
>> The mempool local cache is not being initialize properly leading to
> 'initialize' -> 'initialized' ?
> and maybe 'is not being' -> 'was not' ?
>
>> undefined behavior in cases where the allocated memory was used and left
>> with data.
>>
>> Fixes: af75078fece3 ("first public release")
> I think it fixes this one instead:
>
> 213af31e0960 ("mempool: reduce structure size if no cache needed")

Fair enough, I thought the issue was there as we never 
initialized/zeroed the local cache
on mempool creation. Usually we would have allocated all mempools on 
init (or close)
and that would be it (initially all memory would be zeroed), but I think 
you could still
manage to reproduce the problem if somehow you where to do something like:
rte_malloc(), rte_free(), rte_mempool_create() and the memory was the 
one we got
with malloc and never gets zeroed again.

Sergio



[dpdk-dev] [PATCH] mempool: fix local cache initialization

2016-06-08 Thread Sergio Gonzalez Monroy
The mempool local cache is not being initialize properly leading to
undefined behavior in cases where the allocated memory was used and left
with data.

Fixes: af75078fece3 ("first public release")

Signed-off-by: Sergio Gonzalez Monroy 
---
 lib/librte_mempool/rte_mempool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index b54de43..216514c 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -787,7 +787,7 @@ rte_mempool_create_empty(const char *name, unsigned n, 
unsigned elt_size,

/* init the mempool structure */
mp = mz->addr;
-   memset(mp, 0, sizeof(*mp));
+   memset(mp, 0, MEMPOOL_HEADER_SIZE(mp, cache_size));
ret = snprintf(mp->name, sizeof(mp->name), "%s", name);
if (ret < 0 || ret >= (int)sizeof(mp->name)) {
rte_errno = ENAMETOOLONG;
-- 
2.4.11



[dpdk-dev] [PATCH v2] eal: out-of-bounds write

2016-06-08 Thread Sergio Gonzalez Monroy

I missed this patch at the time!

On 27/04/2016 12:29, Slawomir Mrozowicz wrote:
> Fix issue reported by Coverity.
>
> Coverity ID 13282: Out-of-bounds write
> overrun-local: Overrunning array mcfg->memseg of 256 44-byte elements
> at element index 257 using index j.
>
> Fixes: af75078fece3 ("first public release")
>
> Signed-off-by: Slawomir Mrozowicz 
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 5b9132c..715bd52 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1333,8 +1333,11 @@ rte_eal_hugepage_init(void)
>   
>   if (new_memseg) {
>   j += 1;
> - if (j == RTE_MAX_MEMSEG)
> + if (j >= RTE_MAX_MEMSEG) {
> + RTE_LOG(ERR, EAL,
> + "Failed: memseg reached 
> RTE_MAX_MEMSEG\n");
>   break;
> + }
>   
>   mcfg->memseg[j].phys_addr = hugepage[i].physaddr;
>   mcfg->memseg[j].addr = hugepage[i].final_va;

As Bruce was suggesting in his comment to the v1, it's more helpful to 
do a check before the loop
and print a message distinguishing the error case, something along the 
lines of: "all memsegs
used by ivshmem. Please either increase", returning with -ENOMEM error.

Sergio


[dpdk-dev] [PATCH] eal/linux: fix undefined allocation of 0 bytes (CERT MEM04-C; CWE-131)

2016-06-08 Thread Sergio Gonzalez Monroy
On 27/04/2016 18:06, Daniel Mrzyglod wrote:
> Fix issue reported by clang scan-build
>
> there is a chance that nr_hugepages will be 0 if conditions for loop
> for (i = 0; i < (int) internal_config.num_hugepage_sizes; i++)
> will be unmeet.
>
> Fixes: b6a468ad41d5 ("memory: add --socket-mem option")
>
> Signed-off-by: Daniel Mrzyglod 
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 5b9132c..e94538e 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1114,6 +1114,8 @@ rte_eal_hugepage_init(void)
>* processing done on these pages, shared memory will be created
>* at a later stage.
>*/
> + if (nr_hugepages == 0)
> + goto fail;
>   tmp_hp = malloc(nr_hugepages * sizeof(struct hugepage_file));
>   if (tmp_hp == NULL)
>   goto fail;

The behavior of malloc(0) is implementation-defined, but on Linux man 
page it says that returns NULL.
So strictly speaking, without the patch the outcome is the same cause 
malloc(0) will return NULL.

Now, I'd consider the patch not needed but it doesn't really harm either.
Anyone else has comments/thoughts about it?

Regarding the patch itself, I think the title and commit message need to 
be modify to reflect that the patch
goal is to handle nr_hugepages = 0 case without relying in malloc to 
return NULL.

Sergio




[dpdk-dev] [PATCH v5] eal: fix allocating all free hugepages

2016-06-08 Thread Sergio Gonzalez Monroy
On 31/05/2016 04:37, Jianfeng Tan wrote:
> EAL memory init allocates all free hugepages of the whole system,
> which seen from sysfs, even when applications do not ask so many.
> When there is a limitation on how many hugepages an application can
> use (such as cgroup.hugetlb), or hugetlbfs is specified with an
> option of size (exceeding the quota of the fs), it just fails to
> start even there are enough hugepages allocated.
>
> To fix above issue, this patch:
>   - Changes the logic to continue memory init to see if hugetlb
> requirement of application can be addressed by already allocated
> hugepages.
>   - To make sure each hugepage is allocated successfully, we add a
> recover mechanism, which relies on a mem access to fault-in
> hugepages, and if it fails with SIGBUS, recover to previously
> saved stack environment with siglongjmp().
>
> For the case of CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS (enabled by
> default when compiling IVSHMEM target), it's indispensable to
> mapp all free hugepages in the system. Under this case, it fails
> to start when allocating fails.
>
> Test example:
>a. cgcreate -g hugetlb:/test-subgroup
>b. cgset -r hugetlb.1GB.limit_in_bytes=2147483648 test-subgroup
>c. cgexec -g hugetlb:test-subgroup \
>./examples/helloworld/build/helloworld -c 0x2 -n 4
>
> 
> Fixes: af75078fece ("first public release")
>
> Signed-off-by: Jianfeng Tan
> Acked-by: Neil Horman
> ---
> v5:
>   - Make this method as default instead of using an option.
>   - When SIGBUS is triggered in the case of RTE_EAL_SINGLE_FILE_SEGMENTS,
> just return error.
>   - Add prefix "huge_" to newly added function and static variables.
>   - Move the internal_config.memory assignment after the page allocations.
> v4:
>   - Change map_all_hugepages to return unsigned instead of int.
> v3:
>   - Reword commit message to include it fixes the hugetlbfs quota issue.
>   - setjmp -> sigsetjmp.
>   - Fix RTE_LOG complaint from ERR to DEBUG as it does not mean init error
> so far.
>   - Fix the second map_all_hugepages's return value check.
> v2:
>   - Address the compiling error by move setjmp into a wrap method.
>
>   lib/librte_eal/linuxapp/eal/eal.c    |  20 -
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 138 
> ---
>   2 files changed, 125 insertions(+), 33 deletions(-)
>

Acked-by: Sergio Gonzalez Monroy 


[dpdk-dev] [PATCH] examples/ipsec-secgw: wrong spi read from packet

2016-06-07 Thread Sergio Gonzalez Monroy
On 07/06/2016 13:17, Slawomir Mrozowicz wrote:
> In ipsec-secgw wrong SPI number is read from incoming ESP packet.
> The problem exist inside function inbound_sa_lookup().
> The SPI is read from mbuf where the information is stored in big-endian.
> In low-endian environment the value is erroneous.
> Fixed by add conversion rte_be_to_cpu_32().
>
> Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")
>
> Signed-off-by: Slawomir Mrozowicz 
> ---

This is a bug, but I don't think it is the right fix.

Anyway, the code has change with the last patch set [1] and the bug is 
not present anymore.

[1] http://dpdk.org/ml/archives/dev/2016-May/039270.html

Sergio


[dpdk-dev] [PATCH] examples/ipsec-secgw: Calling risky function

2016-06-07 Thread Sergio Gonzalez Monroy
On 07/06/2016 09:58, Slawomir Mrozowicz wrote:
> lrand48 should not be used for security related applications,
> as linear congruential algorithms are too easy to break.
> Used a compliant random number generator /dev/urandom.
>
> Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")
> Coverity ID 124558
>
> Signed-off-by: Slawomir Mrozowicz 
> ---

I understand that lrand48 is not crypto secure, but this fix will kill 
performance.

I already have a solution for this issue to be included in the next 
IPSec patch set
that will also add support for GCM/CTR modes.

Sergio



[dpdk-dev] unable to bind to vfio_pci inside RHEL virtual machine.

2016-05-24 Thread Sergio Gonzalez Monroy
Hi Tao,

On 23/05/2016 14:39, DING, TAO wrote:
> Hello dpdk dev,
>
> Do you know if the vfio_pci can be bind to network interface from within 
> RedHat virtual machine ? I read the doc that igb_uio  should not be used ; it 
> is not stable. (http://people.redhat.com/~pmatilai/dpdk-guide/index.html) 
> however I cannot use vfio_pci driver from inside VM.
>
> Currently I am working on a project that migrating a network package capture 
> application into virtual machines so that it can be hosted on  cloud. My 
> intent is using SR-IOR to ensure the data sending from physical NIC to vNIC  
> in line speed; using DPDK inside VM to read data from vNIC to get good 
> performance because the libpcap does not perform well  inside VM.
>
> Following dpdk instruction, I was able to set up the SR-IOV and bind the 
> vfio_pci to virtual Function on the host. Once the VM starts, the Virtual 
> Functions bind to vfio-pci automatically on the host. . The following is the 
> output from host.
> Option: 22
>
>
> Network devices using DPDK-compatible driver
> 
> :04:10.4 'X540 Ethernet Controller Virtual Function' drv=vfio-pci unused=
> :04:10.6 'X540 Ethernet Controller Virtual Function' drv=vfio-pci unused=
> :04:11.4 'X540 Ethernet Controller Virtual Function' drv=vfio-pci unused=
> :04:11.6 'X540 Ethernet Controller Virtual Function' drv=vfio-pci unused=
>
> Network devices using kernel driver
> ===
> :01:00.0 'NetXtreme BCM5720 Gigabit Ethernet PCIe' if=em1 drv=tg3 
> unused=vfio-pci
> :01:00.1 'NetXtreme BCM5720 Gigabit Ethernet PCIe' if=em2 drv=tg3 
> unused=vfio-pci
> :02:00.0 'NetXtreme BCM5720 Gigabit Ethernet PCIe' if=em3 drv=tg3 
> unused=vfio-pci
> :02:00.1 'NetXtreme BCM5720 Gigabit Ethernet PCIe' if=em4 drv=tg3 
> unused=vfio-pci
> :04:00.0 'Ethernet Controller 10-Gigabit X540-AT2' if=p3p1 drv=ixgbe 
> unused=vfio-pci *Active*
> :04:00.1 'Ethernet Controller 10-Gigabit X540-AT2' if=p3p2 drv=ixgbe 
> unused=vfio-pci
> :04:10.0 'X540 Ethernet Controller Virtual Function' if=p3p1_0 
> drv=ixgbevf unused=vfio-pci
> :04:10.2 'X540 Ethernet Controller Virtual Function' if=p3p1_1 
> drv=ixgbevf unused=vfio-pci
> :04:11.0 'X540 Ethernet Controller Virtual Function' if=p3p1_4 
> drv=ixgbevf unused=vfio-pci
> :04:11.2 'X540 Ethernet Controller Virtual Function' if=p3p1_5 
> drv=ixgbevf unused=vfio-pci
>
> I repeated the same set up within the VM which has 4 Virtual Functions 
> assigned to it, I could not successfully bind any of network devices  to 
> vfio-pci. I followed different suggestion from the web , but no luck.   
> (however I was able to bind UIO driver to the network devices inside the VM)
> One difference I noticed between VM and host is the outcome of IOMMU setting. 
>  On the host, the /sys/kernel/iommu_groups/ is NOT empty , but on the VM , it 
> is empty. I rebooted VM several time. There's not luck.

AFAIK VFIO is not supported in a guest.
https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg04284.html

So you are left with two options, VFIO no-IOMMU or igb_uio, none of them 
safe.
If you have Linux kernel +4.5 and DPDK +16.04, you could use VFIO 
no-IOMMU inside the VM. Otherwise, you are left with igb_uio.
IMHO the main difference is that igb_uio is an out-of-tree kernel module.

Sergio



[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order

2016-05-20 Thread Sergio Gonzalez Monroy
On 20/05/2016 09:41, Chao Zhu wrote:
> Sergio,
>
> The step 4 will not fail because each huge page will get an virtual address 
> finally, though it's a different address. If you take a look at the function 
> rte_eal_hugepage_init(), in the last loop, it uses both physical address and 
> virtual address to determine a new memory segment. This step can make sure 
> that the initialization is correct. What I want to say is, this bug also 
> influence the secondary process in function rte_eal_hugepage_attach(). It can 
> make the secondary process fail to init. I'm trying to figure out how to make 
> it work.

You are right, I misread the code.

So basically because mmap ignores the hint to mmap on the requested address,
by default we get VA maps in decreasing address order.

Knowing that, PPC orders pages by decreasing physical address order so when
this happens we actually get hugepages in order in the "new" final_va.

Not sure if that makes sense but I think I understand where you are 
coming from.

I think we need to document this as know issue and/or add comments regarding
this behavior , basically calling out that all this "reverse-ordering" 
is required
because mmap fails to map on the requested VA.

Thanks,
Sergio

> -Original Message-
> From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.monroy at intel.com]
> Sent: 2016?5?20? 16:01
> To: Chao Zhu ; 'Bruce Richardson' 
> 
> Cc: 'Gowrishankar' ; dev at dpdk.org; 
> 'David Marchand' 
> Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map 
> hugepages in correct order
>
> On 20/05/2016 04:03, Chao Zhu wrote:
>> Bruce,
>>
>> Recently, we find some bugs with mmap in PowerLinux. The mmap doesn't
>> respect the address hints. In function get_virtual_area() in
>> eal_memory.c, mmap get the free virtual address range as the address
>> hint. However, when mapping the real memory in
>> rte_eal_hugepage_init(), mmap doesn't return the same address as the
>> requested address. When taking a look at the /proc//maps, the
>> requested address range is free for use. With this bug, pre-allocate some 
>> free space doesn't work.
> Hi Chao,
>
> If I understand you correctly, the issue you are describing would cause DPDK 
> to fail initialization even with the reverse reordering that you are doing 
> for PPC.
>
> Basically (just showing relevant initialization steps):
> 1. map_all_hugepages(..., orig=1)
>   - map all hugepages
> 2. find physical address for each hugepage 3. sort by physical address 4. 
> map_all_hugepages(..., orig=0)
>   - Now we try to get big chunk of virtual address for a block of contig 
> hugepages
>  so we know we have that virtual address chunk available.
>   - Then we try to remap each page of that block of contig pages into that
>  virtual address chunk.
>
> So the issue you are describing would make step 4 fail regardless of the 
> different ordering that PPC does.
> I'm probably missing something, would you care to elaborate?
>
> Sergio
>
>
>> We're trying to create some test case and report it as a bug to kernel
>> community.
>>
>> Here's some logs:
>> ===
>> EAL: Ask a virtual area of 0x1000 bytes
>> EAL: Virtual area found at 0x3fffa700 (size = 0x1000)
>> EAL: map_all_hugepages, /mnt/huge/rtemap_52,paddr 0x3ca600
>> requested
>> addr: 0x3fffa700  mmaped addr: 0x3efff000
>> EAL: map_all_hugepages, /mnt/huge/rtemap_53,paddr 0x3ca500
>> requested
>> addr: 0x3fffa800  mmaped addr: 0x3effef00
>> EAL: map_all_hugepages, /mnt/huge/rtemap_54,paddr 0x3ca400
>> requested
>> addr: 0x3fffa900  mmaped addr: 0x3effee00
>> EAL: map_all_hugepages, /mnt/huge/rtemap_55,paddr 0x3ca300
>> requested
>> addr: 0x3fffaa00  mmaped addr: 0x3effed00
>> EAL: map_all_hugepages, /mnt/huge/rtemap_56,paddr 0x3ca200
>> requested
>> addr: 0x3fffab00  mmaped addr: 0x3effec00
>> EAL: map_all_hugepages, /mnt/huge/rtemap_57,paddr 0x3ca100
>> requested
>> addr: 0x3fffac00  mmaped addr: 0x3effeb00
>> EAL: map_all_hugepages, /mnt/huge/rtemap_58,paddr 0x3ca000
>> requested
>> addr: 0x3fffad00  mmaped addr: 0x3effea00
>> EAL: map_all_hugepages, /mnt/huge/rtemap_59,paddr 0x3c9f00
>> requested
>> addr: 0x3fffae00  mmaped addr: 0x3effe900
>> EAL: map_all_hugepages, /mnt/huge/rtemap_60,paddr 0x3c9e00
>> requested
>> addr: 0x3fffaf00  mmaped addr: 0x3effe800
>> EAL: map_all_hugepages, /mnt/huge/rtemap_61,paddr 0x3c9d00
>> requested
>> addr: 0x3fffb0

[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order

2016-05-20 Thread Sergio Gonzalez Monroy
fffb7e5-3fffb7e7 r-xp  08:32 7090563
> /opt/at7.1/lib64/power8/libpthread-2.19.so
> 3fffb7e7-3fffb7e8 rw-p 0001 08:32 7090563
> /opt/at7.1/lib64/power8/libpthread-2.19.so
> 3fffb7e8-3fffb7e9 r-xp  08:32 7090210
> /opt/at7.1/lib64/libdl-2.19.so
> 3fffb7e9-3fffb7ea rw-p  08:32 7090210
> /opt/at7.1/lib64/libdl-2.19.so
> 3fffb7ea-3fffb7ec r-xp  08:32 7090533
> /opt/at7.1/lib64/power8/libz.so.1.2.6
> 3fffb7ec-3fffb7ed rw-p 0001 08:32 7090533
> /opt/at7.1/lib64/power8/libz.so.1.2.6
> 3fffb7ed-3fffb7f9 r-xp  08:32 7090568
> /opt/at7.1/lib64/power8/libm-2.19.so
> 3fffb7f9-3fffb7fa rw-p 000b 08:32 7090568
> /opt/at7.1/lib64/power8/libm-2.19.so
> 3fffb7fa-3fffb7fc r-xp  00:00 0
> [vdso]
> 3fffb7fc-3fffb7ff r-xp  08:32 7090048
> /opt/at7.1/lib64/ld-2.19.so
> 3fffb7ff-3fffb800 rw-p 0002 08:32 7090048
> /opt/at7.1/lib64/ld-2.19.so
> 3ffd-4000 rw-p  00:00 0
> [stack]
>
>
> -Original Message-
> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> Sent: 2016?3?23? 1:11
> To: Sergio Gonzalez Monroy 
> Cc: Gowrishankar ; dev at dpdk.org;
> chaozhu at linux.vnet.ibm.com; David Marchand 
> Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map
> hugepages in correct order
>
> On Tue, Mar 22, 2016 at 04:35:32PM +, Sergio Gonzalez Monroy wrote:
>> First of all, forgive my ignorance regarding ppc64 and if the
>> questions are naive but after having a look to the already existing
>> code for ppc64 and this patch now, why are we doing this reverse
>> mapping at all?
>>
>> I guess the question revolves around the comment in eal_memory.c:
>> 1316 /* On PPC64 architecture, the mmap always start from
>> higher
>> 1317  * virtual address to lower address. Here, both the
>> physical
>> 1318  * address and virtual address are in descending
> order
>> */
>>
>>  From looking at the code, for ppc64 we do qsort in reverse order and
>> thereafter everything looks to be is done to account for that reverse
>> sorting.
>>
>> CC: Chao Zhu and David Marchand as original author and reviewer of the
> code.
>> Sergio
>>
> Just to add my 2c here. At one point, with I believe some i686 installs -
> don't remember the specific OS/kernel, we found that the mmap calls were
> returning the highest free address first and then working downwards - must
> like seems to be described here. To fix this we changed the mmap code from
> assuming that addresses are mapped upwards, to instead explicitly requesting
> a large free block of memory (mmap of /dev/zero) to find a free address
> space range of the correct size, and then explicitly mmapping each
> individual page to the appropriate place in that free range. With this
> scheme it didn't matter whether the OS tried to mmap the pages from the
> highest or lowest address because we always told the OS where to put the
> page (and we knew the slot was free from the earlier block mmap).
> Would this scheme not also work for PPC in a similar way? (Again, forgive
> unfamiliarity with PPC! :-) )
>
> /Bruce
>
>> On 07/03/2016 14:13, Gowrishankar wrote:
>>> From: Gowri Shankar 
>>>
>>> For a secondary process address space to map hugepages from every
>>> segment of primary process, hugepage_file entries has to be mapped
>>> reversely from the list that primary process updated for every
>>> segment. This is for a reason that, in ppc64, hugepages are sorted for
> decrementing addresses.
>>> Signed-off-by: Gowrishankar 
>>> ---



[dpdk-dev] [PATCH v2 9/9] doc: update ipsec sample guide

2016-05-18 Thread Sergio Gonzalez Monroy
Signed-off-by: Sergio Gonzalez Monroy 
---
 doc/guides/sample_app_ug/img/ipsec_endpoints.svg | 850 +
 doc/guides/sample_app_ug/ipsec_secgw.rst | 910 ++-
 2 files changed, 1400 insertions(+), 360 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/img/ipsec_endpoints.svg

diff --git a/doc/guides/sample_app_ug/img/ipsec_endpoints.svg 
b/doc/guides/sample_app_ug/img/ipsec_endpoints.svg
new file mode 100644
index 000..e4aba4c
--- /dev/null
+++ b/doc/guides/sample_app_ug/img/ipsec_endpoints.svg
@@ -0,0 +1,850 @@
+
+
+
+http://www.openswatchbook.org/uri/2009/osb;
+   xmlns:dc="http://purl.org/dc/elements/1.1/;
+   xmlns:cc="http://creativecommons.org/ns#;
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#;
+   xmlns:svg="http://www.w3.org/2000/svg;
+   xmlns="http://www.w3.org/2000/svg;
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd;
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape;
+   width="155.68507mm"
+   height="76.061203mm"
+   viewBox="0 0 551.64003 269.50821"
+   id="svg2"
+   version="1.1"
+   inkscape:version="0.91 r13725"
+   sodipodi:docname="endpoints.svg">
+  
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+
+
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
+  
+  
+
+  
+image/svg+xml
+http://purl.org/dc/dcmitype/StillImage; />
+
+  
+
+  
+  
+
+
+
+
+ep0
+ep1
+traffic gen
+traffic gen
+
+
+
+
+
+2
+3
+2
+3
+0
+1
+
+0
+1
+
+UNPROTECTEDcipher-text
+PROTECTEDclear-text
+
+
+outbound
+inbound
+
+
+
+
+
+
+  
+
diff --git a/doc/guides/sample_app_ug/ipsec_secgw.rst 
b/doc/guides/sample_app_ug/ipsec_secgw.rst
index c11c7e7..66dd326 100644
--- a/doc/guides/sample_app_ug/ipsec_secgw.rst
+++ b/doc/guides/sample_app_ug/ipsec_secgw.rst
@@ -38,165 +38,171 @@ Overview
 

 The application demonstrates the implementation of a Security Gateway
-(not IPsec compliant, see Constraints bellow) using DPDK based on RFC4301,
+(not IPsec compliant, see the Constraints section below) using DPDK based on 
RFC4301,
 RFC4303, RFC3602 and RFC2404.

 Internet Key Exchange (IKE) is not implemented, so only manual setting of
 Security Policies and Security Associations is supported.

 The Security Policies (SP) are implemented as ACL rules, the Security
-Associations (SA) are stored in a table and the Routing is implemented
+Associations (SA) are stored in a table and the routing is implemented
 using LPM.

-The application classify the ports between Protected and Unprotected.
-Thus, traffic received in an Unprotected or Protected port is consider
+The application classifies the ports as *Protected* and *Unprotected*.
+Thus, traffic received on an Unprotected or Protected port is consider
 Inbound or Outbound respectively.

-Path for IPsec Inbound traffic:
+The Path for IPsec Inbound traffic is:

-*  Read packets from the port
+*  Read packets from the port.
 *  Classify packets between IPv4 and ESP.
-*  Inbound SA lookup for ESP packets based on their SPI
-*  Verification/Decryption
-*  Removal of ESP and outer IP header
-*  Inbound SP check using ACL of decrypted packets and any other IPv4 packet
-   we read.
-*  Routing
-*  Write packet to port
-
-Path for IPsec Outbound traffic:
-
-*  Read packets from the port
-*  Outbound SP check using ACL of all IPv4 traffic
-*  Outbound SA lookup for packets that need IPsec protection
-*  Add ESP and outer IP header
-*  Encryption/Digest
-*  Routing
-*  Write packet to port
+*  Perform Inbound SA lookup for ESP packets based on their SPI.
+*  Perform Verification/Decryption.
+*  Remove ESP and outer IP header
+*  Inbound SP check using ACL of decrypted packets and any other IPv4 packets.
+*  Routing.
+*  Write packet to port.
+
+The Path for the IPsec Outbound traffic is:
+
+*  Read packets from the port.
+*  Perform Outbound SP check using ACL of all IPv4 traffic.
+*  Perform Outbound SA lookup for packets that need IPsec protection.
+*  Add ESP and outer IP header.
+*  Perform Encryption/Digest.
+*  Routing.
+*  Write packet to port.
+

 Constraints
 ---
-*  IPv4 traffic
-*  ESP tunnel mode
-*  EAS-CBC, HMAC-SHA1 and NULL
-*  Each SA must be ha

[dpdk-dev] [PATCH v2 8/9] examples/ipsec-secgw: transport mode support

2016-05-18 Thread Sergio Gonzalez Monroy
IPSec transport mode support.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/esp.c   | 124 ++-
 examples/ipsec-secgw/ipsec.h |   1 +
 examples/ipsec-secgw/rt.c|  32 +++
 examples/ipsec-secgw/sa.c|  39 ++
 examples/ipsec-secgw/sp4.c   |  40 ++
 examples/ipsec-secgw/sp6.c   |  48 +
 6 files changed, 248 insertions(+), 36 deletions(-)

diff --git a/examples/ipsec-secgw/esp.c b/examples/ipsec-secgw/esp.c
index e1fde36..05caa77 100644
--- a/examples/ipsec-secgw/esp.c
+++ b/examples/ipsec-secgw/esp.c
@@ -42,7 +42,6 @@
 #include 

 #include 
-#include 
 #include 
 #include 
 #include 
@@ -70,21 +69,24 @@ int
 esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
-   int32_t payload_len, ip_hdr_len;
+   struct ip *ip4;
struct rte_crypto_sym_op *sym_cop;
+   int32_t payload_len, ip_hdr_len;

RTE_ASSERT(m != NULL);
RTE_ASSERT(sa != NULL);
RTE_ASSERT(cop != NULL);

-   ip_hdr_len = 0;
-   switch (sa->flags) {
-   case IP4_TUNNEL:
-   ip_hdr_len = sizeof(struct ip);
-   break;
-   case IP6_TUNNEL:
+   ip4 = rte_pktmbuf_mtod(m, struct ip *);
+   if (likely(ip4->ip_v == IPVERSION))
+   ip_hdr_len = ip4->ip_hl * 4;
+   else if (ip4->ip_v == IP6_VERSION)
+   /* XXX No option headers supported */
ip_hdr_len = sizeof(struct ip6_hdr);
-   break;
+   else {
+   RTE_LOG(ERR, IPSEC_ESP, "invalid IP packet type %d\n",
+   ip4->ip_v);
+   return -EINVAL;
}

payload_len = rte_pktmbuf_pkt_len(m) - ip_hdr_len -
@@ -126,6 +128,8 @@ int
 esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
+   struct ip *ip4, *ip;
+   struct ip6_hdr *ip6;
uint8_t *nexthdr, *pad_len;
uint8_t *padding;
uint16_t i;
@@ -135,7 +139,7 @@ esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
RTE_ASSERT(cop != NULL);

if (cop->status != RTE_CRYPTO_OP_STATUS_SUCCESS) {
-   RTE_LOG(ERR, IPSEC_ESP, "Failed crypto op\n");
+   RTE_LOG(ERR, IPSEC_ESP, "failed crypto op\n");
return -1;
}

@@ -146,7 +150,7 @@ esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
padding = pad_len - *pad_len;
for (i = 0; i < *pad_len; i++) {
if (padding[i] != i + 1) {
-   RTE_LOG(ERR, IPSEC_ESP, "invalid pad_len field\n");
+   RTE_LOG(ERR, IPSEC_ESP, "invalid padding\n");
return -EINVAL;
}
}
@@ -157,7 +161,23 @@ esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
return -EINVAL;
}

-   ipip_inbound(m, sizeof(struct esp_hdr) + sa->iv_len);
+   if (unlikely(sa->flags == TRANSPORT)) {
+   ip = rte_pktmbuf_mtod(m, struct ip *);
+   ip4 = (struct ip *)rte_pktmbuf_adj(m,
+   sizeof(struct esp_hdr) + sa->iv_len);
+   if (likely(ip->ip_v == IPVERSION)) {
+   memmove(ip4, ip, ip->ip_hl * 4);
+   ip4->ip_p = *nexthdr;
+   ip4->ip_len = htons(rte_pktmbuf_data_len(m));
+   } else {
+   ip6 = (struct ip6_hdr *)ip4;
+   /* XXX No option headers supported */
+   memmove(ip6, ip, sizeof(struct ip6_hdr));
+   ip6->ip6_nxt = *nexthdr;
+   ip6->ip6_plen = htons(rte_pktmbuf_data_len(m));
+   }
+   } else
+   ipip_inbound(m, sizeof(struct esp_hdr) + sa->iv_len);

return 0;
 }
@@ -166,31 +186,57 @@ int
 esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
-   uint16_t pad_payload_len, pad_len, ip_hdr_len;
struct ip *ip4;
struct ip6_hdr *ip6;
-   struct esp_hdr *esp;
-   int32_t i;
-   char *padding;
+   struct esp_hdr *esp = NULL;
+   uint8_t *padding, *new_ip, nlp;
struct rte_crypto_sym_op *sym_cop;
+   int32_t i;
+   uint16_t pad_payload_len, pad_len, ip_hdr_len;

RTE_ASSERT(m != NULL);
RTE_ASSERT(sa != NULL);
RTE_ASSERT(cop != NULL);

-   /* Payload length */
-   pad_payload_len = RTE_ALIGN_CEIL(rte_pktmbuf_pkt_len(m) + 2,
-   sa->block_size);
-   pad_len = pad_payload_len - rte_pktmbuf_pkt_len(m);
-
ip_hdr_len = 0;
-   switch (sa->flags) {
-   case IP4_TUNNEL:
+
+   ip4 = rte_pktmbuf_mtod(m, struct ip *);
+   if (likely(ip4->ip_v == IPVERSION))

[dpdk-dev] [PATCH v2 7/9] examples/ipsec-secgw: ipv6 support

2016-05-18 Thread Sergio Gonzalez Monroy
Support IPSec IPv6 allowing IPv4/IPv6 traffic in IPv4 or IPv6 tunnel.

We need separate Routing (LPM) and SP (ACL) tables for IPv4 and IPv6,
but a common SA table.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/Makefile  |   5 +-
 examples/ipsec-secgw/esp.c | 128 +++-
 examples/ipsec-secgw/ipip.h| 145 ++---
 examples/ipsec-secgw/ipsec-secgw.c | 332 --
 examples/ipsec-secgw/ipsec.c   |   9 +-
 examples/ipsec-secgw/ipsec.h   |  60 --
 examples/ipsec-secgw/rt.c  | 201 +-
 examples/ipsec-secgw/sa.c  | 239 ++
 examples/ipsec-secgw/sp.c  | 366 -
 examples/ipsec-secgw/sp4.c | 407 +
 examples/ipsec-secgw/sp6.c | 400 
 11 files changed, 1582 insertions(+), 710 deletions(-)
 delete mode 100644 examples/ipsec-secgw/sp.c
 create mode 100644 examples/ipsec-secgw/sp4.c
 create mode 100644 examples/ipsec-secgw/sp6.c

diff --git a/examples/ipsec-secgw/Makefile b/examples/ipsec-secgw/Makefile
index 6780ad5..06b6db1 100644
--- a/examples/ipsec-secgw/Makefile
+++ b/examples/ipsec-secgw/Makefile
@@ -47,7 +47,7 @@ CFLAGS_sa.o += -diag-disable=vec
 endif

 ifeq ($(DEBUG),1)
-CFLAGS += -DIPSEC_DEBUG -fstack-protector-all
+CFLAGS += -DIPSEC_DEBUG -fstack-protector-all -O0
 endif

 #
@@ -55,7 +55,8 @@ endif
 #
 SRCS-y += ipsec.c
 SRCS-y += esp.c
-SRCS-y += sp.c
+SRCS-y += sp4.c
+SRCS-y += sp6.c
 SRCS-y += sa.c
 SRCS-y += rt.c
 SRCS-y += ipsec-secgw.c
diff --git a/examples/ipsec-secgw/esp.c b/examples/ipsec-secgw/esp.c
index b423080..e1fde36 100644
--- a/examples/ipsec-secgw/esp.c
+++ b/examples/ipsec-secgw/esp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 

@@ -50,13 +51,11 @@
 #include "esp.h"
 #include "ipip.h"

-#define IP_ESP_HDR_SZ (sizeof(struct ip) + sizeof(struct esp_hdr))
-
 static inline void
 random_iv_u64(uint64_t *buf, uint16_t n)
 {
-   unsigned left = n & 0x7;
-   unsigned i;
+   uint32_t left = n & 0x7;
+   uint32_t i;

RTE_ASSERT((n & 0x3) == 0);

@@ -67,20 +66,29 @@ random_iv_u64(uint64_t *buf, uint16_t n)
*((uint32_t *)[i]) = (uint32_t)lrand48();
 }

-/* IPv4 Tunnel */
 int
 esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
-   int32_t payload_len;
+   int32_t payload_len, ip_hdr_len;
struct rte_crypto_sym_op *sym_cop;

RTE_ASSERT(m != NULL);
RTE_ASSERT(sa != NULL);
RTE_ASSERT(cop != NULL);

-   payload_len = rte_pktmbuf_pkt_len(m) - IP_ESP_HDR_SZ - sa->iv_len -
-   sa->digest_len;
+   ip_hdr_len = 0;
+   switch (sa->flags) {
+   case IP4_TUNNEL:
+   ip_hdr_len = sizeof(struct ip);
+   break;
+   case IP6_TUNNEL:
+   ip_hdr_len = sizeof(struct ip6_hdr);
+   break;
+   }
+
+   payload_len = rte_pktmbuf_pkt_len(m) - ip_hdr_len -
+   sizeof(struct esp_hdr) - sa->iv_len - sa->digest_len;

if ((payload_len & (sa->block_size - 1)) || (payload_len <= 0)) {
RTE_LOG(DEBUG, IPSEC_ESP, "payload %d not multiple of %u\n",
@@ -91,21 +99,19 @@ esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
sym_cop = (struct rte_crypto_sym_op *)(cop + 1);

sym_cop->m_src = m;
-   sym_cop->cipher.data.offset = IP_ESP_HDR_SZ + sa->iv_len;
+   sym_cop->cipher.data.offset =  ip_hdr_len + sizeof(struct esp_hdr) +
+   sa->iv_len;
sym_cop->cipher.data.length = payload_len;

sym_cop->cipher.iv.data = rte_pktmbuf_mtod_offset(m, void*,
-   IP_ESP_HDR_SZ);
+ip_hdr_len + sizeof(struct esp_hdr));
sym_cop->cipher.iv.phys_addr = rte_pktmbuf_mtophys_offset(m,
-   IP_ESP_HDR_SZ);
+ip_hdr_len + sizeof(struct esp_hdr));
sym_cop->cipher.iv.length = sa->iv_len;

-   sym_cop->auth.data.offset = sizeof(struct ip);
-   if (sa->auth_algo == RTE_CRYPTO_AUTH_AES_GCM)
-   sym_cop->auth.data.length = sizeof(struct esp_hdr);
-   else
-   sym_cop->auth.data.length = sizeof(struct esp_hdr) +
-   sa->iv_len + payload_len;
+   sym_cop->auth.data.offset = ip_hdr_len;
+   sym_cop->auth.data.length = sizeof(struct esp_hdr) +
+   sa->iv_len + payload_len;

sym_cop->auth.digest.data = rte_pktmbuf_mtod_offset(m, void*,
rte_pktmbuf_pkt_len(m) - sa->digest_len);
@@ -151,17 +157,20 @@ esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
return -EINVAL;
}

-   return ip4ip_inbound(m, sizeof(struct esp_hdr)

[dpdk-dev] [PATCH v2 5/9] examples/ipsec-secgw: fix no sa found case

2016-05-18 Thread Sergio Gonzalez Monroy
The application only checks that an SA shoudln't be NULL through ASSERT
(only when debugging is enabled) without properly dealing with the case
of not having an SA for the processed packet.

If no SA is found, drop the packet.

Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/ipsec.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index 90a9a86..ccc840f 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -110,6 +110,11 @@ ipsec_enqueue(ipsec_xform_fn xform_func, struct ipsec_ctx 
*ipsec_ctx,
struct ipsec_sa *sa;

for (i = 0; i < nb_pkts; i++) {
+   if (unlikely(sas[i] == NULL)) {
+   rte_pktmbuf_free(pkts[i]);
+   continue;
+   }
+
rte_prefetch0(sas[i]);
rte_prefetch0(pkts[i]);

@@ -117,8 +122,6 @@ ipsec_enqueue(ipsec_xform_fn xform_func, struct ipsec_ctx 
*ipsec_ctx,
sa = sas[i];
priv->sa = sa;

-   RTE_ASSERT(sa != NULL);
-
priv->cop.type = RTE_CRYPTO_OP_TYPE_SYMMETRIC;

rte_prefetch0(>sym_cop);
-- 
2.5.5



[dpdk-dev] [PATCH v2 4/9] examples/ipsec-secgw: rework ipsec execution loop

2016-05-18 Thread Sergio Gonzalez Monroy
Rework implementation moving from function pointers approach, where each
function implements very specific functionality, to a generic function
approach.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/esp.c   |   8 +-
 examples/ipsec-secgw/esp.h   |   9 +-
 examples/ipsec-secgw/ipsec.c |  36 --
 examples/ipsec-secgw/ipsec.h |   2 -
 examples/ipsec-secgw/sa.c| 272 ++-
 5 files changed, 145 insertions(+), 182 deletions(-)

diff --git a/examples/ipsec-secgw/esp.c b/examples/ipsec-secgw/esp.c
index 7dce78c..b423080 100644
--- a/examples/ipsec-secgw/esp.c
+++ b/examples/ipsec-secgw/esp.c
@@ -69,7 +69,7 @@ random_iv_u64(uint64_t *buf, uint16_t n)

 /* IPv4 Tunnel */
 int
-esp4_tunnel_inbound_pre_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
int32_t payload_len;
@@ -117,7 +117,7 @@ esp4_tunnel_inbound_pre_crypto(struct rte_mbuf *m, struct 
ipsec_sa *sa,
 }

 int
-esp4_tunnel_inbound_post_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
uint8_t *nexthdr, *pad_len;
@@ -155,7 +155,7 @@ esp4_tunnel_inbound_post_crypto(struct rte_mbuf *m, struct 
ipsec_sa *sa,
 }

 int
-esp4_tunnel_outbound_pre_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
uint16_t pad_payload_len, pad_len;
@@ -234,7 +234,7 @@ esp4_tunnel_outbound_pre_crypto(struct rte_mbuf *m, struct 
ipsec_sa *sa,
 }

 int
-esp4_tunnel_outbound_post_crypto(struct rte_mbuf *m __rte_unused,
+esp_outbound_post(struct rte_mbuf *m __rte_unused,
struct ipsec_sa *sa __rte_unused,
struct rte_crypto_op *cop)
 {
diff --git a/examples/ipsec-secgw/esp.h b/examples/ipsec-secgw/esp.h
index 3101882..fa5cc8a 100644
--- a/examples/ipsec-secgw/esp.h
+++ b/examples/ipsec-secgw/esp.h
@@ -46,21 +46,20 @@ struct esp_hdr {
/* Integrity Check Value - ICV */
 };

-/* IPv4 Tunnel */
 int
-esp4_tunnel_inbound_pre_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop);

 int
-esp4_tunnel_inbound_post_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop);

 int
-esp4_tunnel_outbound_pre_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop);

 int
-esp4_tunnel_outbound_post_crypto(struct rte_mbuf *m, struct ipsec_sa *sa,
+esp_outbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop);

 #endif /* __RTE_IPSEC_XFORM_ESP_H__ */
diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index 3ffa77a..90a9a86 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -42,6 +42,7 @@
 #include 

 #include "ipsec.h"
+#include "esp.h"

 static inline int
 create_session(struct ipsec_ctx *ipsec_ctx __rte_unused, struct ipsec_sa *sa)
@@ -99,15 +100,14 @@ enqueue_cop(struct cdev_qp *cqp, struct rte_crypto_op *cop)
}
 }

-static inline uint16_t
-ipsec_processing(struct ipsec_ctx *ipsec_ctx, struct rte_mbuf *pkts[],
-   struct ipsec_sa *sas[], uint16_t nb_pkts, uint16_t max_pkts)
+static inline void
+ipsec_enqueue(ipsec_xform_fn xform_func, struct ipsec_ctx *ipsec_ctx,
+   struct rte_mbuf *pkts[], struct ipsec_sa *sas[],
+   uint16_t nb_pkts)
 {
-   int ret = 0, i, j, nb_cops;
+   int ret = 0, i;
struct ipsec_mbuf_metadata *priv;
-   struct rte_crypto_op *cops[max_pkts];
struct ipsec_sa *sa;
-   struct rte_mbuf *pkt;

for (i = 0; i < nb_pkts; i++) {
rte_prefetch0(sas[i]);
@@ -133,7 +133,7 @@ ipsec_processing(struct ipsec_ctx *ipsec_ctx, struct 
rte_mbuf *pkts[],
rte_crypto_op_attach_sym_session(>cop,
sa->crypto_session);

-   ret = sa->pre_crypto(pkts[i], sa, >cop);
+   ret = xform_func(pkts[i], sa, >cop);
if (unlikely(ret)) {
rte_pktmbuf_free(pkts[i]);
continue;
@@ -142,8 +142,18 @@ ipsec_processing(struct ipsec_ctx *ipsec_ctx, struct 
rte_mbuf *pkts[],
RTE_ASSERT(sa->cdev_id_qp < ipsec_ctx->nb_qps);
enqueue_cop(_ctx->tbl[sa->cdev_id_qp], >cop);
}
+}
+
+static inline int
+ipsec_dequeue(ipsec_xform_fn xform_func, struct ipsec_ctx *ipsec_ctx,
+   struct rte_mbuf *pkts[], uint16_t max_pkts)
+{
+   int nb_pkts = 0, ret = 0, i, j, nb_cops;
+   struct ipsec_mbuf_metadata *priv;
+   

[dpdk-dev] [PATCH v2 3/9] examples/ipsec-secgw: add build option and cleanup

2016-05-18 Thread Sergio Gonzalez Monroy
Add support for building the application with DEBUG=1.
This option adds the compiler stack protection flag and enables extra
output in the application.

Also remove unnecessary VPATH setup.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/examples/ipsec-secgw/Makefile b/examples/ipsec-secgw/Makefile
index f9b59c2..6780ad5 100644
--- a/examples/ipsec-secgw/Makefile
+++ b/examples/ipsec-secgw/Makefile
@@ -46,8 +46,9 @@ ifeq ($(CONFIG_RTE_TOOLCHAIN_ICC),y)
 CFLAGS_sa.o += -diag-disable=vec
 endif

-
-VPATH += $(SRCDIR)/librte_ipsec
+ifeq ($(DEBUG),1)
+CFLAGS += -DIPSEC_DEBUG -fstack-protector-all
+endif

 #
 # all source are stored in SRCS-y
-- 
2.5.5



[dpdk-dev] [PATCH v2 2/9] examples/ipsec-secgw: fix stack smashing error

2016-05-18 Thread Sergio Gonzalez Monroy
Building the application with -O3 and -fstack-protection (default in
Ubuntu) results in the following error:

*** stack smashing detected ***: ./build/ipsec-secgw terminated

The error is caused by storing an 8B value in a 4B variable.

Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/ipsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index 4566d38..3ffa77a 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -46,7 +46,7 @@
 static inline int
 create_session(struct ipsec_ctx *ipsec_ctx __rte_unused, struct ipsec_sa *sa)
 {
-   uint32_t cdev_id_qp = 0;
+   unsigned long cdev_id_qp = 0;
int32_t ret;
struct cdev_key key = { 0 };

-- 
2.5.5



[dpdk-dev] [PATCH v2 1/9] examples/ipsec-secgw: fix esp padding check

2016-05-18 Thread Sergio Gonzalez Monroy
Current code fails to correctly check padding sequence for inbound
packets.
Padding sequence starts on 1 but it checks for 0.

Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/esp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/ipsec-secgw/esp.c b/examples/ipsec-secgw/esp.c
index 0f6b33e..7dce78c 100644
--- a/examples/ipsec-secgw/esp.c
+++ b/examples/ipsec-secgw/esp.c
@@ -139,7 +139,7 @@ esp4_tunnel_inbound_post_crypto(struct rte_mbuf *m, struct 
ipsec_sa *sa,

padding = pad_len - *pad_len;
for (i = 0; i < *pad_len; i++) {
-   if (padding[i] != i) {
+   if (padding[i] != i + 1) {
RTE_LOG(ERR, IPSEC_ESP, "invalid pad_len field\n");
return -EINVAL;
}
-- 
2.5.5



[dpdk-dev] [PATCH v2 0/9] IPSec enhancements

2016-05-18 Thread Sergio Gonzalez Monroy
Update IPSec sample app with IPv6 and Transport mode support.

The series contains some bug fixes to facilitate patch merge.

v2:
 - rebase code
 - doc improvements
 - add missing image file

Sergio Gonzalez Monroy (9):
  examples/ipsec-secgw: fix esp padding check
  examples/ipsec-secgw: fix stack smashing error
  examples/ipsec-secgw: add build option and cleanup
  examples/ipsec-secgw: rework ipsec execution loop
  examples/ipsec-secgw: fix no sa found case
  examples/ipsec-secgw: consistent config variable names
  examples/ipsec-secgw: ipv6 support
  examples/ipsec-secgw: transport mode support
  doc: update ipsec sample guide

 doc/guides/sample_app_ug/img/ipsec_endpoints.svg | 850 +
 doc/guides/sample_app_ug/ipsec_secgw.rst | 910 ++-
 examples/ipsec-secgw/Makefile|   8 +-
 examples/ipsec-secgw/esp.c   | 210 --
 examples/ipsec-secgw/esp.h   |   9 +-
 examples/ipsec-secgw/ipip.h  | 145 +++-
 examples/ipsec-secgw/ipsec-secgw.c   | 332 ++---
 examples/ipsec-secgw/ipsec.c |  50 +-
 examples/ipsec-secgw/ipsec.h |  63 +-
 examples/ipsec-secgw/rt.c| 229 --
 examples/ipsec-secgw/sa.c| 466 +++-
 examples/ipsec-secgw/sp.c| 366 -
 examples/ipsec-secgw/sp4.c   | 447 +++
 examples/ipsec-secgw/sp6.c   | 448 +++
 14 files changed, 3312 insertions(+), 1221 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/img/ipsec_endpoints.svg
 delete mode 100644 examples/ipsec-secgw/sp.c
 create mode 100644 examples/ipsec-secgw/sp4.c
 create mode 100644 examples/ipsec-secgw/sp6.c

-- 
2.5.5



[dpdk-dev] [PATCH v4] eal: make hugetlb initialization more robust

2016-05-18 Thread Sergio Gonzalez Monroy
On 17/05/2016 17:40, Thomas Monjalon wrote:
> 2016-05-12 00:44, Jianfeng Tan:
>> This patch adds an option, --huge-trybest, to use a recover mechanism to
>> the case that there are not so many hugepages (declared in sysfs), which
>> can be used. It relys on a mem access to fault-in hugepages, and if fails
> relys -> relies
>
>> with SIGBUS, recover to previously saved stack environment with
>> siglongjmp().
>>
>> Besides, this solution fixes an issue when hugetlbfs is specified with an
>> option of size. Currently DPDK does not respect the quota of a hugetblfs
>> mount. It fails to init the EAL because it tries to map the number of free
>> hugepages in the system rather than using the number specified in the quota
>> for that mount.
> It looks to be a bug. Why adding an option?
> What is the benefit of the old behaviour, not using --try-best?

I do not see any benefit to the old behavior.
Given that we need the signal handling for the cgroup use case, I would 
be inclined to use
this method as the default instead of trying to figure out how many 
hugepages we have free, etc.

Thoughts?

Sergio

>> +static sigjmp_buf jmpenv;
>> +
>> +static void sigbus_handler(int signo __rte_unused)
>> +{
>> +siglongjmp(jmpenv, 1);
>> +}
>> +
>> +/* Put setjmp into a wrap method to avoid compiling error. Any non-volatile,
>> + * non-static local variable in the stack frame calling sigsetjmp might be
>> + * clobbered by a call to longjmp.
>> + */
>> +static int wrap_sigsetjmp(void)
>> +{
>> +return sigsetjmp(jmpenv, 1);
>> +}
> Please add the word "huge" to these variables and functions.
>
>> +static struct sigaction action_old;
>> +static int need_recover;
>> +
>> +static void
>> +register_sigbus(void)
>> +{
>> +sigset_t mask;
>> +struct sigaction action;
>> +
>> +sigemptyset();
>> +sigaddset(, SIGBUS);
>> +action.sa_flags = 0;
>> +action.sa_mask = mask;
>> +action.sa_handler = sigbus_handler;
>> +
>> +need_recover = !sigaction(SIGBUS, , _old);
>> +}
>> +
>> +static void
>> +recover_sigbus(void)
>> +{
>> +if (need_recover) {
>> +sigaction(SIGBUS, _old, NULL);
>> +need_recover = 0;
>> +}
>> +}
> Idem, Please add the word "huge".
>



[dpdk-dev] [PATCH v4] eal: make hugetlb initialization more robust

2016-05-18 Thread Sergio Gonzalez Monroy
On 17/05/2016 17:39, David Marchand wrote:
> Hello Jianfeng,
>
> On Thu, May 12, 2016 at 2:44 AM, Jianfeng Tan  
> wrote:
>> This patch adds an option, --huge-trybest, to use a recover mechanism to
>> the case that there are not so many hugepages (declared in sysfs), which
>> can be used. It relys on a mem access to fault-in hugepages, and if fails
>> with SIGBUS, recover to previously saved stack environment with
>> siglongjmp().
>>
>> Besides, this solution fixes an issue when hugetlbfs is specified with an
>> option of size. Currently DPDK does not respect the quota of a hugetblfs
>> mount. It fails to init the EAL because it tries to map the number of free
>> hugepages in the system rather than using the number specified in the quota
>> for that mount.
>>
>> It's still an open issue with CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS. Under
>> this case (such as IVSHMEM target), having hugetlbfs mounts with quota will
>> fail to remap hugepages as it relies on having mapped all free hugepages
>> in the system.
> For such a case case, maybe having some warning log message when it
> fails would help the user.
> + a known issue in the release notes ?
>
>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
>> b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> index 5b9132c..8c77010 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> @@ -417,12 +434,33 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>>  hugepg_tbl[i].final_va = virtaddr;
>>  }
>>
>> +   if (orig && internal_config.huge_trybest) {
>> +   /* In linux, hugetlb limitations, like cgroup, are
>> +* enforced at fault time instead of mmap(), even
>> +* with the option of MAP_POPULATE. Kernel will send
>> +* a SIGBUS signal. To avoid to be killed, save stack
>> +* environment here, if SIGBUS happens, we can jump
>> +* back here.
>> +*/
>> +   if (wrap_sigsetjmp()) {
>> +   RTE_LOG(DEBUG, EAL, "SIGBUS: Cannot mmap 
>> more "
>> +   "hugepages of size %u MB\n",
>> +   (unsigned)(hugepage_sz / 0x10));
>> +   munmap(virtaddr, hugepage_sz);
>> +   close(fd);
>> +   unlink(hugepg_tbl[i].filepath);
>> +   return i;
>> +   }
>> +   *(int *)virtaddr = 0;
>> +   }
>> +
>> +
>>  /* set shared flock on the file. */
>>  if (flock(fd, LOCK_SH | LOCK_NB) == -1) {
>> -   RTE_LOG(ERR, EAL, "%s(): Locking file failed:%s \n",
>> +   RTE_LOG(DEBUG, EAL, "%s(): Locking file failed:%s 
>> \n",
>>  __func__, strerror(errno));
>>  close(fd);
>> -   return -1;
>> +   return i;
>>  }
>>
>>  close(fd);
> Maybe I missed something, but we are writing into some hugepage before
> the flock has been called.
> Are we sure there is nobody else using this hugepage ?
>
> Especially, can't this cause trouble to a primary process running if
> we start the exact same primary process ?
>

We lock the hugepage directory during eal_hugepage_info_init(), and we 
do not unlock
until we have finished eal_memory_init.

I think that takes care of that case.

Sergio


[dpdk-dev] [PATCH] eal/linuxapp: fix resource leak

2016-05-12 Thread Sergio Gonzalez Monroy
Hi,

On 11/05/2016 17:01, Daniel Mrzyglod wrote:
> Fix issue reported by Coverity.
> Coverity ID 97920
>
> munmap structure of hugepage
>
> leaked_storage: Variable hugepage going out of scope leaks the storage
> it points to.
>
> The system resource will not be reclaimed and reused, reducing the future
> availability of the resource.

I'm not really fond of this commit messages, but if no one minds them, 
so be it.

> In rte_eal_hugepage_init: Leak of memory or pointers to system resources
>
> Fixes: b6a468ad41d5 ("memory: add --socket-mem option")
>
> Signed-off-by: Daniel Mrzyglod 
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 5b9132c..cd40cc9 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1051,7 +1051,7 @@ int
>   rte_eal_hugepage_init(void)
>   {
>   struct rte_mem_config *mcfg;
> - struct hugepage_file *hugepage, *tmp_hp = NULL;
> + struct hugepage_file *hugepage = NULL, *tmp_hp = NULL;
>   struct hugepage_info used_hp[MAX_HUGEPAGE_SIZES];
>   
>   uint64_t memory[RTE_MAX_NUMA_NODES];
> @@ -1374,6 +1374,8 @@ rte_eal_hugepage_init(void)
>   
>   fail:
>   free(tmp_hp);
> + if (hugepage != NULL)
> + munmap(hugepage, nr_hugefiles * sizeof(struct hugepage_file));
>   return -1;
>   }
>   

You missed the last conditional if(...0 of the function where it returns 
directly with error value.

Looking at the code a bit, we never check for anything other than < 0 
return value, so I'd just do a 'goto fail' instead.

Sergio


[dpdk-dev] rte_malloc

2016-05-11 Thread Sergio Gonzalez Monroy
Ok.

So if you build DPDK from the sources, you would have an 'app' directory 
in you build directory.
You can read the Getting Started Guide for more info:
http://dpdk.readthedocs.io/en/v16.04/linux_gsg/index.html

Once you ran the 'test' DPDK application (located inside the 'app' 
directory) you would get a shell where you can run different kind of 
autotests/tests. One of them is the 'malloc_autotest'.

Let me know the result of running that test.

Sergio

On 10/05/2016 17:13, Mahdi Moradmand Badie wrote:
> No, I don't have any idea? :)
>
> On 10 May 2016 at 12:12, Sergio Gonzalez Monroy 
>  <mailto:sergio.gonzalez.monroy at intel.com>> wrote:
>
> Have you tried to run the unit tests? (Run 'app/test' application,
> then 'malloc_autotest')
>
> Sergio
>
>
> On 10/05/2016 16:55, Mahdi Moradmand Badie wrote:
>> #!/bin/sh
>> ./build/app/Mahdi_test -c 0x55 --master-lcore 0
>>
>> On 10 May 2016 at 11:31, Sergio Gonzalez Monroy
>> > <mailto:sergio.gonzalez.monroy at intel.com>> wrote:
>>
>> Forgot to ask,
>>
>> What's the command line you are using to run the app?
>>
>> Sergio
>>
>>
>> On 10/05/2016 16:17, Mahdi Moradmand Badie wrote:
>>> Thanks Sergio,
>>>     Yes sure,
>>> I attached files, it seems so easy but doesn't work.
>>> Thanks,
>>>
>>> On 10 May 2016 at 04:12, Sergio Gonzalez Monroy
>>> >> <mailto:sergio.gonzalez.monroy at intel.com>> wrote:
>>>
>>> Hi,
>>>
>>> On 09/05/2016 18:32, Mahdi Moradmand Badie wrote:
>>>
>>> Hello All,
>>>
>>> I had a problem regarding use the rte_malloc.
>>> I want to know if I want to use rte_malloc instead
>>> of malloc just mak
>>> change like this
>>> struct lcore_params *p = malloc
>>> 
>>> <http://dpdk.org/doc/api/rte__malloc_8h.html#afb7316a4ec228ed9b8ffc1864b03d85b>
>>> (sizeof(*p)); ==>
>>> struct lcore_params *p = rte_malloc
>>> 
>>> <http://dpdk.org/doc/api/rte__malloc_8h.html#afb7316a4ec228ed9b8ffc1864b03d85b>(NULL,
>>> sizeof(*p), 0);
>>> is enough ?
>>>
>>>
>>> Yes, malloc(sizeof(*p)) has an equivalent behavior to
>>> rte_malloc(NULL, sizeof(*p), 0)
>>> in the context of a DPDK application.
>>>
>>> Because I have problem and Segmentation fault (core
>>> dumped) ??
>>>
>>>
>>> Could you provide more details of how to reproduce or
>>> could you try to
>>> reproduce your problem using a very simple example like
>>> examples/helloworld ?
>>>
>>> Sergio
>>>
>>> Thanks in advance,
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> M at hdi Mor at dm@nd B at die
>>
>>
>>
>>
>> -- 
>> M at hdi Mor at dm@nd B at die
>
>
>
>
> -- 
> M at hdi Mor at dm@nd B at die



[dpdk-dev] rte_malloc

2016-05-10 Thread Sergio Gonzalez Monroy
Have you tried to run the unit tests? (Run 'app/test' application, then 
'malloc_autotest')

Sergio

On 10/05/2016 16:55, Mahdi Moradmand Badie wrote:
> #!/bin/sh
> ./build/app/Mahdi_test -c 0x55 --master-lcore 0
>
> On 10 May 2016 at 11:31, Sergio Gonzalez Monroy 
>  <mailto:sergio.gonzalez.monroy at intel.com>> wrote:
>
> Forgot to ask,
>
> What's the command line you are using to run the app?
>
> Sergio
>
>
> On 10/05/2016 16:17, Mahdi Moradmand Badie wrote:
>> Thanks Sergio,
>> Yes sure,
>> I attached files, it seems so easy but doesn't work.
>> Thanks,
>>
>> On 10 May 2016 at 04:12, Sergio Gonzalez Monroy
>> > <mailto:sergio.gonzalez.monroy at intel.com>> wrote:
>>
>> Hi,
>>
>> On 09/05/2016 18:32, Mahdi Moradmand Badie wrote:
>>
>> Hello All,
>>
>> I had a problem regarding use the rte_malloc.
>> I want to know if I want to use rte_malloc instead of
>> malloc just mak
>> change like this
>> struct lcore_params *p = malloc
>> 
>> <http://dpdk.org/doc/api/rte__malloc_8h.html#afb7316a4ec228ed9b8ffc1864b03d85b>
>> (sizeof(*p)); ==>
>> struct lcore_params *p = rte_malloc
>> 
>> <http://dpdk.org/doc/api/rte__malloc_8h.html#afb7316a4ec228ed9b8ffc1864b03d85b>(NULL,
>> sizeof(*p), 0);
>> is enough ?
>>
>>
>> Yes, malloc(sizeof(*p)) has an equivalent behavior to
>> rte_malloc(NULL, sizeof(*p), 0)
>> in the context of a DPDK application.
>>
>> Because I have problem and Segmentation fault (core
>> dumped) ??
>>
>>
>> Could you provide more details of how to reproduce or could
>> you try to
>> reproduce your problem using a very simple example like
>> examples/helloworld ?
>>
>> Sergio
>>
>> Thanks in advance,
>>
>>
>>
>>
>>
>>
>> -- 
>> M at hdi Mor at dm@nd B at die
>
>
>
>
> -- 
> M at hdi Mor at dm@nd B at die



[dpdk-dev] rte_malloc

2016-05-10 Thread Sergio Gonzalez Monroy
Forgot to ask,

What's the command line you are using to run the app?

Sergio

On 10/05/2016 16:17, Mahdi Moradmand Badie wrote:
> Thanks Sergio,
> Yes sure,
> I attached files, it seems so easy but doesn't work.
> Thanks,
>
> On 10 May 2016 at 04:12, Sergio Gonzalez Monroy 
>  <mailto:sergio.gonzalez.monroy at intel.com>> wrote:
>
> Hi,
>
> On 09/05/2016 18:32, Mahdi Moradmand Badie wrote:
>
> Hello All,
>
> I had a problem regarding use the rte_malloc.
> I want to know if I want to use rte_malloc instead of malloc
> just mak
> change like this
> struct lcore_params *p = malloc
> 
> <http://dpdk.org/doc/api/rte__malloc_8h.html#afb7316a4ec228ed9b8ffc1864b03d85b>
> (sizeof(*p)); ==>
> struct lcore_params *p = rte_malloc
> 
> <http://dpdk.org/doc/api/rte__malloc_8h.html#afb7316a4ec228ed9b8ffc1864b03d85b>(NULL,
> sizeof(*p), 0);
> is enough ?
>
>
> Yes, malloc(sizeof(*p)) has an equivalent behavior to
> rte_malloc(NULL, sizeof(*p), 0)
> in the context of a DPDK application.
>
> Because I have problem and Segmentation fault (core dumped) ??
>
>
> Could you provide more details of how to reproduce or could you try to
> reproduce your problem using a very simple example like
> examples/helloworld ?
>
> Sergio
>
> Thanks in advance,
>
>
>
>
>
>
> -- 
> M at hdi Mor at dm@nd B at die



[dpdk-dev] [PATCH v3] eal: make hugetlb initialization more robust

2016-05-10 Thread Sergio Gonzalez Monroy

Hi Jianfeng,

On 09/05/2016 11:48, Jianfeng Tan wrote:

>   /* find physical addresses and sockets for each hugepage */
> @@ -1172,8 +1255,9 @@ rte_eal_hugepage_init(void)
>   hp_offset += new_pages_count[i];
>   #else
>   /* remap all hugepages */
> - if (map_all_hugepages(_hp[hp_offset], hpi, 0) < 0){
> - RTE_LOG(DEBUG, EAL, "Failed to remap %u MB pages\n",
> + if ((uint32_t)map_all_hugepages(_hp[hp_offset], hpi, 0) !=
> + hpi->num_pages[0]) {

It probably makes more sense to have map_all_hugepages return uint32_t 
instead.

Sergio



[dpdk-dev] rte_malloc

2016-05-10 Thread Sergio Gonzalez Monroy
Hi,

On 09/05/2016 18:32, Mahdi Moradmand Badie wrote:
> Hello All,
>
> I had a problem regarding use the rte_malloc.
> I want to know if I want to use rte_malloc instead of malloc just mak
> change like this
> struct lcore_params *p = malloc
> 
> (sizeof(*p)); ==>
> struct lcore_params *p = rte_malloc
> (NULL,
> sizeof(*p), 0);
> is enough ?

Yes, malloc(sizeof(*p)) has an equivalent behavior to rte_malloc(NULL, 
sizeof(*p), 0)
in the context of a DPDK application.

> Because I have problem and Segmentation fault (core dumped) ??

Could you provide more details of how to reproduce or could you try to
reproduce your problem using a very simple example like 
examples/helloworld ?

Sergio

> Thanks in advance,
>
>



[dpdk-dev] [PATCH 9/9] doc: update ipsec sample guide

2016-05-06 Thread Sergio Gonzalez Monroy
Signed-off-by: Sergio Gonzalez Monroy 
---
 doc/guides/sample_app_ug/ipsec_secgw.rst | 583 ---
 1 file changed, 381 insertions(+), 202 deletions(-)

diff --git a/doc/guides/sample_app_ug/ipsec_secgw.rst 
b/doc/guides/sample_app_ug/ipsec_secgw.rst
index c11c7e7..5dbee2e 100644
--- a/doc/guides/sample_app_ug/ipsec_secgw.rst
+++ b/doc/guides/sample_app_ug/ipsec_secgw.rst
@@ -76,10 +76,10 @@ Path for IPsec Outbound traffic:

 Constraints
 ---
-*  IPv4 traffic
-*  ESP tunnel mode
-*  EAS-CBC, HMAC-SHA1 and NULL
-*  Each SA must be handle by a unique lcore (1 RX queue per port)
+*  No IPv6 options headers
+*  No AH mode
+*  Currently only EAS-CBC, HMAC-SHA1 and NULL
+*  Each SA must be handle by a unique lcore (*1 RX queue per port*)
 *  No chained mbufs

 Compiling the Application
@@ -108,6 +108,14 @@ To compile the application:

make

+#. [Optional] Build the application for debugging:
+   This option adds some extra flags, disables compiler optimizations and
+   is verbose.
+
+   .. code-block:: console
+
+   make DEBUG=1
+
 Running the Application
 ---

@@ -141,8 +149,9 @@ where,
 *   --ep1: configure the app as Endpoint 1.

 Either one of --ep0 or --ep1 *must* be specified.
-The main purpose of these options is two easily configure two systems
-back-to-back that would forward traffic through an IPsec tunnel.
+The main purpose of these options is to easily configure two systems
+back-to-back that would forward traffic through an IPsec tunnel (see
+:ref:`figure_ipsec_endpoints`).

 The mapping of lcores to port/queues is similar to other l3fwd applications.

@@ -196,7 +205,8 @@ Refer to the *DPDK Getting Started Guide* for general 
information on running
 applications and the Environment Abstraction Layer (EAL) options.

 The application would do a best effort to "map" crypto devices to cores, with
-hardware devices having priority.
+hardware devices having priority. Basically, hardware devices if present would
+be assign to a core before software ones.
 This means that if the application is using a single core and both hardware
 and software crypto devices are detected, hardware devices will be used.

@@ -221,6 +231,20 @@ The following sections provide some details on the default 
values used to
 initialize the SP, SA and Routing tables.
 Currently all the configuration is hard coded into the application.

+The following image illustrate a few of the concepts regarding IPSec, such
+as protected/unprotected and inbound/outbound traffic, from the point of
+view of two back-to-back endpoints:
+
+.. _figure_ipsec_endpoints:
+
+.. figure:: img/ipsec_endpoints.svg
+
+   IPSec Inbound/Outbound traffic
+
+Note that the above image only displays uniderectional traffic per port
+for illustration purposes.
+The application supports bidirectional traffic on all ports,
+
 Security Policy Initialization
 ~~

@@ -228,107 +252,122 @@ As mention in the overview, the Security Policies are 
ACL rules.
 The application defines two ACLs, one each of Inbound and Outbound, and
 it replicates them per socket in use.

-Following are the default rules:
+Following are the default rules which show only the relevant information,
+assuming ANY value is valid for the fields not mentioned (src ip, proto,
+src/dst ports).

 Endpoint 0 Outbound Security Policies:

-+-+--+---++
-| **Src** | **Dst**  | **proto** | **SA idx** |
-| |  |   ||
-+-+--+---++
-| Any | 192.168.105.0/24 | Any   | 5  |
-| |  |   ||
-+-+--+---++
-| Any | 192.168.106.0/24 | Any   | 6  |
-| |  |   ||
-+-+--+---++
-| Any | 192.168.107.0/24 | Any   | 7  |
-| |  |   ||
-+-+--+---++
-| Any | 192.168.108.0/24 | Any   | 8  |
-| |  |   ||
-+-+--+---++
-| Any | 192.168.200.0/24 | Any   | 9  |
-| |  |   ||
-+-+--+---++
-| Any | 192.168.250.0/24 | Any   | BYPASS |
-| |  |   ||
-+-+--+---++
++---++
+| **Dst**   | **SA idx** |
+|   ||
++---++
+| 192.168.105.0/24   

[dpdk-dev] [PATCH 8/9] examples/ipsec-secgw: transport mode support

2016-05-06 Thread Sergio Gonzalez Monroy
IPSec transport mode support.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/ipsec-secgw/esp.c   | 121 ++-
 examples/ipsec-secgw/ipsec.h |   1 +
 examples/ipsec-secgw/rt.c|  32 
 examples/ipsec-secgw/sa.c|  39 ++
 examples/ipsec-secgw/sp4.c   |  40 ++
 examples/ipsec-secgw/sp6.c   |  48 +
 6 files changed, 244 insertions(+), 37 deletions(-)

diff --git a/examples/ipsec-secgw/esp.c b/examples/ipsec-secgw/esp.c
index ee541eb..7efac39 100644
--- a/examples/ipsec-secgw/esp.c
+++ b/examples/ipsec-secgw/esp.c
@@ -42,7 +42,6 @@
 #include 

 #include 
-#include 
 #include 
 #include 
 #include 
@@ -70,21 +69,24 @@ int
 esp_inbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
-   int32_t payload_len, ip_hdr_len;
+   struct ip *ip4;
struct rte_crypto_sym_op *sym_cop;
+   int32_t payload_len, ip_hdr_len;

IPSEC_ASSERT(m != NULL);
IPSEC_ASSERT(sa != NULL);
IPSEC_ASSERT(cop != NULL);

-   ip_hdr_len = 0;
-   switch (sa->flags) {
-   case IP4_TUNNEL:
-   ip_hdr_len = sizeof(struct ip);
-   break;
-   case IP6_TUNNEL:
+   ip4 = rte_pktmbuf_mtod(m, struct ip *);
+   if (likely(ip4->ip_v == IPVERSION))
+   ip_hdr_len = ip4->ip_hl * 4;
+   else if (ip4->ip_v == IP6_VERSION)
+   /* XXX No option headers supported */
ip_hdr_len = sizeof(struct ip6_hdr);
-   break;
+   else {
+   IPSEC_LOG(ERR, IPSEC_ESP, "invalid IP packet type %d\n",
+   ip4->ip_v);
+   return -EINVAL;
}

payload_len = rte_pktmbuf_pkt_len(m) - ip_hdr_len -
@@ -126,6 +128,8 @@ int
 esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
+   struct ip *ip4, *ip;
+   struct ip6_hdr *ip6;
uint8_t *nexthdr, *pad_len;
uint8_t *padding;
uint16_t i;
@@ -135,7 +139,7 @@ esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
IPSEC_ASSERT(cop != NULL);

if (cop->status != RTE_CRYPTO_OP_STATUS_SUCCESS) {
-   IPSEC_LOG(ERR, IPSEC_ESP, "Failed crypto op\n");
+   IPSEC_LOG(ERR, IPSEC_ESP, "failed crypto op\n");
return -1;
}

@@ -146,7 +150,7 @@ esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
padding = pad_len - *pad_len;
for (i = 0; i < *pad_len; i++) {
if (padding[i] != i + 1) {
-   IPSEC_LOG(ERR, IPSEC_ESP, "invalid pad_len field\n");
+   IPSEC_LOG(ERR, IPSEC_ESP, "invalid padding\n");
return -EINVAL;
}
}
@@ -157,7 +161,23 @@ esp_inbound_post(struct rte_mbuf *m, struct ipsec_sa *sa,
return -EINVAL;
}

-   ipip_inbound(m, sizeof(struct esp_hdr) + sa->iv_len);
+   if (unlikely(sa->flags == TRANSPORT)) {
+   ip = rte_pktmbuf_mtod(m, struct ip *);
+   ip4 = (struct ip *)rte_pktmbuf_adj(m,
+   sizeof(struct esp_hdr) + sa->iv_len);
+   if (likely(ip->ip_v == IPVERSION)) {
+   memmove(ip4, ip, ip->ip_hl * 4);
+   ip4->ip_p = *nexthdr;
+   ip4->ip_len = htons(rte_pktmbuf_data_len(m));
+   } else {
+   ip6 = (struct ip6_hdr *)ip4;
+   /* XXX No option headers supported */
+   memmove(ip6, ip, sizeof(struct ip6_hdr));
+   ip6->ip6_nxt = *nexthdr;
+   ip6->ip6_plen = htons(rte_pktmbuf_data_len(m));
+   }
+   } else
+   ipip_inbound(m, sizeof(struct esp_hdr) + sa->iv_len);

return 0;
 }
@@ -166,32 +186,53 @@ int
 esp_outbound(struct rte_mbuf *m, struct ipsec_sa *sa,
struct rte_crypto_op *cop)
 {
-   uint16_t pad_payload_len, pad_len, ip_hdr_len;
struct ip *ip4;
struct ip6_hdr *ip6;
-   struct esp_hdr *esp;
-   int32_t i;
-   char *padding;
+   struct esp_hdr *esp = NULL;
+   uint8_t *padding, *new_ip, nlp;
struct rte_crypto_sym_op *sym_cop;
+   int32_t i;
+   uint16_t pad_payload_len, pad_len, ip_hdr_len;

IPSEC_ASSERT(m != NULL);
IPSEC_ASSERT(sa != NULL);
IPSEC_ASSERT(cop != NULL);

-   /* Payload length */
-   pad_payload_len = RTE_ALIGN_CEIL(rte_pktmbuf_pkt_len(m) + 2,
-   sa->block_size);
-   pad_len = pad_payload_len - rte_pktmbuf_pkt_len(m);
-
ip_hdr_len = 0;
-   switch (sa->flags) {
-   case IP4_TUNNEL:
+
+   ip4 = rte_pktmbuf_mtod(m, struct ip *);
+   if (likely(ip4->ip_v == IPVERSI

  1   2   3   4   >