date:20141212

[dpdk-dev] [PATCH] Fixed spam from kni_allocate_mbufs() when no mbufs are free.

2014-12-12 Thread Jay Rolette

Fixed spam from kni_allocate_mbufs() when no mbufs are free.
If mbufs exhausted, 'out of memory' message logged at EXTREMELY high rates.
Now logs no more than once per 10 mins

Signed-off-by: Jay Rolette 
---
 lib/librte_kni/rte_kni.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index fdb7509..f89319c 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -40,6 +40,7 @@
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 
@@ -61,6 +62,9 @@

 #define KNI_MEM_CHECK(cond) do { if (cond) goto kni_fail; } while (0)

+// Configure how often we log "out of memory" messages (in seconds)
+#define KNI_SPAM_SUPPRESSION_PERIOD 60*10
+
 /**
  * KNI context
  */
@@ -592,6 +596,10 @@ kni_free_mbufs(struct rte_kni *kni)
 static void
 kni_allocate_mbufs(struct rte_kni *kni)
 {
+ static uint64_t no_mbufs = 0;
+ static uint64_t spam_filter = 0;
+ static uint64_t delayPeriod = 0;
+
  int i, ret;
  struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];

@@ -620,7 +628,18 @@ kni_allocate_mbufs(struct rte_kni *kni)
  pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
  if (unlikely(pkts[i] == NULL)) {
  /* Out of memory */
- RTE_LOG(ERR, KNI, "Out of memory\n");
+ no_mbufs++;
+
+ // Memory leak or need to tune? Regardless, if we get here once,
+ // we will get here a *lot*. Don't spam the logs!
+ now = rte_get_tsc_cycles();
+ if (!delayPeriod)
+delayPeriod = rte_get_tsc_hz() * KNI_SPAM_SUPPRESSION_PERIOD;
+
+ if (!spam_filter || (now - spam_filter) > delayPeriod) {
+ RTE_LOG(ERR, KNI, "No mbufs available (%llu)\n", (unsigned long
long)no_mbufs);
+ spam_filter = now;
+ }
  break;
  }
  }
--

[dpdk-dev] [PATCH] bond: static analysis issues fix

2014-12-12 Thread Wodkowski, PawelX



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Declan Doherty
> Sent: Friday, December 12, 2014 6:40 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] bond: static analysis issues fix
> 
> Fixes for link bonding library identified by static analysis tool
> 
> - Overflow check for active_slaves array in activate_slave function
> - Allocation check of pci_id_table in rte_eth_bond_create
> - Use of eth_dev pointer in mac_address_get/set before NULL check
> 
> Signed-off-by: Declan Doherty 
> ---
>  lib/librte_pmd_bond/rte_eth_bond_api.c | 12 
>  lib/librte_pmd_bond/rte_eth_bond_pmd.c |  8 
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_pmd_bond/rte_eth_bond_api.c
> b/lib/librte_pmd_bond/rte_eth_bond_api.c
> index ef5ddf4..9cb1c1f 100644
> --- a/lib/librte_pmd_bond/rte_eth_bond_api.c
> +++ b/lib/librte_pmd_bond/rte_eth_bond_api.c
> @@ -115,8 +115,11 @@ activate_slave(struct rte_eth_dev *eth_dev, uint8_t
> port_id)
>   if (internals->mode == BONDING_MODE_8023AD)
>   bond_mode_8023ad_activate_slave(eth_dev, port_id);
> 
> - internals->active_slaves[internals->active_slave_count] = port_id;
> - internals->active_slave_count++;
> + if (internals->active_slave_count <
> + RTE_DIM(internals->active_slaves) - 1) {
> + internals->active_slaves[internals->active_slave_count] =
> port_id;
> + internals->active_slave_count++;
> + }
>  }
> 
>  void
> @@ -144,7 +147,8 @@ deactivate_slave(struct rte_eth_dev *eth_dev, uint8_t
> port_id)
>   sizeof(internals->active_slaves[0]));
>   }
> 
> - internals->active_slave_count = active_count;
> + internals->active_slave_count = active_count < RTE_MAX_ETHPORTS ?
> + active_count : RTE_MAX_ETHPORTS - 1;

Since port might not be added twice and active_slaves array is (should be)
 proper size to contain every port you can add to bonding and in fact is
one element bigger and active_slave_count should newer overflow, those
changes might only mask real problems in user application and/or library itself.
I think if you want to make this static analysis tool happy it should be changed
to RTE_VERIFY(), assert(), rte_panic() or something like that to indicate
undefined state.

Pawel

[dpdk-dev] [PATCH] bond: static analysis issues fix

2014-12-12 Thread Declan Doherty

Fixes for link bonding library identified by static analysis tool

- Overflow check for active_slaves array in activate_slave function
- Allocation check of pci_id_table in rte_eth_bond_create
- Use of eth_dev pointer in mac_address_get/set before NULL check

Signed-off-by: Declan Doherty 
---
 lib/librte_pmd_bond/rte_eth_bond_api.c | 12 
 lib/librte_pmd_bond/rte_eth_bond_pmd.c |  8 
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/lib/librte_pmd_bond/rte_eth_bond_api.c 
b/lib/librte_pmd_bond/rte_eth_bond_api.c
index ef5ddf4..9cb1c1f 100644
--- a/lib/librte_pmd_bond/rte_eth_bond_api.c
+++ b/lib/librte_pmd_bond/rte_eth_bond_api.c
@@ -115,8 +115,11 @@ activate_slave(struct rte_eth_dev *eth_dev, uint8_t 
port_id)
if (internals->mode == BONDING_MODE_8023AD)
bond_mode_8023ad_activate_slave(eth_dev, port_id);

-   internals->active_slaves[internals->active_slave_count] = port_id;
-   internals->active_slave_count++;
+   if (internals->active_slave_count <
+   RTE_DIM(internals->active_slaves) - 1) {
+   internals->active_slaves[internals->active_slave_count] = 
port_id;
+   internals->active_slave_count++;
+   }
 }

 void
@@ -144,7 +147,8 @@ deactivate_slave(struct rte_eth_dev *eth_dev, uint8_t 
port_id)
sizeof(internals->active_slaves[0]));
}

-   internals->active_slave_count = active_count;
+   internals->active_slave_count = active_count < RTE_MAX_ETHPORTS ?
+   active_count : RTE_MAX_ETHPORTS - 1;

if (eth_dev->data->dev_started && internals->mode == 
BONDING_MODE_8023AD)
bond_mode_8023ad_start(eth_dev);
@@ -210,7 +214,7 @@ rte_eth_bond_create(const char *name, uint8_t mode, uint8_t 
socket_id)
goto err;
}
pci_id_table = rte_zmalloc_socket(name, sizeof(*pci_id_table), 0, 
socket_id);
-   if (pci_drv == NULL) {
+   if (pci_id_table == NULL) {
RTE_BOND_LOG(ERR, "Unable to malloc pci_id_table on socket");
goto err;
}
diff --git a/lib/librte_pmd_bond/rte_eth_bond_pmd.c 
b/lib/librte_pmd_bond/rte_eth_bond_pmd.c
index 3db473b..bb4a537 100644
--- a/lib/librte_pmd_bond/rte_eth_bond_pmd.c
+++ b/lib/librte_pmd_bond/rte_eth_bond_pmd.c
@@ -764,8 +764,6 @@ mac_address_get(struct rte_eth_dev *eth_dev, struct 
ether_addr *dst_mac_addr)
 {
struct ether_addr *mac_addr;

-   mac_addr = eth_dev->data->mac_addrs;
-
if (eth_dev == NULL) {
RTE_LOG(ERR, PMD, "%s: NULL pointer eth_dev specified\n", 
__func__);
return -1;
@@ -776,6 +774,8 @@ mac_address_get(struct rte_eth_dev *eth_dev, struct 
ether_addr *dst_mac_addr)
return -1;
}

+   mac_addr = eth_dev->data->mac_addrs;
+
ether_addr_copy(mac_addr, dst_mac_addr);
return 0;
 }
@@ -785,8 +785,6 @@ mac_address_set(struct rte_eth_dev *eth_dev, struct 
ether_addr *new_mac_addr)
 {
struct ether_addr *mac_addr;

-   mac_addr = eth_dev->data->mac_addrs;
-
if (eth_dev == NULL) {
RTE_BOND_LOG(ERR, "NULL pointer eth_dev specified");
return -1;
@@ -797,6 +795,8 @@ mac_address_set(struct rte_eth_dev *eth_dev, struct 
ether_addr *new_mac_addr)
return -1;
}

+   mac_addr = eth_dev->data->mac_addrs;
+
/* If new MAC is different to current MAC then update */
if (memcmp(mac_addr, new_mac_addr, sizeof(*mac_addr)) != 0)
memcpy(mac_addr, new_mac_addr, sizeof(*mac_addr));
-- 
1.7.12.2

[dpdk-dev] [PATCH v2] doc: add known issue for iommu and igb_uio

2014-12-12 Thread Thomas Monjalon

2014-12-12 14:50, Nicolas Dichtel:
> Le 12/12/2014 14:38, Gonzalez Monroy, Sergio a ?crit :
> > Any ideas why patchwork is not showing these patches?
> 
> No ...
> Thomas, do you have an idea?

The parsemail script is responsible of adding new patches.
So I guess the answer (bug?) is in this file:

http://git.ozlabs.org/?p=patchwork;a=blob;f=apps/patchwork/bin/parsemail.py

Sergio, maybe you could try the parsemail script locally with your patch.

-- 
Thomas

[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2014-12-12 Thread Olivier MATZ

Hello,

On 12/12/2014 04:48 AM, Liu, Jijiang wrote:
> The 'hw/sw' option  is used to set/clear the flag of enabling TX tunneling 
> packet checksum hardware offload in testpmd application.

This is not clear at all.
In your command, there is (hw|sw|none).
Are you talking about inner or outer?
Is this command useful for any kind of packet?
How does it combine with "tx_checksum set outer-ip (hw|sw)"?

>> You are mixing scenario descriptions with what you do in your patchset:
>> 1/ is a scenario
>> 2/ and 3/ are descriptions of added/removed commands
> 
> No.
> Please note the symbols for command descriptions and  scenario descriptions.
> 
> The command descriptions with ">"  symbol.
>  1> add "tx_checksum set tunnel (hw|sw|none) (port-id)" command
>  2> add "tx_checksum set outer-ip (hw|sw) (port-id)" command
> 3> (ip|udp|tcp|sctp|vxlan) (hw|sw) (port-id)" command
> 
> The scenario descriptions with ")"  symbol.
> 1) User requests HW offload for ipv4_hdr_out  checksum, and doesn't care is 
> it a tunneled packet or not. So he sets:

I read again your cover letter. You enumerate *one* scenario.
An enumeration starts with 2 items.
Then you mix 1> and 1) numbers, which does not make things readable.

>> Another thing: you don't describe what you want to be able to do:
>>
>> 1/ packet type 1: compute L3 and/or L4 checksum using lX_len 2/ packet type 
>> 2:
>> compute inner L3 and/or L4 checksum using lX_len 3/ packet type 2: compute
>> outer L3 and/or L4 checksum using lX_len 4/ packet type 2: compute inner L3
>> and/or L4 checksum using lX_len
>> and outer L3 and/or L4 checksum using outer_lX_len
> 
> These details have already covered in 
> http://dpdk.org/ml/archives/dev/2014-December/009213.html,
> if the patch set is applied, and we aslo have to update the some documents.

First, this link was not referenced in the cover letter or patchset.
I think it would help to first try to explain what problem is fixed,
what is the need, and why. I think in this case it should not require
lines and it could be done in a simple way.

Indeed, yesterday I spent more than an hour to review your patches
and read your descriptions. After that, I still don't understand how
the commands impact the behavior of the csumonly application. The
possible reasons are:

1) I am too dumb to understand. In this case, it would be better
   for you and the community to find another reviewer.

2) Your descriptions are not clear. In this case, you need to think
   about how to reword them so they can be understood, or even maybe
   think about rework your commands if they cannot be explained with
   simple words.

Note that 1) and 2) are not exclusive ;)

>> why not having the 2 following commands:
>>
> 
> I have talked about why we need the current 3 commands in another mail loop, 
> let me explain it here again.

The community does not have this private thread.
And, that's right, I remember this thread: in it, I asked 3 times some
precisions about the commands without any clear answer.

> First. We  still think we need some command to enable/disable tunneling  
> support in testpmd, that's why the command 1 is needed.

What does enable/disable tunneling support mean?

> 1. tx_checksum set tunnel (hw|sw|none) (port-id) command
> 
> 2. tx_cksum set (outer-ip)  (hw|sw) (port_id)
> 
> 3. tx_cksum set (ip|l4) (hw|sw) (port_id)
> 
> Secondly, in most of cases,   user application use non-tunneling packet, so 
> he just care how to use 3, don't need to care 1 and 2, don't you think  it 
> becomes simpler?  
> If we mix tunneling packet command and non-tunneling packet together, and the 
> commands will become more complicated and  not easy to understand.

Really no, it is not simpler. But if you are able to explain it
in few words what is done by csumonly, maybe I can change my mind.

>> tx_checksum set
>> (ip|udp|tcp|sctp|outer-ip|outer-udp|outer-tcp|outer-sctp) 
> 
> As far as I know, so far,  there is no a type of tunneling packet with 
> outer-tcp and outer-sctp.

For TCP, there is STT, which is used in storage.
For SCTP, it could probably be removed.

>>select if we use hw or sw calculation for each header type
>>
>> tx_checksum tunnel (inner|outer|both)
>>
>>when a tunnel packet is received in csum only, control wether
>>we want to process inner, outer or both headers
> 
> This command can't meet/match our previous discussions and current 
> implementation.  In terms of 'inner' option, which can't meet the two 
> following cases.
> 
> B) User is aware that it is a tunneled packet and requests HW offload for 
> ipv4_hdr_in and tcp_hdr_in *only*.
> He doesn't care about outer IP checksum offload.
> In that case, for FVL  he has 2 choices:
>1. Treat that packet as a 'proper' tunnelled packet, and fill all the 
> fields:
>  mb->l2_len =  udp_hdr_len + vxlan_hdr_len + eth_hdr_in;
>  mb->l3_len = ipv4_hdr_in;
>  mb->outer_l2_len = eth_hdr_out;
>  mb->outer_l3_len = ipv4_hdr_out;
>  mb->ol_flags |= PKT_TX_UDP

[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-12 Thread Paolo Bonzini



On 12/12/2014 17:10, Thomas Monjalon wrote:
> > Ok, this looks specific enough that an out-of-band solution within DPDK
> > sounds like the best approach.  It seems unnecessary to involve the
> > hypervisor (neither KVM nor QEMU).
>
> Paolo, I don't understand why you don't imagine controlling frequency scaling
> of a pinned vCPU transparently?

Probably because I don't imagine controlling frequency scaling from the
application on bare metal, either. :)  It seems to me that this is just
working around limitations of the kernel.

Paolo

> In my understanding, we currently cannot control frequency scaling without
> knowing wether we are in a VM or not.

[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-12 Thread Thomas Monjalon

2014-12-12 15:50, Paolo Bonzini:
> On 12/12/2014 14:00, Carew, Alan wrote:
> > The problem is deterministic control of host CPU frequency and the DPDK 
> > usage
> > model.
> > A hands-off power governor will scale based on workload, whether this is a 
> > host
> > application or VM, so no problems or bug there.
> > 
> > Where this solution fits is where an application wants to control its own
> > power policy, for example l3fwd_power uses librte_power library to change
> > frequency via apci_cpufreq based on application heuristics rather than
> > relying on an inbuilt policy for example ondemand or performance.
> > 
> > This ability has existed in DPDK for host usage for some time and VM power
> > management allows this use case to be extended to cater for virtual machines
> > by re-using the librte_power interface to encapsulate the VM->Host
> > comms and provide an example means of managing such communications.
> > 
> >  I hope this clears it up a bit.
> 
> Ok, this looks specific enough that an out-of-band solution within DPDK
> sounds like the best approach.  It seems unnecessary to involve the
> hypervisor (neither KVM nor QEMU).

Paolo, I don't understand why you don't imagine controlling frequency scaling
of a pinned vCPU transparently?
In my understanding, we currently cannot control frequency scaling without
knowing wether we are in a VM or not.

-- 
Thomas

[dpdk-dev] [PATCH] lib/librte_table: Fix table array lookup

2014-12-12 Thread Mark Wunderlich

The existing lookup function was returning an unmodified
pkts_mask bitmask into lookup_hit_mask.  This effectively
assumes that all packets would index correctly into one
of the array table entries.

Also, there was no check that the metadata provided index
value was within range of the table max entries.  By using
using table index bitmask on the metadata provided index
the resulting entry position may falsely indicate a hit
for index values provided that happen to be greter than
the number of table entries.

Like other table type lookup functions it would seem that
the possibility exists that some of the packets provided
to the function would not result in a hit.  It is assumed
that the metadata provided should be a direct index into
the array table.  So, code was added to build and return
a bitmask for only those packets that correctly index
directly into the table array.

If the original intent for this table type was to accept
any 32-bit value, then by applying the table index bitmask
as a modulo index for distribution across table entries,
then this patch would be invalid and should be rejected.

Signed-off-by: Mark Wunderlich 
---
 lib/librte_table/rte_table_array.c |   25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/lib/librte_table/rte_table_array.c 
b/lib/librte_table/rte_table_array.c
index c031070..0164d18 100644
--- a/lib/librte_table/rte_table_array.c
+++ b/lib/librte_table/rte_table_array.c
@@ -164,8 +164,7 @@ rte_table_array_lookup(
void **entries)
 {
struct rte_table_array *t = (struct rte_table_array *) table;
-
-   *lookup_hit_mask = pkts_mask;
+   uint64_t pkts_out_mask = 0;

if ((pkts_mask & (pkts_mask + 1)) == 0) {
uint64_t n_pkts = __builtin_popcountll(pkts_mask);
@@ -173,26 +172,32 @@ rte_table_array_lookup(

for (i = 0; i < n_pkts; i++) {
struct rte_mbuf *pkt = pkts[i];
-   uint32_t entry_pos = RTE_MBUF_METADATA_UINT32(pkt,
-   t->offset) & t->entry_pos_mask;
+   uint32_t entry_pos = 
RTE_MBUF_METADATA_UINT32(pkt,t->offset);

-   entries[i] = (void *) &t->array[entry_pos *
-   t->entry_size];
+   if (entry_pos < t->n_entries) {
+   entries[i] = (void *) &t->array[entry_pos *
+   t->entry_size];
+   pkts_out_mask |= (1LLU << i);
+   }
}
} else {
for ( ; pkts_mask; ) {
uint32_t pkt_index = __builtin_ctzll(pkts_mask);
uint64_t pkt_mask = 1LLU << pkt_index;
struct rte_mbuf *pkt = pkts[pkt_index];
-   uint32_t entry_pos = RTE_MBUF_METADATA_UINT32(pkt,
-   t->offset) & t->entry_pos_mask;
+   uint32_t entry_pos = 
RTE_MBUF_METADATA_UINT32(pkt,t->offset);

-   entries[pkt_index] = (void *) &t->array[entry_pos *
-   t->entry_size];
+   if (entry_pos < t->n_entries) {
+   entries[pkt_index] = (void *) 
&t->array[entry_pos *
+   t->entry_size];
+   pkts_out_mask |= pkt_mask;
+   }
pkts_mask &= ~pkt_mask;
}
}

+   *lookup_hit_mask = pkts_out_mask;
+
return 0;
 }

[dpdk-dev] A question about hugepage initialization time

2014-12-12 Thread Thomas Monjalon

2014-12-12 09:59, Bruce Richardson:
> On Fri, Dec 12, 2014 at 04:07:40AM +, L?szl? Vadkerti wrote:
> > On Thu, 11 Dec,  2014, Bruce Richardson wrote:
> > > On Wed, Dec 10, 2014 at 07:16:59PM +, L?szl? Vadkerti wrote:
> > > >
> > > > On Wed, 10 Dec 2014, Bruce Richardson wrote:
> > > >
> > > > > On Wed, Dec 10, 2014 at 09:29:26AM -0500, Neil Horman wrote:
> > > > >> On Wed, Dec 10, 2014 at 10:32:25AM +, Bruce Richardson wrote:
> > > > >>> On Tue, Dec 09, 2014 at 02:10:32PM -0800, Stephen Hemminger wrote:
> > > >  On Tue, 9 Dec 2014 11:45:07 -0800 &rew
> > > >   wrote:
> > > > 
> > > > >> Hey Folks,
> > > > >>
> > > > >> Our DPDK application deals with very large in memory data
> > > > >> structures, and can potentially use tens or even hundreds of
> > > gigabytes of hugepage memory.
> > > > >> During the course of development, we've noticed that as the
> > > > >> number of huge pages increases, the memory initialization time
> > > > >> during EAL init gets to be quite long, lasting several minutes
> > > > >> at present.  The growth in init time doesn't appear to be linear,
> > > which is concerning.
> > > > >>
> > > > >> This is a minor inconvenience for us and our customers, as
> > > > >> memory initialization makes our boot times a lot longer than it
> > > > >> would otherwise be.  Also, my experience has been that really
> > > > >> long operations often are hiding errors - what you think is
> > > > >> merely a slow operation is actually a timeout of some sort,
> > > > >> often due to misconfiguration. This leads to two
> > > > >> questions:
> > > > >>
> > > > >> 1. Does the long initialization time suggest that there's an
> > > > >> error happening under the covers?
> > > > >> 2. If not, is there any simple way that we can shorten memory
> > > > >> initialization time?
> > > > >>
> > > > >> Thanks in advance for your insights.
> > > > >>
> > > > >> --
> > > > >> Matt Laswell
> > > > >> laswell at infiniteio.com
> > > > >> infinite io, inc.
> > > > >>
> > > > >
> > > > > Hello,
> > > > >
> > > > > please find some quick comments on the questions:
> > > > > 1.) By our experience long initialization time is normal in case
> > > > > of large amount of memory. However this time depends on some
> > > things:
> > > > > - number of hugepages (pagefault handled by kernel is pretty
> > > > > expensive)
> > > > > - size of hugepages (memset at initialization)
> > > > >
> > > > > 2.) Using 1G pages instead of 2M will reduce the initialization
> > > > > time significantly. Using wmemset instead of memset adds an
> > > > > additional 20-30% boost by our measurements. Or, just by
> > > > > touching the pages but not cleaning them you can have still some
> > > > > more speedup. But in this case your layer or the applications
> > > > > above need to do the cleanup at allocation time (e.g. by using
> > > rte_zmalloc).
> > > > >
> > > > > Cheers,
> > > > > &rew
> > > > 
> > > >  I wonder if the whole rte_malloc code is even worth it with a
> > > >  modern kernel with transparent huge pages? rte_malloc adds very
> > > >  little value and is less safe and slower than glibc or other
> > > >  allocators. Plus you lose the ablilty to get all the benefit out of
> > > valgrind or electric fence.
> > > > >>>
> > > > >>> While I'd dearly love to not have our own custom malloc lib to
> > > > >>> maintain, for DPDK multiprocess, rte_malloc will be hard to
> > > > >>> replace as we would need a replacement solution that similarly
> > > > >>> guarantees that memory mapped in process A is also available at
> > > > >>> the same address in process B. :-(
> > > > >>>
> > > > >> Just out of curiosity, why even bother with multiprocess support?
> > > > >> What you're talking about above is a multithread model, and your
> > > > >> shoehorning multiple processes into it.
> > > > >> Neil
> > > > >>
> > > > >
> > > > > Yep, that's pretty much what it is alright. However, this
> > > > > multiprocess support is very widely used by our customers in
> > > > > building their applications, and has been in place and supported
> > > > > since some of the earliest DPDK releases. If it is to be removed, it
> > > > > needs to be replaced by something that provides equivalent
> > > > > capabilities to application writers (perhaps something with more
> > > > > fine-grained sharing
> > > > > etc.)
> > > > >
> > > > > /Bruce
> > > > >
> > > >
> > > > It is probably time to start discussing how to pull in our multi
> > > > process and memory management improvements we were talking about in
> > > > our DPDK Summit presentation:
> > > > https://www.youtube.com/watch?v=907VShi799k#t=647
> > > >
> > > > Multi-process model could have several benefits mostly in the high
> > > > availability area (telco requirement) due to better separ

[dpdk-dev] [PATCH] examples: fix unchecked malloc return value in ip_pipeline

2014-12-12 Thread Thomas Monjalon

Hi Cristian,

2014-12-12 15:19, Dumitrescu, Cristian:
> Acked by: 

Please, next time,
- add you acked-by below the signed-off,
- put your name and a real email address (like in a signed-off),
- and remove the patch to make email shorter.

I think the web site needs to be updated to explain such things.


> -Original Message-
> From: Richardson, Bruce 
> Sent: Friday, December 12, 2014 12:24 PM
> To: dev at dpdk.org; Dumitrescu, Cristian
> Cc: Richardson, Bruce
> Subject: [PATCH] examples: fix unchecked malloc return value in ip_pipeline
> 
> Static analysis shows that once instance of rte_zmalloc is missing
> a return value check in the code. This is fixed by adding a return
> value check. The malloc call itself is moved to earlier in the function
> so that no work is done unless all memory allocation requests have
> succeeded - thereby removing the need for rollback on error.
> 
> Signed-off-by: Bruce Richardson 
> ---
>  examples/ip_pipeline/cmdline.c | 20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/examples/ip_pipeline/cmdline.c b/examples/ip_pipeline/cmdline.c
> index 13d565e..152acb5 100644
> --- a/examples/ip_pipeline/cmdline.c
> +++ b/examples/ip_pipeline/cmdline.c
[...]

[dpdk-dev] [PATCH 12/15] eal/tile: add mPIPE buffer stack mempool provider

2014-12-12 Thread Tony Lu

>-Original Message-
>From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
>Sent: Tuesday, December 09, 2014 10:07 PM
>To: Zhigang Lu
>Cc: dev at dpdk.org
>Subject: Re: [dpdk-dev] [PATCH 12/15] eal/tile: add mPIPE buffer stack
mempool
>provider
>
>On Mon, Dec 08, 2014 at 04:59:35PM +0800, Zhigang Lu wrote:
>> TileGX: Modified mempool to allow for variable metadata.
>> Signed-off-by: Zhigang Lu 
>> Signed-off-by: Cyril Chemparathy 
>> ---
>>  app/test-pmd/mempool_anon.c   |   2 +-
>>  app/test/Makefile |   6 +-
>>  app/test/test_mempool_tile.c  | 217 
>>  lib/Makefile  |   5 +
>>  lib/librte_eal/linuxapp/eal/Makefile  |   4 +
>>  lib/librte_mempool_tile/Makefile  |  48 +++
>>  lib/librte_mempool_tile/rte_mempool.c | 381 
>> lib/librte_mempool_tile/rte_mempool.h | 634
>> ++
>>  8 files changed, 1295 insertions(+), 2 deletions(-)  create mode
>> 100644 app/test/test_mempool_tile.c  create mode 100644
>> lib/librte_mempool_tile/Makefile  create mode 100644
>> lib/librte_mempool_tile/rte_mempool.c
>>  create mode 100644 lib/librte_mempool_tile/rte_mempool.h
>>
>NAK, this creates an alternate, parallel implementation of the mempool api,
>that re-implements some aspects of the mempool api, but not others.  This
will
>make for completely no-portable applications (both to and from the tile
arch),
>and create maintnence problems, in that features for mempool will need to
be
>implemented in multiple libraries.
>
>I understand wanting to use mpipe, and thats perfectly fine, but creating
>no-portable apis to do so isn't the right way to go.  Instead, why not just
allow
>applications to use mpipe by initalizing it via the gxio library and
crating a
>mempool using the existing libraries' rte_mempool_xmem_create api call,
which
>allows for existing allocated memory space to be managed as a mempool?

Yes, the mempool we are using is very much tile-specific, as we want to use
the mpipe
hardware managed buffer pool to implement the mempool, which greatly improve
the
performance of mempool.

As Cyril replied in a previous email:
The alternative is to not include support for the hardware managed buffer
pool, but that
decision incurs a significant performance hit.

[dpdk-dev] [PATCH 14/15] app/test: turn off cpu flag checks for tile architecture

2014-12-12 Thread Tony Lu

>-Original Message-
>From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
>Sent: Thursday, December 11, 2014 9:39 PM
>To: Tony Lu
>Cc: dev at dpdk.org
>Subject: Re: [dpdk-dev] [PATCH 14/15] app/test: turn off cpu flag checks
for tile
>architecture
>
>On Thu, Dec 11, 2014 at 12:43:36PM +0800, Tony Lu wrote:
>> >-Original Message-
>> >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
>> >Sent: Tuesday, December 09, 2014 11:03 PM
>> >To: Zhigang Lu
>> >Cc: dev at dpdk.org
>> >Subject: Re: [dpdk-dev] [PATCH 14/15] app/test: turn off cpu flag
>> >checks
>> for tile
>> >architecture
>> >
>> >On Mon, Dec 08, 2014 at 04:59:37PM +0800, Zhigang Lu wrote:
>> >> Tile processor doesn't have CPU flag hardware registers, so this
>> >> patch turns off cpu flag checks for tile.
>> >>
>> >> Signed-off-by: Zhigang Lu 
>> >> Signed-off-by: Cyril Chemparathy 
>> >> ---
>> >>  app/test/test_cpuflags.c | 2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c
>> >> index
>> >> 5aeba5d..da93af5 100644
>> >> --- a/app/test/test_cpuflags.c
>> >> +++ b/app/test/test_cpuflags.c
>> >> @@ -113,7 +113,7 @@ test_cpuflags(void)
>> >>
>> >>   printf("Check for ICACHE_SNOOP:\t\t");
>> >>   CHECK_FOR_FLAG(RTE_CPUFLAG_ICACHE_SNOOP);
>> >> -#else
>> >> +#elif !defined(RTE_ARCH_TILE)
>> >>   printf("Check for SSE:\t\t");
>> >>   CHECK_FOR_FLAG(RTE_CPUFLAG_SSE);
>> >>
>> >Please stop this.  It doesn't make sense for a library that supports
>> multiple
>> >arches, we need some way to generically test for flags that doesn't
>> >involve forcing applications to do ton's of ifdeffing.  Perhaps
>> rte_cpu_get_flag_enabled
>> >needs to do a flag table lookup based on the detected arch at run
>> >time, and return the appropriate response.  In the case of tile, it
>> >can just be an
>> empty
>> >table, so 0 is always returned.  But making an application
>> >responsible for
>> doing
>> >arch checks is a guarantee to write non-portable applications
>> >
>> >Neil
>> >
>>
>> Thanks for taking a look at this.
>> This change just follows what PPC did in commit 9ae15538. The root
>> cause is
>Yes, and I objected to it there as well:
>http://dpdk.org/ml/archives/dev/2014-November/008628.html
>
>To which the response was effectively "Sure, we'll do that later".  You're
>effectively making the same argument.  If no one ever steps up to change
the
>interface when adding a new arch, it will never get done, and we'll have a
>fragmented cpuflag test mechanism that creates completely non-portable code
>accross arches.
>
>> that
>> the test_cpuflags.c explicitly tests X86-specific CPU flags, so we
>> might need to revise this test case to make it
>> architecture-independent.
>>
>Exactly what I said in my email to the powerpc people.  If you're going to
add a
>new arch, and a given interface doesn't support doing so, please try to
re-design
>the interface to make it more friendly, otherwise we'll be left with
>unmaintainable code.

Agree, Make sense.

>Thinking about it, you probably don't even need to change the api call to
do this.
>You just need to create a unified map for all flags of all supported
arches, that is
>to say a two dimensional array with the indicies [arch][flag] where the
stored
>value is the arch specific data to help determine if the feature is
supported, or a
>universal "not supported" flag.

Yes, in order not to break ACL or other libs/apps, we need to make the flags
of all
supported arches accessible.  But I don't feel as strongly to create a
[arch][flag] array,
since checking if the specified flag is supported is at runtime, so we can
not assign it in
a predefine array according to its arch. For example, some old X86 processor
does not
support SSE3.

Instead I prefer a one dimensional arch-specific [flag] array which contains
all the flags
of all supported arches, and we mark the flags that do not belong to the
current arch
as "not available".

To implement this, we need to move the enum rte_cpu_flag_t from
arch-specific
rte_cpuflags.c to the generic one, and combine them as one enumeration.

ACL rte_acl_init() itself has a bug that it should check the return value of
rte_cpu_get_flag_enabled() if it is "1", but not "!0", as it may return
"-EFAULT".

Thanks
-Zhigang

[dpdk-dev] [PATCH 0/2 v4] Fix two compile issues with i686 platform

2014-12-12 Thread Thomas Monjalon

2014-12-12 06:38, Neil Horman:
> On Thu, Dec 11, 2014 at 10:21:44PM +0100, Thomas Monjalon wrote:
> > 2014-12-11 15:28, Qiu, Michael:
> > > On 2014/12/11 21:26, Neil Horman wrote:
> > > > On Thu, Dec 11, 2014 at 01:56:06AM +0100, Thomas Monjalon wrote:
> > > >>> These two issues are both introuduced by commit b77b5639:
> > > >>> mem: add huge page sizes for IBM Power
> > > >>>
> > > >>> Michael Qiu (2):
> > > >>>   Fix compile issue with hugepage_sz in 32-bit system
> > > >>>   Fix compile issue of eal with icc compile
> > > >> Acked-by: Thomas Monjalon 
> > > >>
> > > >> Applied
> > > >>
> > > >> Thanks
> > > >>
> > > > Wait, why did you apply this patch?  We had outstanding debate on it, 
> > > > and
> > > > Michael indicated he was testing a new version of the patch.
> > > 
> > > Yes, I test the solution you suggest :) and it mostly works, but with a
> > > little issue.
> > > I have re-post not the old version.
> > 
> > Neil, v4 is a new version implementing what you suggested.
> > There was no comment and it looks good so I applied it.
> >  
> > > Do you take a look at?
> > 
> I didn't.  Apologies, I see the v4 now.  That said, something is off.  If you
> look at the list archives, I see patch 0/2 v4 in the list, but not 1/2 or 2/2,
> theres no actual patch that got posted.  Was it sent to you privately?

No there are public and you are Cc.

> > I think Neil missed the v4. Sorry to not have pinged you, I wanted rc4 for
> > validation at this time.
> > Neil do you agree this version is OK or do you see some issue to fix?
> > 
> Again, I think Michales send went sideways.  0/4 went to the list but the 
> actual
> patches only went to you Thomas.  Please post them to the list

They were correctly posted:
http://thread.gmane.org/gmane.comp.networking.dpdk.devel/9282/focus=9754

-- 
Thomas

[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-12 Thread Paolo Bonzini



On 12/12/2014 14:00, Carew, Alan wrote:
> The problem is deterministic control of host CPU frequency and the DPDK usage
> model.
> A hands-off power governor will scale based on workload, whether this is a 
> host
> application or VM, so no problems or bug there.
> 
> Where this solution fits is where an application wants to control its own
> power policy, for example l3fwd_power uses librte_power library to change
> frequency via apci_cpufreq based on application heuristics rather than
> relying on an inbuilt policy for example ondemand or performance.
> 
> This ability has existed in DPDK for host usage for some time and VM power
> management allows this use case to be extended to cater for virtual machines
> by re-using the librte_power interface to encapsulate the VM->Host
> comms and provide an example means of managing such communications.
> 
>  I hope this clears it up a bit.

Ok, this looks specific enough that an out-of-band solution within DPDK
sounds like the best approach.  It seems unnecessary to involve the
hypervisor (neither KVM nor QEMU).

Paolo

[dpdk-dev] [PATCH 0/3] bond mode 4: add unit tests

2014-12-12 Thread Doherty, Declan

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski
> Sent: Friday, December 12, 2014 9:15 AM
> To: dev at dpdk.org
> Cc: Kulasek, TomaszX
> Subject: [dpdk-dev] [PATCH 0/3] bond mode 4: add unit tests
> 
> These patches add unit tests for mode 4. They also changes ring pmd
> to behave more like ordinary pmd device.
> 
> Pawel Wodkowski (3):
>   bond-change-warning
>   PMD-ring-MAC-management-fix-initialization-link-up-d
>   unit-tests-add-mode-4-unit-test
> 
>  app/test/Makefile  |1 +
>  app/test/test.h|  111 ++-
>  app/test/test_link_bonding.c   |2 +-
>  app/test/test_link_bonding_mode4.c | 1412
> 
>  lib/librte_pmd_bond/rte_eth_bond_pmd.c |4 +-
>  lib/librte_pmd_ring/rte_eth_ring.c |   62 +-
>  6 files changed, 1539 insertions(+), 53 deletions(-)
>  create mode 100644 app/test/test_link_bonding_mode4.c
> 
> --
> 1.7.9.5
Acked-by: Declan Doherty

[dpdk-dev] How to add veth interfaces in dpdk

2014-12-12 Thread Zhou, Danny

DPDK does not have PMD to support veth. I guess you might want to try to run 
DPDK in container, right? 
If that is the case, you have to assign a NIC's PF or VF to container to be 
driven by DPDK PMD.

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sachin Sharma
> Sent: Friday, December 12, 2014 9:35 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] How to add veth interfaces in dpdk
> 
> Hi all,
> 
> I have created veth interfaces using  command "sudo ip link add veth1 type
> veth peer name veth2". However, when I use command "sudo
> ./tools/igb_uio_bind.py --force --bind=igb_uio veth1" to add veth into
> dpdk. It gives me an error that "Unknown device: veth1. Please specify
> device in "bus:slot.func" format". I do not see even veth interfaces using
> command "sudo ./tools/igb_uio_bind.py --status". Is there any way that I
> can add these interfaces into dpdk?
> 
> 
> Regards,
> Sachin.

[dpdk-dev] [PATCH] examples: fix unchecked malloc return value in ip_pipeline

2014-12-12 Thread Dumitrescu, Cristian

Acked by: 

-Original Message-
From: Richardson, Bruce 
Sent: Friday, December 12, 2014 12:24 PM
To: dev at dpdk.org; Dumitrescu, Cristian
Cc: Richardson, Bruce
Subject: [PATCH] examples: fix unchecked malloc return value in ip_pipeline

Static analysis shows that once instance of rte_zmalloc is missing
a return value check in the code. This is fixed by adding a return
value check. The malloc call itself is moved to earlier in the function
so that no work is done unless all memory allocation requests have
succeeded - thereby removing the need for rollback on error.

Signed-off-by: Bruce Richardson 
---
 examples/ip_pipeline/cmdline.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/examples/ip_pipeline/cmdline.c b/examples/ip_pipeline/cmdline.c
index 13d565e..152acb5 100644
--- a/examples/ip_pipeline/cmdline.c
+++ b/examples/ip_pipeline/cmdline.c
@@ -1093,7 +1093,7 @@ cmd_firewall_add_parsed(
__attribute__((unused)) void *data)
 {
struct cmd_firewall_add_result *params = parsed_result;
-   struct app_rule rule, *old_rule;
+   struct app_rule rule, *old_rule, *new_rule = NULL;
struct rte_mbuf *msg;
struct app_msg_req *req;
struct app_msg_resp *resp;
@@ -1148,6 +1148,18 @@ cmd_firewall_add_parsed(
if (msg == NULL)
rte_panic("Unable to allocate new message\n");

+   /* if we need a new rule structure, allocate it before we go further */
+   if (old_rule == NULL) {
+   new_rule = rte_zmalloc_socket("CLI", sizeof(struct app_rule),
+   RTE_CACHE_LINE_SIZE, rte_socket_id());
+   if (new_rule == NULL) {
+   printf("Cannot allocate memory for new rule\n");
+   rte_ctrlmbuf_free(msg);
+   return;
+   }
+   }
+
+
/* Fill request message */
req = (struct app_msg_req *)rte_ctrlmbuf_data(msg);
req->type = APP_MSG_REQ_FW_ADD;
@@ -1190,12 +1202,6 @@ cmd_firewall_add_parsed(
printf("Request FIREWALL_ADD failed (%u)\n", resp->result);
else {
if (old_rule == NULL) {
-   struct app_rule *new_rule = (struct app_rule *)
-   rte_zmalloc_socket("CLI",
-   sizeof(struct app_rule),
-   RTE_CACHE_LINE_SIZE,
-   rte_socket_id());
-
memcpy(new_rule, &rule, sizeof(rule));
TAILQ_INSERT_TAIL(&firewall_table, new_rule, entries);
n_firewall_rules++;
-- 
1.9.3

--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] How to add veth interfaces in dpdk

2014-12-12 Thread Marc Sune


On 12/12/14 14:57, Sachin Sharma wrote:
> Hi Marc,
> I have limited number of nodes (Linux nodes)  in my testbed. So, I wanted to 
> use veth interfaces for creating some of dpdk nodes in my nodes.  You said 
> that I can use KNI interfaces for this purpose. I am very new to KNI 
> terminology. Could you please redirect me to a tutorial to create links with 
> KNI interfaces and then adding those KNI interfaces into dpdk?

For the original problem, you could use qemu VMs with emulated e1000 
NICs which are supported by DPDK. This is how we have some development 
environments here in BISDN. Of course, this is only valid for functional 
tests, not performance.

The KNI documentation is in a section of the DPDK manual. The API is 
documented here:

http://dpdk.org/doc/intel/dpdk-prog-guide-1.7.0.pdf
http://dpdk.org/doc/api/rte__kni_8h.html

I don't quite get what you meant by "creating links with KNI". KNI 
interfaces are a bridge between the kernel and the DPDK, meaning a 
kernel interface (e.g. kni0) will pop up in the kernel, as a regular 
interface, and from the DPDK zone you can RX and TX from and to it, in a 
similar way you do in normal eth devices ( using calls rte_kni_rx_burst, 
rte_kni_tx_burst).

Marc
> Thanks & Regards,
> Sachin.
> On 12/12/14 14:34, Sachin Sharma wrote:
> >/  Hi all,
> />/
> />/  I have created veth interfaces using  command "sudo ip link add veth1 
> type
> />/  veth peer name veth2". However, when I use command "sudo
> />/  ./tools/igb_uio_bind.py --force --bind=igb_uio veth1" to add veth into
> />/  dpdk. It gives me an error that "Unknown device: veth1. Please specify
> />/  device in "bus:slot.func" format". I do not see even veth interfaces 
> using
> />/  command "sudo ./tools/igb_uio_bind.py --status". Is there any way that I
> />/  can add these interfaces into dpdk?
> /
> >veth are software interfaces, meaning there is no real NIC behind.
> >That's why you cannot bind them to igb_uio.
>
> >You can use KNI interfaces for communicating a DPDK application with the
> >kernel interfaces and viceversa. There is an overhead on doing so, but
> >whether this is an appropriate solution or not, depends on your use
> >case. What do you plan to do?
>
> >Marc
>
>
> On Fri, Dec 12, 2014 at 2:34 PM, Sachin Sharma  > wrote:
>
> Hi all,
>
> I have created veth interfaces using  command "sudo ip link add
> veth1 type veth peer name veth2". However, when I use command
> "sudo ./tools/igb_uio_bind.py --force --bind=igb_uio veth1" to add
> veth into dpdk. It gives me an error that "Unknown device: veth1.
> Please specify device in "bus:slot.func" format". I do not see
> even veth interfaces using command "sudo ./tools/igb_uio_bind.py
> --status". Is there any way that I can add these interfaces into dpdk?
>
>
> Regards,
> Sachin.
>

[dpdk-dev] [PATCH] Fix linuxapp/kni Makefile

2014-12-12 Thread r k

Subject: [PATCH] Fix linuxapp/kni Makefile

When "make clean" is performed following message is seen
tr: missing operand after '.-'
Two strings must be given when translating.
Try 'tr --help' for more information

due to 'comma' not defined. Include appropriate .mk file.

Signed-off-by: Ravi Kerur 
---
 lib/librte_eal/linuxapp/kni/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/linuxapp/kni/Makefile
b/lib/librte_eal/linuxapp/kni/Makefile
index fb673d9..02ed5da 100644
--- a/lib/librte_eal/linuxapp/kni/Makefile
+++ b/lib/librte_eal/linuxapp/kni/Makefile
@@ -29,6 +29,7 @@
 #   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 #   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

+include $(RTE_SDK)/mk/internal/rte.build-pre.mk
 include $(RTE_SDK)/mk/rte.vars.mk

 #
--
1.9.1

[dpdk-dev] [PATCH] Minor fixes in rte_common.h file.

2014-12-12 Thread r k

Subject: [PATCH] Minor fixes in rte_common.h file.

Fix rte_is_power_of_2 since 0 is not.
Avoid branching instructions in RTE_MAX and RTE_MIN.

Signed-off-by: Ravi Kerur 
---
 lib/librte_eal/common/include/rte_common.h | 6 +++---
 lib/librte_pmd_e1000/igb_pf.c  | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_pf.c| 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_common.h
b/lib/librte_eal/common/include/rte_common.h
index 921b91f..e163f35 100644
--- a/lib/librte_eal/common/include/rte_common.h
+++ b/lib/librte_eal/common/include/rte_common.h
@@ -203,7 +203,7 @@ extern int RTE_BUILD_BUG_ON_detected_error;  static
inline int  rte_is_power_of_2(uint32_t n)  {
-   return ((n-1) & n) == 0;
+   return n && !(n & (n - 1));
 }

 /**
@@ -259,7 +259,7 @@ rte_align64pow2(uint64_t v)  #define RTE_MIN(a, b) ({ \
typeof (a) _a = (a); \
typeof (b) _b = (b); \
-   _a < _b ? _a : _b; \
+_b ^ ((_a ^ _b) & -(_a < _b)); \
})

 /**
@@ -268,7 +268,7 @@ rte_align64pow2(uint64_t v)  #define RTE_MAX(a, b) ({ \
typeof (a) _a = (a); \
typeof (b) _b = (b); \
-   _a > _b ? _a : _b; \
+   _a ^ ((_a ^ _b) & -(_a < _b)); \
})

 /*** Other general functions / macros / diff --git
a/lib/librte_pmd_e1000/igb_pf.c b/lib/librte_pmd_e1000/igb_pf.c index
bc3816a..546499c 100644
--- a/lib/librte_pmd_e1000/igb_pf.c
+++ b/lib/librte_pmd_e1000/igb_pf.c
@@ -321,11 +321,11 @@ igb_vf_set_mac_addr(struct rte_eth_dev *dev, uint32_t
vf, uint32_t *msgbuf)  static int  igb_vf_set_multicast(struct rte_eth_dev
*dev, __rte_unused uint32_t vf, uint32_t *msgbuf)  {
-   int i;
+   int16_t i;
uint32_t vector_bit;
uint32_t vector_reg;
uint32_t mta_reg;
-   int entries = (msgbuf[0] & E1000_VT_MSGINFO_MASK) >>
+   int32_t entries = (msgbuf[0] & E1000_VT_MSGINFO_MASK) >>
E1000_VT_MSGINFO_SHIFT;
uint16_t *hash_list = (uint16_t *)&msgbuf[1];
struct e1000_hw *hw =
E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c
b/lib/librte_pmd_ixgbe/ixgbe_pf.c index 51da1fd..426caf9 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
@@ -390,7 +390,7 @@ ixgbe_vf_set_multicast(struct rte_eth_dev *dev,
__rte_unused uint32_t vf, uint32
struct ixgbe_hw *hw =
IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ixgbe_vf_info *vfinfo =
*(IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private));
-   int nb_entries = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >>
+   int32_t nb_entries = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >>
IXGBE_VT_MSGINFO_SHIFT;
uint16_t *hash_list = (uint16_t *)&msgbuf[1];
uint32_t mta_idx;
@@ -399,7 +399,7 @@ ixgbe_vf_set_multicast(struct rte_eth_dev *dev,
__rte_unused uint32_t vf, uint32
const uint32_t IXGBE_MTA_BIT_SHIFT = 5;
const uint32_t IXGBE_MTA_BIT_MASK = (0x1 << IXGBE_MTA_BIT_SHIFT) -
1;
uint32_t reg_val;
-   int i;
+   int16_t i;

/* only so many hash values supported */
nb_entries = RTE_MIN(nb_entries, IXGBE_MAX_VF_MC_ENTRIES);
--
1.9.1

[dpdk-dev] [PATCH] Minor fixes in rte_common.h file.

2014-12-12 Thread r k

Ravi Kerur (1):
  Minor fixes in rte_common.h file.

 lib/librte_eal/common/include/rte_common.h | 6 +++---
 lib/librte_pmd_e1000/igb_pf.c  | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_pf.c| 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

--
1.9.1

[dpdk-dev] [PATCH] Fix linuxapp/kni makefile

2014-12-12 Thread r k

Re-sending as per guidelines.

Subject: [PATCH] Fix linuxapp/kni makefile

*** BLURB HERE ***

Ravi Kerur (1):
  Fix linuxapp/kni Makefile

 lib/librte_eal/linuxapp/kni/Makefile | 1 +
 1 file changed, 1 insertion(+)

--
1.9.1

[dpdk-dev] How to add veth interfaces in dpdk

2014-12-12 Thread Sachin Sharma

Hi Marc,


I have limited number of nodes (Linux nodes)  in my testbed. So, I
wanted to use veth interfaces for creating some of dpdk nodes in my
nodes.  You said that I can use KNI interfaces for this purpose. I am
very new to KNI terminology. Could you please redirect me to a
tutorial to create links with KNI interfaces and then adding those KNI
interfaces into dpdk?



Thanks & Regards,

Sachin.


On 12/12/14 14:34, Sachin Sharma wrote:
>* Hi all,
*>>* I have created veth interfaces using  command "sudo ip link add veth1 type
*>* veth peer name veth2". However, when I use command "sudo
*>* ./tools/igb_uio_bind.py --force --bind=igb_uio veth1" to add veth into
*>* dpdk. It gives me an error that "Unknown device: veth1. Please specify
*>* device in "bus:slot.func" format". I do not see even veth interfaces using
*>* command "sudo ./tools/igb_uio_bind.py --status". Is there any way that I
*>* can add these interfaces into dpdk?
*
>veth are software interfaces, meaning there is no real NIC behind.
>That's why you cannot bind them to igb_uio.

>You can use KNI interfaces for communicating a DPDK application with the
>kernel interfaces and viceversa. There is an overhead on doing so, but
>whether this is an appropriate solution or not, depends on your use
>case. What do you plan to do?

>Marc



On Fri, Dec 12, 2014 at 2:34 PM, Sachin Sharma 
wrote:
>
> Hi all,
>
> I have created veth interfaces using  command "sudo ip link add veth1 type
> veth peer name veth2". However, when I use command "sudo
> ./tools/igb_uio_bind.py --force --bind=igb_uio veth1" to add veth into
> dpdk. It gives me an error that "Unknown device: veth1. Please specify
> device in "bus:slot.func" format". I do not see even veth interfaces using
> command "sudo ./tools/igb_uio_bind.py --status". Is there any way that I
> can add these interfaces into dpdk?
>
>
> Regards,
> Sachin.
>

[dpdk-dev] [PATCH v2] doc: add known issue for iommu and igb_uio

2014-12-12 Thread Nicolas Dichtel

Le 12/12/2014 14:38, Gonzalez Monroy, Sergio a ?crit :
>> From: Nicolas Dichtel [mailto:nicolas.dichtel at 6wind.com]
>> Sent: Friday, December 12, 2014 1:28 PM
>>
>> Le 12/12/2014 14:20, Gonzalez Monroy, Sergio a ?crit :
 From: Nicolas Dichtel [mailto:nicolas.dichtel at 6wind.com]
 Sent: Friday, December 12, 2014 1:00 PM

 Le 12/12/2014 13:06, Sergio Gonzalez Monroy a ?crit :
> mapping for the device on the iommu resulting in memory access errors.
 Do you have the linux commit id which introduces the problem? And the
 one which solves it?

>>> I do have the commits, but I was not sure if I should add the info (at least
>> no previous release note did).
>>>
>>> Introduced in:
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit
>>> /drivers/iommu/intel-
>> iommu.c?id=816997d03bca9fabcee65f3481eb0297103ece
>>> b7
>>>
>>> Solved in:
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit
>>> /drivers/iommu/intel-
>> iommu.c?id=1196c2fb0407683c2df92d3d09f9144d428308
>>> 94
>> I think it's a very useful info.
>>
> Alright, I will add it in the description of the release note.
>
> Any ideas why patchwork is not showing these patches?
No ...
Thomas, do you have an idea?


Regards,
Nicolas

[dpdk-dev] [PATCH v3] doc: add known issue for iommu and igb_uio

2014-12-12 Thread Sergio Gonzalez Monroy

Known issue regarding iommu/VT-d and igb_uio on Linux kernel version 3.15
to 3.17 where unbinding the device from the driver removes the 1:1 mapping
for the device on the iommu resulting in memory access errors.

Signed-off-by: Sergio Gonzalez Monroy 
---
v3:
 Remove reference number
 Add Linux commit links

v2:
 Fix title uppercase
 Add extra blank line to show proper indentation

v1:
 Known igb_uio issue when iommu/vt-d is on

 doc/guides/rel_notes/known_issues.rst | 38 +++
 1 file changed, 38 insertions(+)

diff --git a/doc/guides/rel_notes/known_issues.rst 
b/doc/guides/rel_notes/known_issues.rst
index 8ef654a..0cfecab 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -1026,3 +1026,41 @@ Stopping the port does not down the link on Intel?? 40G 
ethernet controller
 | Driver/Module  | Poll Mode Driver (PMD)  
 |
 || 
 |
 
++--+
+
+Devices bound to igb_uio with VT-d enabled do not work on Linux* kernel 
3.15-3.17
+-
+
+++--+
+| Title  | Devices bound to igb_uio with VT-d enabled 
do not work on Linux* kernel 3.15-3.17|
+|| 
 |
+++==+
+| Description| When VT-d is enabled (iommu=pt 
intel_iommu=on), devices are 1:1 mapped.  |
+|| In the Linux* kernel unbinding devices from 
drivers removes that mapping which   |
+|| result in IOMMU errors. 
 |
+|| 
 |
+|| Introduced in Linux `kernel 3.15 commit 
`_,   
 |
+|| solved in Linux `kernel 3.18 commit 
`_.   
 |
+|| 
 |
+++--+
+| Implication| Devices will not be allowed to access 
memory.|
+|| 
 |
+++--+
+| Resolution/ Workaround | Use earlier or later kernel versions, or 
avoid driver binding on boot by |
+|| blacklisting the driver modules.
 |
+|| ie. in the case of ixgbe, we can pass the 
kernel command line option:|
+|| 
 |
+|| modprobe.blacklist=ixgbe
 |
+|| 
 |
+|| This way we do not need to unbind the 
device to bind it to igb_uio.  |
+|| 
 |
+++--+
+| Affected Environment/ Platform | Linux* systems with kernel versions 3.15 to 
3.17 |
+|

[dpdk-dev] How to add veth interfaces in dpdk

2014-12-12 Thread Marc Sune

On 12/12/14 14:34, Sachin Sharma wrote:
> Hi all,
>
> I have created veth interfaces using  command "sudo ip link add veth1 type
> veth peer name veth2". However, when I use command "sudo
> ./tools/igb_uio_bind.py --force --bind=igb_uio veth1" to add veth into
> dpdk. It gives me an error that "Unknown device: veth1. Please specify
> device in "bus:slot.func" format". I do not see even veth interfaces using
> command "sudo ./tools/igb_uio_bind.py --status". Is there any way that I
> can add these interfaces into dpdk?

veth are software interfaces, meaning there is no real NIC behind. 
That's why you cannot bind them to igb_uio.

You can use KNI interfaces for communicating a DPDK application with the 
kernel interfaces and viceversa. There is an overhead on doing so, but 
whether this is an appropriate solution or not, depends on your use 
case. What do you plan to do?

Marc

>
>
> Regards,
> Sachin.

[dpdk-dev] How to add veth interfaces in dpdk

2014-12-12 Thread Sachin Sharma

Hi all,

I have created veth interfaces using  command "sudo ip link add veth1 type
veth peer name veth2". However, when I use command "sudo
./tools/igb_uio_bind.py --force --bind=igb_uio veth1" to add veth into
dpdk. It gives me an error that "Unknown device: veth1. Please specify
device in "bus:slot.func" format". I do not see even veth interfaces using
command "sudo ./tools/igb_uio_bind.py --status". Is there any way that I
can add these interfaces into dpdk?


Regards,
Sachin.

[dpdk-dev] [PATCH v2] doc: add known issue for iommu and igb_uio

2014-12-12 Thread Nicolas Dichtel

Le 12/12/2014 14:20, Gonzalez Monroy, Sergio a ?crit :
>> From: Nicolas Dichtel [mailto:nicolas.dichtel at 6wind.com]
>> Sent: Friday, December 12, 2014 1:00 PM
>>
>> Le 12/12/2014 13:06, Sergio Gonzalez Monroy a ?crit :
>>> Known issue regarding iommu/VT-d and igb_uio on Linux kernel version
>>> 3.15 to 3.17 where unbinding the device from the driver removes the
>>> 1:1
>> Do you mean that the problem doesn't exist with a linux 3.18?
>>
> Yes,  as it is mentioned in the resolution/workarounds, earlier or later 
> kernels do solve the issue.
Ok, I was not sure that the last version was tested ;-)

>
>>> mapping for the device on the iommu resulting in memory access errors.
>> Do you have the linux commit id which introduces the problem? And the one
>> which solves it?
>>
> I do have the commits, but I was not sure if I should add the info (at least 
> no previous release note did).
>
> Introduced in:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/iommu/intel-iommu.c?id=816997d03bca9fabcee65f3481eb0297103eceb7
>
> Solved in:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/iommu/intel-iommu.c?id=1196c2fb0407683c2df92d3d09f9144d42830894
I think it's a very useful info.


Thank you,
Nicolas

[dpdk-dev] [PATCH v2] doc: add known issue for iommu and igb_uio

2014-12-12 Thread Nicolas Dichtel

Le 12/12/2014 13:06, Sergio Gonzalez Monroy a ?crit :
> Known issue regarding iommu/VT-d and igb_uio on Linux kernel version
> 3.15 to 3.17 where unbinding the device from the driver removes the 1:1
Do you mean that the problem doesn't exist with a linux 3.18?

> mapping for the device on the iommu resulting in memory access errors.
Do you have the linux commit id which introduces the problem? And the one which
solves it?

>
> Signed-off-by: Sergio Gonzalez Monroy 
> ---
Please, also don't forget to explain what you have changed between the v1 and
v2.
The history should be put here, after the '---', something like:

v2: update that


Regards,
Nicolas

[dpdk-dev] [PATCH] enic: corrected the usage of VFIO_PRESENT

2014-12-12 Thread Sujith Sankar

This patch corrects the usage of the flag VFIO_PRESENT in enic driver.  
This has uncovered a few warnings, and this patch corrects those too.

Signed-off-by: Sujith Sankar 
---
 lib/librte_pmd_enic/Makefile|  1 +
 lib/librte_pmd_enic/enic.h  |  1 +
 lib/librte_pmd_enic/enic_main.c | 12 
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_enic/Makefile b/lib/librte_pmd_enic/Makefile
index a2a623f..3271960 100644
--- a/lib/librte_pmd_enic/Makefile
+++ b/lib/librte_pmd_enic/Makefile
@@ -39,6 +39,7 @@ LIB = librte_pmd_enic.a

 CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_enic/vnic/
 CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_enic/
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal/
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -Wno-strict-aliasing

diff --git a/lib/librte_pmd_enic/enic.h b/lib/librte_pmd_enic/enic.h
index c43417c..c692bab 100644
--- a/lib/librte_pmd_enic/enic.h
+++ b/lib/librte_pmd_enic/enic.h
@@ -182,6 +182,7 @@ extern void enic_dev_stats_get(struct enic *enic,
struct rte_eth_stats *r_stats);
 extern void enic_dev_stats_clear(struct enic *enic);
 extern void enic_add_packet_filter(struct enic *enic);
+extern void *enic_err_intr_handler(void *arg);
 extern void enic_set_mac_address(struct enic *enic, uint8_t *mac_addr);
 extern void enic_del_mac_address(struct enic *enic);
 extern unsigned int enic_cleanup_wq(struct enic *enic, struct vnic_wq *wq);
diff --git a/lib/librte_pmd_enic/enic_main.c b/lib/librte_pmd_enic/enic_main.c
index e4f43c5..469cb6c 100644
--- a/lib/librte_pmd_enic/enic_main.c
+++ b/lib/librte_pmd_enic/enic_main.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -46,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "enic_compat.h"
 #include "enic.h"
@@ -561,6 +563,7 @@ enic_free_consistent(__rte_unused struct rte_pci_device 
*hwdev,
/* Nothing to be done */
 }

+#ifndef VFIO_PRESENT
 static void
 enic_intr_handler(__rte_unused struct rte_intr_handle *handle,
void *arg)
@@ -572,6 +575,7 @@ enic_intr_handler(__rte_unused struct rte_intr_handle 
*handle,

enic_log_q_error(enic);
 }
+#endif

 int enic_enable(struct enic *enic)
 {
@@ -978,12 +982,13 @@ static void enic_eventfd_init(struct enic *enic)
 void *enic_err_intr_handler(void *arg)
 {
struct enic *enic = (struct enic *)arg;
-   unsigned int intr = enic_msix_err_intr(enic);
-   ssize_t size;
uint64_t data;

while (1) {
-   size = read(enic->eventfd, &data, sizeof(data));
+   if (-1 == read(enic->eventfd, &data, sizeof(data))) {
+   dev_err(enic, "eventfd read failed with error %d\n", 
errno);
+   continue;
+   }
dev_err(enic, "Err intr.\n");
vnic_intr_return_all_credits(&enic->intr);

@@ -1035,7 +1040,6 @@ static int enic_set_intr_mode(struct enic *enic)
int *fds;
int size;
int ret = -1;
-   int index;

if (enic->intr_count < 1) {
dev_err(enic, "Unsupported resource conf.\n");
-- 
1.9.1

[dpdk-dev] [PATCH v2] doc: add known issue for iommu and igb_uio

2014-12-12 Thread Gonzalez Monroy, Sergio

> From: Nicolas Dichtel [mailto:nicolas.dichtel at 6wind.com]
> Sent: Friday, December 12, 2014 1:28 PM
> 
> Le 12/12/2014 14:20, Gonzalez Monroy, Sergio a ?crit :
> >> From: Nicolas Dichtel [mailto:nicolas.dichtel at 6wind.com]
> >> Sent: Friday, December 12, 2014 1:00 PM
> >>
> >> Le 12/12/2014 13:06, Sergio Gonzalez Monroy a ?crit :
> >>> mapping for the device on the iommu resulting in memory access errors.
> >> Do you have the linux commit id which introduces the problem? And the
> >> one which solves it?
> >>
> > I do have the commits, but I was not sure if I should add the info (at least
> no previous release note did).
> >
> > Introduced in:
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit
> > /drivers/iommu/intel-
> iommu.c?id=816997d03bca9fabcee65f3481eb0297103ece
> > b7
> >
> > Solved in:
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit
> > /drivers/iommu/intel-
> iommu.c?id=1196c2fb0407683c2df92d3d09f9144d428308
> > 94
> I think it's a very useful info.
> 
Alright, I will add it in the description of the release note.

Any ideas why patchwork is not showing these patches?
Mutt seems to be displaying the symbol fine.

Thanks,
Sergio

> 
> Thank you,
> Nicolas

[dpdk-dev] [PATCH v2] doc: add known issue for iommu and igb_uio

2014-12-12 Thread Gonzalez Monroy, Sergio

> From: Nicolas Dichtel [mailto:nicolas.dichtel at 6wind.com]
> Sent: Friday, December 12, 2014 1:00 PM
> 
> Le 12/12/2014 13:06, Sergio Gonzalez Monroy a ?crit :
> > Known issue regarding iommu/VT-d and igb_uio on Linux kernel version
> > 3.15 to 3.17 where unbinding the device from the driver removes the
> > 1:1
> Do you mean that the problem doesn't exist with a linux 3.18?
> 
Yes,  as it is mentioned in the resolution/workarounds, earlier or later 
kernels do solve the issue.

> > mapping for the device on the iommu resulting in memory access errors.
> Do you have the linux commit id which introduces the problem? And the one
> which solves it?
> 
I do have the commits, but I was not sure if I should add the info (at least no 
previous release note did).

Introduced in:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/iommu/intel-iommu.c?id=816997d03bca9fabcee65f3481eb0297103eceb7

Solved in:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/iommu/intel-iommu.c?id=1196c2fb0407683c2df92d3d09f9144d42830894

> >
> > Signed-off-by: Sergio Gonzalez Monroy
> > 
> > ---
> Please, also don't forget to explain what you have changed between the v1
> and v2.
> The history should be put here, after the '---', something like:
> 
> v2: update that
> 
I did forget. I will add it for v3.
I need to fix issue with patch not  showing in patchwork (I think cause of R 
(registered) symbol in context line).

Regards,
Sergio
> 
> Regards,
> Nicolas

[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-12 Thread Carew, Alan

Hi Paolo,

> 2014-12-09 18:35, Paolo Bonzini:
> >  Did you make any progress in Qemu/KVM community?
> >  We need to be sync'ed up with them to be sure we share the same
> goal.
> >  I want also to avoid using a solution which doesn't fit with
> >  their plan.
> >  Remember that we already had this problem with ivshmem which
> was
> >  planned to be dropped.
> > >>>
> > >>> Unfortunately, I have not yet received any feedback:
> > >>> http://lists.nongnu.org/archive/html/qemu-devel/2014-
> 11/msg01103.h
> > >>> tml
> > >>
> > >> Just to add to what Alan said above, this capability does not exist
> > >> in qemu at the moment, and based on there having been no feedback
> > >> on th qemu mailing list so far, I think it's reasonable to assume
> > >> that it will not be implemented in the immediate future. The VM
> > >> Power Management feature has also been designed to allow easy
> > >> migration to a qemu-based solution when this is supported in
> > >> future. Therefore, I'd be in favour of accepting this feature into DPDK
> now.
> > >>
> > >> It's true that the implementation is a work-around, but there have
> > >> been similar cases in DPDK in the past. One recent example that
> > >> comes to mind is userspace vhost. The original implementation could
> > >> also be considered a work-around, but it met the needs of many in
> > >> the community. Now, with support for vhost-user in qemu 2.1, that
> > >> implementation is being improved. I'd see VM Power Management
> > >> following a similar path when this capability is supported in qemu.
> >
> > I wonder if this might be papering over a bug in the host cpufreq
> > driver.  If the guest is not doing much and leaving a lot of idle CPU
> > time, the host should scale down the frequency of that CPU.  In the
> > case of pinned VCPUs this should really "just work".  What is the
> > problem that is being solved?
> >
> > Paolo
> 
> Alan, Pablo, please could you explain your logic with VM power
> management?
> 
> --
> Thomas

The problem is deterministic control of host CPU frequency and the DPDK usage
model.
A hands-off power governor will scale based on workload, whether this is a host
application or VM, so no problems or bug there.

Where this solution fits is where an application wants to control its own
power policy, for example l3fwd_power uses librte_power library to change
frequency via apci_cpufreq based on application heuristics rather than
relying on an inbuilt policy for example ondemand or performance.

This ability has existed in DPDK for host usage for some time and VM power
management allows this use case to be extended to cater for virtual machines
by re-using the librte_power interface to encapsulate the VM->Host
comms and provide an example means of managing such communications.

 I hope this clears it up a bit.

Thanks,
Alan.

[dpdk-dev] [PATCH] igb_uio: fix Xen compatibility with kernel 3.18

2014-12-12 Thread Jincheng Miao

ACK, kernel-3.18.0 drops _PAGE_IOMAP.

On 12/12/2014 03:33 AM, Shu Shen wrote:
> This patch fixes build failing with undefined symbol _PAGE_IOMAP with
> kernel 3.18.
>
> The Xen-specific _PAGE_IOMAP PTE flag was removed in kernel 3.18 and
> could be used for other purpose in future. This patch ensures that
> _PAGE_IOMAP flag is only used for kernels before 3.18.
>
> Signed-off-by: Shu Shen 
> ---
>   lib/librte_eal/linuxapp/igb_uio/compat.h  | 4 
>   lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 2 ++
>   2 files changed, 6 insertions(+)
>
> diff --git a/lib/librte_eal/linuxapp/igb_uio/compat.h 
> b/lib/librte_eal/linuxapp/igb_uio/compat.h
> index 9454382..c1d45a6 100644
> --- a/lib/librte_eal/linuxapp/igb_uio/compat.h
> +++ b/lib/librte_eal/linuxapp/igb_uio/compat.h
> @@ -11,6 +11,10 @@
>   #define pci_cfg_access_unlock pci_unblock_user_cfg_access
>   #endif
>   
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 18, 0)
> +#define HAVE_PTE_MASK_PAGE_IOMAP
> +#endif
> +
>   #ifndef PCI_MSIX_ENTRY_SIZE
>   #define PCI_MSIX_ENTRY_SIZE 16
>   #define  PCI_MSIX_ENTRY_LOWER_ADDR  0
> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> index 47ff2f3..60a2db1 100644
> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> @@ -287,7 +287,9 @@ igbuio_dom0_mmap_phys(struct uio_info *info, struct 
> vm_area_struct *vma)
>   
>   idx = (int)vma->vm_pgoff;
>   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +#if defined(HAVE_PTE_MASK_PAGE_IOMAP)
>   vma->vm_page_prot.pgprot |= _PAGE_IOMAP;
> +#endif
>   
>   return remap_pfn_range(vma,
>   vma->vm_start,

[dpdk-dev] [PATCH] examples: fix unchecked malloc return value in ip_pipeline

2014-12-12 Thread Bruce Richardson

Static analysis shows that once instance of rte_zmalloc is missing
a return value check in the code. This is fixed by adding a return
value check. The malloc call itself is moved to earlier in the function
so that no work is done unless all memory allocation requests have
succeeded - thereby removing the need for rollback on error.

Signed-off-by: Bruce Richardson 
---
 examples/ip_pipeline/cmdline.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/examples/ip_pipeline/cmdline.c b/examples/ip_pipeline/cmdline.c
index 13d565e..152acb5 100644
--- a/examples/ip_pipeline/cmdline.c
+++ b/examples/ip_pipeline/cmdline.c
@@ -1093,7 +1093,7 @@ cmd_firewall_add_parsed(
__attribute__((unused)) void *data)
 {
struct cmd_firewall_add_result *params = parsed_result;
-   struct app_rule rule, *old_rule;
+   struct app_rule rule, *old_rule, *new_rule = NULL;
struct rte_mbuf *msg;
struct app_msg_req *req;
struct app_msg_resp *resp;
@@ -1148,6 +1148,18 @@ cmd_firewall_add_parsed(
if (msg == NULL)
rte_panic("Unable to allocate new message\n");

+   /* if we need a new rule structure, allocate it before we go further */
+   if (old_rule == NULL) {
+   new_rule = rte_zmalloc_socket("CLI", sizeof(struct app_rule),
+   RTE_CACHE_LINE_SIZE, rte_socket_id());
+   if (new_rule == NULL) {
+   printf("Cannot allocate memory for new rule\n");
+   rte_ctrlmbuf_free(msg);
+   return;
+   }
+   }
+
+
/* Fill request message */
req = (struct app_msg_req *)rte_ctrlmbuf_data(msg);
req->type = APP_MSG_REQ_FW_ADD;
@@ -1190,12 +1202,6 @@ cmd_firewall_add_parsed(
printf("Request FIREWALL_ADD failed (%u)\n", resp->result);
else {
if (old_rule == NULL) {
-   struct app_rule *new_rule = (struct app_rule *)
-   rte_zmalloc_socket("CLI",
-   sizeof(struct app_rule),
-   RTE_CACHE_LINE_SIZE,
-   rte_socket_id());
-
memcpy(new_rule, &rule, sizeof(rule));
TAILQ_INSERT_TAIL(&firewall_table, new_rule, entries);
n_firewall_rules++;
-- 
1.9.3

[dpdk-dev] [PATCH] examples/vhost: Fix vlan offload issue

2014-12-12 Thread Ouyang Changchun

The following commit break vm2vm hard mode test cases:
commit db4014f2b65cb31bf209cadd5bcec778ca137fe2
Author: Huawei Xie 
Date:   Thu Nov 13 06:34:07 2014 +0800
examples/vhost: use factorized default Rx/Tx configuration

Investigation show that it needs enabling vlan offload since it is turn off by 
default,
and Tx need it, especially when vm2vm is in hard mode.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 9331910..04f0118 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -390,6 +390,9 @@ port_init(uint8_t port)
txconf = &dev_info.default_txconf;
rxconf->rx_drop_en = 1;

+   /* Enable vlan offload */
+   txconf->txq_flags &= ~ETH_TXQ_FLAGS_NOVLANOFFL;
+
/*
 * Zero copy defers queue RX/TX start to the time when guest
 * finishes its startup and packet buffers from that guest are
-- 
1.8.4.2

[dpdk-dev] [PATCH v2] doc: add known issue for iommu and igb_uio

2014-12-12 Thread Sergio Gonzalez Monroy

Known issue regarding iommu/VT-d and igb_uio on Linux kernel version
3.15 to 3.17 where unbinding the device from the driver removes the 1:1
mapping for the device on the iommu resulting in memory access errors.

Signed-off-by: Sergio Gonzalez Monroy 
---
 doc/guides/rel_notes/known_issues.rst | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/doc/guides/rel_notes/known_issues.rst 
b/doc/guides/rel_notes/known_issues.rst
index 8ef654a..a8aab52 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -1026,3 +1026,37 @@ Stopping the port does not down the link on Intel?? 40G 
ethernet controller
 | Driver/Module  | Poll Mode Driver (PMD)  
 |
 || 
 |
 
++--+
+
+Devices bound to igb_uio with VT-d enabled do not work on Linux* kernel 
3.15-3.17
+-
+
+++--+
+| Title  | Devices bound to igb_uio with VT-d enabled 
do not work on Linux* kernel 3.15-3.17|
+|| 
 |
+++==+
+| Reference #| IXA00373938 
 |
+|| 
 |
+++--+
+| Description| When VT-d is enabled (iommu=pt 
intel_iommu=on), devices are 1:1 mapped.  |
+|| In the Linux* kernel unbinding devices from 
drivers removes that mapping which   |
+|| result in IOMMU errors. 
 |
+|| 
 |
+++--+
+| Implication| Devices will not be allowed to access 
memory.|
+|| 
 |
+++--+
+| Resolution/ Workaround | Use earlier or later kernel versions, or 
avoid driver binding on boot by |
+|| blacklisting the driver modules.
 |
+|| ie. in the case of ixgbe, we can pass the 
kernel command line option:|
+|| 
 |
+|| modprobe.blacklist=ixgbe
 |
+|| 
 |
+|| This way we do not need to unbind the 
device to bind it to igb_uio.  |
+|| 
 |
+++--+
+| Affected Environment/ Platform | Linux* systems with kernel versions 3.15 to 
3.17 |
+|| 
 |
+++--+
+| Driver/Module  | igb_uio module  
 |
+|| 
 |
+++--+
-- 
1.9.3

[dpdk-dev] [PATCH v2 2/2] doc: Updated image files for rte_mbuf changes in 1.8

2014-12-12 Thread Bruce Richardson

The two image files showing the structure of the rte_mbuf data
structure required some minor updates to take account of the changes
introduced to the structure in the 1.8 release

Signed-off-by: Bruce Richardson 
---
Change in V2:
* Made minor changes to the position of some text to try and work around
the different rendering of the svg files in different browsers, which led
to test being truncated unexpectedly.
---
 doc/guides/prog_guide/img/mbuf1.svg | 44 --
 doc/guides/prog_guide/img/mbuf2.svg | 61 ++---
 2 files changed, 49 insertions(+), 56 deletions(-)

diff --git a/doc/guides/prog_guide/img/mbuf1.svg 
b/doc/guides/prog_guide/img/mbuf1.svg
index 0b8ff00..70507a3 100644
--- a/doc/guides/prog_guide/img/mbuf1.svg
+++ b/doc/guides/prog_guide/img/mbuf1.svg
@@ -48,7 +48,7 @@
height="288.34286"
id="svg3868"
version="1.1"
-   inkscape:version="0.48.4 r9939"
+   inkscape:version="0.48.5 r10040"
sodipodi:docname="mbuf1.svg"
sodipodi:version="0.32"
inkscape:output_extension="org.inkscape.output.svg.inkscape">
@@ -328,16 +328,16 @@
  inkscape:pageopacity="0.0"
  inkscape:pageshadow="2"
  inkscape:zoom="2.8"
- inkscape:cx="424.95386"
+ inkscape:cx="344.29455"
  inkscape:cy="143.63151"
  inkscape:document-units="px"
  inkscape:current-layer="layer1"
  showgrid="false"
- inkscape:window-width="1650"
- inkscape:window-height="1059"
- inkscape:window-x="177"
- inkscape:window-y="111"
- inkscape:window-maximized="0"
+ inkscape:window-width="1920"
+ inkscape:window-height="1017"
+ inkscape:window-x="1592"
+ inkscape:window-y="285"
+ inkscape:window-maximized="1"
  fit-margin-top="0.1"
  fit-margin-left="0.1"
  fit-margin-right="0.1"
@@ -416,17 +416,17 @@
 rte_mbuf (type is pkt)
+ style="font-weight:bold">struct rte_mbuf 
 rte_pktmbuf_mtod(m)or m->pkt.data
+ y="119.50503"
+ id="tspan5223">rte_pktmbuf_mtod(m)
 m->pkt.next = NULL
+ x="83.928574"
+ y="284.14789">m->pkt.next = NULL
 tailroom
 multi-segmented rte_mbuf (type is 
pkt)
+ style="font-weight:bold">multi-segmented rte_mbuf
 rte_pktmbuf_mtod(m)or m->pkt.data
+ x="78.793297"
+ y="498.27075"
+ id="tspan5223-9">rte_pktmbuf_mtod(m)
 rte_pktmbuf_pktlen(m) = rte_pktmbuf_datalen(m) 
+rte_pktmbuf_datalen(mseg2) + 
rte_pktmbuf_datalen(mseg3)
+ x="233.53358"
+ y="483.38562"
+ id="tspan6985">rte_pktmbuf_datalen(mseg2) + 
rte_pktmbuf_datalen(mseg3)

[dpdk-dev] [PATCH] doc: Add known issue for iommu and igb_uio

2014-12-12 Thread Sergio Gonzalez Monroy

Known issue regarding iommu/VT-d and igb_uio on Linux kernel version
3.15 to 3.17 where unbinding the device from the driver removes the 1:1
mapping for the device on the iommu resulting in memory access errors.

Signed-off-by: Sergio Gonzalez Monroy 
---
 doc/guides/rel_notes/known_issues.rst | 32 
 1 file changed, 32 insertions(+)

diff --git a/doc/guides/rel_notes/known_issues.rst 
b/doc/guides/rel_notes/known_issues.rst
index 8ef654a..72bd0de 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -1026,3 +1026,35 @@ Stopping the port does not down the link on Intel?? 40G 
ethernet controller
 | Driver/Module  | Poll Mode Driver (PMD)  
 |
 || 
 |
 
++--+
+
+Devices bound to igb_uio with VT-d enabled do not work on Linux* kernel 
3.15-3.17
+-
+
+++--+
+| Title  | Devices bound to igb_uio with VT-d enabled 
do not work on Linux* kernel 3.15-3.17|
+|| 
 |
+++==+
+| Reference #| IXA00373938 
 |
+|| 
 |
+++--+
+| Description| When VT-d is enabled (iommu=pt 
intel_iommu=on), devices are 1:1 mapped.  |
+|| In the Linux* kernel unbinding devices from 
drivers removes that mapping which   |
+|| result in IOMMU errors. 
 |
+|| 
 |
+++--+
+| Implication| Devices will not be allowed to access 
memory.|
+|| 
 |
+++--+
+| Resolution/ Workaround | Use earlier or later kernel versions, or 
avoid driver binding on boot by |
+|| blacklisting the driver modules.
 |
+|| ie. in the case of ixgbe, we can pass the 
kernel command line option:|
+|| modprobe.blacklist=ixgbe
 |
+|| This way we do not need to unbind the 
device to bind it to igb_uio.  |
+|| 
 |
+++--+
+| Affected Environment/ Platform | Linux* systems with kernel versions 3.15 to 
3.17 |
+|| 
 |
+++--+
+| Driver/Module  | igb_uio module  
 |
+|| 
 |
+++--+
-- 
1.9.3

[dpdk-dev] [PATCH 0/2 v4] Fix two compile issues with i686 platform

2014-12-12 Thread Neil Horman

On Fri, Dec 12, 2014 at 04:09:46PM +0100, Thomas Monjalon wrote:
> 2014-12-12 06:38, Neil Horman:
> > On Thu, Dec 11, 2014 at 10:21:44PM +0100, Thomas Monjalon wrote:
> > > 2014-12-11 15:28, Qiu, Michael:
> > > > On 2014/12/11 21:26, Neil Horman wrote:
> > > > > On Thu, Dec 11, 2014 at 01:56:06AM +0100, Thomas Monjalon wrote:
> > > > >>> These two issues are both introuduced by commit b77b5639:
> > > > >>> mem: add huge page sizes for IBM Power
> > > > >>>
> > > > >>> Michael Qiu (2):
> > > > >>>   Fix compile issue with hugepage_sz in 32-bit system
> > > > >>>   Fix compile issue of eal with icc compile
> > > > >> Acked-by: Thomas Monjalon 
> > > > >>
> > > > >> Applied
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > > Wait, why did you apply this patch?  We had outstanding debate on it, 
> > > > > and
> > > > > Michael indicated he was testing a new version of the patch.
> > > > 
> > > > Yes, I test the solution you suggest :) and it mostly works, but with a
> > > > little issue.
> > > > I have re-post not the old version.
> > > 
> > > Neil, v4 is a new version implementing what you suggested.
> > > There was no comment and it looks good so I applied it.
> > >  
> > > > Do you take a look at?
> > > 
> > I didn't.  Apologies, I see the v4 now.  That said, something is off.  If 
> > you
> > look at the list archives, I see patch 0/2 v4 in the list, but not 1/2 or 
> > 2/2,
> > theres no actual patch that got posted.  Was it sent to you privately?
> 
> No there are public and you are Cc.
> 
> > > I think Neil missed the v4. Sorry to not have pinged you, I wanted rc4 for
> > > validation at this time.
> > > Neil do you agree this version is OK or do you see some issue to fix?
> > > 
> > Again, I think Michales send went sideways.  0/4 went to the list but the 
> > actual
> > patches only went to you Thomas.  Please post them to the list
> 
> They were correctly posted:
> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/9282/focus=9754
> 
Hmm, I apologize.  somehow these haven't show up in my reader.  I must have a
bogus filter somewhere.

Looking at the patch, it looks good.  Thank you, and sorry for the noise.

For the record.
Acked-by: Neil Horman 

> -- 
> Thomas
>

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-12 Thread Linhaifeng



On 2014/12/12 1:13, Xie, Huawei wrote:
>>
>> Only support one vhost-user port ?
> 
> Do you mean vhost server by "port"?
> If that is the case, yes, now only one vhost server is supported for multiple 
> virtio devices.
> As stated in the cover letter, we have requirement and plan for multiple 
> server support,
> though I am not sure if it is absolutely necessary.
> 
I think if you also want to suport client mode (qemu as server) is necessary to 
implement multiple server.

>>
>> Can you mmap the region if gpa is 0? When i run VM with two numa node (qemu
>> will create two hugepage file) found that always failed to mmap with the 
>> region
>> which gpa is 0.
> 
> Current implementation doesn't assume there is only one huge page file to 
> back the guest memory.
> It maps every region using the fd of that region. 
> Could you please paste your guest VM command line here?
> 
>>
>> BTW can we ensure the memory regions cover with all the memory of hugepage
>> for VM?
> 
> I think so, because virtio devices could use any normal guest memory, but we 
> needn't ensure that.
> We only need to map the region passed to us from qemu vhost, which should be 
> enough to translate
> the GPA in vring from virtio in guest, otherwise it is the bug of qemu vhost.
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] A question about hugepage initialization time

2014-12-12 Thread Bruce Richardson

On Fri, Dec 12, 2014 at 04:07:40AM +, L?szl? Vadkerti wrote:
> On Thu, 11 Dec,  2014, Bruce Richardson wrote:
> > On Wed, Dec 10, 2014 at 07:16:59PM +, L?szl? Vadkerti wrote:
> > >
> > > On Wed, 10 Dec 2014, Bruce Richardson wrote:
> > >
> > > > On Wed, Dec 10, 2014 at 09:29:26AM -0500, Neil Horman wrote:
> > > >> On Wed, Dec 10, 2014 at 10:32:25AM +, Bruce Richardson wrote:
> > > >>> On Tue, Dec 09, 2014 at 02:10:32PM -0800, Stephen Hemminger wrote:
> > >  On Tue, 9 Dec 2014 11:45:07 -0800 &rew
> > >   wrote:
> > > 
> > > >> Hey Folks,
> > > >>
> > > >> Our DPDK application deals with very large in memory data
> > > >> structures, and can potentially use tens or even hundreds of
> > gigabytes of hugepage memory.
> > > >> During the course of development, we've noticed that as the
> > > >> number of huge pages increases, the memory initialization time
> > > >> during EAL init gets to be quite long, lasting several minutes
> > > >> at present.  The growth in init time doesn't appear to be linear,
> > which is concerning.
> > > >>
> > > >> This is a minor inconvenience for us and our customers, as
> > > >> memory initialization makes our boot times a lot longer than it
> > > >> would otherwise be.  Also, my experience has been that really
> > > >> long operations often are hiding errors - what you think is
> > > >> merely a slow operation is actually a timeout of some sort,
> > > >> often due to misconfiguration. This leads to two
> > > >> questions:
> > > >>
> > > >> 1. Does the long initialization time suggest that there's an
> > > >> error happening under the covers?
> > > >> 2. If not, is there any simple way that we can shorten memory
> > > >> initialization time?
> > > >>
> > > >> Thanks in advance for your insights.
> > > >>
> > > >> --
> > > >> Matt Laswell
> > > >> laswell at infiniteio.com
> > > >> infinite io, inc.
> > > >>
> > > >
> > > > Hello,
> > > >
> > > > please find some quick comments on the questions:
> > > > 1.) By our experience long initialization time is normal in case
> > > > of large amount of memory. However this time depends on some
> > things:
> > > > - number of hugepages (pagefault handled by kernel is pretty
> > > > expensive)
> > > > - size of hugepages (memset at initialization)
> > > >
> > > > 2.) Using 1G pages instead of 2M will reduce the initialization
> > > > time significantly. Using wmemset instead of memset adds an
> > > > additional 20-30% boost by our measurements. Or, just by
> > > > touching the pages but not cleaning them you can have still some
> > > > more speedup. But in this case your layer or the applications
> > > > above need to do the cleanup at allocation time (e.g. by using
> > rte_zmalloc).
> > > >
> > > > Cheers,
> > > > &rew
> > > 
> > >  I wonder if the whole rte_malloc code is even worth it with a
> > >  modern kernel with transparent huge pages? rte_malloc adds very
> > >  little value and is less safe and slower than glibc or other
> > >  allocators. Plus you lose the ablilty to get all the benefit out of
> > valgrind or electric fence.
> > > >>>
> > > >>> While I'd dearly love to not have our own custom malloc lib to
> > > >>> maintain, for DPDK multiprocess, rte_malloc will be hard to
> > > >>> replace as we would need a replacement solution that similarly
> > > >>> guarantees that memory mapped in process A is also available at
> > > >>> the same address in process B. :-(
> > > >>>
> > > >> Just out of curiosity, why even bother with multiprocess support?
> > > >> What you're talking about above is a multithread model, and your
> > > >> shoehorning multiple processes into it.
> > > >> Neil
> > > >>
> > > >
> > > > Yep, that's pretty much what it is alright. However, this
> > > > multiprocess support is very widely used by our customers in
> > > > building their applications, and has been in place and supported
> > > > since some of the earliest DPDK releases. If it is to be removed, it
> > > > needs to be replaced by something that provides equivalent
> > > > capabilities to application writers (perhaps something with more
> > > > fine-grained sharing
> > > > etc.)
> > > >
> > > > /Bruce
> > > >
> > >
> > > It is probably time to start discussing how to pull in our multi
> > > process and memory management improvements we were talking about in
> > > our DPDK Summit presentation:
> > > https://www.youtube.com/watch?v=907VShi799k#t=647
> > >
> > > Multi-process model could have several benefits mostly in the high
> > > availability area (telco requirement) due to better separation,
> > > controlling permissions (per process RO or RW page mappings), single
> > > process restartability, improved startup and core dumping time etc.
> > >
> > > As a summary of our memory management additions, it allows an
> > > appl

[dpdk-dev] [PATCH 3/3] unit tests add mode 4 unit test

2014-12-12 Thread Michal Jastrzebski

This patch adds unit tests for mode 4. It is split into separate
file to avoid problems with other modes that does not need to
look into packets payload.
This patch includes also a modification of maximum number of ports
used in their tests for bonding modes 0-3 from 16 to 6.
Additionally some typos fix is included.

Signed-off-by: Pawel Wodkowski 
---
 app/test/Makefile  |1 +
 app/test/test.h|  111 +--
 app/test/test_link_bonding.c   |2 +-
 app/test/test_link_bonding_mode4.c | 1412 
 4 files changed, 1480 insertions(+), 46 deletions(-)
 create mode 100644 app/test/test_link_bonding_mode4.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 4311f96..ee0e95a 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -129,6 +129,7 @@ SRCS-y += virtual_pmd.c
 SRCS-y += packet_burst_generator.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += test_acl.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += test_link_bonding.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += test_link_bonding_mode4.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_RING) += test_pmd_ring.c
 SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c

diff --git a/app/test/test.h b/app/test/test.h
index 72e67b9..53ef11d 100644
--- a/app/test/test.h
+++ b/app/test/test.h
@@ -36,62 +36,80 @@

 #include 

-#define TEST_ASSERT(cond, msg, ...) do {   
\
-   if (!(cond)) {  
\
-   printf("TestCase %s() line %d failed: " 
\
-   msg "\n", __func__, __LINE__, ##__VA_ARGS__);   
\
-   return -1;  
\
-   }   
\
+#define TEST_SUCCESS  (0)
+#define TEST_FAILED  (-1)
+
+/* Before including test.h file you can define
+ * TEST_TRACE_FAILURE(_file, _line, _func) macro to better trace/debug test
+ * failures. Mostly useful in test development phase. */
+#ifndef TEST_TRACE_FAILURE
+# define TEST_TRACE_FAILURE(_file, _line, _func)
+#endif
+
+#define TEST_ASSERT(cond, msg, ...) do { \
+   if (!(cond)) {   \
+   printf("TestCase %s() line %d failed: "  \
+   msg "\n", __func__, __LINE__, ##__VA_ARGS__);   
 \
+   TEST_TRACE_FAILURE(__FILE__, __LINE__, __func__);\
+   return TEST_FAILED;  \
+   }\
 } while (0)

-#define TEST_ASSERT_EQUAL(a, b, msg, ...)  {   
\
-   if (!(a == b)) {
\
-   printf("TestCase %s() line %d failed: " 
\
-   msg "\n", __func__, __LINE__, ##__VA_ARGS__);   
\
-   return -1;  
\
-   }   
\
+#define TEST_ASSERT_EQUAL(a, b, msg, ...) do {   \
+   if (!(a == b)) { \
+   printf("TestCase %s() line %d failed: "  \
+   msg "\n", __func__, __LINE__, ##__VA_ARGS__);   
 \
+   TEST_TRACE_FAILURE(__FILE__, __LINE__, __func__);\
+   return TEST_FAILED;  \
+   }\
 } while (0)

-#define TEST_ASSERT_NOT_EQUAL(a, b, msg, ...) do { 
\
-   if (!(a != b)) {
\
-   printf("TestCase %s() line %d failed: " 
\
-   msg "\n", __func__, __LINE__, ##__VA_ARGS__);   
\
-   return -1;  
\
-   }   
\
+#define TEST_ASSERT_NOT_EQUAL(a, b, msg, ...) do {   \
+   if (!(a != b)) { \
+   printf("TestCase %s() line %d failed: "  \
+   msg "\n", __func__, __LINE__, ##__VA_ARGS__);   
 \
+   TEST_TRACE_FAILURE(__FILE__, __LINE__, __fun

[dpdk-dev] [PATCH 2/3] PMD ring MAC management, fix initialization, link up/down

2014-12-12 Thread Michal Jastrzebski

  * add MAC management per device
  * fix initialization procedure
  * add link up/down functions

Signed-off-by: Pawel Wodkowski 
Signed-off-by: Tomasz Kulasek 
---
 lib/librte_pmd_ring/rte_eth_ring.c |   62 +---
 1 file changed, 57 insertions(+), 5 deletions(-)

diff --git a/lib/librte_pmd_ring/rte_eth_ring.c 
b/lib/librte_pmd_ring/rte_eth_ring.c
index 4f1b6ed..b6047cc 100644
--- a/lib/librte_pmd_ring/rte_eth_ring.c
+++ b/lib/librte_pmd_ring/rte_eth_ring.c
@@ -44,6 +44,8 @@
 #define ETH_RING_ACTION_CREATE "CREATE"
 #define ETH_RING_ACTION_ATTACH "ATTACH"

+static const char *ring_ethdev_driver_name = "Ring PMD";
+
 static const char *valid_arguments[] = {
ETH_RING_NUMA_NODE_ACTION_ARG,
NULL
@@ -62,10 +64,11 @@ struct pmd_internals {

struct ring_queue rx_ring_queues[RTE_PMD_RING_MAX_RX_RINGS];
struct ring_queue tx_ring_queues[RTE_PMD_RING_MAX_TX_RINGS];
+
+   struct ether_addr address;
 };


-static struct ether_addr eth_addr = { .addr_bytes = {0} };
 static const char *drivername = "Rings PMD";
 static struct rte_eth_link pmd_link = {
.link_speed = 1,
@@ -121,6 +124,20 @@ eth_dev_stop(struct rte_eth_dev *dev)
 }

 static int
+eth_dev_set_link_down(struct rte_eth_dev *dev)
+{
+   dev->data->dev_link.link_status = 0;
+   return 0;
+}
+
+static int
+eth_dev_set_link_up(struct rte_eth_dev *dev)
+{
+   dev->data->dev_link.link_status = 1;
+   return 0;
+}
+
+static int
 eth_rx_queue_setup(struct rte_eth_dev *dev,uint16_t rx_queue_id,
uint16_t nb_rx_desc __rte_unused,
unsigned int socket_id __rte_unused,
@@ -199,6 +216,20 @@ eth_stats_reset(struct rte_eth_dev *dev)
 }

 static void
+eth_mac_addr_remove(struct rte_eth_dev *dev __rte_unused,
+   uint32_t index __rte_unused)
+{
+}
+
+static void
+eth_mac_addr_add(struct rte_eth_dev *dev __rte_unused,
+   struct ether_addr *mac_addr __rte_unused,
+   uint32_t index __rte_unused,
+   uint32_t vmdq __rte_unused)
+{
+}
+
+static void
 eth_queue_release(void *q __rte_unused) { ; }
 static int
 eth_link_update(struct rte_eth_dev *dev __rte_unused,
@@ -207,6 +238,8 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
 static struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
+   .dev_set_link_up = eth_dev_set_link_up,
+   .dev_set_link_down = eth_dev_set_link_down,
.dev_configure = eth_dev_configure,
.dev_infos_get = eth_dev_info,
.rx_queue_setup = eth_rx_queue_setup,
@@ -216,6 +249,8 @@ static struct eth_dev_ops ops = {
.link_update = eth_link_update,
.stats_get = eth_stats_get,
.stats_reset = eth_stats_reset,
+   .mac_addr_remove = eth_mac_addr_remove,
+   .mac_addr_add = eth_mac_addr_add,
 };

 int
@@ -229,6 +264,9 @@ rte_eth_from_rings(const char *name, struct rte_ring *const 
rx_queues[],
struct rte_pci_device *pci_dev = NULL;
struct pmd_internals *internals = NULL;
struct rte_eth_dev *eth_dev = NULL;
+   struct eth_driver *eth_drv = NULL;
+   struct rte_pci_id *id_table = NULL;
+
unsigned i;

/* do some parameter checking */
@@ -251,6 +289,10 @@ rte_eth_from_rings(const char *name, struct rte_ring 
*const rx_queues[],
if (pci_dev == NULL)
goto error;

+   id_table = rte_zmalloc_socket(name, sizeof(*id_table), 0, numa_node);
+   if (id_table == NULL)
+   goto error;
+
internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
if (internals == NULL)
goto error;
@@ -260,6 +302,10 @@ rte_eth_from_rings(const char *name, struct rte_ring 
*const rx_queues[],
if (eth_dev == NULL)
goto error;

+   eth_drv = rte_zmalloc_socket(name, sizeof(*eth_drv), 0, numa_node);
+   if (eth_drv == NULL)
+   goto error;
+
/* now put it all together
 * - store queue data in internals,
 * - store numa_node info in pci_driver
@@ -278,18 +324,24 @@ rte_eth_from_rings(const char *name, struct rte_ring 
*const rx_queues[],
internals->tx_ring_queues[i].rng = tx_queues[i];
}

+   eth_drv->pci_drv.name = ring_ethdev_driver_name;
+   eth_drv->pci_drv.id_table = id_table;
+
pci_dev->numa_node = numa_node;
+   pci_dev->driver = ð_drv->pci_drv;

data->dev_private = internals;
data->port_id = eth_dev->data->port_id;
data->nb_rx_queues = (uint16_t)nb_rx_queues;
data->nb_tx_queues = (uint16_t)nb_tx_queues;
data->dev_link = pmd_link;
-   data->mac_addrs = ð_addr;
+   data->mac_addrs = &internals->address;

-   eth_dev ->data = data;
-   eth_dev ->dev_ops = &ops;
-   eth_dev ->pci_dev = pci_

[dpdk-dev] [PATCH 1/3] bond change warning

2014-12-12 Thread Michal Jastrzebski

Remove function name from warning. 

Signed-off-by: Pawel Wodkowski 
---
 lib/librte_pmd_bond/rte_eth_bond_pmd.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_bond/rte_eth_bond_pmd.c 
b/lib/librte_pmd_bond/rte_eth_bond_pmd.c
index 539baa4..9169040 100644
--- a/lib/librte_pmd_bond/rte_eth_bond_pmd.c
+++ b/lib/librte_pmd_bond/rte_eth_bond_pmd.c
@@ -891,9 +891,9 @@ bond_ethdev_mode_set(struct rte_eth_dev *eth_dev, int mode)

eth_dev->rx_pkt_burst = bond_ethdev_rx_burst_8023ad;
eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_8023ad;
-   RTE_BOND_LOG(WARNING,
+   RTE_LOG(WARNING, PMD,
"Using mode 4, it is necessary to do TX burst 
and RX burst "
-   "at least every 100ms.");
+   "at least every 100ms.\n");
break;
case BONDING_MODE_ADAPTIVE_TRANSMIT_LOAD_BALANCING:
eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_tlb;
-- 
1.7.9.5

[dpdk-dev] [PATCH 0/3] bond mode 4: add unit tests

2014-12-12 Thread Michal Jastrzebski

These patches add unit tests for mode 4. They also changes ring pmd 
to behave more like ordinary pmd device.

Pawel Wodkowski (3):
  bond-change-warning
  PMD-ring-MAC-management-fix-initialization-link-up-d
  unit-tests-add-mode-4-unit-test

 app/test/Makefile  |1 +
 app/test/test.h|  111 ++-
 app/test/test_link_bonding.c   |2 +-
 app/test/test_link_bonding_mode4.c | 1412 
 lib/librte_pmd_bond/rte_eth_bond_pmd.c |4 +-
 lib/librte_pmd_ring/rte_eth_ring.c |   62 +-
 6 files changed, 1539 insertions(+), 53 deletions(-)
 create mode 100644 app/test/test_link_bonding_mode4.c

-- 
1.7.9.5

[dpdk-dev] [PATCH 14/15] app/test: turn off cpu flag checks for tile architecture

2014-12-12 Thread Neil Horman

On Fri, Dec 12, 2014 at 04:10:21PM +0800, Tony Lu wrote:
> >-Original Message-
> >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> >Sent: Thursday, December 11, 2014 9:39 PM
> >To: Tony Lu
> >Cc: dev at dpdk.org
> >Subject: Re: [dpdk-dev] [PATCH 14/15] app/test: turn off cpu flag checks
> for tile
> >architecture
> >
> >On Thu, Dec 11, 2014 at 12:43:36PM +0800, Tony Lu wrote:
> >> >-Original Message-
> >> >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> >> >Sent: Tuesday, December 09, 2014 11:03 PM
> >> >To: Zhigang Lu
> >> >Cc: dev at dpdk.org
> >> >Subject: Re: [dpdk-dev] [PATCH 14/15] app/test: turn off cpu flag
> >> >checks
> >> for tile
> >> >architecture
> >> >
> >> >On Mon, Dec 08, 2014 at 04:59:37PM +0800, Zhigang Lu wrote:
> >> >> Tile processor doesn't have CPU flag hardware registers, so this
> >> >> patch turns off cpu flag checks for tile.
> >> >>
> >> >> Signed-off-by: Zhigang Lu 
> >> >> Signed-off-by: Cyril Chemparathy 
> >> >> ---
> >> >>  app/test/test_cpuflags.c | 2 +-
> >> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c
> >> >> index
> >> >> 5aeba5d..da93af5 100644
> >> >> --- a/app/test/test_cpuflags.c
> >> >> +++ b/app/test/test_cpuflags.c
> >> >> @@ -113,7 +113,7 @@ test_cpuflags(void)
> >> >>
> >> >> printf("Check for ICACHE_SNOOP:\t\t");
> >> >> CHECK_FOR_FLAG(RTE_CPUFLAG_ICACHE_SNOOP);
> >> >> -#else
> >> >> +#elif !defined(RTE_ARCH_TILE)
> >> >> printf("Check for SSE:\t\t");
> >> >> CHECK_FOR_FLAG(RTE_CPUFLAG_SSE);
> >> >>
> >> >Please stop this.  It doesn't make sense for a library that supports
> >> multiple
> >> >arches, we need some way to generically test for flags that doesn't
> >> >involve forcing applications to do ton's of ifdeffing.  Perhaps
> >> rte_cpu_get_flag_enabled
> >> >needs to do a flag table lookup based on the detected arch at run
> >> >time, and return the appropriate response.  In the case of tile, it
> >> >can just be an
> >> empty
> >> >table, so 0 is always returned.  But making an application
> >> >responsible for
> >> doing
> >> >arch checks is a guarantee to write non-portable applications
> >> >
> >> >Neil
> >> >
> >>
> >> Thanks for taking a look at this.
> >> This change just follows what PPC did in commit 9ae15538. The root
> >> cause is
> >Yes, and I objected to it there as well:
> >http://dpdk.org/ml/archives/dev/2014-November/008628.html
> >
> >To which the response was effectively "Sure, we'll do that later".  You're
> >effectively making the same argument.  If no one ever steps up to change
> the
> >interface when adding a new arch, it will never get done, and we'll have a
> >fragmented cpuflag test mechanism that creates completely non-portable code
> >accross arches.
> >
> >> that
> >> the test_cpuflags.c explicitly tests X86-specific CPU flags, so we
> >> might need to revise this test case to make it
> >> architecture-independent.
> >>
> >Exactly what I said in my email to the powerpc people.  If you're going to
> add a
> >new arch, and a given interface doesn't support doing so, please try to
> re-design
> >the interface to make it more friendly, otherwise we'll be left with
> >unmaintainable code.
> 
> Agree, Make sense.
> 
> >Thinking about it, you probably don't even need to change the api call to
> do this.
> >You just need to create a unified map for all flags of all supported
> arches, that is
> >to say a two dimensional array with the indicies [arch][flag] where the
> stored
> >value is the arch specific data to help determine if the feature is
> supported, or a
> >universal "not supported" flag.
> 
> Yes, in order not to break ACL or other libs/apps, we need to make the flags
> of all
> supported arches accessible.  But I don't feel as strongly to create a
> [arch][flag] array,
> since checking if the specified flag is supported is at runtime, so we can
> not assign it in
> a predefine array according to its arch. For example, some old X86 processor
> does not
> support SSE3.
> 
> Instead I prefer a one dimensional arch-specific [flag] array which contains
> all the flags
> of all supported arches, and we mark the flags that do not belong to the
> current arch
> as "not available".
> 
> To implement this, we need to move the enum rte_cpu_flag_t from
> arch-specific
> rte_cpuflags.c to the generic one, and combine them as one enumeration.
> 
> ACL rte_acl_init() itself has a bug that it should check the return value of
> rte_cpu_get_flag_enabled() if it is "1", but not "!0", as it may return
> "-EFAULT".
> 

Thats all fine with me.  We can debate the relative merits of implementation
when its available.  Its getting the interface right that I think is currently
the priority.
Neil

> Thanks
> -Zhigang
> 
>

[dpdk-dev] [PATCH 12/15] eal/tile: add mPIPE buffer stack mempool provider

2014-12-12 Thread Neil Horman

On Fri, Dec 12, 2014 at 04:30:27PM +0800, Tony Lu wrote:
> >-Original Message-
> >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> >Sent: Tuesday, December 09, 2014 10:07 PM
> >To: Zhigang Lu
> >Cc: dev at dpdk.org
> >Subject: Re: [dpdk-dev] [PATCH 12/15] eal/tile: add mPIPE buffer stack
> mempool
> >provider
> >
> >On Mon, Dec 08, 2014 at 04:59:35PM +0800, Zhigang Lu wrote:
> >> TileGX: Modified mempool to allow for variable metadata.
> >> Signed-off-by: Zhigang Lu 
> >> Signed-off-by: Cyril Chemparathy 
> >> ---
> >>  app/test-pmd/mempool_anon.c   |   2 +-
> >>  app/test/Makefile |   6 +-
> >>  app/test/test_mempool_tile.c  | 217 
> >>  lib/Makefile  |   5 +
> >>  lib/librte_eal/linuxapp/eal/Makefile  |   4 +
> >>  lib/librte_mempool_tile/Makefile  |  48 +++
> >>  lib/librte_mempool_tile/rte_mempool.c | 381 
> >> lib/librte_mempool_tile/rte_mempool.h | 634
> >> ++
> >>  8 files changed, 1295 insertions(+), 2 deletions(-)  create mode
> >> 100644 app/test/test_mempool_tile.c  create mode 100644
> >> lib/librte_mempool_tile/Makefile  create mode 100644
> >> lib/librte_mempool_tile/rte_mempool.c
> >>  create mode 100644 lib/librte_mempool_tile/rte_mempool.h
> >>
> >NAK, this creates an alternate, parallel implementation of the mempool api,
> >that re-implements some aspects of the mempool api, but not others.  This
> will
> >make for completely no-portable applications (both to and from the tile
> arch),
> >and create maintnence problems, in that features for mempool will need to
> be
> >implemented in multiple libraries.
> >
> >I understand wanting to use mpipe, and thats perfectly fine, but creating
> >no-portable apis to do so isn't the right way to go.  Instead, why not just
> allow
> >applications to use mpipe by initalizing it via the gxio library and
> crating a
> >mempool using the existing libraries' rte_mempool_xmem_create api call,
> which
> >allows for existing allocated memory space to be managed as a mempool?
> 
> Yes, the mempool we are using is very much tile-specific, as we want to use
> the mpipe
> hardware managed buffer pool to implement the mempool, which greatly improve
> the
> performance of mempool.
> 
> As Cyril replied in a previous email:
> The alternative is to not include support for the hardware managed buffer
> pool, but that
> decision incurs a significant performance hit.
> 
You're not reading my previous notes clearly.  There is no need to completely
re-implement the mempool interface.  Doing so is completely broken.  There
already exists a mechanism in the existing interface to allow you to pass
pre-allocated memory to mempool for management.  As such, you can use the
existing gxio library to allocate and initlize your mpipe hardware, and use the
existing mempool infrastructure to manage it.

Until then, I re-iterate my NAK

Neil

[dpdk-dev] [PATCH v3] test-pmd: Fix pointer aliasing error

2014-12-12 Thread Qiu, Michael

On 2014/12/12 1:51, r k wrote:
Thomas, Michael,

Wouldn't it cause unaligned memory access (new changes as well as the previous 
code)? Wondering if get_unaligned/put_unaligned macros similar to the ones used 
in kernel be ported to user-space?


I think it will not, as all buf point to are struct udp_hdr/struct 
tcp_hdr/struct ipv6_psd_header/struct ipv4_psd_header, they are all aligned 
with uint16_t.

Thanks
Michael
Thanks,
Ravi

On Wed, Dec 10, 2014 at 4:54 PM, Thomas Monjalon mailto:thomas.monjalon at 6wind.com>> wrote:
> > app/test-pmd/csumonly.c: In function ?get_psd_sum?:
> > build/include/rte_ip.h:161: error: dereferencing pointer ?u16?
> > does break strict-aliasing rules
> > build/include/rte_ip.h:157: note: initialized from here
> > ...
> >
> > The root cause is that, compile enable strict aliasing by default,
> > while in function rte_raw_cksum() try to convert 'const char *'
> > to 'const uint16_t *'.
> >
> > This patch is one workaround fix.
> >
> > Signed-off-by: Michael Qiu mailto:michael.qiu at 
> > intel.com>>
> > ---
> > v3 --> v2:
> > use uintptr_t instead of unsigned long to
> > save pointer.
> >
> > v2 --> v1:
> > Workaround solution instead of shut off the
> > gcc params.
>
> This workaround is to solve the compile issue of GCC strict-aliasing(Two
> different type pointers should not be point to the same memory address).
>
> For GCC 4.4.7 it will definitely occurs if  flags "-fstrict-aliasing"
> and "-Wall" used.

Acked-by: Thomas Monjalon mailto:thomas.monjalon 
at 6wind.com>>>

Applied with a comment in the code.

Thanks
--
Thomas

[dpdk-dev] [PATCH] vmxnet3: set txq_flags in default TX conf

2014-12-12 Thread Zhang, XiaonanX

Tested-by: Xiaonan Zhang 

- Tested Commit: Pablo
- OS: Fedora20 3.15.8-200.fc20.x86_64
- GCC: gcc version 4.8.3 20140624
- CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
- NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection 
[8086:10fb]
- Default x86_64-native-linuxapp-gcc configuration
- Total 6 cases, 6 passed, 0 failed
- Test Environment setup

- Topology #1: Create 2VMs (Fedora 20, 64bit);for each VM, pass through one 
physical port(Niantic 82599) to VM, and also create one virtual device: vmxnet3 
in VM. Between two VMs, use one vswitch to connect 2 vmxnet3. In summary, PF1 
   and vmxnet3A are in VM1; PF2 and vmxnet3B are in VM2.The traffic 
flow for l2fwd/l3fwd is as below:
   Ixia -> PF1 -> vmxnet3A -> vswitch -> vmxnet3B -> PF2 -> Ixia. 
- Topology #2: Create 1VM (Fedora 20, 64bit), on this VM, created 2 vmxnet3, 
called vmxnet3A, vmxnet3B; create 2 vswitch, vswitchA connecting PF1 and 
vmxnet3A, while vswitchB connecting PF2 and vmxnet3B. The traffic flow is as 
below:
   Ixia -> PF1 -> vswitchA -> vmxnet3A -> vmxnet3B -> vswitchB -> 
PF2 -> Ixia.

- Test Case1: L2fwd with Topology#1 
  Description: Set up topology#1(in prerequisite session), and bind PF1, PF2, 
Vmxnet3A, vmxnet3B to DPDK poll-mode driver (igb_uio).
   Increase the flow at line rate (uni-directional traffic), send 
the flow at different packet size (64bytes, 128bytes, 256bytes, 512bytes, 
1024bytes, 1280bytes and 1518bytes) and check the received packets/rate to see  
   if any unexpected behavior, such as no receives after N packets. 
  Command / instruction:
To run the l2fwd example in 2VMs:
./build/l2fwd -c f -n 4 -- -p 0x3
- Test IXIA Flow prerequisite: Ixia port1 sends 5 packets to PF1, and the flow 
should have PF1's MAC as destination MAC. Check if ixia port2 have received the 
5 packets.
  Expected test result:
Passed

- Test Case2: L3fwd-VF with Topology#1
  Description: Set up topology#1(in prerequisite session), and bind PF1, PF2, 
Vmxnet3A, vmxnet3B to DPDK poll-mode driver (igb_uio)
   Increase the flow at line rate (uni-directional traffic), send 
the flow at different packet size (64bytes, 128bytes, 256bytes, 512bytes, 
1024bytes, 1280bytes and 1518bytes) and check the received packets/rate to see  
   if any unexpected behavior, such as no receives after N packets.
  Command / instruction:
To run the l3fwd-vf example in 2VMs:
./build/l3fwd-vf -c 0x6 -n 4 -- -p 0x3 --config 
"(0,0,1),(1,0,2)"
- Test IXIA Flow prerequisite: Ixia port1 sends 5 packets to PF1, and the flow 
should have PF1's MAC as destination MAC and have 2.1.1.x as destination IP. 
Check if ixia port2 have received the 5 packets.
  Expected test result:
Passed

- Test Case3: L2fwd with Topology#2
  Description: Set up topology#2(in prerequisite session), and bind vmxnet3A 
and vmxnet3B to DPDK poll-mode driver (igb_uio).
   Increase the flow at line rate (uni-directional traffic), send 
the flow at different packet size (64bytes, 128bytes, 256bytes, 512bytes, 
1024bytes, 1280bytes and 1518bytes) and check the received packets/rate to see  
   if any unexpected behavior, such as no receives after N packets.
  Command / instruction:
To run the l2fwd example in VM1:
./build/l2fwd -c f -n 4 -- -p 0x3
- Test IXIA Flow prerequisite: Ixia port1 sends 5 packets to port0 (vmxnet3A), 
and the flow should have port0's MAC as destination MAC. Check if ixia port2 
have received the 5 packets. Similar things need to be done at ixia port2.
  Expected test result:
Passed

- Test Case4: L3fwd-VF with Topology#2
  Description: Set up topology#2(in prerequisite session), and bind vmxnet3A 
and vmxnet3B to DPDK poll-mode driver (igb_uio).  
   Increase the flow at line rate (uni-directional traffic), send 
the flow at different packet size (64bytes, 128bytes, 256bytes, 512bytes, 
1024bytes, 1280bytes and 1518bytes) and check the received packets/rate to see  
   if any unexpected behavior, such as no receives after N packets.
  Command / instruction:
To run the l3fwd-vf example in VM1:
./build/l3fwd-vf -c 0x6 -n 4 -- -p 0x3 --config 
"(0,0,1),(1,0,2)"
- Test IXIA Flow prerequisite: Ixia port1 sends 5 packets to port0(vmxnet3A), 
and the flow should have port0's MAC as destination MAC and have 2.1.1.x as 
destination IP. Check if ixia port2 have received the 5 packets.

  Expected test result:
Passed

- Test Case5: Timer test with Topology#2
  Description: Set up topology#2(in prerequisite session), and bind vmxnet3A 
and vmxnet3B to DPDK poll-mode driver (igb_uio).
  Command / instruction:
Build timer sample and run the sample

[dpdk-dev] [PATCH 0/2 v4] Fix two compile issues with i686 platform

2014-12-12 Thread Neil Horman

On Thu, Dec 11, 2014 at 10:21:44PM +0100, Thomas Monjalon wrote:
> 2014-12-11 15:28, Qiu, Michael:
> > On 2014/12/11 21:26, Neil Horman wrote:
> > > On Thu, Dec 11, 2014 at 01:56:06AM +0100, Thomas Monjalon wrote:
> > >>> These two issues are both introuduced by commit b77b5639:
> > >>> mem: add huge page sizes for IBM Power
> > >>>
> > >>> Michael Qiu (2):
> > >>>   Fix compile issue with hugepage_sz in 32-bit system
> > >>>   Fix compile issue of eal with icc compile
> > >> Acked-by: Thomas Monjalon 
> > >>
> > >> Applied
> > >>
> > >> Thanks
> > >>
> > > Wait, why did you apply this patch?  We had outstanding debate on it, and
> > > Michael indicated he was testing a new version of the patch.
> > 
> > Yes, I test the solution you suggest :) and it mostly works, but with a
> > little issue.
> > I have re-post not the old version.
> 
> Neil, v4 is a new version implementing what you suggested.
> There was no comment and it looks good so I applied it.
>  
> > Do you take a look at?
> 
I didn't.  Apologies, I see the v4 now.  That said, something is off.  If you
look at the list archives, I see patch 0/2 v4 in the list, but not 1/2 or 2/2,
theres no actual patch that got posted.  Was it sent to you privately?

> I think Neil missed the v4. Sorry to not have pinged you, I wanted rc4 for
> validation at this time.
> Neil do you agree this version is OK or do you see some issue to fix?
> 
Again, I think Michales send went sideways.  0/4 went to the list but the actual
patches only went to you Thomas.  Please post them to the list

Neil

> -- 
> Thomas
>

[dpdk-dev] [PATCH] examples/vhost: Fix vlan offload issue

2014-12-12 Thread Fu, JingguoX

Comments:
This patch fixed all cases related to VM to VM scenario hard switch for 
virtio.

Basic Information
Patch name: examples/vhost: Fix vlan offload issue
Brief description:  Verify all VM to VM scenario cases based hard switch
Test Flag:  Tested-by
Tester name:jingguox.fu at intel.com
Test environment:
OS: Fedora20 3.11.10-301.fc20.x86_64
GCC: gcc version 4.8.3 20140911
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
NIC: 82599ES 10-Gigabit SFI/SFP+ Network 
Connection [8086:10fb]

Commit ID:  4c8b41715168101dc76fd9b264658837ff54ab76

Detailed Testing Information:
DPDK SW Configuration:
open CONFIG_RTE_LIBRTE_VHOST
x86_64-native-linuxapp-gcc configuration   

Test Result Summary:Total 6 cases, 6 passed, 0 failed

Test Case - name:
vm to vm by dpdk forward hard switch (virtio 
one copy suite)
Test Case - Description:
Check vhost switch can forward packets received 
from the first virtio on VM1 to the second virtio on VM2
Test Case - command / instruction:
Start vhost-switch on host and start testpmd on 
guests
On host:
taskset -c 8-10 vhost-switch -c 0xf00 
-n 4 --huge-dir /mnt/huge --socket-mem 1024,1024 -- -p 1 --mergeable 0 
--zero-copy 0 --vm2vm 2
On guests:
VM1:
./testpmd -c 0xf -n 4 -- -i --txqflags 
0x0f00 --eth-peer=0,00:00:00:00:00:0A

Set fwd type tx_first
testpmd>set fwd mac
testpmd>start tx_first

VM2:
testpmd -c 0xf -n 4 -- -i --txqflags 
0x0f00
testpmd>set fwd mac
testpmd>start tx_first

Send packets without vlan id: ether|ip|udp 
packets
Test Case - expected test result:
packet generator can get the packets from the 
vf on VM2

Test Case - name:   
vm to vm by linux forward hard switch (virtio 
one copy suite)
Test Case - Description:
Check vhost switch can forward packets received 
from the first virtio on VM1 to the second virtio on VM2
Test Case -test command / instruction:
Start vhost-switch on host, use virtios as 
Ethernet devices
On host:
taskset -c 8-10 vhost-switch -c 0xf00 
-n 4 --huge-dir /mnt/huge --socket-mem 1024,1024 -- -p 1 --mergeable 0 
--zero-copy 0 --vm2vm 2 
On guests:
VM1:
ip addr add 192.168.2.2/24 dev eth1
ip neigh add 192.168.2.1 lladdr 
00:00:02:00:00:a1 dev eth1
ip link set dev eth1 up 
ip addr add 192.168.3.2/24 dev eth0
ip neigh add 192.168.3.1 lladdr 
52:00:00:54:00:02 dev eth0
ip link set dev eth0 up
VM2:
ip addr add 192.168.3.2/24 dev eth1
ip neigh add 192.168.3.1 lladdr 
00:00:02:00:00:a1 dev eth1
ip link set dev eth1 up
ip addr add 192.168.2.2/24 dev eth0
ip neigh add 192.168.2.1 lladdr 
00:00:02:00:00:a1 dev eth0
ip link set dev eth0 up

arp -s 192.168.3.1 00:00:02:0a:0a

Send packets without vlan id: ether|ip|udp 
packets
Test Case - expected test result:
packet generator can get the packets from the 
vf on VM2


Test Case - name:
vm to vm by dpdk forward hard switch (virtio 
one copy jumbo frame suite)
Test Case - Description:
Check vhost switch can forward packets received 
from the first virtio on VM1 to the second virtio on VM2
Test Case - command / instruction:
Start vhost-switch on host and start testpmd on 
guests
On host:
taskset -c 8-10 vhost-switc

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-12 Thread Liang, Cunming

Thanks Mirek. That's a good point which wasn't mentioned in cover letter.
For 'rte_timer', I only expect it be used within the 'legacy per-lcore' pthread.
I'm appreciate if you can give me some cases which can't use it to fit.
In case have to use 'rte_timer' in multi-pthread, there are some prerequisites 
and limitations.
1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do pthread 
init by rte_pthread_prepare)
2. As 'rte_timer' is not preemptable, when using rte_timer_manager/reset in 
multi-pthread, make sure they're not on the same core.

-Cunming

> -Original Message-
> From: Walukiewicz, Miroslaw
> Sent: Thursday, December 11, 2014 5:57 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> Thank you Cunming for explanation.
> 
> What about DPDK timers? They also depend on rte_lcore_id() to avoid spinlocks.
> 
> Mirek
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > Sent: Thursday, December 11, 2014 3:05 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> >
> > Scope & Usage Scenario
> > 
> >
> > DPDK usually pin pthread per core to avoid task switch overhead. It gains
> > performance a lot, but it's not efficient in all cases. In some cases, it 
> > may
> > too expensive to use the whole core for a lightweight workload. It's a
> > reasonable demand to have multiple threads per core and each threads
> > share CPU
> > in an assigned weight.
> >
> > In fact, nothing avoid user to create normal pthread and using cgroup to
> > control the CPU share. One of the purpose for the patchset is to clean the
> > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > demonstrates performance gain by proactive 'yield' when doing idle loop
> > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> >
> >
> > Changes to DPDK libraries
> > ==
> >
> > Some of DPDK libraries must run in DPDK environment.
> >
> > # rte_mempool
> >
> > In rte_mempool doc, it mentions a thread not created by EAL must not use
> > mempools. The root cause is it uses a per-lcore cache inside mempool.
> > And 'rte_lcore_id()' will not return a correct value.
> >
> > The patchset changes this a little. The index of mempool cache won't be a
> > lcore_id. Instead of it, using a linear number generated by the allocator.
> > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > during creation. For those normal pthread expecting to use rte_mempool, it
> > requires to apply for a linear id explicitly. Now the mempool cache looks 
> > like
> > a per-thread base. The linear ID actually identify for the linear thread id.
> >
> > However, there's another problem. The rte_mempool is not preemptable.
> > The
> > problem comes from rte_ring, so talk together in next section.
> >
> > # rte_ring
> >
> > rte_ring supports multi-producer enqueue and multi-consumer dequeue.
> > But it's
> > not preemptable. There's conversation talking about this before.
> > http://dpdk.org/ml/archives/dev/2013-November/000714.html
> >
> > Let's say there's two pthreads running on the same core doing enqueue on
> > the
> > same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it
> > has
> > already modified the prod.head, the 2nd pthread will spin until the 1st one
> > scheduled agian. It causes time wasting. In addition, if the 2nd pthread has
> > absolutely higer priority, it's more terrible.
> >
> > But it doesn't means we can't use. Just need to narrow down the situation
> > when
> > it's used by multi-pthread on the same core.
> > - It CAN be used for any single-producer or single-consumer situation.
> > - It MAY be used by multi-producer/consumer pthread whose scheduling
> > policy
> > are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty
> > befor
> > using it.
> > - It MUST not be used by multi-producer/consumer pthread, while some of
> > their
> > scheduling policies is SCHED_FIFO or SCHED_RR.
> >
> >
> > Performance
> > ==
> >
> > It loses performance by introducing task switching. On packet IO 
> > perspective,
> > we can gain some back by improving IO effective rate. When the pthread do
> > idle
> > loop on an empty rx queue, it should proactively yield. We can also slow
> > down
> > rx for a bit while to take more advantage of the bulk receiving in the next
> > loop. In practice, increase the rx ring size also helps to improve the 
> > overrall
> > throughput.
> >
> >
> > Cgroup Control
> > 
> >
> > Here's a simple example, there's four pthread doing packet IO on the same
> > core.
> > We expect the CPU share rate is 1:1:2:4.
> > > mkdir /sys/fs/cgroup/cpu/dpdk
> > > mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> > > mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> > > mkdir /sys/fs/cgroup/cpu/dpdk/thread

[dpdk-dev] A question about hugepage initialization time

2014-12-12 Thread László Vadkerti

On Thu, 11 Dec,  2014, Bruce Richardson wrote:
> On Wed, Dec 10, 2014 at 07:16:59PM +, L?szl? Vadkerti wrote:
> >
> > On Wed, 10 Dec 2014, Bruce Richardson wrote:
> >
> > > On Wed, Dec 10, 2014 at 09:29:26AM -0500, Neil Horman wrote:
> > >> On Wed, Dec 10, 2014 at 10:32:25AM +, Bruce Richardson wrote:
> > >>> On Tue, Dec 09, 2014 at 02:10:32PM -0800, Stephen Hemminger wrote:
> >  On Tue, 9 Dec 2014 11:45:07 -0800 &rew
> >   wrote:
> > 
> > >> Hey Folks,
> > >>
> > >> Our DPDK application deals with very large in memory data
> > >> structures, and can potentially use tens or even hundreds of
> gigabytes of hugepage memory.
> > >> During the course of development, we've noticed that as the
> > >> number of huge pages increases, the memory initialization time
> > >> during EAL init gets to be quite long, lasting several minutes
> > >> at present.  The growth in init time doesn't appear to be linear,
> which is concerning.
> > >>
> > >> This is a minor inconvenience for us and our customers, as
> > >> memory initialization makes our boot times a lot longer than it
> > >> would otherwise be.  Also, my experience has been that really
> > >> long operations often are hiding errors - what you think is
> > >> merely a slow operation is actually a timeout of some sort,
> > >> often due to misconfiguration. This leads to two
> > >> questions:
> > >>
> > >> 1. Does the long initialization time suggest that there's an
> > >> error happening under the covers?
> > >> 2. If not, is there any simple way that we can shorten memory
> > >> initialization time?
> > >>
> > >> Thanks in advance for your insights.
> > >>
> > >> --
> > >> Matt Laswell
> > >> laswell at infiniteio.com
> > >> infinite io, inc.
> > >>
> > >
> > > Hello,
> > >
> > > please find some quick comments on the questions:
> > > 1.) By our experience long initialization time is normal in case
> > > of large amount of memory. However this time depends on some
> things:
> > > - number of hugepages (pagefault handled by kernel is pretty
> > > expensive)
> > > - size of hugepages (memset at initialization)
> > >
> > > 2.) Using 1G pages instead of 2M will reduce the initialization
> > > time significantly. Using wmemset instead of memset adds an
> > > additional 20-30% boost by our measurements. Or, just by
> > > touching the pages but not cleaning them you can have still some
> > > more speedup. But in this case your layer or the applications
> > > above need to do the cleanup at allocation time (e.g. by using
> rte_zmalloc).
> > >
> > > Cheers,
> > > &rew
> > 
> >  I wonder if the whole rte_malloc code is even worth it with a
> >  modern kernel with transparent huge pages? rte_malloc adds very
> >  little value and is less safe and slower than glibc or other
> >  allocators. Plus you lose the ablilty to get all the benefit out of
> valgrind or electric fence.
> > >>>
> > >>> While I'd dearly love to not have our own custom malloc lib to
> > >>> maintain, for DPDK multiprocess, rte_malloc will be hard to
> > >>> replace as we would need a replacement solution that similarly
> > >>> guarantees that memory mapped in process A is also available at
> > >>> the same address in process B. :-(
> > >>>
> > >> Just out of curiosity, why even bother with multiprocess support?
> > >> What you're talking about above is a multithread model, and your
> > >> shoehorning multiple processes into it.
> > >> Neil
> > >>
> > >
> > > Yep, that's pretty much what it is alright. However, this
> > > multiprocess support is very widely used by our customers in
> > > building their applications, and has been in place and supported
> > > since some of the earliest DPDK releases. If it is to be removed, it
> > > needs to be replaced by something that provides equivalent
> > > capabilities to application writers (perhaps something with more
> > > fine-grained sharing
> > > etc.)
> > >
> > > /Bruce
> > >
> >
> > It is probably time to start discussing how to pull in our multi
> > process and memory management improvements we were talking about in
> > our DPDK Summit presentation:
> > https://www.youtube.com/watch?v=907VShi799k#t=647
> >
> > Multi-process model could have several benefits mostly in the high
> > availability area (telco requirement) due to better separation,
> > controlling permissions (per process RO or RW page mappings), single
> > process restartability, improved startup and core dumping time etc.
> >
> > As a summary of our memory management additions, it allows an
> > application to describe their memory model in a configuration (or via
> > an API), e.g. a simplified config would say that every instance will
> > need 4GB private memory and 2GB shared memory. In a multi process
> > model this will result mapping only 6GB memory in each process i

[dpdk-dev] [PATCH v3 3/3] app/testpmd:change tx_checksum command and csum forwarding engine

2014-12-12 Thread Liu, Jijiang



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, December 11, 2014 6:53 PM
> To: Liu, Jijiang; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 3/3] app/testpmd:change tx_checksum
> command and csum forwarding engine
> 
> Hi Jijiang,
> 
> Some more comments, in addition to the one I've made in the cover letter.
> Reference link for patchwork readers:
> http://dpdk.org/ml/archives/dev/2014-December/009886.html
> 
> On 12/10/2014 02:03 AM, Jijiang Liu wrote:
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -316,16 +316,30 @@ static void cmd_help_long_parsed(void
> *parsed_result,
> > "Disable hardware insertion of a VLAN header in"
> > " packets sent on a port.\n\n"
> >
> > -   "tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw)
> (port_id)\n"
> > +   "tx_cksum set (ip|udp|tcp|sctp) (hw|sw) (port_id)\n"
> > "Select hardware or software calculation of the"
> > " checksum with when transmitting a packet using the"
> > " csum forward engine.\n"
> > -   "ip|udp|tcp|sctp always concern the inner layer.\n"
> > -   "vxlan concerns the outer IP and UDP layer (in"
> > -   " case the packet is recognized as a vxlan packet by"
> > -   " the forward engine)\n"
> > +   "In the case of tunneling packet, ip|udp|tcp|sctp"
> > +   " always concern the inner layer.\n\n"
> > +
> > +   "tx_cksum set tunnel (hw|sw|none) (port_id)\n"
> > +   " Select hardware or software calculation of the"
> > +   " checksum with when transmitting a tunneling packet"
> > +   " using the csum forward engine.\n"
> > +   " The none option means treat tunneling packet as
> ordinary"
> > +   " packet when using the csum forward engine\n."
> > +   "Tunneling packet concerns the outer IP, inner IP"
> > +   " and inner L4\n"
> > "Please check the NIC datasheet for HW limits.\n\n"
> >
> > +   "tx_cksum set (outer-ip) (hw|sw) (port_id)\n"
> > +   "Select hardware or software calculation of the"
> > +   " checksum with when transmitting a packet using the"
> > +   " csum forward engine.\n"
> > +   "outer-ip always concern the outer layer of"
> > +   " tunneling packet.\n\n"
> > +
> > "tx_checksum show (port_id)\n"
> > "Display tx checksum offload configuration\n\n"
> >
> 
> not sure we need 2 different commands for tx_cksum set (outer-ip) and tx_cksum
> set (ip|udp|tcp|sctp). As the syntax is exactly the same, it may result in 
> less code
> to have only one command.

Why do we have a separate command for outer layer, I have explained this in 
other mail.
Do you agree on this?

> 
> > --- a/app/test-pmd/csumonly.c
> > +++ b/app/test-pmd/csumonly.c
> > @@ -256,17 +256,16 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t
> outer_ethertype,
> > struct udp_hdr *udp_hdr;
> > uint64_t ol_flags = 0;
> >
> > -   if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
> > -   ol_flags |= PKT_TX_UDP_TUNNEL_PKT;
> > -
> > if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
> > ipv4_hdr->hdr_checksum = 0;
> >
> > -   if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
> > +   if (testpmd_ol_flags &
> TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM)
> > ol_flags |= PKT_TX_OUTER_IP_CKSUM;
> > -   else
> > +   else {
> > ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > -   } else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
> > +   ol_flags |= PKT_TX_OUTER_IPV4;
> > +   }
> > +   } else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TUNNEL_CKSUM)
> > ol_flags |= PKT_TX_OUTER_IPV6;
> >
> > udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
> > @@ -300,11 +299,14 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t
> outer_ethertype,
> >*   Ether / (vlan) / outer IP|IP6 / outer UDP / VxLAN / Ether / IP|IP6 /
> >*   UDP|TCP|SCTP
> >*
> > - * The testpmd command line for this forward engine sets the flags
> > + * These testpmd command lines for this forward engine sets the flags
> >* TESTPMD_TX_OFFLOAD_* in ports[tx_port].tx_ol_flags. They control
> > - * wether a checksum must be calculated in software or in hardware.
> > The
> > - * IP, UDP, TCP and SCTP flags always concern the inner layer.  The
> > - * VxLAN flag concerns the outer IP (if packet is recognized as a vxlan 
> > packet).
> > + * wether a checksum must be calculated in software or in hardware.
> > + * In the cas

[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2014-12-12 Thread Liu, Jijiang

Hi Olivier,

Thanks for your comments.

> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, December 11, 2014 6:18 PM
> To: Liu, Jijiang; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and
> csum forwarding engine
> 
> Hi Jijiang,
> 
> Sorry for the late review, I was very busy these last days. Please find my
> comments below.
> 
> On 12/10/2014 02:03 AM, Jijiang Liu wrote:
> > In the current codes, the "tx_checksum set (ip|udp|tcp|sctp|vxlan) (hw|sw)
> (port-id)" command is not easy to understand and extend, so the patch set
> enhances the tx_checksum command and reworks csum forwarding engine due
> to the change of tx_checksum command.
> > The main changes of the tx_checksum command are listed below,
> >
> > 1> add "tx_checksum set tunnel (hw|sw|none) (port-id)" command
> >
> > The command is used to set/clear tunnel flag that is used to tell the NIC 
> > that the
> packetg is a tunneing packet and application want hardware TX checksum offload
> for outer layer, or inner layer, or both.
> 
> packetg -> packet
> tunneing -> tunneling
> 
> I don't understand the description: this flag cannot be set to tell the NIC 
> that it's a
> tunnel packet because it's a configuration flag.
> Whatever the value of this configuration option, the packets can be either 
> tunnel
> or non-tunnel packets. The real question is, what is the behavior for each 
> packet
> type for each value for this option.

Ok,
Will replace the above the description with the following:

The 'hw/sw' option  is used to set/clear the flag of enabling TX tunneling 
packet checksum hardware offload in testpmd application.


> > The 'none' option means that user treat tunneling packet as ordinary packet
> when using the csum forward engine.
> > for example, let say we have a tunnel packet:
> eth_hdr_out/ipv4_hdr_out/udp_hdr_out/vxlan_hdr/ehtr_hdr_in/ipv4_hdr_in/tcp
> _hdr_in. one of several scenarios:
> >
> > 1) User requests HW offload for ipv4_hdr_out  checksum, and doesn't care is 
> > it
> a tunnelled packet or not. So he sets:
> 
> tunnelled -> tunneled
> 
> >
> > tx_checksum set tunnel none 0
> >
> > tx_checksum set ip hw 0
> >
> > So for such case we should set tx_tunnel to 'none'.
> >
> > 2> add "tx_checksum set outer-ip (hw|sw) (port-id)" command
> >
> > The command is used to set/clear outer IP flag that is used to tell the NIC 
> > that
> application want hardware offload for outer layer.
> >
> > 3> remove the 'vxlan' option from the  "tx_checksum set
> > 3> (ip|udp|tcp|sctp|vxlan) (hw|sw) (port-id)" command
> >
> > The command is used to set IP, UDP, TCP and SCTP TX checksum flag. In the 
> > case
> of tunneling packet, the IP, UDP, TCP and SCTP flags always concern inner 
> layer.
> >
> > Moreover, replace the TESTPMD_TX_OFFLOAD_VXLAN_CKSUM flag with
> TESTPMD_TX_OFFLOAD_TUNNEL_CKSUM flag and add the
> TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM and
> TESTPMD_TX_OFFLOAD_NON_TUNNEL_CKSUM flag in test-pmd application.
> 
> You are mixing scenario descriptions with what you do in your patchset:
> 1/ is a scenario
> 2/ and 3/ are descriptions of added/removed commands

No.
Please note the symbols for command descriptions and  scenario descriptions.

The command descriptions with ">"  symbol.
 1> add "tx_checksum set tunnel (hw|sw|none) (port-id)" command
 2> add "tx_checksum set outer-ip (hw|sw) (port-id)" command
3> (ip|udp|tcp|sctp|vxlan) (hw|sw) (port-id)" command

The scenario descriptions with ")"  symbol.
1) User requests HW offload for ipv4_hdr_out  checksum, and doesn't care is it 
a tunneled packet or not. So he sets:


> Let's first sumarize what was the behavior before this patch. This is the
> description in csumonly code.
> 
> Receive a burst of packets, and for each packet:
>   - parse packet, and try to recognize a supported packet type (1)
>   - if it's not a supported packet type, don't touch the packet, else:
>   - modify the IPs in inner headers and in outer headers if any
>   - reprocess the checksum of all supported layers. This is done in SW
> or HW, depending on testpmd command line configuration
>   - if TSO is enabled in testpmd command line, also flag the mbuf for TCP
> segmentation offload (this implies HW TCP checksum) Then transmit packets 
> on
> the output port.
> 
> (1) Supported packets are:
>Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
>Ether / (vlan) / outer IP|IP6 / outer UDP / VxLAN / Ether / IP|IP6 /
>UDP|TCP|SCTP
> 
> The testpmd command line for this forward engine sets the flags
> TESTPMD_TX_OFFLOAD_* in ports[tx_port].tx_ol_flags. They control wether a
> checksum must be calculated in software or in hardware. The IP, UDP, TCP and
> SCTP flags always concern the inner layer.  The VxLAN flag concerns the outer 
> IP
> (if packet is recognized as a vxlan packet).
> 
>  From this description, it is easy to deduct this table:
> 
> Packet type 1:
>   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
> 
> Packet

[dpdk-dev] In DPDK 1.7.1, the link status of the interface using virtio driver is always down.

2014-12-12 Thread Fu, Weiyi (NSN - CN/Hangzhou)

Hi,
I have ingnored the link status. Rx and tx can't work.
I will try the methods Vijay suggested to have a try. Thanks! 

Brs,
Fu Weiyi

-Original Message-
From: ext Ouyang, Changchun [mailto:changchun.ouy...@intel.com] 
Sent: Friday, December 12, 2014 9:00 AM
To: Fu, Weiyi (NSN - CN/Hangzhou); dev at dpdk.org
Cc: Ouyang, Changchun
Subject: RE: [dpdk-dev] In DPDK 1.7.1, the link status of the interface using 
virtio driver is always down.

Hi 

> -Original Message-
> From: Fu, Weiyi (NSN - CN/Hangzhou) [mailto:weiyi.fu at nsn.com]
> Sent: Thursday, December 11, 2014 7:42 PM
> To: Fu, Weiyi (NSN - CN/Hangzhou); Ouyang, Changchun; dev at dpdk.org
> Subject: RE: [dpdk-dev] In DPDK 1.7.1, the link status of the interface using
> virtio driver is always down.
> 
> Hi Changchun,
> I found you had done follow change to allow the virtio interface startup
> when the link is down.  Is there any scenario causing link down for virtio
> interface?
> 
Not really in my environment, those codes are RFC codes from Brocade,
Not merged into mainline yet. 

You can apply this patch and ignore the link state to see if rx and tx still 
works.

Thanks
Changchun

[dpdk-dev] [PATCH v4] mbuf: fix of enabling all newly added RX error flags

2014-12-12 Thread Zhang, Helin



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, December 11, 2014 7:16 PM
> To: Zhang, Helin; Thomas Monjalon
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4] mbuf: fix of enabling all newly added RX
> error flags
> 
> Hi Helin,
> 
> On 12/10/2014 11:29 PM, Zhang, Helin wrote:
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -83,12 +83,7 @@ extern "C" {
> >   #define PKT_RX_RSS_HASH  (1ULL << 1)  /**< RX packet with
> >> RSS
>  hash result. */
> >   #define PKT_RX_FDIR  (1ULL << 2)  /**< RX packet with
> >> FDIR
>  match indicate. */
> >   #define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)  /**< L4 cksum of RX
> >> pkt.
>  is
> > not OK. */ -#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)  /**< IP
> > cksum of RX pkt. is not OK. */ -#define PKT_RX_EIP_CKSUM_BAD (0ULL
> > << 0)  /**<
>  External IP header checksum error. */
> > -#define PKT_RX_OVERSIZE  (0ULL << 0)  /**< Num of desc of an
> >> RX
>  pkt oversize. */
> > -#define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer
>  overflow. */
> > -#define PKT_RX_RECIP_ERR (0ULL << 0)  /**< Hardware
> >> processing
>  error. */
> > -#define PKT_RX_MAC_ERR   (0ULL << 0)  /**< MAC error. */
> > +#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)  /**< IP (or inner IP)
> > +header checksum error. */
> 
>  It can be also an outer IP header in case the device don't support
>  tunneling or is not configured to detect it.
> >>>
> >>> For non-tunneling case, no outer/inner at all, it just has the IP
> >>> header. The bit flag indicates the IP header checksum error.
> >>> For tunneling case, this bit flag indicates the inner IP header
> >>> checksum error, another one for outer IP header checksum error.
> >>> So I don't think this bit can be treated as outer.
> >>
> >> I think you didn't understand my comment.
> >> I talk about NICs which don't have tunneling support.
> >> In this case, the outer IP header is seen as a simple IP header.
> >> So, depending on which port is receiving a tunneled packet, this flag
> >> or the dedicated one can be used for outer IP checksum.
> > I think I did understand your point. For those port which does not
> > support tunneling, if a 'tunneling' packet received, it never knows
> > that's tunneling packet, it always treats it as a general IP packet.
> > The "inner" IP is just part of its data. For this case, no outer or inner 
> > at all,
> just an IP header.
> >
> >>
> >> I just suggest to remove the part "(or inner IP)" of the comment.
> >> Do you agree?
> > I got it, actually I wanted to describe it as (or inner IP for
> > tunneling), as the macro name does not tell it could be inner IP header
> checksum error for tunneling case.
> 
> I still don't understand how to use that flag. Let's imagine an application 
> that
> processes an IP packet:
> 
>ip_input(m) /* receive a packet after ethernet header is stripped */
>{
>  if (m->ol_flags & PKT_RX_IP_CKSUM_BAD) {
>log("packet dropped");
>rte_pktmbuf_free(m);
>return;
>  }
>  /* continue IP header processing,maybe route the packet? */
>  ...
> 
> This kind of code works since a long time with dpdk on ixgbe, even if you 
> receive
> a tunnel packet.
I have the similar understanding of yours, though I am not sure how the real 
users use it.
I think the real users always want to have more error information at runtime, 
then they
know the root cause and how to deal with it.
For checksum errors, they are in packets which come from peer. If this type of 
errors is
detected, the end users can check what happens on the peer, but not debug on 
itself.

> 
> In my understanding, with your patch, if you receive a tunnel packet on i40e,
> the flag PKT_RX_IP_CKSUM_BAD is about the inner header, which should not
> be checked by a router. This would make the code above not working anymore.
> Am I correct?
For a tunnel packet received, I think both inner and outer checksum errors 
should
be checked. And even the inner is more important than outer, as the inner IP 
could
be the real IP packet which is wanted to be processed.

> 
> By the way (it's a bit out of topic), as we already noticed on the list some 
> times,
> in the future we should add another flags PKT_RX_IP_CKSUM_VERIFIED in
> addition to PKT_RX_IP_CKSUM_BAD because many drivers do not support
> hardware checksum, or only supports it in specific conditions (ex: no IP 
> options,
> or no vlan, ...). We should think about it for 2.0.
Good reason for new flags. But I think it may need another bit for outer IP 
checksum?
Is there any other choice to indicate the checksum is not offloaded somewhere 
else?
Or can it adds a bit flag like PKT_RX_IP_CKSUM_NOT_OFFLOADED?

Regards,
Helin

> 
> Regards,
> Olivier

[dpdk-dev] In DPDK 1.7.1, the link status of the interface using virtio driver is always down.

2014-12-12 Thread Ouyang, Changchun

Hi 

> -Original Message-
> From: Fu, Weiyi (NSN - CN/Hangzhou) [mailto:weiyi.fu at nsn.com]
> Sent: Thursday, December 11, 2014 7:42 PM
> To: Fu, Weiyi (NSN - CN/Hangzhou); Ouyang, Changchun; dev at dpdk.org
> Subject: RE: [dpdk-dev] In DPDK 1.7.1, the link status of the interface using
> virtio driver is always down.
> 
> Hi Changchun,
> I found you had done follow change to allow the virtio interface startup
> when the link is down.  Is there any scenario causing link down for virtio
> interface?
> 
Not really in my environment, those codes are RFC codes from Brocade,
Not merged into mainline yet. 

You can apply this patch and ignore the link state to see if rx and tx still 
works.

Thanks
Changchun

[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-12 Thread Thomas Monjalon

2014-12-09 18:35, Paolo Bonzini:
>  Did you make any progress in Qemu/KVM community?
>  We need to be sync'ed up with them to be sure we share the same goal.
>  I want also to avoid using a solution which doesn't fit with their
>  plan.
>  Remember that we already had this problem with ivshmem which was
>  planned to be dropped.
> >>> 
> >>> Unfortunately, I have not yet received any feedback:
> >>> http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html
> >>
> >> Just to add to what Alan said above, this capability does not exist in
> >> qemu at the moment, and based on there having been no feedback on th
> >> qemu mailing list so far, I think it's reasonable to assume that it
> >> will not be implemented in the immediate future. The VM Power
> >> Management feature has also been designed to allow easy migration to a
> >> qemu-based solution when this is supported in future. Therefore, I'd
> >> be in favour of accepting this feature into DPDK now.
> >>
> >> It's true that the implementation is a work-around, but there have
> >> been similar cases in DPDK in the past. One recent example that comes
> >> to mind is userspace vhost. The original implementation could also be
> >> considered a work-around, but it met the needs of many in the
> >> community. Now, with support for vhost-user in qemu 2.1, that
> >> implementation is being improved. I'd see VM Power Management
> >> following a similar path when this capability is supported in qemu.
> 
> I wonder if this might be papering over a bug in the host cpufreq
> driver.  If the guest is not doing much and leaving a lot of idle CPU
> time, the host should scale down the frequency of that CPU.  In the case
> of pinned VCPUs this should really "just work".  What is the problem
> that is being solved?
> 
> Paolo

Alan, Pablo, please could you explain your logic with VM power management?

-- 
Thomas

60 matches

Mail list logo