date:20150528

[dpdk-dev] [PATCH v3] pipeline: add statistics for librte_pipeline

2015-05-28 Thread Rajagopalan Sivaramakrishnan

My first preference would be to enable stats always. However, if the
majority feels that it should be optional,
your preference of 3, 2, 1 seems fine to me. I hope the same decision will
apply to port/table/other stats.

Raja

On 5/28/15, 12:26 PM, "Dumitrescu, Cristian"
 wrote:

>Hi Raja,
>
>Thanks for your input.
>
>I think we have the following options identified so far for stats
>collection configuration:
>
>1. Stats configuration through the RTE_LOG_LEVEL
>2. Single configuration flag global for all DPDK libraries
>3. Single configuration flag per DPDK library
>
>It would be good if Thomas and Stephen, as well as others, would reply
>with their preference order.
>
>My personal preference order is: 3., 2., 1., but I can work with any of
>the above that is identified by the majority of the replies. My goal
>right now is reaching a conclusion on this item as soon as we can.
>
>Regards,
>Cristian
>
>
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rajagopalan
>> Sivaramakrishnan
>> Sent: Wednesday, May 27, 2015 11:45 PM
>> To: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v3] pipeline: add statistics for
>>librte_pipeline
>> 
>> 
>> > > You also reiterate that you would like to have the stats always
>>enabled.
>> You
>> > can definitely do this, it is one of the available choices, but why
>>not also
>> > accommodate the users that want to pick the opposite choice? Why force
>> > apps to spend cycles on stats if the app either does not want these
>> counters
>> > (library counters not relevant for that app, maybe the app is only
>> interested
>> > in maintaining some other stats that it implements itself) or do not
>>want
>> > them anymore (maybe they only needed them during debug phase), etc?
>> > Jay asked this question, and I did my best in my reply to describe our
>> > motivation (http://www.dpdk.org/ml/archives/dev/2015-
>> May/017992.html).
>> > Maybe you missed that post, it would be good to get your reply on
>>this one
>> > too.
>> >
>> > I want to see DPDK get out of the config madness.
>> > This is real code, not an Intel benchmark special.
>> 
>> 
>> I agree that statistics will definitely be required in most real-world
>>production
>> environments and the overhead
>> from per-core stats gathering will be minimal if the data structures
>>are such
>> that CPU cache thrashing is avoided.
>> However, if there are scenarios where it is desirable to turn stats
>>off, I think
>> we can live with a config option.
>> I am not comfortable with using the log level to enable/disable
>>statistics as
>> they are not really related. A
>> separate config option for stats collection seems like a reasonable
>> compromise.
>> 
>> Raja

[dpdk-dev] [PATCH v3 01/10] table: added structure for storing table stats

2015-05-28 Thread Dumitrescu, Cristian



> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, May 26, 2015 10:58 PM
> To: Dumitrescu, Cristian
> Cc: Gajdzica, MaciejX T; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing
> table stats
> 
> On Tue, 26 May 2015 21:40:42 +
> "Dumitrescu, Cristian"  wrote:
> 
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen
> > > Hemminger
> > > Sent: Tuesday, May 26, 2015 3:58 PM
> > > To: Gajdzica, MaciejX T
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for
> storing
> > > table stats
> > >
> > > On Tue, 26 May 2015 14:39:38 +0200
> > > Maciej Gajdzica  wrote:
> > >
> > > > +
> > > >  /** Lookup table interface defining the lookup table operation */
> > > >  struct rte_table_ops {
> > > > rte_table_op_create f_create;   /**< Create */
> > > > @@ -194,6 +218,7 @@ struct rte_table_ops {
> > > > rte_table_op_entry_add f_add;   /**< Entry add */
> > > > rte_table_op_entry_delete f_delete; /**< Entry delete */
> > > > rte_table_op_lookup f_lookup;   /**< Lookup */
> > > > +   rte_table_op_stats_read f_stats;/**< Stats */
> > > >  };
> > >
> > > Another good idea, which is an ABI change.
> >
> > This is simply adding a new API function, this is not changing any function
> prototype. There is no change required in the map file of this library. Is 
> there
> anything we should have done and we did not do?
> >
> 
> But if I built an external set of code which had rte_table_ops (don't worry I
> haven't)
> and that binary ran with the new definition, the core code it table would
> reference
> outside the (old version) of rte_table_ops structure and find garbage.

This is just adding  a new field at the end of an API data structure. Based on 
input from multiple people and after reviewing the rules listed on 
http://dpdk.org/doc/guides/rel_notes/abi.html , I think this is an acceptable 
change. There are other patches in flight on this mailing list that are in the 
same situation. Any typical/well behaved application will not break due to this 
change.

[dpdk-dev] [PATCH v3] pipeline: add statistics for librte_pipeline

2015-05-28 Thread Dumitrescu, Cristian

Hi Raja,

Thanks for your input.

I think we have the following options identified so far for stats collection 
configuration:

1. Stats configuration through the RTE_LOG_LEVEL
2. Single configuration flag global for all DPDK libraries
3. Single configuration flag per DPDK library

It would be good if Thomas and Stephen, as well as others, would reply with 
their preference order.

My personal preference order is: 3., 2., 1., but I can work with any of the 
above that is identified by the majority of the replies. My goal right now is 
reaching a conclusion on this item as soon as we can.

Regards,
Cristian



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rajagopalan
> Sivaramakrishnan
> Sent: Wednesday, May 27, 2015 11:45 PM
> To: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3] pipeline: add statistics for 
> librte_pipeline
> 
> 
> > > You also reiterate that you would like to have the stats always enabled.
> You
> > can definitely do this, it is one of the available choices, but why not also
> > accommodate the users that want to pick the opposite choice? Why force
> > apps to spend cycles on stats if the app either does not want these
> counters
> > (library counters not relevant for that app, maybe the app is only
> interested
> > in maintaining some other stats that it implements itself) or do not want
> > them anymore (maybe they only needed them during debug phase), etc?
> > Jay asked this question, and I did my best in my reply to describe our
> > motivation (http://www.dpdk.org/ml/archives/dev/2015-
> May/017992.html).
> > Maybe you missed that post, it would be good to get your reply on this one
> > too.
> >
> > I want to see DPDK get out of the config madness.
> > This is real code, not an Intel benchmark special.
> 
> 
> I agree that statistics will definitely be required in most real-world 
> production
> environments and the overhead
> from per-core stats gathering will be minimal if the data structures are such
> that CPU cache thrashing is avoided.
> However, if there are scenarios where it is desirable to turn stats off, I 
> think
> we can live with a config option.
> I am not comfortable with using the log level to enable/disable statistics as
> they are not really related. A
> separate config option for stats collection seems like a reasonable
> compromise.
> 
> Raja

[dpdk-dev] Packet Cloning

2015-05-28 Thread Marc Sune



On 28/05/15 18:06, Matt Laswell wrote:
> Hey Kyle,
>
> That's one way you can handle it, though I suspect you'll end up with some
> complexity elsewhere in your code to deal with remembering whether you
> should look at the original data or the copied and modified data.  Another
> way is just to make a copy of the original mbuf, but have your copy API
> stop after it reaches some particular point.  Perhaps just the L2-L4
> headers, perhaps a few hundred bytes into payload, or perhaps something
> else entirely. This all gets very application dependent, of course.  How
> much is "enough" is going to depend heavily on what you're trying to
> accomplish.

mbufs can be chained in multiple segments. So you could first split into 
two segments leaving the big chunk in the original mbuf (chunk2) and 
copy chunk1 into the new mbuf (check prepend, adj and trim).

Marc

[1] http://dpdk.org/doc/api/rte__mbuf_8h.html

>

[dpdk-dev] freeing memzone

2015-05-28 Thread Harish Patil

>
>On 21/05/2015 17:59, Harish Patil wrote:
>> Hello dpdk-dev,
>>
>> I understand that the reserved memzones cannot be freed, as mentioned in
>> the DPDK specs. But I would like to know why? Is there any limitations?
>There should be a few threads in the mailing list related to this topic.
>Last that comes to mind:
>http://dpdk.org/ml/archives/dev/2015-April/016501.html
>
>Short answer is, it has not been implemented yet.
>> If the memory is not freed/returned, then can it be reused for
>>subsequent
>> allocations without re-init (i.e. with same memzone name)?
>> We use it for allocating DMA?ble memory.
>It is up to the application to manage the use/re-use of memzones.
>By the way, would maybe rte_malloc memory be more suitable than memzones
>for your application?
>You can retrieve the physical address of memory allcoated by rte_malloc
>with rte_malloc_virt2phy.
>> Secondly, there was a related discussion on this in the following email
>> thread:
>> http://dpdk.org/ml/archives/dev/2014-July/004456.html
>>
>> Do we plan to incorporate that changes?
>There is some ongoing work related to freeing memzones:
>http://dpdk.org/ml/archives/dev/2015-May/017470.html
>
>Feel free to comment on it.
>
>Sergio
>> Thanks,
>> Harish
>>
>>
>> 
>>
>

Thanks very much Sergio. I will take a look at your suggested links and
get back to you shortly.

Thanks,
Harish




This message and any attached documents contain information from the sending 
company or its parent company(s), subsidiaries, divisions or branch offices 
that may be confidential. If you are not the intended recipient, you may not 
read, copy, distribute, or use this information. If you have received this 
transmission in error, please notify the sender immediately by reply e-mail and 
then delete this message.

[dpdk-dev] Intel fortville not working with multi-segment

2015-05-28 Thread Nissim Nisimov

Thx!

We will check it in our code

Nissim

-Original Message-
From: Zhang, Helin [mailto:helin.zh...@intel.com] 
Sent: Wednesday, May 27, 2015 6:54 AM
To: Nissim Nisimov
Cc: 'dev at dpdk.org'
Subject: RE: Intel fortville not working with multi-segment

Hi Nissim

Sorry for late reply!
Today I got a ready environment, and tried the latest DPDK code (on master 
branch) on my environment, it works well.
So could you help to try the latest code (R2.0 +) on your environment again, to 
see if the issue is still there or not?

Regards,
Helin

> -Original Message-
> From: Zhang, Helin
> Sent: Tuesday, May 12, 2015 4:51 PM
> To: Nissim Nisimov
> Cc: 'dev at dpdk.org'
> Subject: RE: Intel fortville not working with multi-segment
> 
> Hi Nissim
> 
> It seems that our validation guys here can reproduce it in our lab. I 
> will check that soon later, and update you later.
> Thank you very much for the good finding!
> 
> Regards,
> Helin
> 
> > -Original Message-
> > From: Nissim Nisimov [mailto:NissimN at Radware.com]
> > Sent: Monday, May 11, 2015 11:44 AM
> > To: Zhang, Helin
> > Cc: 'dev at dpdk.org'
> > Subject: RE: Intel fortville not working with multi-segment
> >
> > Hi,
> >
> > I am using PF pass-through and it doesn't work even with 2000 bytes 
> > of server response page size.
> > Looks like the first segment of each session is not received.
> >
> > When i am changing the server response size to 1000 bytes, all works 
> > as expected.
> >
> > I am working with dpdk 1.8 version.
> >
> > Any idea why ? Is it related to i40e multi segment support?
> >
> > Thx
> > Nissim
> >
> > On May 11, 2015 5:03 AM, "Zhang, Helin" 
> > wrote:
> > >
> > > Hi Nissim
> > >
> > > Are you using PF pass-through or VF pass-through?
> > > For PF pass-through, you might have already gotten the fix.
> > > For VF pass-through, there is
> >
> > Hi Nissim
> >
> > Are you using PF pass-through or VF pass-through?
> > For PF pass-through, you might have already gotten the fix.
> > For VF pass-through, there is a bug fix which is needed for 
> > supporting jumbo frame and multiple mbuf.
> > http://www.dpdk.org/dev/patchwork/patch/4641/
> >
> >
> > Regards,
> > Helin
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Nissim
> Nisimov
> > > Sent: Monday, May 11, 2015 3:48 AM
> > > To: Nissim Nisimov; 'dev at dpdk.org'
> > > Subject: Re: [dpdk-dev] Intel fortville not working with 
> > > multi-segment
> > >
> > > Hi,
> > >
> > > can someone assist regarding this issue?
> > >
> > > Is it a known limitation in i40e/dpdk (no support for multi-segment)?
> > >
> > > Thx
> > > Nissim
> > >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Nissim
> Nisimov
> > > Sent: Thursday, May 07, 2015 5:44 PM
> > > To: 'dev at dpdk.org'
> > > Subject: [dpdk-dev] Intel fortville not working with multi-segment
> > >
> > > Hi,
> > >
> > >
> > >
> > > I am trying to work with Intel Fortville (XL710) NICs in 
> > > Passthrough mode from a VM running dpdk app.
> > >
> > >
> > > First I didn't have any TX traffic from the VM, I got dpdk patch 
> > > for this issue and it fixed it.
> > > (http://www.dpdk.org/dev/patchwork/patch/4588/)
> > >
> > > But now I see that when trying to run multi-segment traffic not 
> > > all the packets reaching the VM (I tested it on bare metal as well 
> > > and saw the same issue)
> > >
> > > Is it a known issue? any workaround for it?
> > >
> > > Thanks,
> > > Nissim

[dpdk-dev] [PATCH] e1000: enable allmulticast support for VF

2015-05-28 Thread Yury Kylulin

Add support to enable and disable reception of all multicast packets by the VF 
using standard API
rte_eth_allmulticast_enable()/rte_eth_allmulticast_disable().

Signed-off-by: Yury Kylulin 
---
 drivers/net/e1000/igb_ethdev.c |   20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index e4b370d..5196bd5 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -135,6 +135,8 @@ static int igbvf_dev_configure(struct rte_eth_dev *dev);
 static int igbvf_dev_start(struct rte_eth_dev *dev);
 static void igbvf_dev_stop(struct rte_eth_dev *dev);
 static void igbvf_dev_close(struct rte_eth_dev *dev);
+static void igbvf_allmulticast_enable(struct rte_eth_dev *dev);
+static void igbvf_allmulticast_disable(struct rte_eth_dev *dev);
 static int eth_igbvf_link_update(struct e1000_hw *hw);
 static void eth_igbvf_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats 
*rte_stats);
 static void eth_igbvf_stats_reset(struct rte_eth_dev *dev);
@@ -280,6 +282,8 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = {
.dev_start= igbvf_dev_start,
.dev_stop = igbvf_dev_stop,
.dev_close= igbvf_dev_close,
+   .allmulticast_enable  = igbvf_allmulticast_enable,
+   .allmulticast_disable = igbvf_allmulticast_disable,
.link_update  = eth_igb_link_update,
.stats_get= eth_igbvf_stats_get,
.stats_reset  = eth_igbvf_stats_reset,
@@ -2272,6 +2276,22 @@ igbvf_dev_close(struct rte_eth_dev *dev)
igbvf_dev_stop(dev);
 }

+static void
+igbvf_allmulticast_enable(struct rte_eth_dev *dev)
+{
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   e1000_promisc_set_vf(hw, e1000_promisc_multicast);
+}
+
+static void
+igbvf_allmulticast_disable(struct rte_eth_dev *dev)
+{
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   e1000_promisc_set_vf(hw, e1000_promisc_disabled);
+}
+
 static int igbvf_set_vfta(struct e1000_hw *hw, uint16_t vid, bool on)
 {
struct e1000_mbx_info *mbx = >mbx;
-- 
1.7.9.5

[dpdk-dev] Packet Cloning

2015-05-28 Thread Padam Jeet Singh

Hello,

Is there a function in DPDK to completely clone a pkt_mbuf including the 
segments? 

I am trying to build a packet mirroring application which sends packet out 
through two separate interfaces, but the packet payload needs to be altered 
before send.

Thanks,
Padam

[dpdk-dev] [PATCH 5/5] app/testpmd: fix reply to a multicast ICMP request

2015-05-28 Thread Ivan Boule

Set the IP source and destination addresses in the IP header of the
ICMP reply as follows:
  - Use the request IP source address as the reply IP destination address
  - If the request IP destination address is a multicast IP address
  - choose a reply IP source address different from the request IP
source address,
  - re-compute the IP header checksum.
Otherwise
  - switch the request IP source and destination addresses in the
reply,
  - keep the IP header checksum unchanged.

Signed-off-by: Ivan Boule 
---
 app/test-pmd/icmpecho.c |   65 ++-
 1 file changed, 59 insertions(+), 6 deletions(-)

diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c
index 010c5a9..9e6f5e9 100644
--- a/app/test-pmd/icmpecho.c
+++ b/app/test-pmd/icmpecho.c
@@ -272,6 +272,30 @@ ipv4_addr_dump(const char *what, uint32_t be_ipv4_addr)
printf("%s", buf);
 }

+static uint16_t
+ipv4_hdr_cksum(struct ipv4_hdr *ip_h)
+{
+   uint16_t *v16_h;
+   uint32_t ip_cksum;
+
+   /*
+* Compute the sum of successive 16-bit words of the IPv4 header,
+* skipping the checksum field of the header.
+*/
+   v16_h = (uint16_t *) ip_h;
+   ip_cksum = v16_h[0] + v16_h[1] + v16_h[2] + v16_h[3] +
+   v16_h[4] + v16_h[6] + v16_h[7] + v16_h[8] + v16_h[9];
+
+   /* reduce 32 bit checksum to 16 bits and complement it */
+   ip_cksum = (ip_cksum & 0x) + (ip_cksum >> 16);
+   ip_cksum = (ip_cksum & 0x) + (ip_cksum >> 16);
+   ip_cksum = (~ip_cksum) & 0x;
+   return (ip_cksum == 0) ? 0x : (uint16_t) ip_cksum;
+}
+
+#define is_multicast_ipv4_addr(ipv4_addr) \
+   (((rte_be_to_cpu_32((ipv4_addr)) >> 24) & 0x00FF) == 0xE0)
+
 /*
  * Receive a burst of packets, lookup for ICMP echo requets, and, if any,
  * send back ICMP echo replies.
@@ -295,6 +319,7 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs)
uint16_t vlan_id;
uint16_t arp_op;
uint16_t arp_pro;
+   uint32_t cksum;
uint8_t  i;
int l2_len;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
@@ -442,19 +467,47 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs)
/*
 * Prepare ICMP echo reply to be sent back.
 * - switch ethernet source and destinations addresses,
-* - switch IPv4 source and destinations addresses,
+* - use the request IP source address as the reply IP
+*destination address,
+* - if the request IP destination address is a multicast
+*   address:
+* - choose a reply IP source address different from the
+*   request IP source address,
+* - re-compute the IP header checksum.
+*   Otherwise:
+* - switch the request IP source and destination
+*   addresses in the reply IP header,
+* - keep the IP header checksum unchanged.
 * - set IP_ICMP_ECHO_REPLY in ICMP header.
-* No need to re-compute the IP header checksum.
-* Reset ICMP checksum.
+* ICMP checksum is computed by assuming it is valid in the
+* echo request and not verified.
 */
ether_addr_copy(_h->s_addr, _addr);
ether_addr_copy(_h->d_addr, _h->s_addr);
ether_addr_copy(_addr, _h->d_addr);
ip_addr = ip_h->src_addr;
-   ip_h->src_addr = ip_h->dst_addr;
-   ip_h->dst_addr = ip_addr;
+   if (is_multicast_ipv4_addr(ip_h->dst_addr)) {
+   uint32_t ip_src;
+
+   ip_src = rte_be_to_cpu_32(ip_addr);
+   if ((ip_src & 0x0003) == 1)
+   ip_src = (ip_src & 0xFFFC) | 0x0002;
+   else
+   ip_src = (ip_src & 0xFFFC) | 0x0001;
+   ip_h->src_addr = rte_cpu_to_be_32(ip_src);
+   ip_h->dst_addr = ip_addr;
+   ip_h->hdr_checksum = ipv4_hdr_cksum(ip_h);
+   } else {
+   ip_h->src_addr = ip_h->dst_addr;
+   ip_h->dst_addr = ip_addr;
+   }
icmp_h->icmp_type = IP_ICMP_ECHO_REPLY;
-   icmp_h->icmp_cksum = 0;
+   cksum = ~icmp_h->icmp_cksum & 0x;
+   cksum += ~htons(IP_ICMP_ECHO_REQUEST << 8) & 0x;
+   cksum += htons(IP_ICMP_ECHO_REPLY << 8);
+   cksum = (cksum & 0x) + (cksum >> 16);
+   cksum = (cksum & 0x) + (cksum >> 16);
+   icmp_h->icmp_cksum = ~cksum;
pkts_burst[nb_replies++] = pkt;
}

-- 
1.7.10.4

[dpdk-dev] [PATCH 4/5] ixgbe: add multicast MAC address filtering

2015-05-28 Thread Ivan Boule

Support the function "set_mc_addr_list" in the "ixgbe" and in the
"ixgbe-vf" Poll Mode Drivers.

Signed-off-by: Ivan Boule 
---
 drivers/net/ixgbe/ixgbe_ethdev.c |   32 
 1 file changed, 32 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0d9f9b2..885ed8f 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -257,6 +257,10 @@ static int ixgbe_dev_filter_ctrl(struct rte_eth_dev *dev,
 void *arg);
 static int ixgbevf_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu);

+static int ixgbe_dev_set_mc_addr_list(struct rte_eth_dev *dev,
+ struct ether_addr *mc_addr_set,
+ uint32_t nb_mc_addr);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -381,6 +385,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.rss_hash_update  = ixgbe_dev_rss_hash_update,
.rss_hash_conf_get= ixgbe_dev_rss_hash_conf_get,
.filter_ctrl  = ixgbe_dev_filter_ctrl,
+   .set_mc_addr_list = ixgbe_dev_set_mc_addr_list,
 };

 /*
@@ -406,6 +411,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
.tx_queue_release = ixgbe_dev_tx_queue_release,
.mac_addr_add = ixgbevf_add_mac_addr,
.mac_addr_remove  = ixgbevf_remove_mac_addr,
+   .set_mc_addr_list = ixgbe_dev_set_mc_addr_list,
 };

 /**
@@ -4439,6 +4445,32 @@ ixgbe_dev_filter_ctrl(struct rte_eth_dev *dev,
return ret;
 }

+static u8 *
+ixgbe_dev_addr_list_itr(__attribute__((unused)) struct ixgbe_hw *hw,
+   u8 **mc_addr_ptr, u32 *vmdq)
+{
+   u8 *mc_addr;
+
+   *vmdq = 0;
+   mc_addr = *mc_addr_ptr;
+   *mc_addr_ptr = (mc_addr + sizeof(struct ether_addr));
+   return mc_addr;
+}
+
+static int
+ixgbe_dev_set_mc_addr_list(struct rte_eth_dev *dev,
+  struct ether_addr *mc_addr_set,
+  uint32_t nb_mc_addr)
+{
+   struct ixgbe_hw *hw;
+   u8 *mc_addr_list;
+
+   hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   mc_addr_list = (u8 *)mc_addr_set;
+   return ixgbe_update_mc_addr_list(hw, mc_addr_list, nb_mc_addr,
+ixgbe_dev_addr_list_itr, TRUE);
+}
+
 static struct rte_driver rte_ixgbe_driver = {
.type = PMD_PDEV,
.init = rte_ixgbe_pmd_init,
-- 
1.7.10.4

[dpdk-dev] [PATCH 3/5] e1000: add multicast MAC address filtering

2015-05-28 Thread Ivan Boule

Support the PMD function "set_mc_addr_list" in the "igb", "igb-vf",
and "em" Poll Mode Drivers.

Signed-off-by: Ivan Boule 
---
 drivers/net/e1000/em_ethdev.c  |   17 +
 drivers/net/e1000/igb_ethdev.c |   18 ++
 2 files changed, 35 insertions(+)

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index d28030e..2392942 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -116,6 +116,10 @@ static void eth_em_rar_set(struct rte_eth_dev *dev, struct 
ether_addr *mac_addr,
uint32_t index, uint32_t pool);
 static void eth_em_rar_clear(struct rte_eth_dev *dev, uint32_t index);

+static int eth_em_set_mc_addr_list(struct rte_eth_dev *dev,
+  struct ether_addr *mc_addr_set,
+  uint32_t nb_mc_addr);
+
 #define EM_FC_PAUSE_TIME 0x0680
 #define EM_LINK_UPDATE_CHECK_TIMEOUT  90  /* 9s */
 #define EM_LINK_UPDATE_CHECK_INTERVAL 100 /* ms */
@@ -161,6 +165,7 @@ static const struct eth_dev_ops eth_em_ops = {
.flow_ctrl_set= eth_em_flow_ctrl_set,
.mac_addr_add = eth_em_rar_set,
.mac_addr_remove  = eth_em_rar_clear,
+   .set_mc_addr_list = eth_em_set_mc_addr_list,
 };

 /**
@@ -1522,6 +1527,18 @@ eth_em_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
return 0;
 }

+static int
+eth_em_set_mc_addr_list(struct rte_eth_dev *dev,
+   struct ether_addr *mc_addr_set,
+   uint32_t nb_mc_addr)
+{
+   struct e1000_hw *hw;
+
+   hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   e1000_update_mc_addr_list(hw, (u8 *)mc_addr_set, nb_mc_addr);
+   return 0;
+}
+
 struct rte_driver em_pmd_drv = {
.type = PMD_PDEV,
.init = rte_em_pmd_init,
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index e4b370d..1c24edc 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -194,6 +194,10 @@ static int eth_igb_filter_ctrl(struct rte_eth_dev *dev,
 enum rte_filter_op filter_op,
 void *arg);

+static int eth_igb_set_mc_addr_list(struct rte_eth_dev *dev,
+   struct ether_addr *mc_addr_set,
+   uint32_t nb_mc_addr);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -269,6 +273,7 @@ static const struct eth_dev_ops eth_igb_ops = {
.rss_hash_update  = eth_igb_rss_hash_update,
.rss_hash_conf_get= eth_igb_rss_hash_conf_get,
.filter_ctrl  = eth_igb_filter_ctrl,
+   .set_mc_addr_list = eth_igb_set_mc_addr_list,
 };

 /*
@@ -289,6 +294,7 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = {
.rx_queue_release = eth_igb_rx_queue_release,
.tx_queue_setup   = eth_igb_tx_queue_setup,
.tx_queue_release = eth_igb_tx_queue_release,
+   .set_mc_addr_list = eth_igb_set_mc_addr_list,
 };

 /**
@@ -3642,6 +3648,18 @@ eth_igb_filter_ctrl(struct rte_eth_dev *dev,
return ret;
 }

+static int
+eth_igb_set_mc_addr_list(struct rte_eth_dev *dev,
+struct ether_addr *mc_addr_set,
+uint32_t nb_mc_addr)
+{
+   struct e1000_hw *hw;
+
+   hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   e1000_update_mc_addr_list(hw, (u8 *)mc_addr_set, nb_mc_addr);
+   return 0;
+}
+
 static struct rte_driver pmd_igb_drv = {
.type = PMD_PDEV,
.init = rte_igb_pmd_init,
-- 
1.7.10.4

[dpdk-dev] [PATCH 2/5] app/testpmd: new command to add/remove multicast MAC addresses

2015-05-28 Thread Ivan Boule

Add the new interactive command:
mcast_addr add|remove X 
to add/remove the multicast MAC address  to/from the set of
multicast addresses filtered by port .
Command used to test the function "rte_eth_dev_set_mc_addr_list"
that has been added to the API of PMDs.

Signed-off-by: Ivan Boule 
---
 app/test-pmd/cmdline.c |   52 ++
 app/test-pmd/config.c  |  142 
 app/test-pmd/testpmd.h |6 ++
 3 files changed, 200 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f01db2a..952a9df 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -8733,6 +8733,57 @@ cmdline_parse_inst_t cmd_set_hash_global_config = {
},
 };

+/* *** ADD/REMOVE A MULTICAST MAC ADDRESS TO/FROM A PORT *** */
+struct cmd_mcast_addr_result {
+   cmdline_fixed_string_t mcast_addr_cmd;
+   cmdline_fixed_string_t what;
+   uint8_t port_num;
+   struct ether_addr mc_addr;
+};
+
+static void cmd_mcast_addr_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_mcast_addr_result *res = parsed_result;
+
+   if (!is_multicast_ether_addr(>mc_addr)) {
+   printf("Invalid multicast addr %02X:%02X:%02X:%02X:%02X:%02X\n",
+  res->mc_addr.addr_bytes[0], res->mc_addr.addr_bytes[1],
+  res->mc_addr.addr_bytes[2], res->mc_addr.addr_bytes[3],
+  res->mc_addr.addr_bytes[4], res->mc_addr.addr_bytes[5]);
+   return;
+   }
+   if (strcmp(res->what, "add") == 0)
+   mcast_addr_add(res->port_num, >mc_addr);
+   else
+   mcast_addr_remove(res->port_num, >mc_addr);
+}
+
+cmdline_parse_token_string_t cmd_mcast_addr_cmd =
+   TOKEN_STRING_INITIALIZER(struct cmd_mcast_addr_result,
+mcast_addr_cmd, "mcast_addr");
+cmdline_parse_token_string_t cmd_mcast_addr_what =
+   TOKEN_STRING_INITIALIZER(struct cmd_mcast_addr_result, what,
+"add#remove");
+cmdline_parse_token_num_t cmd_mcast_addr_portnum =
+   TOKEN_NUM_INITIALIZER(struct cmd_mcast_addr_result, port_num, UINT8);
+cmdline_parse_token_etheraddr_t cmd_mcast_addr_addr =
+   TOKEN_ETHERADDR_INITIALIZER(struct cmd_mac_addr_result, address);
+
+cmdline_parse_inst_t cmd_mcast_addr = {
+   .f = cmd_mcast_addr_parsed,
+   .data = (void *)0,
+   .help_str = "mcast_addr add|remove X : add/remove multicast 
MAC address on port X",
+   .tokens = {
+   (void *)_mcast_addr_cmd,
+   (void *)_mcast_addr_what,
+   (void *)_mcast_addr_portnum,
+   (void *)_mcast_addr_addr,
+   NULL,
+   },
+};
+
 /* 

 */

 /* list of instructions */
@@ -8862,6 +8913,7 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)_set_sym_hash_ena_per_port,
(cmdline_parse_inst_t *)_get_hash_global_config,
(cmdline_parse_inst_t *)_set_hash_global_config,
+   (cmdline_parse_inst_t *)_mcast_addr,
NULL,
 };

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index f788ed5..52917c7 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2130,3 +2130,145 @@ set_vf_rate_limit(portid_t port_id, uint16_t vf, 
uint16_t rate, uint64_t q_msk)
port_id, diag);
return diag;
 }
+
+/*
+ * Functions to manage the set of filtered Multicast MAC addresses.
+ *
+ * A pool of filtered multicast MAC addresses is associated with each port.
+ * The pool is allocated in chunks of MCAST_POOL_INC multicast addresses.
+ * The address of the pool and the number of valid multicast MAC addresses
+ * recorded in the pool are stored in the fields "mc_addr_pool" and
+ * "mc_addr_nb" of the "rte_port" data structure.
+ *
+ * The function "rte_eth_dev_set_mc_addr_list" of the PMDs API imposes
+ * to be supplied a contiguous array of multicast MAC addresses.
+ * To comply with this constraint, the set of multicast addresses recorded
+ * into the pool are systematically compacted at the beginning of the pool.
+ * Hence, when a multicast address is removed from the pool, all following
+ * addresses, if any, are copied back to keep the set contiguous.
+ */
+#define MCAST_POOL_INC 32
+
+static int
+mcast_addr_pool_extend(struct rte_port *port)
+{
+   struct ether_addr *mc_pool;
+   size_t mc_pool_size;
+
+   /*
+* If a free entry is available at the end of the pool, just
+* increment the number of recorded multicast addresses.
+*/
+   if ((port->mc_addr_nb % MCAST_POOL_INC) != 0) {
+   port->mc_addr_nb++;
+   return 0;
+   }
+
+   /*
+* [re]allocate a pool with MCAST_POOL_INC more entries.
+* The previous test guarantees that

[dpdk-dev] [PATCH 1/5] ethdev: add multicast address filtering

2015-05-28 Thread Ivan Boule

With the current PMD API, the receipt of multicast packets on a given
port can only be enabled by invoking the "rte_eth_allmulticast_enable"
function.
This method may not work on Virtual Functions in SR-IOV architectures
when the host PF driver does not allow such operation on VFs.
In such cases, joined multicast addresses must be individually added
in the set of multicast addresses that are filtered by the [VF] port.

For this purpose, a new function "set_mc_addr_list" is introduced
into the set of functions that are exported by a Poll Mode Driver.

Signed-off-by: Ivan Boule 
---
 lib/librte_ether/rte_ethdev.c |   17 +
 lib/librte_ether/rte_ethdev.h |   26 ++
 2 files changed, 43 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 024fe8b..f5784de 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3628,3 +3628,20 @@ rte_eth_remove_tx_callback(uint8_t port_id, uint16_t 
queue_id,
/* Callback wasn't found. */
return -EINVAL;
 }
+
+int
+rte_eth_dev_set_mc_addr_list(uint8_t port_id,
+struct ether_addr *mc_addr_set,
+uint32_t nb_mc_addr)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = _eth_devices[port_id];
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_mc_addr_list, -ENOTSUP);
+   return dev->dev_ops->set_mc_addr_list(dev, mc_addr_set, nb_mc_addr);
+}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..04c192d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1228,6 +1228,10 @@ typedef int (*eth_udp_tunnel_del_t)(struct rte_eth_dev 
*dev,
struct rte_eth_udp_tunnel *tunnel_udp);
 /**< @internal Delete tunneling UDP info */

+typedef int (*eth_set_mc_addr_list_t)(struct rte_eth_dev *dev,
+ struct ether_addr *mc_addr_set,
+ uint32_t nb_mc_addr);
+/**< @internal set the list of multicast addresses on an Ethernet device */

 #ifdef RTE_NIC_BYPASS

@@ -1386,6 +1390,7 @@ struct eth_dev_ops {
/** Get current RSS hash configuration. */
rss_hash_conf_get_t rss_hash_conf_get;
eth_filter_ctrl_t  filter_ctrl;  /**< common filter 
control*/
+   eth_set_mc_addr_list_t set_mc_addr_list; /**< set list of mcast addrs */
 };

 /**
@@ -3615,4 +3620,25 @@ int rte_eth_remove_tx_callback(uint8_t port_id, uint16_t 
queue_id,
 }
 #endif

+/**
+ * Set the list of multicast addresses to filter on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mc_addr_set
+ *   The array of multicast addresses to set. Equal to NULL when the function
+ *   is invoked to flush the set of filtered addresses.
+ * @param nb_mc_addr
+ *   The number of multicast addresses in the *mc_addr_set* array. Equal to 0
+ *   when the function is invoked to flush the set of filtered addresses.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-ENOTSUP) if PMD of *port_id* doesn't support multicast filtering.
+ *   - (-ENOSPC) if *port_id* has not enough multicast filtering resources.
+ */
+int rte_eth_dev_set_mc_addr_list(uint8_t port_id,
+struct ether_addr *mc_addr_set,
+uint32_t nb_mc_addr);
+
 #endif /* _RTE_ETHDEV_H_ */
-- 
1.7.10.4

[dpdk-dev] [PATCH 0/5] multicast address filtering

2015-05-28 Thread Ivan Boule

Introduce PMD API to set the list of multicast MAC addresses filtered
by a port.
Implemented in the following PMDs: igb, igbvf, em, ixgbe, and ixgbevf.
Implementation for physical PMDs i40e, i40evf, enic, and fm10k left
to their respective maintainers.

Ivan Boule (5):
  ethdev: add multicast address filtering
  app/testpmd: new command to add/remove multicast MAC addresses
  e1000: add multicast MAC address filtering
  ixgbe: add multicast MAC address filtering
  app/testpmd: fix reply to a multicast ICMP request

 app/test-pmd/cmdline.c   |   52 ++
 app/test-pmd/config.c|  142 ++
 app/test-pmd/icmpecho.c  |   65 +++--
 app/test-pmd/testpmd.h   |6 ++
 drivers/net/e1000/em_ethdev.c|   17 +
 drivers/net/e1000/igb_ethdev.c   |   18 +
 drivers/net/ixgbe/ixgbe_ethdev.c |   32 +
 lib/librte_ether/rte_ethdev.c|   17 +
 lib/librte_ether/rte_ethdev.h|   26 +++
 9 files changed, 369 insertions(+), 6 deletions(-)

-- 
1.7.10.4

[dpdk-dev] DPDK: Proposal for a patch patch-test integration tree

2015-05-28 Thread Thomas Monjalon

2015-05-28 10:39, Thomas F Herbert:
> On 5/28/15 5:48 AM, Thomas Monjalon wrote:
> > 2015-05-27 16:48, Thomas F Herbert:
> >> This is a proposal to create a patch-test dpdk tree. I welcome feedback
> >> from the dpdk community on this proposal.
> >>
> >> This tree will consist of pre-integrated patches submitted to dpdk-dev
> >> mailing list. The purpose of this tree is to help reduce the effort in
> >> reviewing patches and providing feedback to submitters of patches. Also
> >> integrators of dpdk can benefit by having a "linux-next" like tree to
> >> clone and or try new features for performance and breadth of hardware
> >> support. It is the hope that the patch-test tree will provide early
> >> feedback to patch reviewers as well as an assurance of quality to the
> >> project maintainer.
> >
> > Thanks for the proposal. This tree may be hosted here:
> > http://dpdk.org/browse
> > You just have to give the name of the repository and a SSH key.
> 
> Shall we call the repo patch-test-next?

Why not dpdk-next?
With this description? Apply and test every patches before acceptance

> I am attaching two public keys to this email

Received, thanks

[dpdk-dev] [PATCH] vhost: tcp pkt with virtio header in one desc

2015-05-28 Thread Wei li

Signed-off-by: Wei li 
---
 lib/librte_vhost/vhost_rxtx.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 4809d32..2d3ea92 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -588,8 +588,19 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,

desc = >desc[head[entry_success]];

-   /* Discard first buffer as it is the virtio header */
-   desc = >desc[desc->next];
+   if (desc->flags & VRING_DESC_F_NEXT)
+   {
+   /* Discard first buffer as it is the virtio header */
+   desc = >desc[desc->next];
+   vb_offset = 0;
+   vb_avail = desc->len;
+   }
+   else /* virtio header in one desc with real pkt */
+   {
+   /* strip the virtio header */
+   vb_offset = vq->vhost_hlen;
+   vb_avail = desc->len - vq->vhost_hlen;
+   }

/* Buffer address translation. */
vb_addr = gpa_to_vva(dev, desc->addr);
@@ -608,8 +619,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
vq->used->ring[used_idx].id = head[entry_success];
vq->used->ring[used_idx].len = 0;

-   vb_offset = 0;
-   vb_avail = desc->len;
/* Allocate an mbuf and populate the structure. */
m = rte_pktmbuf_alloc(mbuf_pool);
if (unlikely(m == NULL)) {
-- 
1.9.5.msysgit.1

[dpdk-dev] [PATCH] vhost: tcp pkt with virtio header in one desc

2015-05-28 Thread Ouyang, Changchun

Pls see the patch: "Fix vhost enqueue/dequeue issue" for more fixing on the 
vhost enqueue/dequeue.
We don't need this duplicated fix and it only fixes partial issue.
Thanks
Changchun


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen
> Hemminger
> Sent: Thursday, May 28, 2015 10:51 PM
> To: Wei li
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vhost: tcp pkt with virtio header in one desc
> 
> On Thu, 28 May 2015 16:19:44 +0800
> Wei li  wrote:
> 
> > +   if (desc->flags & VRING_DESC_F_NEXT)
> > +   {
> > +   /* Discard first buffer as it is the virtio header */
> > +   desc = >desc[desc->next];
> > +   vb_offset = 0;
> > +   vb_avail = desc->len;
> > +   }
> > +   else /* virtio header in one desc with real pkt */
> > +   {
> > +   /* strip the virtio header */
> > +   vb_offset = vq->vhost_hlen;
> > +   vb_avail = desc->len - vq->vhost_hlen;
> > +
> This code looks correct, but please follow the same style as other code in the
> driver. The virtio driver uses Linux/BSD
> style:
>   if () {
>   } else {
>   }

[dpdk-dev] [PATCH v3 01/10] table: added structure for storing table stats

2015-05-28 Thread Stephen Hemminger

On Thu, 28 May 2015 19:32:32 +
"Dumitrescu, Cristian"  wrote:

> This is just adding  a new field at the end of an API data structure. Based 
> on input from multiple people and after reviewing the rules listed on 
> http://dpdk.org/doc/guides/rel_notes/abi.html , I think this is an acceptable 
> change. There are other patches in flight on this mailing list that are in 
> the same situation. Any typical/well behaved application will not break due 
> to this change.

Expanding a structure can be okay but:
  1. The allocation will have to always take within the library.
 If you let application put structure on stack or allocate on it's own, the 
ABI would break.

  2. The structure must not be used as a return by reference.
 For example, this would break if sizeof(struct my_stats) changed.

 void foo() {
 struct my_stats stats;
 int i_will_get_clobbered;
...
rte_dpdk_get_stats(obj, )
}

[dpdk-dev] [PATCH v7 09/10] igb: enable rx queue interrupts for PF

2015-05-28 Thread Stephen Hemminger

On Tue,  5 May 2015 13:39:45 +0800
Cunming Liang  wrote:

> + pci_dev->intr_handle.intr_vec =
> + rte_zmalloc("intr_vec",
> + dev_info.max_rx_queues * sizeof(int), 0);
> + 

This and other drivers should be using rte_zmalloc_socket to ensure
that the intr_vec table is allocated on the same NUMA node as the hardware.

[dpdk-dev] [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-05-28 Thread Simon Kågström

On 2015-05-28 12:48, Wodkowski, PawelX wrote:
>>> Please only check if UTS_RELEASE is available on all Ubuntu versions DPDK
>> support.
>>
>> From some digging, it appears it entered the kernel tree in 2006 and
>> moved to include/generated/ in 2009 so I guess that should be fine for
>> DPDK builds?
> 
> I also think that it is OK but I also think should check by building you (o 
> ask
> someone to do it for you)  on those systems not by theory :)

Well, I think this is one of the main motivations from something like
what Thomas F. Herbert proposed in another thread recently,

  DPDK: Proposal for a patch patch-test integration tree

basically, a continuous-integration-type of system should test-build
(and probably test) any prospective patch to see that it builds for
various targets. In my view, this would be a perfect match for
github+travis-ci.

Anyway, I'll see if I can dig up an older Ubuntu to build on, unless
someone else steps up and tests the patch. (My issue to start with was
that the build fails on a 14.04 chroot on a 12.04 host, but I only have
access to the chroot there).

// Simon

[dpdk-dev] [PATCH] vhost: tcp pkt with virtio header in one desc

2015-05-28 Thread Ouyang, Changchun

I have sent out another patch which has already included such fix,
That patch also fix other issue in rx path, and it is in reworking,
I will send out the v2 version soon.
Accordingly, this patch is a duplicated one.
Thanks
Changchun


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wei li
> Sent: Thursday, May 28, 2015 4:20 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] vhost: tcp pkt with virtio header in one desc
> 
> Signed-off-by: Wei li 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 4809d32..2d3ea92 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -588,8 +588,19 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
> uint16_t queue_id,
> 
>   desc = >desc[head[entry_success]];
> 
> - /* Discard first buffer as it is the virtio header */
> - desc = >desc[desc->next];
> + if (desc->flags & VRING_DESC_F_NEXT)
> + {
> + /* Discard first buffer as it is the virtio header */
> + desc = >desc[desc->next];
> + vb_offset = 0;
> + vb_avail = desc->len;
> + }
> + else /* virtio header in one desc with real pkt */
> + {
> + /* strip the virtio header */
> + vb_offset = vq->vhost_hlen;
> + vb_avail = desc->len - vq->vhost_hlen;
> + }
> 
>   /* Buffer address translation. */
>   vb_addr = gpa_to_vva(dev, desc->addr); @@ -608,8 +619,6
> @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
>   vq->used->ring[used_idx].id = head[entry_success];
>   vq->used->ring[used_idx].len = 0;
> 
> - vb_offset = 0;
> - vb_avail = desc->len;
>   /* Allocate an mbuf and populate the structure. */
>   m = rte_pktmbuf_alloc(mbuf_pool);
>   if (unlikely(m == NULL)) {
> --
> 1.9.5.msysgit.1
>

[dpdk-dev] [RFC PATCH V2] drivers/net/bonding: add support for PCI Port Hotplug

2015-05-28 Thread Bernard Iremonger

This patch depends on the Port Hotplug Framework.
It implements the rte_dev_uninit_t() function for the link bonding pmd.

Changes in V2:
Rebased to use drivers/net/bonding dirctory
Free rx and tx queues following comments from Declan

Signed-off-by: Bernard Iremonger 
---
 drivers/net/bonding/rte_eth_bond.h |   13 -
 drivers/net/bonding/rte_eth_bond_api.c |   82 +++-
 drivers/net/bonding/rte_eth_bond_pmd.c |   23 +++-
 drivers/net/bonding/rte_eth_bond_private.h |7 ++-
 4 files changed, 95 insertions(+), 30 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond.h 
b/drivers/net/bonding/rte_eth_bond.h
index d688fc3..8efbf07 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -131,6 +131,17 @@ int
 rte_eth_bond_create(const char *name, uint8_t mode, uint8_t socket_id);

 /**
+ * Free a bonded rte_eth_dev device
+ *
+ * @param name Name of the link bonding device.
+ *
+ * @return
+ * 0 on success, negative value otherwise
+ */
+int
+rte_eth_bond_free(const char *name);
+
+/**
  * Add a rte_eth_dev device as a slave to the bonded device
  *
  * @param bonded_port_id   Port ID of bonded device.
diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index e91a623..04271aa 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -192,7 +192,15 @@ number_of_sockets(void)
return ++sockets;
 }

-const char *driver_name = "Link Bonding PMD";
+const char driver_name[] = "rte_bond_pmd";
+static struct rte_pci_id pci_id_table;
+
+static struct eth_driver rte_bond_pmd = {
+   .pci_drv = {
+   .name = driver_name,
+   .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_DETACHABLE,
+   },
+};

 int
 rte_eth_bond_create(const char *name, uint8_t mode, uint8_t socket_id)
@@ -200,9 +208,8 @@ rte_eth_bond_create(const char *name, uint8_t mode, uint8_t 
socket_id)
struct rte_pci_device *pci_dev = NULL;
struct bond_dev_private *internals = NULL;
struct rte_eth_dev *eth_dev = NULL;
-   struct eth_driver *eth_drv = NULL;
struct rte_pci_driver *pci_drv = NULL;
-   struct rte_pci_id *pci_id_table = NULL;
+
/* now do all data allocation - for eth_dev structure, dummy pci driver
 * and internal (private) data
 */
@@ -224,26 +231,15 @@ rte_eth_bond_create(const char *name, uint8_t mode, 
uint8_t socket_id)
goto err;
}

-   eth_drv = rte_zmalloc_socket(name, sizeof(*eth_drv), 0, socket_id);
-   if (eth_drv == NULL) {
-   RTE_BOND_LOG(ERR, "Unable to malloc eth_drv on socket");
-   goto err;
-   }
-
-   pci_drv = _drv->pci_drv;
+   pci_drv = _bond_pmd.pci_drv;

-   pci_id_table = rte_zmalloc_socket(name, sizeof(*pci_id_table), 0, 
socket_id);
-   if (pci_id_table == NULL) {
-   RTE_BOND_LOG(ERR, "Unable to malloc pci_id_table on socket");
-   goto err;
-   }
-   pci_id_table->device_id = PCI_ANY_ID;
-   pci_id_table->subsystem_device_id = PCI_ANY_ID;
-   pci_id_table->vendor_id = PCI_ANY_ID;
-   pci_id_table->subsystem_vendor_id = PCI_ANY_ID;
+   memset(_id_table, 0, sizeof(pci_id_table));
+   pci_id_table.device_id = PCI_ANY_ID;
+   pci_id_table.subsystem_device_id = PCI_ANY_ID;
+   pci_id_table.vendor_id = PCI_ANY_ID;
+   pci_id_table.subsystem_vendor_id = PCI_ANY_ID;

-   pci_drv->id_table = pci_id_table;
-   pci_drv->drv_flags = RTE_PCI_DRV_INTR_LSC;
+   pci_drv->id_table = _id_table;

internals = rte_zmalloc_socket(name, sizeof(*internals), 0, socket_id);
if (internals == NULL) {
@@ -261,7 +257,7 @@ rte_eth_bond_create(const char *name, uint8_t mode, uint8_t 
socket_id)
pci_dev->numa_node = socket_id;
pci_drv->name = driver_name;

-   eth_dev->driver = eth_drv;
+   eth_dev->driver = _bond_pmd;
eth_dev->data->dev_private = internals;
eth_dev->data->nb_rx_queues = (uint16_t)1;
eth_dev->data->nb_tx_queues = (uint16_t)1;
@@ -317,13 +313,49 @@ rte_eth_bond_create(const char *name, uint8_t mode, 
uint8_t socket_id)

 err:
rte_free(pci_dev);
-   rte_free(pci_id_table);
-   rte_free(eth_drv);
rte_free(internals);
+

[dpdk-dev] DPDK: Proposal for a patch patch-test integration tree

2015-05-28 Thread Wiles, Keith



On 5/28/15, 1:54 AM, "Simon K?gstr?m" 
wrote:

>I spot a can of worms to be opened here :-)
>
>On 2015-05-27 22:48, Thomas F Herbert wrote:
>> Work Flow and Process:
>> 
>> All patches will be taken from from public submissions to dpdk-dev.org
>> scraped from dpdk patchwork. Patches will be applied to the patch-test
>> tree and tested against HEAD as they are received. The feedback from the
>> testing will be provided to the community. The patch-test tree will
>> periodically be git pull'ed from dpdk.
>> 
>> Longer term goal:
>> 
>> Initially, the patches will be applied along with some simple smoke
>> tests. The longer term goal is to automate this process, apply more
>> extensive tests and post the results in dpdk patchwork,
>> http://dpdk.org/dev/patchwork/project/dpdk/list/ which would have an
>> accompanying mailing list for distribution of a results summary of the
>> tests.
>
>Actually, github and services such as travis-ci and coveralls already
>provide this functionality (with very little setup). So when someone
>sends a pull request, the continuous integration service travis-ci will
>notice it, and start a build and (possibly) run a test suite on the code
>- with the patches applied.
>
>If code coverage is collected in the process [1], it's uploaded to the
>coveralls site. Both travis-ci and coveralls will add a note to the pull
>request saying something like "Build failed with this patch, be careful"
>or "Build OK, everything is fine" and "Coverage decreased with 5% with
>this patch" etc etc.
>
>Of course, github provides an API so it's entirely possible to add your
>own continuous integration support with the same functionality as
>travis-ci (customized for DPDK).
>
>
>
>So before venturing into implementing something like this, I think the
>DPDK project should at least consider the existing alternatives.

Using the above process seems very reasonable and we already have github
account we can mirror the code and run these processes. This gives us a
more standard method instead of a home grown method, but as Simon states
we could enhance the process via the API.

I may have bruised a few worms here, so very sorry :-)
>
>And with that, I close the can of worms again. I hope no worms were hurt
>in the process! :-)
>
>// Simon
>
>[1] https://github.com/SimonKagstrom/kcov - yes my personal project

[dpdk-dev] [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-05-28 Thread Simon Kågström

On 2015-05-28 12:05, Wodkowski, PawelX wrote:
>>>
>>> -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2 /proc/version_signature | \
>>> -cut -d'~' -f1 | cut -d- -f1,2 | tr .- $(comma))
>>> +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE
>>> $(RTE_KERNELDIR)/include/generated/utsrelease.h \
>>> +| cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1)
> 
> It is fine for me if it do the job and does not break build on other OS (also 
> other 
> Ubuntu versions especially 12.04 if we still support it).
> Please only check if UTS_RELEASE is available on all Ubuntu versions DPDK 
> support.

>From some digging, it appears it entered the kernel tree in 2006 and
moved to include/generated/ in 2009 so I guess that should be fine for
DPDK builds?

// Simon

[dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh

2015-05-28 Thread Zoltan Kiss

The requirements for rte_eth_tx_burst(), which calls a driver specific 
function, in case of ixgbe, these two:

"It is the responsibility of the rte_eth_tx_burst() function to 
transparently free the memory buffers of packets previously sent. This 
feature is driven by the *tx_free_thresh* value supplied to the 
rte_eth_dev_configure() function at device configuration time. When the 
number of previously sent packets reached the "minimum transmit packets 
to free" threshold, the rte_eth_tx_burst() function must [attempt to] 
free the *rte_mbuf*  buffers of those packets whose transmission was 
effectively completed."

Also rte_eth_tx_queue_setup() uses the same description for tx_free_thresh:

"The *tx_free_thresh* value indicates the [minimum] number of network 
buffers that must be pending in the transmit ring to trigger their 
[implicit] freeing by the driver transmit function."

And all the other poll mode drivers are using this formula. Plus I've 
described a possible hang situation in the commit message.

On 28/05/15 11:50, Venkatesan, Venky wrote:
> NAK. This causes more (unsuccessful) cleanup attempts on the descriptor ring. 
> What is motivating this change?
>
> Regards,
> Venky
>
>
>> On May 28, 2015, at 1:42 AM, Zoltan Kiss  wrote:
>>
>> This check doesn't do what's required by rte_eth_tx_burst:
>> "When the number of previously sent packets reached the "minimum transmit
>> packets to free" threshold"
>>
>> This can cause problems when txq->tx_free_thresh + [number of elements in the
>> pool] < txq->nb_tx_desc.
>>
>> Signed-off-by: Zoltan Kiss 
>> ---
>> drivers/net/ixgbe/ixgbe_rxtx.c | 4 ++--
>> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
>> 2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>> index 4f9ab22..b70ed8c 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>
>> /*
>>  * Begin scanning the H/W ring for done descriptors when the
>> - * number of available descriptors drops below tx_free_thresh.  For
>> + * number of in flight descriptors reaches tx_free_thresh. For
>>  * each done descriptor, free the associated buffer.
>>  */
>> -if (txq->nb_tx_free < txq->tx_free_thresh)
>> +if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>> ixgbe_tx_free_bufs(txq);
>>
>> /* Only use descriptors that are available */
>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> index abd10f6..f91c698 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf 
>> **tx_pkts,
>> if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
>> nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
>>
>> -if (txq->nb_tx_free < txq->tx_free_thresh)
>> +if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>> ixgbe_tx_free_bufs(txq);
>>
>> nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
>> --
>> 1.9.1
>>

[dpdk-dev] DPDK: Proposal for a patch patch-test integration tree

2015-05-28 Thread Thomas Monjalon

2015-05-27 16:48, Thomas F Herbert:
> This is a proposal to create a patch-test dpdk tree. I welcome feedback 
> from the dpdk community on this proposal.
> 
> This tree will consist of pre-integrated patches submitted to dpdk-dev 
> mailing list. The purpose of this tree is to help reduce the effort in 
> reviewing patches and providing feedback to submitters of patches. Also 
> integrators of dpdk can benefit by having a "linux-next" like tree to 
> clone and or try new features for performance and breadth of hardware 
> support. It is the hope that the patch-test tree will provide early 
> feedback to patch reviewers as well as an assurance of quality to the 
> project maintainer.

Thanks for the proposal. This tree may be hosted here:
http://dpdk.org/browse
You just have to give the name of the repository and a SSH key.

When it will be done, we could discuss which tests should be passed.

[dpdk-dev] Packet Cloning

2015-05-28 Thread Kyle Larose

I'm fairly new to dpdk, so I may be completely out to lunch on this, but
here's an idea to possibly improve performance compared to a straight copy
of the entire packet. If this idea makes sense, perhaps it could be added
to the mbuf library as an extension of the clone functionality?

If you are only modifying the headers (say the Ethernet header), is it
possible to make a copy of only the first N bytes (say 32 bytes)?

For example, you make two new "main" mbufs, which contain duplicate
metadata, and a copy of the first 32 bytes of the packet. Call them A and
B. Have both A and B chain to the original mbuf (call it O), which is
reference counted as with the normal clone functionality. Then, you adjust
the O such that its start data is 32 bytes into the packet.

When you transmit A, it will send its own copy of the 32 bytes, plus the
unaltered remaining data contained in O. A will be freed, and the refcount
of O decremented. When you transmit B, it will work the same as with the
previous one, except that when the refcount on O is decremented, it reaches
zero and it is freed as well.

I'm not sure if this makes sense in all cases (for example, maybe it's just
faster to allocate separate mbufs for 64-byte packets). Perhaps that could
also be handled transparently underneath the hood.

Thoughts?

Thanks,

Kyle

On Thu, May 28, 2015 at 11:10 AM, Matt Laswell 
wrote:

> Since Padam is going to be altering payload, he likely cannot use that API.
> The rte_pktmbuf_clone() API doesn't make a copy of the payload.  Instead,
> it gives you a second mbuf whose payload pointer points back to the
> contents of the first (and also increments the reference counter on the
> first so that it isn't actually freed until all clones are accounted for).
> This is very fast, which is good.  However, since there's only really one
> buffer full of payload, changes in the original also affect the clone and
> vice versa.  This can have surprising and unpleasant side effects that may
> not show up until you are under load, which is awesome*.
>
> For what it's worth, if you need to be able to modify the copy while
> leaving the original alone, I don't believe that there's a good solution
> within DPDK.   However, writing your own API to copy rather than clone a
> packet mbuf isn't difficult.
>
> --
> Matt Laswell
> infinite io, inc.
> laswell at infiniteio.com
>
> * Don't ask me how I know how much awesome fun this can be, though I
> suspect you can guess.
>
> On Thu, May 28, 2015 at 9:52 AM, Stephen Hemminger <
> stephen at networkplumber.org> wrote:
>
> > On Thu, 28 May 2015 17:15:42 +0530
> > Padam Jeet Singh  wrote:
> >
> > > Hello,
> > >
> > > Is there a function in DPDK to completely clone a pkt_mbuf including
> the
> > segments?
> > >
> > > I am trying to build a packet mirroring application which sends packet
> > out through two separate interfaces, but the packet payload needs to be
> > altered before send.
> > >
> > > Thanks,
> > > Padam
> > >
> > >
> >
> > Isn't this what you want?
> >
> > /**
> >  * Creates a "clone" of the given packet mbuf.
> >  *
> >  * Walks through all segments of the given packet mbuf, and for each of
> > them:
> >  *  - Creates a new packet mbuf from the given pool.
> >  *  - Attaches newly created mbuf to the segment.
> >  * Then updates pkt_len and nb_segs of the "clone" packet mbuf to match
> > values
> >  * from the original packet mbuf.
> >  *
> >  * @param md
> >  *   The packet mbuf to be cloned.
> >  * @param mp
> >  *   The mempool from which the "clone" mbufs are allocated.
> >  * @return
> >  *   - The pointer to the new "clone" mbuf on success.
> >  *   - NULL if allocation fails.
> >  */
> > static inline struct rte_mbuf *rte_pktmbuf_clone(struct rte_mbuf *md,
> > struct rte_mempool *mp)
> >
>

[dpdk-dev] [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-05-28 Thread Wodkowski, PawelX



> -Original Message-
> From: Buriez, Patrice
> Sent: Thursday, May 28, 2015 1:07 PM
> To: Wodkowski, PawelX; Simon K?gstr?m; Zhang, Helin; Alexander Guy; Julien
> Cretin
> Cc: dev at dpdk.org
> Subject: RE: [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version
> 
> Hi all,
> Please forgive top reply and bottom disclaimer.
> Not sure anyway that this email will reach the mailing list, since I did not
> subscribe to it.
> 
> I am worried about the removal of: cut -d'~' -f1
> It was introduced by Pawel in commit

You are absolutely right. That is why I asked to check with documentation
and to verify  with real build on all supported Ubuntu versions. :)


-- 
Pawel

[dpdk-dev] [PATCH] kni: ignore double calls to rte_kni_init()

2015-05-28 Thread Marc Sune

Prevent double initialization of the KNI subsytem.

Signed-off-by: Marc Sune 
---
 lib/librte_kni/rte_kni.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index c5a0089..df0449f 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -201,6 +201,10 @@ rte_kni_init(unsigned int max_kni_ifaces)
char obj_name[OBJNAMSIZ];
char mz_name[RTE_MEMZONE_NAMESIZE];

+   /* Immediately return if KNI is already initialized */
+   if (kni_memzone_pool.initialized)
+   return;
+
if (max_kni_ifaces == 0) {
RTE_LOG(ERR, KNI, "Invalid number of max_kni_ifaces %d\n",
max_kni_ifaces);
-- 
2.1.4

[dpdk-dev] [PATCH / RFC] kni: Add set_rx_mode callback to handle multicast groups

2015-05-28 Thread Simon Kågström

Stephen, Helin, perhaps you have some comment about this patch?

// Simon

On 2015-05-07 15:17, Simon Kagstrom wrote:
> This is needed to add / remove interfaces in multicast groups via the
> ip tool.
> 
> The callback does nothing - the same as the kernel tun.c.
> 
> Signed-off-by: Simon Kagstrom 
> ---
> Marked RFC since I'm by no means an expert on this. We noticed this
> when playing with KNI and IGMP handling.
> 
>  lib/librte_eal/linuxapp/kni/kni_net.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c
> b/lib/librte_eal/linuxapp/kni/kni_net.c index dd95db5..cf93c4b 100644
> --- a/lib/librte_eal/linuxapp/kni/kni_net.c
> +++ b/lib/librte_eal/linuxapp/kni/kni_net.c
> @@ -495,6 +495,11 @@ kni_net_ioctl(struct net_device *dev, struct ifreq
> *rq, int cmd) return 0;
>  }
>  
> +static void
> +kni_net_set_rx_mode(struct net_device *dev)
> +{
> +}
> +
>  static int
>  kni_net_change_mtu(struct net_device *dev, int new_mtu)
>  {
> @@ -645,6 +650,7 @@ static const struct net_device_ops
> kni_net_netdev_ops = { .ndo_start_xmit = kni_net_tx,
>   .ndo_change_mtu = kni_net_change_mtu,
>   .ndo_do_ioctl = kni_net_ioctl,
> + .ndo_set_rx_mode = kni_net_set_rx_mode,
>   .ndo_get_stats = kni_net_stats,
>   .ndo_tx_timeout = kni_net_tx_timeout,
>   .ndo_set_mac_address = kni_net_set_mac,
>

[dpdk-dev] Packet Cloning

2015-05-28 Thread Matt Laswell

Hey Kyle,

That's one way you can handle it, though I suspect you'll end up with some
complexity elsewhere in your code to deal with remembering whether you
should look at the original data or the copied and modified data.  Another
way is just to make a copy of the original mbuf, but have your copy API
stop after it reaches some particular point.  Perhaps just the L2-L4
headers, perhaps a few hundred bytes into payload, or perhaps something
else entirely. This all gets very application dependent, of course.  How
much is "enough" is going to depend heavily on what you're trying to
accomplish.

-- 
Matt Laswell
infinite io, inc.
laswell at infiniteio.com


On Thu, May 28, 2015 at 10:38 AM, Kyle Larose  wrote:

> I'm fairly new to dpdk, so I may be completely out to lunch on this, but
> here's an idea to possibly improve performance compared to a straight copy
> of the entire packet. If this idea makes sense, perhaps it could be added
> to the mbuf library as an extension of the clone functionality?
>
> If you are only modifying the headers (say the Ethernet header), is it
> possible to make a copy of only the first N bytes (say 32 bytes)?
>
> For example, you make two new "main" mbufs, which contain duplicate
> metadata, and a copy of the first 32 bytes of the packet. Call them A and
> B. Have both A and B chain to the original mbuf (call it O), which is
> reference counted as with the normal clone functionality. Then, you adjust
> the O such that its start data is 32 bytes into the packet.
>
> When you transmit A, it will send its own copy of the 32 bytes, plus the
> unaltered remaining data contained in O. A will be freed, and the refcount
> of O decremented. When you transmit B, it will work the same as with the
> previous one, except that when the refcount on O is decremented, it reaches
> zero and it is freed as well.
>
> I'm not sure if this makes sense in all cases (for example, maybe it's
> just faster to allocate separate mbufs for 64-byte packets). Perhaps that
> could also be handled transparently underneath the hood.
>
> Thoughts?
>
> Thanks,
>
> Kyle
>
> On Thu, May 28, 2015 at 11:10 AM, Matt Laswell 
> wrote:
>
>> Since Padam is going to be altering payload, he likely cannot use that
>> API.
>> The rte_pktmbuf_clone() API doesn't make a copy of the payload.  Instead,
>> it gives you a second mbuf whose payload pointer points back to the
>> contents of the first (and also increments the reference counter on the
>> first so that it isn't actually freed until all clones are accounted for).
>> This is very fast, which is good.  However, since there's only really one
>> buffer full of payload, changes in the original also affect the clone and
>> vice versa.  This can have surprising and unpleasant side effects that may
>> not show up until you are under load, which is awesome*.
>>
>> For what it's worth, if you need to be able to modify the copy while
>> leaving the original alone, I don't believe that there's a good solution
>> within DPDK.   However, writing your own API to copy rather than clone a
>> packet mbuf isn't difficult.
>>
>> --
>> Matt Laswell
>> infinite io, inc.
>> laswell at infiniteio.com
>>
>> * Don't ask me how I know how much awesome fun this can be, though I
>> suspect you can guess.
>>
>> On Thu, May 28, 2015 at 9:52 AM, Stephen Hemminger <
>> stephen at networkplumber.org> wrote:
>>
>> > On Thu, 28 May 2015 17:15:42 +0530
>> > Padam Jeet Singh  wrote:
>> >
>> > > Hello,
>> > >
>> > > Is there a function in DPDK to completely clone a pkt_mbuf including
>> the
>> > segments?
>> > >
>> > > I am trying to build a packet mirroring application which sends packet
>> > out through two separate interfaces, but the packet payload needs to be
>> > altered before send.
>> > >
>> > > Thanks,
>> > > Padam
>> > >
>> > >
>> >
>> > Isn't this what you want?
>> >
>> > /**
>> >  * Creates a "clone" of the given packet mbuf.
>> >  *
>> >  * Walks through all segments of the given packet mbuf, and for each of
>> > them:
>> >  *  - Creates a new packet mbuf from the given pool.
>> >  *  - Attaches newly created mbuf to the segment.
>> >  * Then updates pkt_len and nb_segs of the "clone" packet mbuf to match
>> > values
>> >  * from the original packet mbuf.
>> >  *
>> >  * @param md
>> >  *   The packet mbuf to be cloned.
>> >  * @param mp
>> >  *   The mempool from which the "clone" mbufs are allocated.
>> >  * @return
>> >  *   - The pointer to the new "clone" mbuf on success.
>> >  *   - NULL if allocation fails.
>> >  */
>> > static inline struct rte_mbuf *rte_pktmbuf_clone(struct rte_mbuf *md,
>> > struct rte_mempool *mp)
>> >
>>
>
>

[dpdk-dev] [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-05-28 Thread Buriez, Patrice

Hi all,
Please forgive top reply and bottom disclaimer.
Not sure anyway that this email will reach the mailing list, since I did not 
subscribe to it.

I am worried about the removal of: cut -d'~' -f1
It was introduced by Pawel in commit 35170c52d0ae33dc30e69bcf681e5a17168bf11e
http://dpdk.org/browse/dpdk/commit/lib/librte_eal/linuxapp/kni/Makefile?id=35170c52d0ae33dc30e69bcf681e5a17168bf11e
in order to fix the parsing of: 3.11.0-15.25~precise1-generic
Not sure what utsrelease.h would contain in this specific case, but removal of 
~precise1-generic is broken with this recent patch.

Regards,
Patrice

-Original Message-
From: Wodkowski, PawelX 
Sent: Thursday, May 28, 2015 12:48 PM
To: Simon K?gstr?m; Zhang, Helin; Alexander Guy; Julien Cretin; Buriez, Patrice
Cc: dev at dpdk.org
Subject: RE: [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

> -Original Message-
> From: Simon K?gstr?m [mailto:simon.kagstrom at netinsight.net]
> Sent: Thursday, May 28, 2015 12:37 PM
> To: Wodkowski, PawelX; Zhang, Helin; Alexander Guy; Julien Cretin; 
> Buriez, Patrice
> Cc: dev at dpdk.org
> Subject: Re: [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel 
> version
> 
> On 2015-05-28 12:05, Wodkowski, PawelX wrote:
> >>>
> >>> -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2 /proc/version_signature | \
> >>> -cut -d'~' -f1 | cut -d- -f1,2 | tr .- $(comma))
> >>> +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE
> >>> $(RTE_KERNELDIR)/include/generated/utsrelease.h \
> >>> +  | cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1)
> >
> > It is fine for me if it do the job and does not break build on other 
> > OS (also other Ubuntu versions especially 12.04 if we still support it).
> > Please only check if UTS_RELEASE is available on all Ubuntu versions 
> > DPDK
> support.
> 
> From some digging, it appears it entered the kernel tree in 2006 and 
> moved to include/generated/ in 2009 so I guess that should be fine for 
> DPDK builds?
> 
> // Simon

I also think that it is OK but I also think should check by building you (o ask 
someone to do it for you)  on those systems not by theory :)

--
Pawel

Intel Corporation NV/SA
Kings Square, Veldkant 31
2550 Kontich
RPM (Bruxelles) 0415.497.718. 
Citibank, Brussels, account 570/1031255/09

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] DPDK: Proposal for a patch patch-test integration tree

2015-05-28 Thread Thomas F Herbert



On 5/28/15 10:44 AM, Thomas Monjalon wrote:
> 2015-05-28 10:39, Thomas F Herbert:
>> On 5/28/15 5:48 AM, Thomas Monjalon wrote:
>>> 2015-05-27 16:48, Thomas F Herbert:
 This is a proposal to create a patch-test dpdk tree. I welcome feedback
 from the dpdk community on this proposal.

 This tree will consist of pre-integrated patches submitted to dpdk-dev
 mailing list. The purpose of this tree is to help reduce the effort in
 reviewing patches and providing feedback to submitters of patches. Also
 integrators of dpdk can benefit by having a "linux-next" like tree to
 clone and or try new features for performance and breadth of hardware
 support. It is the hope that the patch-test tree will provide early
 feedback to patch reviewers as well as an assurance of quality to the
 project maintainer.
>>>
>>> Thanks for the proposal. This tree may be hosted here:
>>> http://dpdk.org/browse
>>> You just have to give the name of the repository and a SSH key.
>>
>> Shall we call the repo patch-test-next?
>
> Why not dpdk-next?
> With this description? Apply and test every patches before acceptance
Sounds good to me.
>
>> I am attaching two public keys to this email
>
> Received, thanks
>

[dpdk-dev] [PATCH 2/2] kni: add missing include dependencies

2015-05-28 Thread Marc Sune



On 25/05/15 14:23, Bruce Richardson wrote:
> The file rte_kni.h depends upon a number of other headers, some of which
> are missing from the #include lines. The following #includes are added:
>   * rte_memory.h - for the definition of phys_addr_t
>   * rte_mempool.h - for the definition of mempool struct and the mempool
> create function.
>
> Signed-off-by: Bruce Richardson 
> ---
>   lib/librte_kni/rte_kni.h | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h
> index 98edd72..44240fe 100644
> --- a/lib/librte_kni/rte_kni.h
> +++ b/lib/librte_kni/rte_kni.h
> @@ -47,6 +47,8 @@
>*/
>   
>   #include 
> +#include 
> +#include 
>   
>   #include 
>   

A fwd declaration of struct rte_mempool would be sufficient, but

Acked-by: Marc Sune

[dpdk-dev] [PATCH 1/2] eal: add missing include to rte_pci.h

2015-05-28 Thread Marc Sune



On 25/05/15 14:23, Bruce Richardson wrote:
> rte_pci.h depends upon stdio.h for the definition of the FILE type. Add
> in #include  to the file to satisfy this dependency in cases
> where the including C file does not already include stdio.
>
> Signed-off-by: Bruce Richardson 
> ---
>   lib/librte_eal/common/include/rte_pci.h | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/lib/librte_eal/common/include/rte_pci.h 
> b/lib/librte_eal/common/include/rte_pci.h
> index 223d3cd..a346532 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -74,6 +74,7 @@
>   extern "C" {
>   #endif
>   
> +#include 
>   #include 
>   #include 
>   #include 

Acked-by: Marc Sune

[dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh

2015-05-28 Thread Venkatesan, Venky

NAK. This causes more (unsuccessful) cleanup attempts on the descriptor ring. 
What is motivating this change? 

Regards,
Venky


> On May 28, 2015, at 1:42 AM, Zoltan Kiss  wrote:
> 
> This check doesn't do what's required by rte_eth_tx_burst:
> "When the number of previously sent packets reached the "minimum transmit
> packets to free" threshold"
> 
> This can cause problems when txq->tx_free_thresh + [number of elements in the
> pool] < txq->nb_tx_desc.
> 
> Signed-off-by: Zoltan Kiss 
> ---
> drivers/net/ixgbe/ixgbe_rxtx.c | 4 ++--
> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index 4f9ab22..b70ed8c 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> 
>/*
> * Begin scanning the H/W ring for done descriptors when the
> - * number of available descriptors drops below tx_free_thresh.  For
> + * number of in flight descriptors reaches tx_free_thresh. For
> * each done descriptor, free the associated buffer.
> */
> -if (txq->nb_tx_free < txq->tx_free_thresh)
> +if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>ixgbe_tx_free_bufs(txq);
> 
>/* Only use descriptors that are available */
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> index abd10f6..f91c698 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf 
> **tx_pkts,
>if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
>nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
> 
> -if (txq->nb_tx_free < txq->tx_free_thresh)
> +if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>ixgbe_tx_free_bufs(txq);
> 
>nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
> -- 
> 1.9.1
>

[dpdk-dev] [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-05-28 Thread Wodkowski, PawelX


> -Original Message-
> From: Simon K?gstr?m [mailto:simon.kagstrom at netinsight.net]
> Sent: Thursday, May 28, 2015 12:37 PM
> To: Wodkowski, PawelX; Zhang, Helin; Alexander Guy; Julien Cretin; Buriez,
> Patrice
> Cc: dev at dpdk.org
> Subject: Re: [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version
> 
> On 2015-05-28 12:05, Wodkowski, PawelX wrote:
> >>>
> >>> -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2 /proc/version_signature | \
> >>> -cut -d'~' -f1 | cut -d- -f1,2 | tr .- $(comma))
> >>> +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE
> >>> $(RTE_KERNELDIR)/include/generated/utsrelease.h \
> >>> +  | cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1)
> >
> > It is fine for me if it do the job and does not break build on other OS 
> > (also other
> > Ubuntu versions especially 12.04 if we still support it).
> > Please only check if UTS_RELEASE is available on all Ubuntu versions DPDK
> support.
> 
> From some digging, it appears it entered the kernel tree in 2006 and
> moved to include/generated/ in 2009 so I guess that should be fine for
> DPDK builds?
> 
> // Simon

I also think that it is OK but I also think should check by building you (o ask
someone to do it for you)  on those systems not by theory :)

-- 
Pawel

[dpdk-dev] [PATCH] rte_reorder: Allow sequence numbers > 0 as starting point

2015-05-28 Thread Gonzalez Monroy, Sergio

On 28/05/2015 09:15, Simon K?gstr?m wrote:
> Thanks for the review, Sergio!
>
> On 2015-05-28 09:49, Gonzalez Monroy, Sergio wrote:
>>> @@ -325,6 +327,12 @@ rte_reorder_insert(struct rte_reorder_buffer *b,
>>> struct rte_mbuf *mbuf)
>>>uint32_t offset, position;
>>>struct cir_buffer *order_buf = >order_buf;
>>>+if (!b->is_initialized) {
>>> +b->min_seqn = mbuf->seqn;
>>> +
>>> +b->is_initialized = 1;
>>> +}
>>> +
>>>/*
>>> * calculate the offset from the head pointer we need to go.
>>> * The subtraction takes care of the sequence number wrapping.
>> So my first impression was, why do this in insert instead of init?
>> I guess the goal was trying to avoid changing the API, but would it not
>> be worth it? after all is a one time thing only.
> We don't know the first sequence number until the first insert, so I
> think it has to be there. Alternatively, there could be an API to set
> the minimum sequence number, but I think that would instead make the
> application uglier, and isn't that also just exposing library
> implementation details in the API?
Yes, I agree.
>> About the implementation, packets being inserted could be out of order,
>> so the first packet inserted may not be the first in your sequence. Now
>> what happens with that packet would be app specific so probably is not a
>> big deal but what about initializing min_seqn to something like
>> (mbuf->seqn - b->size/2) ? That would give enough room for packets out
>> of order.
> I thought about that, but you will always miss some packets if you have
> an active stream at start anyway, so in the end I removed that part.
As you said, it would not make much difference from the stream point of 
view.
> But perhaps you are right about this issue, I'm not sure.
>
>> You should also update the documentation regarding rte_reorder_insert.
> Actually, the rte_reorder.h file says nothing about the (current)
> limitation of the first seq number having to be 0, so I think this patch
> actually improves the documentation without touching it :-)
Fair enough :)

Sergio
> // Simon

[dpdk-dev] DPDK: Proposal for a patch patch-test integration tree

2015-05-28 Thread Thomas F Herbert



On 5/27/15 5:22 PM, Vincent JARDIN wrote:
> On 27/05/2015 22:48, Thomas F Herbert wrote:
>> Work Flow and Process:
>>
>> All patches will be taken from from public submissions to dpdk-dev.org
>> scraped from dpdk patchwork. Patches will be applied to the patch-test
>> tree and tested against HEAD as they are received. The feedback from the
>> testing will be provided to the community. The patch-test tree will
>> periodically be git pull'ed from dpdk.
>>
>> Longer term goal:
>>
>> Initially, the patches will be applied along with some simple smoke
>> tests. The longer term goal is to automate this process, apply more
>> extensive tests and post the results in dpdk patchwork,
>> http://dpdk.org/dev/patchwork/project/dpdk/list/ which would have an
>> accompanying mailing list for distribution of a results summary of the
>> tests.
>
> thanks for helping.
>
> It could be broken into two parts:
>- patch-test-net-next,
> http://dpdk.org/browse/dpdk/tree/MAINTAINERS#n190
> http://dpdk.org/browse/dpdk/tree/drivers/net
>
>- patch-test-other-next,
> all, excepted drivers
Vincent, I don't understand what is in patch-test-next vs 
patch-test-other-next. Is patch-test-other-next to be for pmd drivers 
that are not included in the standard dpdk release?
>
> Best regards,
>Vincent

[dpdk-dev] [PATCH] rte_reorder: Allow sequence numbers > 0 as starting point

2015-05-28 Thread Simon Kågström

Thanks for the review, Sergio!

On 2015-05-28 09:49, Gonzalez Monroy, Sergio wrote:
>> @@ -325,6 +327,12 @@ rte_reorder_insert(struct rte_reorder_buffer *b,
>> struct rte_mbuf *mbuf)
>>   uint32_t offset, position;
>>   struct cir_buffer *order_buf = >order_buf;
>>   +if (!b->is_initialized) {
>> +b->min_seqn = mbuf->seqn;
>> +
>> +b->is_initialized = 1;
>> +}
>> +
>>   /*
>>* calculate the offset from the head pointer we need to go.
>>* The subtraction takes care of the sequence number wrapping.
> So my first impression was, why do this in insert instead of init?
> I guess the goal was trying to avoid changing the API, but would it not
> be worth it? after all is a one time thing only.

We don't know the first sequence number until the first insert, so I
think it has to be there. Alternatively, there could be an API to set
the minimum sequence number, but I think that would instead make the
application uglier, and isn't that also just exposing library
implementation details in the API?

> About the implementation, packets being inserted could be out of order,
> so the first packet inserted may not be the first in your sequence. Now
> what happens with that packet would be app specific so probably is not a
> big deal but what about initializing min_seqn to something like
> (mbuf->seqn - b->size/2) ? That would give enough room for packets out
> of order.

I thought about that, but you will always miss some packets if you have
an active stream at start anyway, so in the end I removed that part.

But perhaps you are right about this issue, I'm not sure.

> You should also update the documentation regarding rte_reorder_insert.

Actually, the rte_reorder.h file says nothing about the (current)
limitation of the first seq number having to be 0, so I think this patch
actually improves the documentation without touching it :-)

// Simon

[dpdk-dev] Packet Cloning

2015-05-28 Thread Matt Laswell

Since Padam is going to be altering payload, he likely cannot use that API.
The rte_pktmbuf_clone() API doesn't make a copy of the payload.  Instead,
it gives you a second mbuf whose payload pointer points back to the
contents of the first (and also increments the reference counter on the
first so that it isn't actually freed until all clones are accounted for).
This is very fast, which is good.  However, since there's only really one
buffer full of payload, changes in the original also affect the clone and
vice versa.  This can have surprising and unpleasant side effects that may
not show up until you are under load, which is awesome*.

For what it's worth, if you need to be able to modify the copy while
leaving the original alone, I don't believe that there's a good solution
within DPDK.   However, writing your own API to copy rather than clone a
packet mbuf isn't difficult.

-- 
Matt Laswell
infinite io, inc.
laswell at infiniteio.com

* Don't ask me how I know how much awesome fun this can be, though I
suspect you can guess.

On Thu, May 28, 2015 at 9:52 AM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Thu, 28 May 2015 17:15:42 +0530
> Padam Jeet Singh  wrote:
>
> > Hello,
> >
> > Is there a function in DPDK to completely clone a pkt_mbuf including the
> segments?
> >
> > I am trying to build a packet mirroring application which sends packet
> out through two separate interfaces, but the packet payload needs to be
> altered before send.
> >
> > Thanks,
> > Padam
> >
> >
>
> Isn't this what you want?
>
> /**
>  * Creates a "clone" of the given packet mbuf.
>  *
>  * Walks through all segments of the given packet mbuf, and for each of
> them:
>  *  - Creates a new packet mbuf from the given pool.
>  *  - Attaches newly created mbuf to the segment.
>  * Then updates pkt_len and nb_segs of the "clone" packet mbuf to match
> values
>  * from the original packet mbuf.
>  *
>  * @param md
>  *   The packet mbuf to be cloned.
>  * @param mp
>  *   The mempool from which the "clone" mbufs are allocated.
>  * @return
>  *   - The pointer to the new "clone" mbuf on success.
>  *   - NULL if allocation fails.
>  */
> static inline struct rte_mbuf *rte_pktmbuf_clone(struct rte_mbuf *md,
> struct rte_mempool *mp)
>

[dpdk-dev] [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-05-28 Thread Wodkowski, PawelX

> >
> > -ifeq ($(shell test -f /proc/version_signature && lsb_release -si
> > 2>/dev/null),Ubuntu)
> > +ifeq ($(shell lsb_release -si 2>/dev/null),Ubuntu)
> >  MODULE_CFLAGS += -DUBUNTU_RELEASE_CODE=$(shell lsb_release -sr | tr -
> d .)
> > -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2 /proc/version_signature | \
> > -cut -d'~' -f1 | cut -d- -f1,2 | tr .- $(comma))
> > +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE
> > $(RTE_KERNELDIR)/include/generated/utsrelease.h \
> > +| cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1)
> >  MODULE_CFLAGS +=
> > -
> D"UBUNTU_KERNEL_CODE=UBUNTU_KERNEL_VERSION($(UBUNTU_KERNEL_
> C
> > ODE))"
> >  endif
> >
> > --
> > 1.9.1
Hi,

It is fine for me if it do the job and does not break build on other OS (also 
other 
Ubuntu versions especially 12.04 if we still support it).
Please only check if UTS_RELEASE is available on all Ubuntu versions DPDK 
support.

Pawel

[dpdk-dev] [PATCH RFC 2/2] vhost: realloc virtio_net and virtqueue to the same node of vring desc table

2015-05-28 Thread Huawei Xie

When we get the address of vring descriptor table in VHOST_SET_VRING_ADDR 
message,
will try to reallocate virtio_net device and virtqueue to the same numa node.

Signed-off-by: Huawei Xie 
---
 config/common_linuxapp|  1 +
 lib/librte_vhost/Makefile |  4 ++
 lib/librte_vhost/virtio-net.c | 93 +++
 mk/rte.app.mk |  3 ++
 4 files changed, 101 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0078dc9..4ace24e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -421,6 +421,7 @@ CONFIG_RTE_KNI_VHOST_DEBUG_TX=n
 #
 CONFIG_RTE_LIBRTE_VHOST=n
 CONFIG_RTE_LIBRTE_VHOST_USER=y
+CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

 #
diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index a8645a6..6681f22 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -46,6 +46,10 @@ CFLAGS += -I vhost_cuse -lfuse
 LDFLAGS += -lfuse
 endif

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST_NUMA),y)
+LDFLAGS += -lnuma
+endif
+
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c
 ifeq ($(CONFIG_RTE_LIBRTE_VHOST_USER),y)
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 19b74d6..8a80f5e 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -38,6 +38,9 @@
 #include 
 #include 
 #include 
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include 
+#endif

 #include 

@@ -481,6 +484,93 @@ set_vring_num(struct vhost_device_ctx ctx, struct 
vhost_vring_state *state)
 }

 /*
+ * Reallocate virtio_det and vhost_virtqueue data structure to make them on the
+ * same numa node as the memory of vring descriptor.
+ */
+#ifdef RTE_LIBRTE_VHOST_NUMA
+static struct virtio_net*
+numa_realloc(struct virtio_net *dev, int index)
+{
+   int oldnode, newnode;
+   struct virtio_net_config_ll *old_ll_dev, *new_ll_dev;
+   struct vhost_virtqueue *old_vq, *new_vq;
+   int ret;
+   int realloc_dev = 0, realloc_vq = 0;
+
+   old_ll_dev = (struct virtio_net_config_ll *)dev;
+   old_vq = dev->virtqueue[index];
+
+   ret  = get_mempolicy(, NULL, 0, old_vq->desc,
+   MPOL_F_NODE | MPOL_F_ADDR);
+   ret = ret | get_mempolicy(, NULL, 0, old_ll_dev,
+   MPOL_F_NODE | MPOL_F_ADDR);
+   if (ret) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Unable to get vring desc or dev numa information.\n");
+   return dev;
+   }
+   if (oldnode != newnode)
+   realloc_dev = 1;
+
+   ret = get_mempolicy(, NULL, 0, old_vq,
+   MPOL_F_NODE | MPOL_F_ADDR);
+   if (ret) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Unable to get vq numa information.\n");
+   return dev;
+   }
+   if (oldnode != newnode)
+   realloc_vq = 1;
+
+   if (realloc_dev == 0 && realloc_vq == 0)
+   return dev;
+
+   if (realloc_dev)
+   new_ll_dev = rte_malloc_socket(NULL,
+   sizeof(struct virtio_net_config_ll), 0, newnode);
+   if (realloc_vq)
+   new_vq = rte_malloc_socket(NULL,
+   sizeof(struct vhost_virtqueue), 0, newnode);
+   if (!new_ll_dev || !new_vq) {
+   if (new_ll_dev)
+   rte_free(new_ll_dev);
+   if (new_vq)
+   rte_free(new_vq);
+   return dev;
+   }
+
+   if (realloc_vq)
+   memcpy(new_vq, old_vq, sizeof(*new_vq));
+   if (realloc_dev)
+   memcpy(new_ll_dev, old_ll_dev, sizeof(*new_ll_dev));
+   (new_ll_dev ? new_ll_dev : old_ll_dev)->dev.virtqueue[index] =
+   new_vq ? new_vq : old_vq;
+   if (realloc_vq)
+   rte_free(old_vq);
+   if (realloc_dev) {
+   if (ll_root == old_ll_dev)
+   ll_root = new_ll_dev;
+   else {
+   struct virtio_net_config_ll *prev = ll_root;
+   while (prev->next != old_ll_dev)
+   prev = prev->next;
+   prev->next = new_ll_dev;
+   new_ll_dev->next = old_ll_dev->next;
+   }
+   rte_free(old_ll_dev);
+   }
+
+   return _ll_dev->dev;
+}
+#else
+static struct virtio_net*
+numa_realloc(struct virtio_net *dev, int index __rte_unused)
+{
+   return dev;
+}
+#endif
+
+/*
  * Called from CUSE IOCTL: VHOST_SET_VRING_ADDR
  * The virtio device sends us the desc, used and avail ring addresses.
  * This function then converts these to our address space.
@@ -508,6 +598,9 @@ set_vring_addr(struct vhost_device_ctx ctx, struct 
vhost_vring_addr *addr)
return -1;
}

+   dev = numa_realloc(dev, addr->index);
+   vq = dev->virtqueue[addr->index];
+
vq->avail = (struct vring_avail

[dpdk-dev] [PATCH RFC 1/2] vhost: malloc -> rte_malloc for virtio_net and virt queue allocation

2015-05-28 Thread Huawei Xie

use rte_malloc/free for virtio_net and virt queue allocation/free

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/virtio-net.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 4672e67..19b74d6 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 #include "vhost-net.h"
@@ -202,9 +203,9 @@ static void
 free_device(struct virtio_net_config_ll *ll_dev)
 {
/* Free any malloc'd memory */
-   free(ll_dev->dev.virtqueue[VIRTIO_RXQ]);
-   free(ll_dev->dev.virtqueue[VIRTIO_TXQ]);
-   free(ll_dev);
+   rte_free(ll_dev->dev.virtqueue[VIRTIO_RXQ]);
+   rte_free(ll_dev->dev.virtqueue[VIRTIO_TXQ]);
+   rte_free(ll_dev);
 }

 /*
@@ -278,7 +279,7 @@ new_device(struct vhost_device_ctx ctx)
struct vhost_virtqueue *virtqueue_rx, *virtqueue_tx;

/* Setup device and virtqueues. */
-   new_ll_dev = malloc(sizeof(struct virtio_net_config_ll));
+   new_ll_dev = rte_malloc(NULL, sizeof(struct virtio_net_config_ll), 0);
if (new_ll_dev == NULL) {
RTE_LOG(ERR, VHOST_CONFIG,
"(%"PRIu64") Failed to allocate memory for dev.\n",
@@ -286,19 +287,19 @@ new_device(struct vhost_device_ctx ctx)
return -1;
}

-   virtqueue_rx = malloc(sizeof(struct vhost_virtqueue));
+   virtqueue_rx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
if (virtqueue_rx == NULL) {
-   free(new_ll_dev);
+   rte_free(new_ll_dev);
RTE_LOG(ERR, VHOST_CONFIG,
"(%"PRIu64") Failed to allocate memory for rxq.\n",
ctx.fh);
return -1;
}

-   virtqueue_tx = malloc(sizeof(struct vhost_virtqueue));
+   virtqueue_tx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
if (virtqueue_tx == NULL) {
-   free(virtqueue_rx);
-   free(new_ll_dev);
+   rte_free(virtqueue_rx);
+   rte_free(new_ll_dev);
RTE_LOG(ERR, VHOST_CONFIG,
"(%"PRIu64") Failed to allocate memory for txq.\n",
ctx.fh);
-- 
1.8.1.4

[dpdk-dev] [PATCH RFC 0/2] vhost: numa aware allocation of virtio_net device and vhost virt queue

2015-05-28 Thread Huawei Xie

The virtio_net device and vhost virt queue should be allocated on the same numa 
node as vring descriptors.
When we firstly allocate the virtio_net device and vhost virt queue, we don't 
know the numa node of vring descriptors.
When we receive the VHOST_SET_VRING_ADDR message, we get the numa node of vring 
descriptors, so we will try to reallocate virtio_net and vhost virt queue to 
the same numa node.

Huawei Xie (2):
  use rte_malloc/free for virtio_net and virt_queue memory data allocation/free
  When we get the address of vring descriptor table, will try to reallocate 
virtio_net device and virtqueue to the same numa node.

 config/common_linuxapp|   1 +
 lib/librte_vhost/Makefile |   4 ++
 lib/librte_vhost/virtio-net.c | 112 ++
 mk/rte.app.mk |   3 ++
 4 files changed, 111 insertions(+), 9 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [PATCH] ixgbe: fall back to non-vector rx

2015-05-28 Thread Ananyev, Konstantin

Hi Eric,

> -Original Message-
> From: Eric Kinzie [mailto:ekinzie at brocade.com]
> Sent: Wednesday, May 27, 2015 7:20 PM
> To: Ananyev, Konstantin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fall back to non-vector rx
> 
> On Wed May 27 10:20:39 + 2015, Ananyev, Konstantin wrote:
> > Hi Eric,
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Eric Kinzie
> > > Sent: Tuesday, May 26, 2015 4:52 PM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [PATCH] ixgbe: fall back to non-vector rx
> > >
> > > The ixgbe driver refuses to receive any packets when vector receive
> > > is enabled and fewer than the minimum number of required mbufs (32)
> > > are supplied.  This makes it incompatible with the bonding driver
> > > which, during receive, may start out with enough buffers but as it
> > > collects packets from each of the enslaved interfaces can drop below
> > > that threshold.  Instead of just giving up when insufficient buffers are
> > > supplied fall back to the original, non-vector, ixgbe receive function.
> >
> > Right now,  vector and bulk_alloc RX methods are not interchangeable.
> > Once you setup your RX queue, you can't mix them.
> > It would be good to make vector RX method to work with arbitrary number of 
> > packets,
> > but I don't think your method would work properly.
> > In meanwhile, wonder can this problem be handled on the bonding device 
> > level?
> > Something like prevent vector RX be enabled at setup stage, or something?
> > Konstantin
> 
> 
> Konstantin, thanks for reviewing this -- I'll look for some other way to
> address the problem.
> 
> Regardless of how this is dealt with, is it acceptable to make
> _recv_raw_pkts_vec() return an error when nb_pkts is too small?  Or will
> this cause problems elsewhere?

I am afraid it would.
Right now rte_eth_rx_burst() function does not provide any error information,
it just returns number of packets.
Changing that would have a massive impact on both DPDK libraries and external 
applications.
Konstantin

> 
> Eric
> 
> 
> > >
> > > Signed-off-by: Eric Kinzie 
> > > ---
> > >  drivers/net/ixgbe/ixgbe_rxtx.c |   10 +-
> > >  drivers/net/ixgbe/ixgbe_rxtx.h |4 
> > >  drivers/net/ixgbe/ixgbe_rxtx_vec.c |4 ++--
> > >  3 files changed, 11 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c 
> > > b/drivers/net/ixgbe/ixgbe_rxtx.c
> > > index 4f9ab22..fbba0ab 100644
> > > --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > > @@ -1088,9 +1088,9 @@ ixgbe_rx_fill_from_stage(struct ixgbe_rx_queue 
> > > *rxq, struct rte_mbuf **rx_pkts,
> > >   return nb_pkts;
> > >  }
> > >
> > > -static inline uint16_t
> > > -rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> > > -  uint16_t nb_pkts)
> > > +uint16_t
> > > +ixgbe_rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> > > +uint16_t nb_pkts)
> > >  {
> > >   struct ixgbe_rx_queue *rxq = (struct ixgbe_rx_queue *)rx_queue;
> > >   uint16_t nb_rx = 0;
> > > @@ -1158,14 +1158,14 @@ ixgbe_recv_pkts_bulk_alloc(void *rx_queue, struct 
> > > rte_mbuf **rx_pkts,
> > >   return 0;
> > >
> > >   if (likely(nb_pkts <= RTE_PMD_IXGBE_RX_MAX_BURST))
> > > - return rx_recv_pkts(rx_queue, rx_pkts, nb_pkts);
> > > + return ixgbe_rx_recv_pkts(rx_queue, rx_pkts, nb_pkts);
> > >
> > >   /* request is relatively large, chunk it up */
> > >   nb_rx = 0;
> > >   while (nb_pkts) {
> > >   uint16_t ret, n;
> > >   n = (uint16_t)RTE_MIN(nb_pkts, RTE_PMD_IXGBE_RX_MAX_BURST);
> > > - ret = rx_recv_pkts(rx_queue, _pkts[nb_rx], n);
> > > + ret = ixgbe_rx_recv_pkts(rx_queue, _pkts[nb_rx], n);
> > >   nb_rx = (uint16_t)(nb_rx + ret);
> > >   nb_pkts = (uint16_t)(nb_pkts - ret);
> > >   if (ret < n)
> > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h 
> > > b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > index af36438..811e514 100644
> > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > @@ -283,6 +283,10 @@ uint16_t ixgbe_recv_scattered_pkts_vec(void 
> > > *rx_queue,
> > >  int ixgbe_rx_vec_dev_conf_condition_check(struct rte_eth_dev *dev);
> > >  int ixgbe_rxq_vec_setup(struct ixgbe_rx_queue *rxq);
> > >
> > > +uint16_t ixgbe_rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> > > + uint16_t nb_pkts);
> > > +
> > > +
> > >  #ifdef RTE_IXGBE_INC_VECTOR
> > >
> > >  uint16_t ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
> > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
> > > b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > > index abd10f6..d27424c 100644
> > > --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > > +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > > @@ -181,7 +181,7 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf 
> > > **rx_pkts)
> > >   * in one loop
> > >   *
> > >   * Notice:
> > > - * -

[dpdk-dev] [PATCH] config:enlarge the default value of RTE_MAX_QUEUES_PER_PORT to 1024

2015-05-28 Thread Jijiang Liu

The default value of RTE_MAX_QUEUES_PER_PORT is 256, which is too small for 
some configurations for i40e. There will return an error when configured queue 
number is larger than 256 in 

rte_eth_dev_configure().

For example, in vHost sample, PF queue number: 64, configured vmdq pool number: 
63, each vmdq pool has 4 queues, there will be required 316 queues in a port.


Signed-off-by: Jijiang Liu 
---
 config/common_bsdapp   |2 +-
 config/common_linuxapp |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index c2374c0..0b169c8 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -137,7 +137,7 @@ CONFIG_RTE_LIBRTE_KVARGS=y
 CONFIG_RTE_LIBRTE_ETHER=y
 CONFIG_RTE_LIBRTE_ETHDEV_DEBUG=n
 CONFIG_RTE_MAX_ETHPORTS=32
-CONFIG_RTE_MAX_QUEUES_PER_PORT=256
+CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0078dc9..5deb55a 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -134,7 +134,7 @@ CONFIG_RTE_LIBRTE_KVARGS=y
 CONFIG_RTE_LIBRTE_ETHER=y
 CONFIG_RTE_LIBRTE_ETHDEV_DEBUG=n
 CONFIG_RTE_MAX_ETHPORTS=32
-CONFIG_RTE_MAX_QUEUES_PER_PORT=256
+CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
-- 
1.7.7.6

[dpdk-dev] [PATCH 1/5] ethdev: add multicast address filtering

2015-05-28 Thread Stephen Hemminger

On Thu, 28 May 2015 17:05:19 +0200
Ivan Boule  wrote:

> + if (port_id >= nb_ports) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + return -ENODEV;
> + }
> +

Use rte_eth_dev_is_valid_port() function instead.

[dpdk-dev] [PATCH 0/5] multicast address filtering

2015-05-28 Thread Stephen Hemminger

On Thu, 28 May 2015 17:05:18 +0200
Ivan Boule  wrote:

> Introduce PMD API to set the list of multicast MAC addresses filtered
> by a port.
> Implemented in the following PMDs: igb, igbvf, em, ixgbe, and ixgbevf.
> Implementation for physical PMDs i40e, i40evf, enic, and fm10k left
> to their respective maintainers.
> 
> Ivan Boule (5):
>   ethdev: add multicast address filtering
>   app/testpmd: new command to add/remove multicast MAC addresses
>   e1000: add multicast MAC address filtering
>   ixgbe: add multicast MAC address filtering
>   app/testpmd: fix reply to a multicast ICMP request
> 
>  app/test-pmd/cmdline.c   |   52 ++
>  app/test-pmd/config.c|  142 
> ++
>  app/test-pmd/icmpecho.c  |   65 +++--
>  app/test-pmd/testpmd.h   |6 ++
>  drivers/net/e1000/em_ethdev.c|   17 +
>  drivers/net/e1000/igb_ethdev.c   |   18 +
>  drivers/net/ixgbe/ixgbe_ethdev.c |   32 +
>  lib/librte_ether/rte_ethdev.c|   17 +
>  lib/librte_ether/rte_ethdev.h|   26 +++
>  9 files changed, 369 insertions(+), 6 deletions(-)
> 

Looks good, could you also add support for virtio and vmxnet3?

[dpdk-dev] DPDK: Proposal for a patch patch-test integration tree

2015-05-28 Thread Simon Kågström

I spot a can of worms to be opened here :-)

On 2015-05-27 22:48, Thomas F Herbert wrote:
> Work Flow and Process:
> 
> All patches will be taken from from public submissions to dpdk-dev.org
> scraped from dpdk patchwork. Patches will be applied to the patch-test
> tree and tested against HEAD as they are received. The feedback from the
> testing will be provided to the community. The patch-test tree will
> periodically be git pull'ed from dpdk.
> 
> Longer term goal:
> 
> Initially, the patches will be applied along with some simple smoke
> tests. The longer term goal is to automate this process, apply more
> extensive tests and post the results in dpdk patchwork,
> http://dpdk.org/dev/patchwork/project/dpdk/list/ which would have an
> accompanying mailing list for distribution of a results summary of the
> tests.

Actually, github and services such as travis-ci and coveralls already
provide this functionality (with very little setup). So when someone
sends a pull request, the continuous integration service travis-ci will
notice it, and start a build and (possibly) run a test suite on the code
- with the patches applied.

If code coverage is collected in the process [1], it's uploaded to the
coveralls site. Both travis-ci and coveralls will add a note to the pull
request saying something like "Build failed with this patch, be careful"
or "Build OK, everything is fine" and "Coverage decreased with 5% with
this patch" etc etc.

Of course, github provides an API so it's entirely possible to add your
own continuous integration support with the same functionality as
travis-ci (customized for DPDK).

So before venturing into implementing something like this, I think the
DPDK project should at least consider the existing alternatives.

And with that, I close the can of worms again. I hope no worms were hurt
in the process! :-)

// Simon

[1] https://github.com/SimonKagstrom/kcov - yes my personal project

[dpdk-dev] [PATCH] rte_reorder: Allow sequence numbers > 0 as starting point

2015-05-28 Thread Gonzalez Monroy, Sergio

Sorry for the delay :)

On 20/05/2015 12:02, Simon Kagstrom wrote:
> We use sequence numbers from a generator which has potentially started
> long before the receiver. Therefore, the first number will typically
> be > 0. The rte_reorder code will not work in this case, since the
> packet is seen as outside of the buffer.
Yep, that is a flaw in the current implementation.
> The patch instead records the first sequence number inserted as the
> starting point.
>
> Signed-off-by: Simon Kagstrom 
> Signed-off-by: Johan Faltstrom 
> ---
>   lib/librte_reorder/rte_reorder.c | 8 
>   1 file changed, 8 insertions(+)
>
> diff --git a/lib/librte_reorder/rte_reorder.c 
> b/lib/librte_reorder/rte_reorder.c
> index dc0e806..4d6449e 100644
> --- a/lib/librte_reorder/rte_reorder.c
> +++ b/lib/librte_reorder/rte_reorder.c
> @@ -73,6 +73,8 @@ struct rte_reorder_buffer {
>   unsigned int memsize; /**< memory area size of reorder buffer */
>   struct cir_buffer ready_buf; /**< temp buffer for dequeued entries */
>   struct cir_buffer order_buf; /**< buffer used to reorder entries */
> +
> + int is_initialized;
>   } __rte_cache_aligned;
>   
>   static void
> @@ -325,6 +327,12 @@ rte_reorder_insert(struct rte_reorder_buffer *b, struct 
> rte_mbuf *mbuf)
>   uint32_t offset, position;
>   struct cir_buffer *order_buf = >order_buf;
>   
> + if (!b->is_initialized) {
> + b->min_seqn = mbuf->seqn;
> +
> + b->is_initialized = 1;
> + }
> +
>   /*
>* calculate the offset from the head pointer we need to go.
>* The subtraction takes care of the sequence number wrapping.
So my first impression was, why do this in insert instead of init?
I guess the goal was trying to avoid changing the API, but would it not 
be worth it? after all is a one time thing only.

About the implementation, packets being inserted could be out of order, 
so the first packet inserted may not be the first in your sequence. Now 
what happens with that packet would be app specific so probably is not a 
big deal but what about initializing min_seqn to something like 
(mbuf->seqn - b->size/2) ? That would give enough room for packets out 
of order.

You should also update the documentation regarding rte_reorder_insert.

Thanks,
Sergio

[dpdk-dev] [PATCH / RFC] kni: Add set_rx_mode callback to handle multicast groups

2015-05-28 Thread Stephen Hemminger

On Thu, 7 May 2015 15:17:54 +0200
Simon Kagstrom  wrote:

> This is needed to add / remove interfaces in multicast groups via the
> ip tool.
> 
> The callback does nothing - the same as the kernel tun.c.
> 
> Signed-off-by: Simon Kagstrom 

Yes, the dummy callback is needed, otherwise SIOCADDMULTI ioctl will
fail.

[dpdk-dev] Packet Cloning

2015-05-28 Thread Stephen Hemminger

On Thu, 28 May 2015 17:15:42 +0530
Padam Jeet Singh  wrote:

> Hello,
> 
> Is there a function in DPDK to completely clone a pkt_mbuf including the 
> segments? 
> 
> I am trying to build a packet mirroring application which sends packet out 
> through two separate interfaces, but the packet payload needs to be altered 
> before send.
> 
> Thanks,
> Padam
> 
> 

Isn't this what you want?

/**
 * Creates a "clone" of the given packet mbuf.
 *
 * Walks through all segments of the given packet mbuf, and for each of them:
 *  - Creates a new packet mbuf from the given pool.
 *  - Attaches newly created mbuf to the segment.
 * Then updates pkt_len and nb_segs of the "clone" packet mbuf to match values
 * from the original packet mbuf.
 *
 * @param md
 *   The packet mbuf to be cloned.
 * @param mp
 *   The mempool from which the "clone" mbufs are allocated.
 * @return
 *   - The pointer to the new "clone" mbuf on success.
 *   - NULL if allocation fails.
 */
static inline struct rte_mbuf *rte_pktmbuf_clone(struct rte_mbuf *md,
struct rte_mempool *mp)

[dpdk-dev] [PATCH] vhost: tcp pkt with virtio header in one desc

2015-05-28 Thread Stephen Hemminger

On Thu, 28 May 2015 16:19:44 +0800
Wei li  wrote:

> + if (desc->flags & VRING_DESC_F_NEXT)
> + {
> + /* Discard first buffer as it is the virtio header */
> + desc = >desc[desc->next];
> + vb_offset = 0;
> + vb_avail = desc->len;
> + }
> + else /* virtio header in one desc with real pkt */
> + {
> + /* strip the virtio header */
> + vb_offset = vq->vhost_hlen;
> + vb_avail = desc->len - vq->vhost_hlen;
> +
This code looks correct, but please follow the same style as
other code in the driver. The virtio driver uses Linux/BSD
style:
if () {
} else {
}

[dpdk-dev] [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-05-28 Thread Zhang, Helin

Hi guys

Could you help to review the code changes where you modified before?

Regards,
Helin

> -Original Message-
> From: Simon Kagstrom [mailto:simon.kagstrom at netinsight.net]
> Sent: Wednesday, May 27, 2015 7:45 PM
> To: dev at dpdk.org; Zhang, Helin
> Subject: [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version
> 
> /proc/version_signature is the version for the host machine, but in e.g., 
> chroots,
> this does not need to match that DPDK is built for. Use utsrelease.h from the
> kernel sources instead and fake the upload version.
> 
> Signed-off-by: Simon Kagstrom 
> Signed-off-by: Johan Faltstrom 
> ---
>  lib/librte_eal/linuxapp/kni/Makefile | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/kni/Makefile
> b/lib/librte_eal/linuxapp/kni/Makefile
> index fb673d9..ac99d3f 100644
> --- a/lib/librte_eal/linuxapp/kni/Makefile
> +++ b/lib/librte_eal/linuxapp/kni/Makefile
> @@ -44,10 +44,10 @@ MODULE_CFLAGS += -I$(RTE_OUTPUT)/include
> -I$(SRCDIR)/ethtool/ixgbe -I$(SRCDIR)/e  MODULE_CFLAGS += -include
> $(RTE_OUTPUT)/include/rte_config.h
>  MODULE_CFLAGS += -Wall -Werror
> 
> -ifeq ($(shell test -f /proc/version_signature && lsb_release -si
> 2>/dev/null),Ubuntu)
> +ifeq ($(shell lsb_release -si 2>/dev/null),Ubuntu)
>  MODULE_CFLAGS += -DUBUNTU_RELEASE_CODE=$(shell lsb_release -sr | tr -d .)
> -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2 /proc/version_signature | \
> -cut -d'~' -f1 | cut -d- -f1,2 | tr .- $(comma))
> +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE
> $(RTE_KERNELDIR)/include/generated/utsrelease.h \
> +  | cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1)
>  MODULE_CFLAGS +=
> -D"UBUNTU_KERNEL_CODE=UBUNTU_KERNEL_VERSION($(UBUNTU_KERNEL_C
> ODE))"
>  endif
> 
> --
> 1.9.1

[dpdk-dev] DPDK: Proposal for a patch patch-test integration tree

2015-05-28 Thread Vincent JARDIN

On 27/05/2015 22:48, Thomas F Herbert wrote:
> Work Flow and Process:
>
> All patches will be taken from from public submissions to dpdk-dev.org
> scraped from dpdk patchwork. Patches will be applied to the patch-test
> tree and tested against HEAD as they are received. The feedback from the
> testing will be provided to the community. The patch-test tree will
> periodically be git pull'ed from dpdk.
>
> Longer term goal:
>
> Initially, the patches will be applied along with some simple smoke
> tests. The longer term goal is to automate this process, apply more
> extensive tests and post the results in dpdk patchwork,
> http://dpdk.org/dev/patchwork/project/dpdk/list/ which would have an
> accompanying mailing list for distribution of a results summary of the
> tests.

thanks for helping.

It could be broken into two parts:
   - patch-test-net-next,
http://dpdk.org/browse/dpdk/tree/MAINTAINERS#n190
http://dpdk.org/browse/dpdk/tree/drivers/net

   - patch-test-other-next,
all, excepted drivers

Best regards,
   Vincent

[dpdk-dev] Question about worker assignment in load balancer implementaion in DPDK library example.

2015-05-28 Thread 최익성

Dear DPDK experts.

I have a question about load balancer implementaion in DPDK library example.
(dpdk-2.0.0/examples/load_balancer)

I read from load balancer application user guide that The worker lcore to 
handle the current packet is determined by reading a predefined 1 byte field 
from the input packet.

worker_id = packet[load_balancing_field(pos_lb)] % n_workers .

However I found its implementaion of the above function in the source code.

worker_0 = data_0_0[pos_lb]  (n_workers - 1);
worker_1 = data_0_1[pos_lb]  (n_workers - 1);

Does this implementation works?

If the n_workers = 3, the worker_0 and worker_1 have only values of 0 or 2.

I will appreciate if I can be given any advice or information.

Thank you very much.

Sincerely Yours,

Ick-Sung Choi.

57 matches

Mail list logo