[dpdk-dev] [PATCH RFC 00/11] ixgbe/mbuf: add TSO support

2014-05-09 Thread Olivier MATZ
Hi Stephen,

On 05/09/2014 07:04 PM, Stephen Hemminger wrote:
> I would also like to propose changing the checksum offload flags.
> Many devices can indicate good checksum in some cases but can't test
> for many other types of packets. By changing the flags to be:
>  PKT_RX_L4_CKSUM_GOOD and PKT_RX_IP_CKSUM_GOOD
> 
> It is then possible to support devices where some cases (IPv4 + TCP)
> are supported but others are not.

I agree. That's also what I'm talking about in the commit log of
the patch 08/11.

If there is not much rework for all the patches, I think it's feasible
to include this kind of modification in the v2 of this series.

Regards,
Olivier



[dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield

2014-05-09 Thread Olivier MATZ
Hi Jeff,

Thank you for your comment.

On 05/09/2014 05:39 PM, Shaw, Jeffrey B wrote:
> have you tested this patch to see if there is a negative impact to
> performance?

Yes, but not with testpmd. I passed our internal non-regression
performance tests and it shows no difference (or below the error
margin), even with low overhead processing like forwarding whatever
the number of cores I use.

> Wouldn't the processor have to mask the high bytes of the physical
> address when it is used, for example, to populate descriptors with
> buffer addresses?  When compute bound, this could steal CPU cycles
> away from packet processing.  I think we should understand the
> performance trade-off in order to save these 2 bytes.

I would naively say that the cost is negligible: accessing to the
length is the same as before (it's a 16 bits field) and accessing
the physical address is just a mask or a shift, which should not
be very long on an Intel processor (1 cycle?). This is to be
compared with the number of cycles per packet in io-fwd mode,
which is probably around 150 or 200.

> It would be interesting to see how throughput is impacted when the
> workload is core-bound.  This could be accomplished by running testpmd
> in io-fwd mode across 4x 10G ports.

I agree, this is something we could check. If you agree, let's first
wait for some other comments and see if we find a consensus on the
patches.

Regards,
Olivier


[dpdk-dev] [PATCH RFC 00/11] ixgbe/mbuf: add TSO support

2014-05-09 Thread Stephen Hemminger
On Fri, 09 May 2014 23:49:45 +0200
Olivier MATZ  wrote:

> Hi Stephen,
> 
> On 05/09/2014 07:04 PM, Stephen Hemminger wrote:
> > I would also like to propose changing the checksum offload flags.
> > Many devices can indicate good checksum in some cases but can't test
> > for many other types of packets. By changing the flags to be:
> >  PKT_RX_L4_CKSUM_GOOD and PKT_RX_IP_CKSUM_GOOD
> > 
> > It is then possible to support devices where some cases (IPv4 + TCP)
> > are supported but others are not.
> 
> I agree. That's also what I'm talking about in the commit log of
> the patch 08/11.
> 
> If there is not much rework for all the patches, I think it's feasible
> to include this kind of modification in the v2 of this series.
> 
> Regards,
> Olivier
> 

There are three checksum states:
1. Known good
2. Known bad
3. Can't tell

Current choice of flags makes handling #3 impossible. If you change it to 
CKSUM_GOOD
then 1 => GOOD, 2 => not GOOD, 3 => not GOOD. And for case #3 the software can
validate it.  For most cases IP checksum offload is meaning less anyway because
the IP header fits in a single cache line, and the cost to checksum is minimal.


[dpdk-dev] [PATCH RFC 11/11] ixgbe/mbuf: add TSO support

2014-05-09 Thread Olivier Matz
Implement TSO (TCP segmentation offload) in ixgbe driver. To delegate
the TCP segmentation to the hardware, the user has to:

- set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
  PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM)
- fill the mbuf->hw_offload information: l2_len, l3_len, l4_len, mss
- calculate the pseudo header checksum and set it in the TCP header,
  as required when doing hardware TCP checksum offload
- set the IP checksum to 0

This approach seems generic enough to be used for other hw/drivers
in the future.

In the patch, the tx_desc_cksum_flags_to_olinfo() and
tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them
clearer. This does not impact performance as gcc (version 4.8 in my
case) is smart enough to convert the tests into a code that does not
contain any branch instruction.

validation
==

platform:

  Tester (linux)   <>   DUT (DPDK)

Run testpmd on DUT:

  cd dpdk.org/
  make install T=x86_64-default-linuxapp-gcc
  cd x86_64-default-linuxapp-gcc/
  modprobe uio
  insmod kmod/igb_uio.ko
  python ../tools/igb_uio_bind.py -b igb_uio :02:00.0
  echo 0 > /proc/sys/kernel/randomize_va_space
  echo 1000 > 
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
  echo 1000 > 
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
  mount -t hugetlbfs none /mnt/huge
  ./app/testpmd -c 0x55 -n 4 -m 800 -- -i --port-topology=chained

Disable all offload feature on Tester, and start capture:

  ethtool -K ixgbe0 rx off tx off tso off gso off gro off lro off
  ip l set ixgbe0 up
  tcpdump -n -e -i ixgbe0 -s 0 -w /tmp/cap

We use the following scapy script for testing:

  def test():
### IPv4
# checksum TCP
p=Ether()/IP(src=RandIP(), dst=RandIP())/TCP(flags=0x10)/Raw(RandString(50))
sendp(p, iface="ixgbe0", count=5)
# checksum UDP
p=Ether()/IP(src=RandIP(), dst=RandIP())/UDP()/Raw(RandString(50))
sendp(p, iface="ixgbe0", count=5)
# bad IP checksum
p=Ether()/IP(src=RandIP(), dst=RandIP(), 
chksum=0x1234)/TCP(flags=0x10)/Raw(RandString(50))
sendp(p, iface="ixgbe0", count=5)
# bad TCP checksum
p=Ether()/IP(src=RandIP(), dst=RandIP())/TCP(flags=0x10, 
chksum=0x1234)/Raw(RandString(50))
sendp(p, iface="ixgbe0", count=5)
# large packet
p=Ether()/IP(src=RandIP(), 
dst=RandIP())/TCP(flags=0x10)/Raw(RandString(1400))
sendp(p, iface="ixgbe0", count=5)
### IPv6v6
# checksum TCP
p=Ether()/IPv6(src=RandIP6(), 
dst=RandIP6())/TCP(flags=0x10)/Raw(RandString(50))
sendp(p, iface="ixgbe0", count=5)
# checksum UDP
p=Ether()/IPv6(src=RandIP6(), dst=RandIP6())/UDP()/Raw(RandString(50))
sendp(p, iface="ixgbe0", count=5)
# bad TCP checksum
p=Ether()/IPv6(src=RandIP6(), dst=RandIP6())/TCP(flags=0x10, 
chksum=0x1234)/Raw(RandString(50))
sendp(p, iface="ixgbe0", count=5)
# large packet
p=Ether()/IPv6(src=RandIP6(), 
dst=RandIP6())/TCP(flags=0x10)/Raw(RandString(1400))
sendp(p, iface="ixgbe0", count=5)

Without hw cksum


On DUT:

  # disable hw cksum (use sw) in csumonly test, disable tso
  stop
  set fwd csum
  tx_checksum set 0x0 0
  tso set 0 0
  start

On tester:

  >>> test()

Then check the capture file.

With hw cksum
-

On DUT:

  # enable hw cksum in csumonly test, disable tso
  stop
  set fwd csum
  tx_checksum set 0xf 0
  tso set 0 0
  start

On tester:

  >>> test()

Then check the capture file.

With TSO


On DUT:

  set fwd csum
  tx_checksum set 0xf 0
  tso set 800 0
  start

On tester:

  >>> test()

Then check the capture file.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/cmdline.c|  45 +++
 app/test-pmd/config.c |   8 ++
 app/test-pmd/csumonly.c   |  16 
 app/test-pmd/testpmd.h|   2 +
 lib/librte_mbuf/rte_mbuf.h|   7 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 165 --
 6 files changed, 200 insertions(+), 43 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index a95b279..c628773 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2305,6 +2305,50 @@ cmdline_parse_inst_t cmd_tx_cksum_set = {
},
 };

+/* *** ENABLE HARDWARE SEGMENTATION IN TX PACKETS *** */
+struct cmd_tso_set_result {
+   cmdline_fixed_string_t tso;
+   cmdline_fixed_string_t set;
+   uint16_t mss;
+   uint8_t port_id;
+};
+
+static void
+cmd_tso_set_parsed(void *parsed_result,
+  __attribute__((unused)) struct cmdline *cl,
+  __attribute__((unused)) void *data)
+{
+   struct cmd_tso_set_result *res = parsed_result;
+   tso_set(res->port_id, res->mss);
+}
+
+cmdline_parse_token_string_t cmd_tso_set_tso =
+   TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result,
+   tso, "tso");
+cmdline_parse_token_string_t cmd_tso_set_set =
+   TOKEN_STRING_INITIALIZER(

[dpdk-dev] [PATCH RFC 10/11] testpmd: modify source address to validate checksum calculation

2014-05-09 Thread Olivier Matz
Always modify the source address of the packet in order to validate
the calculation of the checksums (L3 or L4). This was already done
for IPv4 software checksum, add it for IPv4 hw checksum and IPv6.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/csumonly.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 9caad8f..e93d75f 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -310,6 +310,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)

if (tx_ol_flags & PKT_TX_IP_CKSUM) {
/* HW checksum */
+   ipv4_hdr->src_addr--;
ol_flags |= PKT_TX_IP_CKSUM;
}
else {
@@ -373,6 +374,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
unsigned char *) + l2_len);
l3_len = sizeof(struct ipv6_hdr) ;
l4_proto = ipv6_hdr->proto;
+   ipv6_hdr->src_addr[3]--;

if (l4_proto == IPPROTO_UDP) {
udp_hdr = (struct udp_hdr*) 
(rte_pktmbuf_mtod(mb,
-- 
1.9.2



[dpdk-dev] [PATCH RFC 09/11] mbuf: rename vlan_macip_len in hw_offload and increase its size

2014-05-09 Thread Olivier Matz
To implement the TCP segmentation offload, we will need to add
some more meta information in the mbuf, like the length of the
L4 header, the MSS, ...

To prepare this modification, this patch renames vlan_macip_len in
hw_offload and change its length from 32 bits to 64 bits.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/csumonly.c   |  4 +--
 app/test-pmd/macfwd.c |  6 ++--
 app/test-pmd/rxonly.c |  2 +-
 app/test-pmd/testpmd.c|  2 +-
 app/test-pmd/txonly.c |  6 ++--
 examples/ip_reassembly/ipv4_rsmbl.h   | 10 +++
 examples/ip_reassembly/main.c |  4 +--
 lib/librte_mbuf/rte_mbuf.h| 34 ++---
 lib/librte_pmd_e1000/em_rxtx.c| 50 +--
 lib/librte_pmd_e1000/igb_rxtx.c   | 56 ---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 54 +++--
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h |  3 +-
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c |  4 +--
 13 files changed, 126 insertions(+), 109 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 69b90a7..9caad8f 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -430,8 +430,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
}

/* Combine the packet header write. VLAN is not consider here */
-   mb->vlan_macip.f.l2_len = l2_len;
-   mb->vlan_macip.f.l3_len = l3_len;
+   mb->hw_offload.l2_len = l2_len;
+   mb->hw_offload.l3_len = l3_len;
mb->ol_flags = ol_flags;
}
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index ab74d0c..d137f92 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -116,9 +116,9 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
ether_addr_copy(&ports[fs->tx_port].eth_addr,
ð_hdr->s_addr);
mb->ol_flags = txp->tx_ol_flags;
-   mb->vlan_macip.f.l2_len = sizeof(struct ether_hdr);
-   mb->vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
-   mb->vlan_macip.f.vlan_tci = txp->tx_vlan_id;
+   mb->hw_offload.l2_len = sizeof(struct ether_hdr);
+   mb->hw_offload.l3_len = sizeof(struct ipv4_hdr);
+   mb->hw_offload.vlan_tci = txp->tx_vlan_id;
}
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
fs->tx_packets += nb_tx;
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index 0bf4440..6283482 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -149,7 +149,7 @@ pkt_burst_receive(struct fwd_stream *fs)
   mb->hash.fdir.hash, mb->hash.fdir.id);
if (ol_flags & PKT_RX_VLAN_PKT)
printf(" - VLAN tci=0x%x",
-   mb->vlan_macip.f.vlan_tci);
+   mb->hw_offload.vlan_tci);
printf("\n");
if (ol_flags != 0) {
uint32_t rxf;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 572c3aa..3085be5 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -397,7 +397,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
mb->ol_flags = 0;
mb->data_off = RTE_PKTMBUF_HEADROOM;
mb->nb_segs  = 1;
-   mb->vlan_macip.data = 0;
+   mb->hw_offload.u64 = 0;
mb->hash.rss = 0;
 }

diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 5d93209..97e381a 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -264,9 +264,9 @@ pkt_burst_transmit(struct fwd_stream *fs)
pkt->nb_segs = tx_pkt_nb_segs;
pkt->pkt_len = tx_pkt_length;
pkt->ol_flags = ol_flags;
-   pkt->vlan_macip.f.vlan_tci  = vlan_tci;
-   pkt->vlan_macip.f.l2_len = sizeof(struct ether_hdr);
-   pkt->vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
+   pkt->hw_offload.vlan_tci  = vlan_tci;
+   pkt->hw_offload.l2_len = sizeof(struct ether_hdr);
+   pkt->hw_offload.l3_len = sizeof(struct ipv4_hdr);
pkts_burst[nb_pkt] = pkt;
}
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
diff --git a/examples/ip_reassembly/ipv4_rsmbl.h 
b/examples/ip_reassembly/ipv4_rsmbl.h
index 9b647fb..c653993 100644
--- a/examples/ip_reassembly/ipv4_rsmbl.h
+++ b/examples/ip_reassembly/ipv4_rsmbl.h
@@ -168,8 +168,8 @@ ipv4_frag_chain(struct rte_mbuf *mn, struct rte_mbuf *mp)
struct rte_mbuf *ms;

/* adjust start of the last fragment data. */
-   rte_pktmbuf_adj(mp, (uint16_t)(mp->vlan_macip.f.l2_len +
-   mp->vlan_macip.f.l3_len));
+   rte_pktmbuf_adj(mp

[dpdk-dev] [PATCH RFC 08/11] mbuf: change ol_flags to 32 bits

2014-05-09 Thread Olivier Matz
There is no room to add other offload flags in the current 16 bits
fields.  Since we have more room in the mbuf structure, we can change
the ol_flags to 32 bits.

A next commit will add the support of TSO (TCP Segmentation Offload)
which require a new ol_flags, justifying this commit.

Thanks to this modification, another possible improvement (which is not
part of this series) could be to change the checksum flags from:
  PKT_RX_L4_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD
to:
  PKT_RX_L4_CKSUM, PKT_RX_IP_CKSUM, PKT_RX_L4_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD
in order to detect if the checksum has been processed by hw or not.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/cmdline.c | 13 +++-
 app/test-pmd/config.c  | 10 +--
 app/test-pmd/csumonly.c| 26 
 app/test-pmd/rxonly.c  |  4 +-
 app/test-pmd/testpmd.h | 11 +---
 app/test-pmd/txonly.c  |  2 +-
 .../bsdapp/eal/include/exec-env/rte_kni_common.h   |  2 +-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |  2 +-
 lib/librte_mbuf/rte_mbuf.c |  2 +-
 lib/librte_mbuf/rte_mbuf.h | 52 +++
 lib/librte_pmd_e1000/em_rxtx.c | 35 +-
 lib/librte_pmd_e1000/igb_rxtx.c| 71 ++--
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c  | 77 +++---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h  |  2 +-
 14 files changed, 157 insertions(+), 152 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index c507c46..a95b279 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2264,8 +2264,17 @@ cmd_tx_cksum_set_parsed(void *parsed_result,
   __attribute__((unused)) void *data)
 {
struct cmd_tx_cksum_set_result *res = parsed_result;
-
-   tx_cksum_set(res->port_id, res->cksum_mask);
+   uint32_t ol_flags = 0;
+
+   if (res->cksum_mask & 0x1)
+   ol_flags |= PKT_TX_IP_CKSUM;
+   if (res->cksum_mask & 0x2)
+   ol_flags |= PKT_TX_TCP_CKSUM;
+   if (res->cksum_mask & 0x4)
+   ol_flags |= PKT_TX_UDP_CKSUM;
+   if (res->cksum_mask & 0x8)
+   ol_flags |= PKT_TX_SCTP_CKSUM;
+   tx_cksum_set(res->port_id, ol_flags);
 }

 cmdline_parse_token_string_t cmd_tx_cksum_set_tx_cksum =
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 1feb133..cd82f60 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1442,14 +1442,16 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t 
queue_id, uint8_t map_value)
 }

 void
-tx_cksum_set(portid_t port_id, uint8_t cksum_mask)
+tx_cksum_set(portid_t port_id, uint32_t ol_flags)
 {
-   uint16_t tx_ol_flags;
+   uint32_t cksum_mask = PKT_TX_IP_CKSUM | PKT_TX_L4_MASK;
+
if (port_id_is_invalid(port_id))
return;
+
/* Clear last 4 bits and then set L3/4 checksum mask again */
-   tx_ol_flags = (uint16_t) (ports[port_id].tx_ol_flags & 0xFFF0);
-   ports[port_id].tx_ol_flags = (uint16_t) ((cksum_mask & 0xf) | 
tx_ol_flags);
+   ports[port_id].tx_ol_flags &= ~cksum_mask;
+   ports[port_id].tx_ol_flags |= (ol_flags & cksum_mask);
 }

 void
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 3313b87..69b90a7 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -217,9 +217,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
uint16_t nb_rx;
uint16_t nb_tx;
uint16_t i;
-   uint16_t ol_flags;
-   uint16_t pkt_ol_flags;
-   uint16_t tx_ol_flags;
+   uint32_t ol_flags;
+   uint32_t pkt_ol_flags;
+   uint32_t tx_ol_flags;
uint16_t l4_proto;
uint16_t eth_type;
uint8_t  l2_len;
@@ -261,7 +261,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
mb = pkts_burst[i];
l2_len  = sizeof(struct ether_hdr);
pkt_ol_flags = mb->ol_flags;
-   ol_flags = (uint16_t) (pkt_ol_flags & (~PKT_TX_L4_MASK));
+   ol_flags = (pkt_ol_flags & (~PKT_TX_L4_MASK));

eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
@@ -274,8 +274,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
}

/* Update the L3/L4 checksum error packet count  */
-   rx_bad_ip_csum += (uint16_t) ((pkt_ol_flags & 
PKT_RX_IP_CKSUM_BAD) != 0);
-   rx_bad_l4_csum += (uint16_t) ((pkt_ol_flags & 
PKT_RX_L4_CKSUM_BAD) != 0);
+   rx_bad_ip_csum += ((pkt_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
+   rx_bad_l4_csum += ((pkt_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);

/*
 * Try to figure out L3 packet type by SW.
@@ -308,7 +308,7 @@ pkt_burst_checksum_forward

[dpdk-dev] [PATCH RFC 07/11] mbuf: add functions to get the name of an ol_flag

2014-05-09 Thread Olivier Matz
In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
The issue is that the list of flags in the application has to be
synchronized with the flags defined in rte_mbuf.h.

This patch introduces 2 new functions rte_get_rx_ol_flag_name()
and rte_get_tx_ol_flag_name() that returns the name of a flag from
its mask. It also fixes rxonly.c to use this new functions and to
display the proper flags.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/rxonly.c  | 33 ++---
 lib/librte_mbuf/rte_mbuf.h | 46 --
 2 files changed, 54 insertions(+), 25 deletions(-)

diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index 5751b0b..94f71c7 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -69,23 +69,6 @@

 #include "testpmd.h"

-#define MAX_PKT_RX_FLAGS 11
-static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
-   "VLAN_PKT",
-   "RSS_HASH",
-   "PKT_RX_FDIR",
-   "IP_CKSUM",
-   "IP_CKSUM_BAD",
-
-   "IPV4_HDR",
-   "IPV4_HDR_EXT",
-   "IPV6_HDR",
-   "IPV6_HDR_EXT",
-
-   "IEEE1588_PTP",
-   "IEEE1588_TMST",
-};
-
 static inline void
 print_ether_addr(const char *what, struct ether_addr *eth_addr)
 {
@@ -169,12 +152,16 @@ pkt_burst_receive(struct fwd_stream *fs)
mb->vlan_macip.f.vlan_tci);
printf("\n");
if (ol_flags != 0) {
-   int rxf;
-
-   for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
-   if (ol_flags & (1 << rxf))
-   printf("  PKT_RX_%s\n",
-  pkt_rx_flag_names[rxf]);
+   uint16_t rxf;
+   const char *name;
+
+   for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
+   if ((ol_flags & (1 << rxf)) == 0)
+   continue;
+   name = rte_get_rx_ol_flag_name(1 << rxf);
+   if (name == NULL)
+   continue;
+   printf("  %s\n", name);
}
}
rte_pktmbuf_free(mb);
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 8fa781b..55a993a 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -99,9 +99,51 @@ extern "C" {
 #define PKT_TX_IEEE1588_TMST 0x8000 /**< TX IEEE1588 packet to timestamp. */

 /**
- * Bit Mask to indicate what bits required for building TX context
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag (only one bit must be set)
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
  */
-#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | 
PKT_TX_L4_MASK)
+static inline const char *rte_get_rx_ol_flag_name(uint16_t mask)
+{
+   switch (mask) {
+   case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
+   case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
+   case PKT_RX_FDIR: return "PKT_RX_FDIR";
+   case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
+   case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
+   case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
+   case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
+   case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
+   case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
+   case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
+   case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
+   default: return NULL;
+   }
+}
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag (only one bit must be set)
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+static inline const char *rte_get_tx_ol_flag_name(uint16_t mask)
+{
+   switch (mask) {
+   case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
+   case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
+   case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
+   case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
+   case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
+   case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
+   default: return NULL;
+   }
+}

 /** Offload features */
 union rte_vlan_macip {
-- 
1.9.2



[dpdk-dev] [PATCH RFC 06/11] mbuf: replace data pointer by an offset

2014-05-09 Thread Olivier Matz
The mbuf structure already contains a pointer to the beginning of the
buffer (m->buf_addr). It is not needed to use 8 bytes again to store
another pointer to the beginning of the data.

Using a 16 bits unsigned integer is enough as we know that a mbuf is
never longer than 64KB. We gain 6 bytes in the structure thanks to
this modification.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/csumonly.c   |  2 +-
 app/test-pmd/macfwd-retry.c   |  2 +-
 app/test-pmd/macfwd.c |  2 +-
 app/test-pmd/rxonly.c |  2 +-
 app/test-pmd/testpmd.c|  2 +-
 app/test-pmd/txonly.c |  7 ++--
 app/test/test_mbuf.c  |  6 ++--
 examples/exception_path/main.c|  3 +-
 examples/vhost/main.c | 21 +++-
 examples/vhost_xen/main.c |  2 +-
 lib/librte_mbuf/rte_mbuf.c|  7 ++--
 lib/librte_mbuf/rte_mbuf.h| 62 ---
 lib/librte_pmd_e1000/em_rxtx.c| 12 +++
 lib/librte_pmd_e1000/igb_rxtx.c   | 13 
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 13 
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h |  3 +-
 lib/librte_pmd_virtio/virtio_rxtx.c   |  2 +-
 lib/librte_pmd_virtio/virtqueue.h |  5 ++-
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c |  5 ++-
 19 files changed, 85 insertions(+), 86 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index ee82eb6..3313b87 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -263,7 +263,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
pkt_ol_flags = mb->ol_flags;
ol_flags = (uint16_t) (pkt_ol_flags & (~PKT_TX_L4_MASK));

-   eth_hdr = (struct ether_hdr *) mb->data;
+   eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
if (eth_type == ETHER_TYPE_VLAN) {
/* Only allow single VLAN label here */
diff --git a/app/test-pmd/macfwd-retry.c b/app/test-pmd/macfwd-retry.c
index 687ff8d..7749c9e 100644
--- a/app/test-pmd/macfwd-retry.c
+++ b/app/test-pmd/macfwd-retry.c
@@ -119,7 +119,7 @@ pkt_burst_mac_retry_forward(struct fwd_stream *fs)
fs->rx_packets += nb_rx;
for (i = 0; i < nb_rx; i++) {
mb = pkts_burst[i];
-   eth_hdr = (struct ether_hdr *) mb->data;
+   eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
ether_addr_copy(&peer_eth_addrs[fs->peer_addr],
ð_hdr->d_addr);
ether_addr_copy(&ports[fs->tx_port].eth_addr,
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index 8d7612c..ab74d0c 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -110,7 +110,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
txp = &ports[fs->tx_port];
for (i = 0; i < nb_rx; i++) {
mb = pkts_burst[i];
-   eth_hdr = (struct ether_hdr *) mb->data;
+   eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
ether_addr_copy(&peer_eth_addrs[fs->peer_addr],
ð_hdr->d_addr);
ether_addr_copy(&ports[fs->tx_port].eth_addr,
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index b77c8ce..5751b0b 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -149,7 +149,7 @@ pkt_burst_receive(struct fwd_stream *fs)
rte_pktmbuf_free(mb);
continue;
}
-   eth_hdr = (struct ether_hdr *) mb->data;
+   eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
ol_flags = mb->ol_flags;
print_ether_addr("  src=", ð_hdr->s_addr);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 1964020..572c3aa 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -395,7 +395,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
mb_ctor_arg->seg_buf_offset);
mb->buf_len  = mb_ctor_arg->seg_buf_size;
mb->ol_flags = 0;
-   mb->data = (char *) mb->buf_addr + RTE_PKTMBUF_HEADROOM;
+   mb->data_off = RTE_PKTMBUF_HEADROOM;
mb->nb_segs  = 1;
mb->vlan_macip.data = 0;
mb->hash.rss = 0;
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 3baa0c8..c28f3dd 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -111,13 +111,13 @@ copy_buf_to_pkt_segs(void* buf, unsigned len, struct 
rte_mbuf *pkt,
seg = seg->next;
}
copy_len = seg->data_len - offset;
-   seg_buf = ((char *) seg->data + offset);
+   seg_buf = (rte_pktmbuf_mtod(seg, char *) + offset);
while (len > copy_len) {
rte_memcpy(seg_buf, buf, (size_t) copy_len);

[dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield

2014-05-09 Thread Olivier Matz
The physical address is never greater than (1 << 48) = 256 TB.
We can win 2 bytes in the mbuf structure by merging the physical
address and the buffer length in the same bitfield.

Signed-off-by: Olivier Matz 
---
 lib/librte_mbuf/rte_mbuf.c | 3 ++-
 lib/librte_mbuf/rte_mbuf.h | 7 ---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index c229525..9879095 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -104,7 +104,8 @@ rte_pktmbuf_init(struct rte_mempool *mp,
m->buf_len = (uint16_t)buf_len;

/* keep some headroom between start of buffer and data */
-   m->data = (char*) m->buf_addr + RTE_MIN(RTE_PKTMBUF_HEADROOM, 
m->buf_len);
+   m->data = (char*) m->buf_addr + RTE_MIN(RTE_PKTMBUF_HEADROOM,
+   (uint16_t)m->buf_len);

/* init some constant fields */
m->pool = mp;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 803b223..275f6b2 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -130,8 +130,8 @@ union rte_vlan_macip {
 struct rte_mbuf {
struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
void *buf_addr;   /**< Virtual address of segment buffer. */
-   phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */
-   uint16_t buf_len; /**< Length of segment buffer. */
+   uint64_t buf_physaddr:48; /**< Physical address of segment buffer. */
+   uint64_t buf_len:16;  /**< Length of segment buffer. */
 #ifdef RTE_MBUF_REFCNT
/**
 * 16-bit Reference counter.
@@ -148,8 +148,9 @@ struct rte_mbuf {
 #else
uint16_t refcnt_reserved; /**< Do not use this field */
 #endif
-   uint16_t reserved; /**< Unused field. Required for padding. 
*/
+
uint16_t ol_flags;/**< Offload features. */
+   uint32_t reserved; /**< Unused field. Required for padding. 
*/

/* valid for any segment */
struct rte_mbuf *next;  /**< Next segment of scattered packet. */
-- 
1.9.2



[dpdk-dev] [PATCH RFC 04/11] mbuf: remove the rte_pktmbuf structure

2014-05-09 Thread Olivier Matz
The rte_pktmbuf structure was initially included in the rte_mbuf
structure. This was needed when there was 2 types of mbuf (ctrl and
packet). As the control mbuf has been removed, we can merge the
rte_pktmbuf into the rte_mbuf structure.

Advantages of doing this:
  - the access to mbuf fields is easier (ex: m->data instead of m->pkt.data)
  - make the structure more consistent: for instance, there was no reason
to have the ol_flags field in rte_mbuf
  - it will allow a deeper reorganization of the rte_mbuf structure in the
next commits, allowing to gain several bytes in it

Signed-off-by: Olivier Matz 
---
 app/test-pmd/cmdline.c |   1 -
 app/test-pmd/csumonly.c|   6 +-
 app/test-pmd/ieee1588fwd.c |   6 +-
 app/test-pmd/macfwd-retry.c|   2 +-
 app/test-pmd/macfwd.c  |   8 +-
 app/test-pmd/rxonly.c  |  12 +-
 app/test-pmd/testpmd.c |   8 +-
 app/test-pmd/testpmd.h |   2 +-
 app/test-pmd/txonly.c  |  42 +++
 app/test/commands.c|   1 -
 app/test/test_mbuf.c   |  12 +-
 app/test/test_sched.c  |   4 +-
 examples/dpdk_qat/crypto.c |  22 ++--
 examples/dpdk_qat/main.c   |   2 +-
 examples/exception_path/main.c |  10 +-
 examples/ip_reassembly/ipv4_rsmbl.h|  20 +--
 examples/ip_reassembly/main.c  |   6 +-
 examples/ipv4_frag/main.c  |   4 +-
 examples/ipv4_frag/rte_ipv4_frag.h |  42 +++
 examples/ipv4_multicast/main.c |  14 +--
 examples/l3fwd-power/main.c|   2 +-
 examples/l3fwd-vf/main.c   |   2 +-
 examples/l3fwd/main.c  |  10 +-
 examples/load_balancer/runtime.c   |   2 +-
 .../client_server_mp/mp_client/client.c|   2 +-
 examples/quota_watermark/qw/main.c |   4 +-
 examples/vhost/main.c  |  22 ++--
 examples/vhost_xen/main.c  |  22 ++--
 lib/librte_mbuf/rte_mbuf.c |  26 ++--
 lib/librte_mbuf/rte_mbuf.h | 140 ++---
 lib/librte_pmd_e1000/em_rxtx.c |  64 +-
 lib/librte_pmd_e1000/igb_rxtx.c|  68 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c  | 100 +++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h  |   2 +-
 lib/librte_pmd_pcap/rte_eth_pcap.c |  14 +--
 lib/librte_pmd_virtio/virtio_rxtx.c|  16 +--
 lib/librte_pmd_virtio/virtqueue.h  |   6 +-
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c  |  26 ++--
 lib/librte_pmd_xenvirt/rte_eth_xenvirt.c   |  12 +-
 lib/librte_pmd_xenvirt/virtqueue.h |   4 +-
 lib/librte_sched/rte_sched.c   |  14 +--
 lib/librte_sched/rte_sched.h   |  10 +-
 42 files changed, 394 insertions(+), 398 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index e3d1849..c507c46 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -5009,7 +5009,6 @@ dump_struct_sizes(void)
 {
 #define DUMP_SIZE(t) printf("sizeof(" #t ") = %u\n", (unsigned)sizeof(t));
DUMP_SIZE(struct rte_mbuf);
-   DUMP_SIZE(struct rte_pktmbuf);
DUMP_SIZE(struct rte_mempool);
DUMP_SIZE(struct rte_ring);
 #undef DUMP_SIZE
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 3568ba0..ee82eb6 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -263,7 +263,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
pkt_ol_flags = mb->ol_flags;
ol_flags = (uint16_t) (pkt_ol_flags & (~PKT_TX_L4_MASK));

-   eth_hdr = (struct ether_hdr *) mb->pkt.data;
+   eth_hdr = (struct ether_hdr *) mb->data;
eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
if (eth_type == ETHER_TYPE_VLAN) {
/* Only allow single VLAN label here */
@@ -430,8 +430,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
}

/* Combine the packet header write. VLAN is not consider here */
-   mb->pkt.vlan_macip.f.l2_len = l2_len;
-   mb->pkt.vlan_macip.f.l3_len = l3_len;
+   mb->vlan_macip.f.l2_len = l2_len;
+   mb->vlan_macip.f.l3_len = l3_len;
mb->ol_flags = ol_flags;
}
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
diff --git a/app/test-pmd/ieee1588fwd.c b/app/test-pmd/ieee1588fwd.c
index 44f0a89..4f18183 10

[dpdk-dev] [PATCH RFC 03/11] mbuf: remove rte_ctrlmbuf

2014-05-09 Thread Olivier Matz
The initial role of rte_ctrlmbuf is to carry generic messages (data
pointer + data length) but it's not used by the DPDK or it applications.
Keeping it implies:
  - loosing 1 byte in the rte_mbuf structure
  - having some dead code rte_mbuf.[ch]

This patch removes this feature. Thanks to it, it is now possible to
simplify the rte_mbuf structure by merging the rte_pktmbuf structure
in it. This is done in next commit.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/cmdline.c   |   1 -
 app/test-pmd/testpmd.c   |   2 -
 app/test-pmd/txonly.c|   2 +-
 app/test/commands.c  |   1 -
 app/test/test_mbuf.c |  72 +
 examples/ipv4_multicast/main.c   |   2 +-
 lib/librte_mbuf/rte_mbuf.c   |  65 +++-
 lib/librte_mbuf/rte_mbuf.h   | 175 ++-
 lib/librte_pmd_e1000/em_rxtx.c   |   2 +-
 lib/librte_pmd_e1000/igb_rxtx.c  |   2 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c|   4 +-
 lib/librte_pmd_virtio/virtio_rxtx.c  |   2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c|   2 +-
 lib/librte_pmd_xenvirt/rte_eth_xenvirt.c |   2 +-
 14 files changed, 54 insertions(+), 280 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 7becedc..e3d1849 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -5010,7 +5010,6 @@ dump_struct_sizes(void)
 #define DUMP_SIZE(t) printf("sizeof(" #t ") = %u\n", (unsigned)sizeof(t));
DUMP_SIZE(struct rte_mbuf);
DUMP_SIZE(struct rte_pktmbuf);
-   DUMP_SIZE(struct rte_ctrlmbuf);
DUMP_SIZE(struct rte_mempool);
DUMP_SIZE(struct rte_ring);
 #undef DUMP_SIZE
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9c56914..76b3823 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -389,13 +389,11 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
mb_ctor_arg = (struct mbuf_ctor_arg *) opaque_arg;
mb = (struct rte_mbuf *) raw_mbuf;

-   mb->type = RTE_MBUF_PKT;
mb->pool = mp;
mb->buf_addr = (void *) ((char *)mb + mb_ctor_arg->seg_buf_offset);
mb->buf_physaddr = (uint64_t) (rte_mempool_virt2phy(mp, mb) +
mb_ctor_arg->seg_buf_offset);
mb->buf_len  = mb_ctor_arg->seg_buf_size;
-   mb->type = RTE_MBUF_PKT;
mb->ol_flags = 0;
mb->pkt.data = (char *) mb->buf_addr + RTE_PKTMBUF_HEADROOM;
mb->pkt.nb_segs  = 1;
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 1cf2574..1f066d0 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -93,7 +93,7 @@ tx_mbuf_alloc(struct rte_mempool *mp)
struct rte_mbuf *m;

m = __rte_mbuf_raw_alloc(mp);
-   __rte_mbuf_sanity_check_raw(m, RTE_MBUF_PKT, 0);
+   __rte_mbuf_sanity_check_raw(m, 0);
return (m);
 }

diff --git a/app/test/commands.c b/app/test/commands.c
index b145036..c69544b 100644
--- a/app/test/commands.c
+++ b/app/test/commands.c
@@ -262,7 +262,6 @@ dump_struct_sizes(void)
 #define DUMP_SIZE(t) printf("sizeof(" #t ") = %u\n", (unsigned)sizeof(t));
DUMP_SIZE(struct rte_mbuf);
DUMP_SIZE(struct rte_pktmbuf);
-   DUMP_SIZE(struct rte_ctrlmbuf);
DUMP_SIZE(struct rte_mempool);
DUMP_SIZE(struct rte_ring);
 #undef DUMP_SIZE
diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index fe0f4f6..07b5551 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -80,7 +80,6 @@
 #define MAKE_STRING(x)  # x

 static struct rte_mempool *pktmbuf_pool = NULL;
-static struct rte_mempool *ctrlmbuf_pool = NULL;

 #if defined RTE_MBUF_REFCNT  && defined RTE_MBUF_REFCNT_ATOMIC

@@ -272,8 +271,8 @@ test_one_pktmbuf(void)
GOTO_FAIL("Buffer should be continuous");
memset(hdr, 0x55, MBUF_TEST_HDR2_LEN);

-   rte_mbuf_sanity_check(m, RTE_MBUF_PKT, 1);
-   rte_mbuf_sanity_check(m, RTE_MBUF_PKT, 0);
+   rte_mbuf_sanity_check(m, 1);
+   rte_mbuf_sanity_check(m, 0);
rte_pktmbuf_dump(m, 0);

/* this prepend should fail */
@@ -320,48 +319,6 @@ fail:
return -1;
 }

-/*
- * test control mbuf
- */
-static int
-test_one_ctrlmbuf(void)
-{
-   struct rte_mbuf *m = NULL;
-   char message[] = "This is a message carried by a ctrlmbuf";
-
-   printf("Test ctrlmbuf API\n");
-
-   /* alloc a mbuf */
-
-   m = rte_ctrlmbuf_alloc(ctrlmbuf_pool);
-   if (m == NULL)
-   GOTO_FAIL("Cannot allocate mbuf");
-   if (rte_ctrlmbuf_len(m) != 0)
-   GOTO_FAIL("Bad length");
-
-   /* set data */
-   rte_ctrlmbuf_data(m) = &message;
-   rte_ctrlmbuf_len(m) = sizeof(message);
-
-   /* read data */
-   if (rte_ctrlmbuf_data(m) != message)
-   GOTO_FAIL("Invalid data pointer");
-   if (rte_ctrlmbuf_len(m) != sizeof(message))
-   GOTO_FAIL("Inv

[dpdk-dev] [PATCH RFC 02/11] mbuf: rename RTE_MBUF_SCATTER_GATHER into RTE_MBUF_REFCNT

2014-05-09 Thread Olivier Matz
It seems that RTE_MBUF_SCATTER_GATHER is not the proper name for the
feature it provides. "Scatter gather" means that data is stored using
several buffers. RTE_MBUF_REFCNT seems to be a better name for that
feature as it provides a reference counter for mbufs.

The macro RTE_MBUF_SCATTER_GATHER is poisoned to ensure this
modification is seen by drivers or applications using it.

Signed-off-by: Olivier Matz 
---
 app/test/test_mbuf.c | 16 +++---
 config/defconfig_i686-default-linuxapp-gcc   |  2 +-
 config/defconfig_i686-default-linuxapp-icc   |  2 +-
 config/defconfig_x86_64-default-bsdapp-gcc   |  2 +-
 config/defconfig_x86_64-default-linuxapp-gcc |  2 +-
 config/defconfig_x86_64-default-linuxapp-icc |  2 +-
 doc/doxy-api.conf|  2 +-
 examples/ipv4_frag/Makefile  |  4 ++--
 examples/ipv4_multicast/Makefile |  4 ++--
 lib/librte_mbuf/rte_mbuf.c   |  2 +-
 lib/librte_mbuf/rte_mbuf.h   | 31 +++-
 11 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index f443734..fe0f4f6 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -82,7 +82,7 @@
 static struct rte_mempool *pktmbuf_pool = NULL;
 static struct rte_mempool *ctrlmbuf_pool = NULL;

-#if defined RTE_MBUF_SCATTER_GATHER  && defined RTE_MBUF_REFCNT_ATOMIC
+#if defined RTE_MBUF_REFCNT  && defined RTE_MBUF_REFCNT_ATOMIC

 static struct rte_mempool *refcnt_pool = NULL;
 static struct rte_ring *refcnt_mbuf_ring = NULL;
@@ -365,7 +365,7 @@ fail:
 static int
 testclone_testupdate_testdetach(void)
 {
-#ifndef RTE_MBUF_SCATTER_GATHER
+#ifndef RTE_MBUF_REFCNT
return 0;
 #else
struct rte_mbuf *mc = NULL;
@@ -406,7 +406,7 @@ fail:
if (mc)
rte_pktmbuf_free(mc);
return -1;
-#endif /* RTE_MBUF_SCATTER_GATHER */
+#endif /* RTE_MBUF_REFCNT */
 }
 #undef GOTO_FAIL

@@ -439,7 +439,7 @@ test_pktmbuf_pool(void)
printf("Error pool not empty");
ret = -1;
}
-#ifdef RTE_MBUF_SCATTER_GATHER
+#ifdef RTE_MBUF_REFCNT
extra = rte_pktmbuf_clone(m[0], pktmbuf_pool);
if(extra != NULL) {
printf("Error pool not empty");
@@ -548,11 +548,11 @@ test_pktmbuf_free_segment(void)
 /*
  * Stress test for rte_mbuf atomic refcnt.
  * Implies that:
- * RTE_MBUF_SCATTER_GATHER and RTE_MBUF_REFCNT_ATOMIC are both defined.
+ * RTE_MBUF_REFCNT and RTE_MBUF_REFCNT_ATOMIC are both defined.
  * For more efficency, recomended to run with RTE_LIBRTE_MBUF_DEBUG defined.
  */

-#if defined RTE_MBUF_SCATTER_GATHER  && defined RTE_MBUF_REFCNT_ATOMIC
+#if defined RTE_MBUF_REFCNT  && defined RTE_MBUF_REFCNT_ATOMIC

 static int
 test_refcnt_slave(__attribute__((unused)) void *arg)
@@ -657,7 +657,7 @@ test_refcnt_master(void)
 static int
 test_refcnt_mbuf(void)
 {
-#if defined RTE_MBUF_SCATTER_GATHER  && defined RTE_MBUF_REFCNT_ATOMIC
+#if defined RTE_MBUF_REFCNT  && defined RTE_MBUF_REFCNT_ATOMIC

unsigned lnum, master, slave, tref;

@@ -808,7 +808,7 @@ test_failing_mbuf_sanity_check(void)
return -1;
}

-#ifdef RTE_MBUF_SCATTER_GATHER
+#ifdef RTE_MBUF_REFCNT
badbuf = *buf;
badbuf.refcnt = 0;
if (verify_mbuf_check_panics(&badbuf)) {
diff --git a/config/defconfig_i686-default-linuxapp-gcc 
b/config/defconfig_i686-default-linuxapp-gcc
index 14bd3d1..dd0f0d0 100644
--- a/config/defconfig_i686-default-linuxapp-gcc
+++ b/config/defconfig_i686-default-linuxapp-gcc
@@ -235,7 +235,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
 #
 CONFIG_RTE_LIBRTE_MBUF=y
 CONFIG_RTE_LIBRTE_MBUF_DEBUG=n
-CONFIG_RTE_MBUF_SCATTER_GATHER=y
+CONFIG_RTE_MBUF_REFCNT=y
 CONFIG_RTE_MBUF_REFCNT_ATOMIC=y
 CONFIG_RTE_PKTMBUF_HEADROOM=128

diff --git a/config/defconfig_i686-default-linuxapp-icc 
b/config/defconfig_i686-default-linuxapp-icc
index ec3386e..ef11051 100644
--- a/config/defconfig_i686-default-linuxapp-icc
+++ b/config/defconfig_i686-default-linuxapp-icc
@@ -234,7 +234,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
 #
 CONFIG_RTE_LIBRTE_MBUF=y
 CONFIG_RTE_LIBRTE_MBUF_DEBUG=n
-CONFIG_RTE_MBUF_SCATTER_GATHER=y
+CONFIG_RTE_MBUF_REFCNT=y
 CONFIG_RTE_MBUF_REFCNT_ATOMIC=y
 CONFIG_RTE_PKTMBUF_HEADROOM=128

diff --git a/config/defconfig_x86_64-default-bsdapp-gcc 
b/config/defconfig_x86_64-default-bsdapp-gcc
index d960e1d..f5f2140 100644
--- a/config/defconfig_x86_64-default-bsdapp-gcc
+++ b/config/defconfig_x86_64-default-bsdapp-gcc
@@ -210,7 +210,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
 #
 CONFIG_RTE_LIBRTE_MBUF=y
 CONFIG_RTE_LIBRTE_MBUF_DEBUG=n
-CONFIG_RTE_MBUF_SCATTER_GATHER=y
+CONFIG_RTE_MBUF_REFCNT=y
 CONFIG_RTE_MBUF_REFCNT_ATOMIC=y
 CONFIG_RTE_PKTMBUF_HEADROOM=128

diff --git a/config/defconfig_x86_64-default-linuxapp-gcc 
b/config/defconfig_x86_64-default-linuxapp-gcc
index f11ffbf..25a7e1a 100644
--- a/config/defconfig_x86_64-default-linuxapp-gcc
+++ b/config/defconfig_x86_64-def

[dpdk-dev] [PATCH RFC 01/11] igb/ixgbe: fix IP checksum calculation

2014-05-09 Thread Olivier Matz
According to Intel? 82599 10 GbE Controller Datasheet (Table 7-38), both
L2 and L3 lengths are needed to offload the IP checksum.

Note that the e1000 driver does not need to be patched as it already
contains the fix.

Signed-off-by: Olivier Matz 
---
 lib/librte_pmd_e1000/igb_rxtx.c   | 2 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 4608595..b3c8149 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -233,7 +233,7 @@ igbe_set_xmit_ctx(struct igb_tx_queue* txq,

if (ol_flags & PKT_TX_IP_CKSUM) {
type_tucmd_mlhl = E1000_ADVTXD_TUCMD_IPV4;
-   cmp_mask |= TX_MAC_LEN_CMP_MASK;
+   cmp_mask |= TX_MACIP_LEN_CMP_MASK;
}

/* Specify which HW CTX to upload. */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 55414b9..4e307c2 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -367,7 +367,7 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,

if (ol_flags & PKT_TX_IP_CKSUM) {
type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
-   cmp_mask |= TX_MAC_LEN_CMP_MASK;
+   cmp_mask |= TX_MACIP_LEN_CMP_MASK;
}

/* Specify which HW CTX to upload. */
-- 
1.9.2



[dpdk-dev] [PATCH RFC 00/11] ixgbe/mbuf: add TSO support

2014-05-09 Thread Olivier Matz
This series add TSO support in ixgbe DPDK driver. As discussed
previously on the list [1], one problem is that there is not enough room
in rte_mbuf today to store the required information to implement this
feature:
  - a new ol_flag
  - the MSS
  - the L4 header len

A solution would be to increase the size of the mbuf to 2 cache lines
but it could have a bad impact on performance. This series proposes some
rework to drastically reduce the size of the rte_mbuf structures before
implementing the TSO, avoiding to change the mbuf size to 128 bytes.

After the rework of mbuf structures, the size of rte_mbuf structure is
reduced by 9 bytes. The implementation of TSO requires to double the
size of ol_flags (16 to 32 bits) and to double the size of offload
information in order to add the mss and the l4 header length (32 to 64
bits). At the end of the whole series, sizeof(rte_mbuf) is still 64
bytes and 4 bytes are available for future use.

This rework causes a lot of modifications in the mbuf structure,
implying some changes in the applications that directly use the mbuf
structure fields instead of using the API functions (sometimes there is
no function). That's why this series is a RFC. In my opinion, it's the
proper moment for this evolution as the 1.7.0 window is open.

About TSO, the new fields in mbuf try to be generic enough to apply to
other hardware in the future. To delegate the TCP segmentation to the
hardware, the user has to:

  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM)
  - fill the mbuf->hw_offload information: l2_len, l3_len, l4_len, mss
  - calculate the pseudo header checksum and set it in the TCP header,
as required when doing hardware TCP checksum offload
  - set the IP checksum to 0

Compilation of DPDK and examples is tested for the following
targets: x86_64-*-linuxapp-gcc, i686-*-linuxapp-gcc, x86_64-*-bsdapp-gcc

The mbuf rework series is validated with autotests:

  cd dpdk.org/
  make install T=x86_64-default-linuxapp-gcc
  cd x86_64-default-linuxapp-gcc/
  modprobe uio
  insmod kmod/igb_uio.ko
  python ../tools/igb_uio_bind.py -b igb_uio :02:00.0
  echo 0 > /proc/sys/kernel/randomize_va_space
  echo 1000 > 
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
  echo 1000 > 
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
  mount -t hugetlbfs none /mnt/huge
  make test

TSO is validated with IPv4 and IPv6 with testpmd (see the commit log of
last patch for details).

The performance non-regression has been tested with 6WINDGate fast path.

Note: this patches may conflict with patch [2] which is pushed yet, but
will probably be integrated before this series.

[1] http://dpdk.org/ml/archives/dev/2013-October/thread.html#572
[2] http://dpdk.org/ml/archives/dev/2014-April/002166.html


Olivier Matz (11):
  igb/ixgbe: fix IP checksum calculation
  mbuf: rename RTE_MBUF_SCATTER_GATHER into RTE_MBUF_REFCNT
  mbuf: remove rte_ctrlmbuf
  mbuf: remove the rte_pktmbuf structure
  mbuf: merge physaddr and buf_len in a bitfield
  mbuf: replace data pointer by an offset
  mbuf: add functions to get the name of an ol_flag
  mbuf: change ol_flags to 32 bits
  mbuf: rename vlan_macip_len in hw_offload and increase its size
  testpmd: modify source address to validate checksum calculation
  ixgbe/mbuf: add TSO support

 app/test-pmd/cmdline.c |  60 ++-
 app/test-pmd/config.c  |  18 +-
 app/test-pmd/csumonly.c|  50 ++-
 app/test-pmd/ieee1588fwd.c |   6 +-
 app/test-pmd/macfwd-retry.c|   2 +-
 app/test-pmd/macfwd.c  |   8 +-
 app/test-pmd/rxonly.c  |  47 +-
 app/test-pmd/testpmd.c |  10 +-
 app/test-pmd/testpmd.h |  15 +-
 app/test-pmd/txonly.c  |  47 +-
 app/test/commands.c|   2 -
 app/test/test_mbuf.c   | 100 +
 app/test/test_sched.c  |   4 +-
 config/defconfig_i686-default-linuxapp-gcc |   2 +-
 config/defconfig_i686-default-linuxapp-icc |   2 +-
 config/defconfig_x86_64-default-bsdapp-gcc |   2 +-
 config/defconfig_x86_64-default-linuxapp-gcc   |   2 +-
 config/defconfig_x86_64-default-linuxapp-icc   |   2 +-
 doc/doxy-api.conf  |   2 +-
 examples/dpdk_qat/crypto.c |  22 +-
 examples/dpdk_qat/main.c   |   2 +-
 examples/exception_path/main.c |  11 +-
 examples/ip_reassembly/ipv4_rsmbl.h|  20 +-
 examples/ip_reassembly/main.c  |   6 +-
 examples/ipv4_frag/Makefile|   4 +-
 examples/ipv4_frag/main.c  |   4 +-
 examples/ipv4_

[dpdk-dev] [PATCH] malloc: fix rte_free run time in O(n) free blocks

2014-05-09 Thread Thomas Monjalon
2014-05-09 09:24, Sanford, Robert:
> Hi Thomas,
> 
> >Some patches like this one are not yet reviewed because efforts were
> >focused
> >on release 1.6.0r2. This enhancement must be integrated in 1.7.0.
> >I know that patchwork service is desired and I hope it will be available
> >soon.
> 
> I realized that you guys had been very busy with 1.6.0r2. I just wanted to
> make
> sure that lower-priority patches didn't fall through the cracks.
> 
> >By the way, looking at librte_malloc, it seems implementation of lists
> >could
> >be simpler. Don't you think we could improve (in another patch) this
> >whole
> >code by using BSD macros for lists?
> 
> Yes, I was surprised to find the malloc code not using any kind of list
> functions/macros. I am willing to rework the patch. By BSD list macros, I
> believe you are referring to QUEUE(3) and sys/queue.h. It that right?

Yes I'm referring to QUEUE(3).
So I wait for your rework.

Thanks
-- 
Thomas


[dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield

2014-05-09 Thread Shaw, Jeffrey B
I agree, we should wait for comments then test the performance when the patches 
have settled.


-Original Message-
From: Olivier MATZ [mailto:olivier.m...@6wind.com] 
Sent: Friday, May 09, 2014 9:06 AM
To: Shaw, Jeffrey B; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a 
bitfield

Hi Jeff,

Thank you for your comment.

On 05/09/2014 05:39 PM, Shaw, Jeffrey B wrote:
> have you tested this patch to see if there is a negative impact to 
> performance?

Yes, but not with testpmd. I passed our internal non-regression performance 
tests and it shows no difference (or below the error margin), even with low 
overhead processing like forwarding whatever the number of cores I use.

> Wouldn't the processor have to mask the high bytes of the physical 
> address when it is used, for example, to populate descriptors with 
> buffer addresses?  When compute bound, this could steal CPU cycles 
> away from packet processing.  I think we should understand the 
> performance trade-off in order to save these 2 bytes.

I would naively say that the cost is negligible: accessing to the length is the 
same as before (it's a 16 bits field) and accessing the physical address is 
just a mask or a shift, which should not be very long on an Intel processor (1 
cycle?). This is to be compared with the number of cycles per packet in io-fwd 
mode, which is probably around 150 or 200.

> It would be interesting to see how throughput is impacted when the 
> workload is core-bound.  This could be accomplished by running testpmd 
> in io-fwd mode across 4x 10G ports.

I agree, this is something we could check. If you agree, let's first wait for 
some other comments and see if we find a consensus on the patches.

Regards,
Olivier


[dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield

2014-05-09 Thread Shaw, Jeffrey B
Hello Olivier, have you tested this patch to see if there is a negative impact 
to performance?
Wouldn't the processor have to mask the high bytes of the physical address when 
it is used, for example, to populate descriptors with buffer addresses?  When 
compute bound, this could steal CPU cycles away from packet processing.  I 
think we should understand the performance trade-off in order to save these 2 
bytes.

It would be interesting to see how throughput is impacted when the workload is 
core-bound.  This could be accomplished by running testpmd in io-fwd mode 
across 4x 10G ports.

Thanks,
Jeff

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier Matz
Sent: Friday, May 09, 2014 7:51 AM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a 
bitfield

The physical address is never greater than (1 << 48) = 256 TB.
We can win 2 bytes in the mbuf structure by merging the physical address and 
the buffer length in the same bitfield.

Signed-off-by: Olivier Matz 
---
 lib/librte_mbuf/rte_mbuf.c | 3 ++-
 lib/librte_mbuf/rte_mbuf.h | 7 ---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index 
c229525..9879095 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -104,7 +104,8 @@ rte_pktmbuf_init(struct rte_mempool *mp,
m->buf_len = (uint16_t)buf_len;

/* keep some headroom between start of buffer and data */
-   m->data = (char*) m->buf_addr + RTE_MIN(RTE_PKTMBUF_HEADROOM, 
m->buf_len);
+   m->data = (char*) m->buf_addr + RTE_MIN(RTE_PKTMBUF_HEADROOM,
+   (uint16_t)m->buf_len);

/* init some constant fields */
m->pool = mp;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 
803b223..275f6b2 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -130,8 +130,8 @@ union rte_vlan_macip {  struct rte_mbuf {
struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
void *buf_addr;   /**< Virtual address of segment buffer. */
-   phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */
-   uint16_t buf_len; /**< Length of segment buffer. */
+   uint64_t buf_physaddr:48; /**< Physical address of segment buffer. */
+   uint64_t buf_len:16;  /**< Length of segment buffer. */
 #ifdef RTE_MBUF_REFCNT
/**
 * 16-bit Reference counter.
@@ -148,8 +148,9 @@ struct rte_mbuf {
 #else
uint16_t refcnt_reserved; /**< Do not use this field */
 #endif
-   uint16_t reserved; /**< Unused field. Required for padding. 
*/
+
uint16_t ol_flags;/**< Offload features. */
+   uint32_t reserved; /**< Unused field. Required for padding. 
*/

/* valid for any segment */
struct rte_mbuf *next;  /**< Next segment of scattered packet. */
--
1.9.2



[dpdk-dev] [PATCH] mk: add missing scripts directory in install directory

2014-05-09 Thread David Marchand
Trying to install headers for an external library using DPDK exported makefile
rte.extshared.mk results in following error :

$ cd dpdk
$ make install DESTDIR=/home/marchand/myapp/staging/plop 
T=x86_64-default-linuxapp-gcc
$ cd ~/myapp
$ make RTE_SDK=/home/marchand/myapp/staging/plop 
RTE_TARGET=x86_64-default-linuxapp-gcc
  CC plop.o
  LD plop.so
  SYMLINK-FILE include/plop.h
/bin/sh:
/home/marchand/myapp/staging/plop/scripts/relpath.sh: No such file or directory
ln: `/home/marchand/myapp/build/include' and `./include' are the same file
make[1]: *** [/home/marchand/myapp/build/include/plop.h] Error 1
make: *** [all] Error 2

This comes from the fact that DPDK only installs its mk/ directory while some
makefiles require the scripts/ directory content as well.

So install missing files from scripts/.

Signed-off-by: David Marchand 
---
 mk/rte.sdkbuild.mk |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mk/rte.sdkbuild.mk b/mk/rte.sdkbuild.mk
index 2975ee4..d4d6c05 100644
--- a/mk/rte.sdkbuild.mk
+++ b/mk/rte.sdkbuild.mk
@@ -63,7 +63,7 @@ build: $(ROOTDIRS-y)
@echo Build complete
 ifneq ($(DESTDIR),)
$(Q)mkdir -p $(DESTDIR)
-   $(Q)tar -C $(RTE_SDK) -cf - mk | tar -C $(DESTDIR) -x \
+   $(Q)tar -C $(RTE_SDK) -cf - mk scripts/*.sh | tar -C $(DESTDIR) -x \
  --keep-newer-files --warning=no-ignore-newer -f -
$(Q)mkdir -p $(DESTDIR)/`basename $(RTE_OUTPUT)`
$(Q)tar -C $(RTE_OUTPUT) -chf - \
-- 
1.7.10.4



[dpdk-dev] [PATCH v2] eal: change default per socket memory allocation

2014-05-09 Thread David Marchand
From: Didier Pallard 

Currently, if there is more memory in hugepages than the amount
requested by dpdk application, the memory is allocated by taking as much
memory as possible from each socket, starting from first one.
For example if a system is configured with 8 GB in 2 sockets (4 GB per
socket), and dpdk is requesting only 4GB of memory, all memory will be
taken in socket 0 (that have exactly 4GB of free hugepages) even if some
cores are configured on socket 1, and there are free hugepages on socket
1...

Change this behaviour to allocate memory on all sockets where some cores
are configured, spreading the memory amongst sockets using following
ratio per socket:
N? of cores configured on the socket / Total number of configured cores
* requested memory

This algorithm is used when memory amount is specified globally using
-m option. Per socket memory allocation can always be done using
--socket-mem option.

Changes included in v2:
- only update linux implementation as bsd looks not to be ready for numa
- if new algorithm fails, then defaults to previous behaviour

Signed-off-by: Didier Pallard 
Signed-off-by: David Marchand 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c |   50 +++---
 1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 73a6394..471dcfd 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -881,13 +881,53 @@ calc_num_pages_per_socket(uint64_t * memory,
if (num_hp_info == 0)
return -1;

-   for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_mem != 0; 
socket++) {
-   /* if specific memory amounts per socket weren't requested */
-   if (internal_config.force_sockets == 0) {
+   /* if specific memory amounts per socket weren't requested */
+   if (internal_config.force_sockets == 0) {
+   int cpu_per_socket[RTE_MAX_NUMA_NODES];
+   size_t default_size, total_size;
+   unsigned lcore_id;
+
+   /* Compute number of cores per socket */
+   memset(cpu_per_socket, 0, sizeof(cpu_per_socket));
+   RTE_LCORE_FOREACH(lcore_id) {
+   cpu_per_socket[rte_lcore_to_socket_id(lcore_id)]++;
+   }
+
+   /*
+* Automatically spread requested memory amongst detected 
sockets according
+* to number of cores from cpu mask present on each socket
+*/
+   total_size = internal_config.memory;
+   for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 
0; socket++) {
+
+   /* Set memory amount per socket */
+   default_size = (internal_config.memory * 
cpu_per_socket[socket])
+   / rte_lcore_count();
+
+   /* Limit to maximum available memory on socket */
+   default_size = RTE_MIN(default_size, 
get_socket_mem_size(socket));
+
+   /* Update sizes */
+   memory[socket] = default_size;
+   total_size -= default_size;
+   }
+
+   /*
+* If some memory is remaining, try to allocate it by getting 
all 
+* available memory from sockets, one after the other
+*/
+   for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 
0; socket++) {
/* take whatever is available */
-   memory[socket] = RTE_MIN(get_socket_mem_size(socket),
-   total_mem);
+   default_size = RTE_MIN(get_socket_mem_size(socket) - 
memory[socket],
+  total_size);
+
+   /* Update sizes */
+   memory[socket] += default_size;
+   total_size -= default_size;
}
+   }
+
+   for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_mem != 0; 
socket++) {
/* skips if the memory on specific socket wasn't requested */
for (i = 0; i < num_hp_info && memory[socket] != 0; i++){
hp_used[i].hugedir = hp_info[i].hugedir;
-- 
1.7.10.4



[dpdk-dev] [PATCH v2 7/7] pci: remove deprecated RTE_EAL_UNBIND_PORTS option

2014-05-09 Thread David Marchand
RTE_EAL_UNBIND_PORTS was deprecated in DPDK 1.4.0 and removed in 1.6.0, but the
code was not removed.

The bind/unbind operations should not be handled by the eal.
These operations should be either done outside of dpdk or inside the PMDs
themselves as these are their problems.

Signed-off-by: Anatoly Burakov 
Signed-off-by: David Marchand 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c |  171 -
 1 file changed, 171 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index d529ced..ac2c1fe 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -146,155 +146,6 @@ error:
return -1;
 }

-#ifdef RTE_EAL_UNBIND_PORTS
-#define PROC_MODULES "/proc/modules"
-
-#define IGB_UIO_NAME "igb_uio"
-
-#define UIO_DRV_PATH  "/sys/bus/pci/drivers/%s"
-
-/* maximum time to wait that /dev/uioX appears */
-#define UIO_DEV_WAIT_TIMEOUT 3 /* seconds */
-
-/*
- * Check that a kernel module is loaded. Returns 0 on success, or if the
- * parameter is NULL, or -1 if the module is not loaded.
- */
-static int
-pci_uio_check_module(const char *module_name)
-{
-   FILE *f;
-   unsigned i;
-   char buf[BUFSIZ];
-
-   if (module_name == NULL)
-   return 0;
-
-   f = fopen(PROC_MODULES, "r");
-   if (f == NULL) {
-   RTE_LOG(ERR, EAL, "Cannot open "PROC_MODULES": %s\n", 
-   strerror(errno));
-   return -1;
-   }
-
-   while(fgets(buf, sizeof(buf), f) != NULL) {
-
-   for (i = 0; i < sizeof(buf) && buf[i] != '\0'; i++) {
-   if (isspace(buf[i]))
-   buf[i] = '\0';
-   }
-
-   if (strncmp(buf, module_name, sizeof(buf)) == 0) {
-   fclose(f);
-   return 0;
-   }
-   }
-   fclose(f);
-   return -1;
-}
-
-/* bind a PCI to the kernel module driver */
-static int
-pci_bind_device(struct rte_pci_device *dev, char dr_path[])
-{
-   FILE *f;
-   int n;
-   char buf[BUFSIZ];
-   char dev_bind[PATH_MAX];
-   struct rte_pci_addr *loc = &dev->addr;
-
-   n = rte_snprintf(dev_bind, sizeof(dev_bind), "%s/bind", dr_path);
-   if ((n < 0) || (n >= (int)sizeof(buf))) {
-   RTE_LOG(ERR, EAL, "Cannot rte_snprintf device bind path\n");
-   return -1;
-   }
-
-   f = fopen(dev_bind, "w");
-   if (f == NULL) {
-   RTE_LOG(ERR, EAL, "Cannot open %s\n", dev_bind);
-   return -1;
-   }
-   n = rte_snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n",
-loc->domain, loc->bus, loc->devid, loc->function);
-   if ((n < 0) || (n >= (int)sizeof(buf))) {
-   RTE_LOG(ERR, EAL, "Cannot rte_snprintf PCI infos\n");
-   fclose(f);
-   return -1;
-   }
-   if (fwrite(buf, n, 1, f) == 0) {
-   fclose(f);
-   return -1;
-   }
-
-   fclose(f);
-   return 0;
-}
-
-static int
-pci_uio_bind_device(struct rte_pci_device *dev, const char *module_name)
-{
-   FILE *f;
-   int n;
-   char buf[BUFSIZ];
-   char uio_newid[PATH_MAX];
-   char uio_bind[PATH_MAX];
-
-   n = rte_snprintf(uio_newid, sizeof(uio_newid), UIO_DRV_PATH "/new_id", 
module_name);
-   if ((n < 0) || (n >= (int)sizeof(uio_newid))) {
-   RTE_LOG(ERR, EAL, "Cannot rte_snprintf uio_newid name\n");
-   return -1;
-   }
-
-   n = rte_snprintf(uio_bind, sizeof(uio_bind), UIO_DRV_PATH, module_name);
-   if ((n < 0) || (n >= (int)sizeof(uio_bind))) {
-   RTE_LOG(ERR, EAL, "Cannot rte_snprintf uio_bind name\n");
-   return -1;
-   }
-
-   n = rte_snprintf(buf, sizeof(buf), "%x %x\n",
-   dev->id.vendor_id, dev->id.device_id);
-   if ((n < 0) || (n >= (int)sizeof(buf))) {
-   RTE_LOG(ERR, EAL, "Cannot rte_snprintf vendor_id/device_id\n");
-   return -1;
-   }
-
-   f = fopen(uio_newid, "w");
-   if (f == NULL) {
-   RTE_LOG(ERR, EAL, "Cannot open %s\n", uio_newid);
-   return -1;
-   }
-   if (fwrite(buf, n, 1, f) == 0) {
-   fclose(f);
-   return -1;
-   }
-   fclose(f);
-
-   pci_bind_device(dev, uio_bind);
-   return 0;
-}
-
-static int
-pci_switch_module(struct rte_pci_driver *dr, struct rte_pci_device *dev,
- const char *module_name)
-{
-   if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-   /* check that our driver is loaded */
-   if (pci_uio_check_module(module_name) != 0)
-   rte_exit(EXIT_FAILURE, "The %s module is required by 
the "
-   "%s driver\n", module_name, dr->name);
-
-   /* unbind current driver, bind ours */
-   

[dpdk-dev] [PATCH v2 6/7] pci: move RTE_PCI_DRV_FORCE_UNBIND handling out of #ifdef

2014-05-09 Thread David Marchand
Move RTE_PCI_DRV_FORCE_UNBIND flag handling out of RTE_EAL_UNBIND_PORTS section.
This had nothing to do with RTE_EAL_UNBIND_PORTS anyway.

Signed-off-by: David Marchand 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c |   89 -
 1 file changed, 44 insertions(+), 45 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index dadb198..d529ced 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -107,6 +107,45 @@ TAILQ_HEAD(uio_res_list, uio_resource);
 static struct uio_res_list *uio_res_list = NULL;
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);

+/* unbind kernel driver for this device */
+static int
+pci_unbind_kernel_driver(struct rte_pci_device *dev)
+{
+   int n;
+   FILE *f;
+   char filename[PATH_MAX];
+   char buf[BUFSIZ];
+   struct rte_pci_addr *loc = &dev->addr;
+
+   /* open /sys/bus/pci/devices/:BB:CC.D/driver */
+   rte_snprintf(filename, sizeof(filename),
+SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/driver/unbind",
+loc->domain, loc->bus, loc->devid, loc->function);
+
+   f = fopen(filename, "w");
+   if (f == NULL) /* device was not bound */
+   return 0;
+
+   n = rte_snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n",
+loc->domain, loc->bus, loc->devid, loc->function);
+   if ((n < 0) || (n >= (int)sizeof(buf))) {
+   RTE_LOG(ERR, EAL, "%s(): rte_snprintf failed\n", __func__);
+   goto error;
+   }
+   if (fwrite(buf, n, 1, f) == 0) {
+   RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__,
+   filename);
+   goto error;
+   }
+
+   fclose(f);
+   return 0;
+
+error:
+   fclose(f);
+   return -1;
+}
+
 #ifdef RTE_EAL_UNBIND_PORTS
 #define PROC_MODULES "/proc/modules"

@@ -234,46 +273,6 @@ pci_uio_bind_device(struct rte_pci_device *dev, const char 
*module_name)
return 0;
 }

-/* unbind kernel driver for this device */
-static int
-pci_unbind_kernel_driver(struct rte_pci_device *dev)
-{
-   int n;
-   FILE *f;
-   char filename[PATH_MAX];
-   char buf[BUFSIZ];
-   struct rte_pci_addr *loc = &dev->addr;
-
-   /* open /sys/bus/pci/devices/:BB:CC.D/driver */
-   rte_snprintf(filename, sizeof(filename),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/driver/unbind",
-loc->domain, loc->bus, loc->devid, loc->function);
-
-   f = fopen(filename, "w");
-   if (f == NULL) /* device was not bound */
-   return 0;
-
-   n = rte_snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n",
-loc->domain, loc->bus, loc->devid, loc->function);
-   if ((n < 0) || (n >= (int)sizeof(buf))) {
-   RTE_LOG(ERR, EAL, "%s(): rte_snprintf failed\n", __func__);
-   goto error;
-   }
-   if (fwrite(buf, n, 1, f) == 0) {
-   RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__,
-   filename);
-   goto error;
-   }
-
-   fclose(f);
-   return 0;
-
-error:
-   fclose(f);
-   return -1;
-}
-
-
 static int
 pci_switch_module(struct rte_pci_driver *dr, struct rte_pci_device *dev,
  const char *module_name)
@@ -1008,11 +1007,6 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
/* unbind current driver and bind on igb_uio */
if (pci_switch_module(dr, dev, IGB_UIO_NAME) < 0)
return -1;
-   } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
-  rte_eal_process_type() == RTE_PROC_PRIMARY) {
-   /* unbind current driver */
-   if (pci_unbind_kernel_driver(dev) < 0)
-   return -1;
}
 #endif

@@ -1020,6 +1014,11 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
/* map resources for devices that use igb_uio */
if (pci_uio_map_resource(dev) < 0)
return -1;
+   } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
+  rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   /* unbind current driver */
+   if (pci_unbind_kernel_driver(dev) < 0)
+   return -1;
}

/* reference driver structure */
-- 
1.7.10.4



[dpdk-dev] [PATCH v2 5/7] pci: pci_switch_module cleanup

2014-05-09 Thread David Marchand
The pci_switch_module() function should only do what its name tells: unbind pci
devices and rebind them on the specified kernel driver.
Hence, it can not call pci_uio_map_resource().

Call to pci_uio_map_resource() should be moved to rte_eal_pci_probe_one_driver()
so that we can factorize code.

Signed-off-by: David Marchand 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c |   24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 451fbd2..dadb198 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -107,9 +107,6 @@ TAILQ_HEAD(uio_res_list, uio_resource);
 static struct uio_res_list *uio_res_list = NULL;
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);

-/* forward prototype of function called in pci_switch_module below */
-static int pci_uio_map_resource(struct rte_pci_device *dev);
-
 #ifdef RTE_EAL_UNBIND_PORTS
 #define PROC_MODULES "/proc/modules"

@@ -279,12 +276,11 @@ error:

 static int
 pci_switch_module(struct rte_pci_driver *dr, struct rte_pci_device *dev,
-   int uio_status, const char *module_name)
+ const char *module_name)
 {
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
/* check that our driver is loaded */
-   if (uio_status != 0 &&
-   (uio_status = 
pci_uio_check_module(module_name)) != 0)
+   if (pci_uio_check_module(module_name) != 0)
rte_exit(EXIT_FAILURE, "The %s module is required by 
the "
"%s driver\n", module_name, dr->name);

@@ -294,9 +290,6 @@ pci_switch_module(struct rte_pci_driver *dr, struct 
rte_pci_device *dev,
if (pci_uio_bind_device(dev, module_name) < 0)
return -1;
}
-   /* map the NIC resources */
-   if (pci_uio_map_resource(dev) < 0)
-   return -1;

return 0;
 }
@@ -1012,8 +1005,8 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d

 #ifdef RTE_EAL_UNBIND_PORTS
if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
-   /* unbind driver and load uio resources for Intel NICs 
*/
-   if (pci_switch_module(dr, dev, 1, IGB_UIO_NAME) < 0)
+   /* unbind current driver and bind on igb_uio */
+   if (pci_switch_module(dr, dev, IGB_UIO_NAME) < 0)
return -1;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
   rte_eal_process_type() == RTE_PROC_PRIMARY) {
@@ -1021,12 +1014,13 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
if (pci_unbind_kernel_driver(dev) < 0)
return -1;
}
-#else
-   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO)
-   /* just map resources for Intel NICs */
+#endif
+
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+   /* map resources for devices that use igb_uio */
if (pci_uio_map_resource(dev) < 0)
return -1;
-#endif
+   }

/* reference driver structure */
dev->driver = dr;
-- 
1.7.10.4



[dpdk-dev] [PATCH v2 4/7] pci: rework interrupt fd init and fix fd leak

2014-05-09 Thread David Marchand
A fd leak happens in pci_map_resource when multiple bars are mapped.
Fix this by closing fd unconditionnally in this function and open the
intr_handle fd in pci_uio_map_resource instead.

Signed-off-by: David Marchand 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   |   60 +---
 lib/librte_eal/linuxapp/eal/eal_pci.c |   71 -
 2 files changed, 62 insertions(+), 69 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index a8945e4..94ae461 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -119,8 +119,8 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev)

 /* map a particular resource from a file */
 static void *
-pci_map_resource(struct rte_pci_device *dev, void *requested_addr, 
-   const char *devname, off_t offset, size_t size)
+pci_map_resource(void *requested_addr, const char *devname, off_t offset,
+size_t size)
 {
int fd;
void *mapaddr;
@@ -130,7 +130,7 @@ pci_map_resource(struct rte_pci_device *dev, void 
*requested_addr,
 */
fd = open(devname, O_RDWR);
if (fd < 0) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", 
+   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
devname, strerror(errno));
goto fail;
}
@@ -138,35 +138,21 @@ pci_map_resource(struct rte_pci_device *dev, void 
*requested_addr,
/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, offset);
+   close(fd);
if (mapaddr == MAP_FAILED ||
(requested_addr != NULL && mapaddr != requested_addr)) {
RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-   " %s (%p)\n", __func__, devname, fd, requested_addr, 
+   " %s (%p)\n", __func__, devname, fd, requested_addr,
(unsigned long)size, (unsigned long)offset,
strerror(errno), mapaddr);
-   close(fd);
goto fail;
}

-   if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-   /* save fd if in primary process */
-   dev->intr_handle.fd = fd;
-   dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
-   } else {
-   /* fd is not needed in slave process, close it */
-   dev->intr_handle.fd = -1;
-   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
-   close(fd);
-   }
-
RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);

return mapaddr;

 fail:
-   dev->intr_handle.fd = -1;
-   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
-
return NULL;
 }

@@ -179,19 +165,19 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 {
 size_t i;
 struct uio_resource *uio_res;
- 
+
TAILQ_FOREACH(uio_res, uio_res_list, next) {
- 
+
/* skip this element if it doesn't match our PCI address */
if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
continue;
-
+
for (i = 0; i != uio_res->nb_maps; i++) {
-   if (pci_map_resource(dev, uio_res->maps[i].addr,
-   uio_res->path,
-   (off_t)uio_res->maps[i].offset,
-   (size_t)uio_res->maps[i].size) != 
-   uio_res->maps[i].addr) {
+   if (pci_map_resource(uio_res->maps[i].addr,
+uio_res->path,
+(off_t)uio_res->maps[i].offset,
+(size_t)uio_res->maps[i].size)
+   != uio_res->maps[i].addr) {
RTE_LOG(ERR, EAL,
"Cannot mmap device resource\n");
return (-1);
@@ -219,6 +205,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct uio_map *maps;

dev->intr_handle.fd = -1;
+   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;

/* secondary processes - use already recorded details */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
@@ -233,6 +220,15 @@ pci_uio_map_resource(struct rte_pci_device *dev)
return -1;
}

+   /* save fd if in primary process */
+   dev->intr_handle.fd = open(devname, O_RDWR);
+   if (dev->intr_handle.fd < 0) {
+   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+   devname, strerror(errno));
+   return -1;
+   }
+   dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
+
/* allocate the mapping details for

[dpdk-dev] [PATCH v2 3/7] pci: remove virtio-uio workaround

2014-05-09 Thread David Marchand
virtio-uio does not need eal to map bars from uio device, so remove flag
RTE_PCI_DRV_NEED_IGB_UIO.
Then, move virtio-uio workaround out of generic eal_pci.c for linux
implementation.

Signed-off-by: David Marchand 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   |9 +--
 lib/librte_eal/linuxapp/eal/eal_pci.c |   30 +---
 lib/librte_pmd_virtio/virtio_ethdev.c |  133 -
 3 files changed, 134 insertions(+), 38 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 5d8bcbd..a8945e4 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -221,8 +221,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
dev->intr_handle.fd = -1;

/* secondary processes - use already recorded details */
-   if ((rte_eal_process_type() != RTE_PROC_PRIMARY) &&
-   (dev->id.vendor_id != PCI_VENDOR_ID_QUMRANET))
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return (pci_uio_map_secondary(dev));

rte_snprintf(devname, sizeof(devname), "/dev/uio at pci:%u:%u:%u",
@@ -234,12 +233,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
return -1;
}

-   if(dev->id.vendor_id == PCI_VENDOR_ID_QUMRANET) {
-   /* I/O port address already assigned */
-   /* rte_virtio_pmd does not need any other bar even if available 
*/
-   return (0);
-   }
-   
/* allocate the mapping details for secondary processes*/
if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
RTE_LOG(ERR, EAL,
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 99e07d2..c006cf5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -584,11 +584,9 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 {
int i, j;
char dirname[PATH_MAX];
-   char filename[PATH_MAX];
char devname[PATH_MAX]; /* contains the /dev/uioX */
void *mapaddr;
int uio_num;
-   unsigned long start,size;
uint64_t phaddr;
uint64_t offset;
uint64_t pagesz;
@@ -600,8 +598,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
dev->intr_handle.fd = -1;

/* secondary processes - use already recorded details */
-   if ((rte_eal_process_type() != RTE_PROC_PRIMARY) &&
-   (dev->id.vendor_id != PCI_VENDOR_ID_QUMRANET))
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return (pci_uio_map_secondary(dev));

/* find uio resource */
@@ -612,31 +609,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
return -1;
}

-   if(dev->id.vendor_id == PCI_VENDOR_ID_QUMRANET) {
-   /* get portio size */
-   rte_snprintf(filename, sizeof(filename),
-"%s/portio/port0/size", dirname);
-   if (eal_parse_sysfs_value(filename, &size) < 0) {
-   RTE_LOG(ERR, EAL, "%s(): cannot parse size\n",
-   __func__);
-   return -1;
-   }
-
-   /* get portio start */
-   rte_snprintf(filename, sizeof(filename),
-"%s/portio/port0/start", dirname);
-   if (eal_parse_sysfs_value(filename, &start) < 0) {
-   RTE_LOG(ERR, EAL, "%s(): cannot parse portio start\n",
-   __func__);
-   return -1;
-   }
-   dev->mem_resource[0].addr = (void *)(uintptr_t)start;
-   dev->mem_resource[0].len =  (uint64_t)size;
-   RTE_LOG(DEBUG, EAL, "PCI Port IO found start=0x%lx with 
size=0x%lx\n", start, size);
-   /* rte_virtio_pmd does not need any other bar even if available 
*/
-   return (0);
-   }
-   
/* allocate the mapping details for secondary processes*/
if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
RTE_LOG(ERR, EAL,
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index f107161..c6a1df5 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -36,6 +36,9 @@
 #include 
 #include 
 #include 
+#ifdef RTE_EXEC_ENV_LINUXAPP
+#include 
+#endif

 #include 
 #include 
@@ -392,6 +395,103 @@ virtio_negotiate_features(struct virtio_hw *hw)
hw->guest_features = vtpci_negotiate_features(hw, guest_features);
 }

+#ifdef RTE_EXEC_ENV_LINUXAPP
+static int
+parse_sysfs_value(const char *filename, unsigned long *val)
+{
+   FILE *f;
+   char buf[BUFSIZ];
+   char *end = NULL;
+
+   if ((f = fopen(filename, "r")) == NULL) {
+   PMD_INIT_LOG(ERR, "%s(): cannot open sysfs value %s\n",
+__func__, fil

[dpdk-dev] [PATCH v2 2/7] pci: align bsd implementation on linux

2014-05-09 Thread David Marchand
bsd implementation lacks check on driver flags, fix this.
Besides, check on BAR0 is not needed and could cause trouble for devices that
have no BAR0.

Signed-off-by: David Marchand 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c |   42 +++
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 987b446..5d8bcbd 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -108,8 +108,14 @@ TAILQ_HEAD(uio_res_list, uio_resource);

 static struct uio_res_list *uio_res_list = NULL;

-/* forward prototype of function called in pci_switch_module below */
-static int pci_uio_map_resource(struct rte_pci_device *dev);
+/* unbind kernel driver for this device */
+static int
+pci_unbind_kernel_driver(struct rte_pci_device *dev)
+{
+   RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not implemented "
+   "for BSD\n");
+   return -ENOTSUP;
+}

 /* map a particular resource from a file */
 static void *
@@ -214,6 +220,11 @@ pci_uio_map_resource(struct rte_pci_device *dev)

dev->intr_handle.fd = -1;

+   /* secondary processes - use already recorded details */
+   if ((rte_eal_process_type() != RTE_PROC_PRIMARY) &&
+   (dev->id.vendor_id != PCI_VENDOR_ID_QUMRANET))
+   return (pci_uio_map_secondary(dev));
+
rte_snprintf(devname, sizeof(devname), "/dev/uio at pci:%u:%u:%u",
dev->addr.bus, dev->addr.devid, dev->addr.function);

@@ -223,11 +234,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
return -1;
}

-   /* secondary processes - use already recorded details */
-   if ((rte_eal_process_type() != RTE_PROC_PRIMARY) &&
-   (dev->id.vendor_id != PCI_VENDOR_ID_QUMRANET))
-   return (pci_uio_map_secondary(dev));
-
if(dev->id.vendor_id == PCI_VENDOR_ID_QUMRANET) {
/* I/O port address already assigned */
/* rte_virtio_pmd does not need any other bar even if available 
*/
@@ -479,19 +485,17 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 0;
}

-   /* just map the NIC resources */
-   if (pci_uio_map_resource(dev) < 0)
-   return -1;
-
-   /* We always should have BAR0 mapped */
-   if (rte_eal_process_type() == RTE_PROC_PRIMARY && 
-   dev->mem_resource[0].addr == NULL) {
-   RTE_LOG(ERR, EAL,
-   "%s(): BAR0 is not mapped\n",
-   __func__);
-   return (-1);
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+   /* map resources for devices that use igb_uio */
+   if (pci_uio_map_resource(dev) < 0)
+   return -1;
+   } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
+  rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   /* unbind current driver */
+   if (pci_unbind_kernel_driver(dev) < 0)
+   return -1;
}
- 
+
/* reference driver structure */
dev->driver = dr;

-- 
1.7.10.4



[dpdk-dev] [PATCH v2 1/7] pci: fix potential mem leaks

2014-05-09 Thread David Marchand
Looking at bsd implementation, we can see that there are some potential mem
leaks in linux implementation. Fix them.

Signed-off-by: David Marchand 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 9538efe..99e07d2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -649,11 +649,13 @@ pci_uio_map_resource(struct rte_pci_device *dev)
memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));

/* collect info about device mappings */
-   if ((nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
-   sizeof (uio_res->maps) / sizeof (uio_res->maps[0])))
-   < 0)
+   nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
+  RTE_DIM(uio_res->maps));
+   if (nb_maps < 0) {
+   rte_free(uio_res);
return (nb_maps);
- 
+   }
+
uio_res->nb_maps = nb_maps;

/* Map all BARs */
@@ -678,6 +680,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
(mapaddr = pci_map_resource(dev,
NULL, devname, (off_t)offset,
(size_t)maps[j].size)) == NULL) {
+   rte_free(uio_res);
return (-1);
}

-- 
1.7.10.4



[dpdk-dev] [PATCH v2 0/7] pci cleanup

2014-05-09 Thread David Marchand
Hello all, 

Here is an attempt at having an equal implementation in bsd and linux eal_pci.c.
It results in following changes :
- checks on driver flag in bsd which were missing
- remove virtio-uio workaround in linux eal_pci.c
- remove deprecated RTE_EAL_UNBIND_PORTS option

Along the way, I discovered two small bugs: a mem leak in linux eal_pci.c and a
fd leak in both bsd and linux eal_pci.c.

Changes included in v2:
- fix another mem leak noticed by Anatoly Burakov

-- 
David Marchand

David Marchand (7):
  pci: fix potential mem leaks
  pci: align bsd implementation on linux
  pci: remove virtio-uio workaround
  pci: rework interrupt fd init and fix fd leak
  pci: pci_switch_module cleanup
  pci: move RTE_PCI_DRV_FORCE_UNBIND handling out of #ifdef
  pci: remove deprecated RTE_EAL_UNBIND_PORTS option

 lib/librte_eal/bsdapp/eal/eal_pci.c   |  105 ++--
 lib/librte_eal/linuxapp/eal/eal_pci.c |  282 +
 lib/librte_pmd_virtio/virtio_ethdev.c |  133 +++-
 3 files changed, 218 insertions(+), 302 deletions(-)

-- 
1.7.10.4



[dpdk-dev] [PATCH v2 6/6] mk: add "make examples" target in root makefile

2014-05-09 Thread Olivier Matz
It is now possible to build all projects from the examples/ directory
using one command from root directory.

Some illustration of what is possible:

- build examples in the DPDK tree for one target

  # install the x86_64-default-linuxapp-gcc in
  # ${RTE_SDK}/x86_64-default-linuxapp-gcc directory
  user at droids:~/dpdk.org$ make install T=x86_64-default-linuxapp-gcc
  # build examples for this new installation in
  # ${RTE_SDK}/examples directory
  user at droids:~/dpdk.org$ make examples T=x86_64-default-linuxapp-gcc

- build examples outside DPDK tree for several targets

  # install all targets matching x86_64-*-linuxapp-gcc in
  # ${RTE_SDK}/x86_64-*-linuxapp-gcc directories
  user at droids:~/dpdk.org$ make install T=x86_64-*-linuxapp-gcc
  # build examples for these installations in /tmp/foobar
  user at droids:~/dpdk.org$ make examples T=x86_64-*-linuxapp-gcc O=/tmp/foobar

Signed-off-by: Olivier Matz 
---
 doc/build-sdk-quick.txt | 14 +
 mk/rte.sdkexamples.mk   | 79 +
 mk/rte.sdkroot.mk   |  4 +++
 3 files changed, 91 insertions(+), 6 deletions(-)
 create mode 100644 mk/rte.sdkexamples.mk

diff --git a/doc/build-sdk-quick.txt b/doc/build-sdk-quick.txt
index 8989a32..d768c44 100644
--- a/doc/build-sdk-quick.txt
+++ b/doc/build-sdk-quick.txt
@@ -1,12 +1,14 @@
 Basic build
make config T=x86_64-default-linuxapp-gcc && make
 Build commands
-   config  get configuration from target template (T=)
-   all same as build (default rule)
-   build   build in a configured directory
-   clean   remove files but keep configuration
-   install build many targets (wildcard allowed) and install in DESTDIR
-   uninstall   remove all installed targets
+   config   get configuration from target template (T=)
+   all  same as build (default rule)
+   buildbuild in a configured directory
+   cleanremove files but keep configuration
+   install  build many targets (wildcard allowed) and install in 
DESTDIR
+   uninstallremove all installed targets
+   examples build examples for given targets (T=)
+   examples_clean   clean examples for given targets (T=)
 Build variables
EXTRA_CPPFLAGS   preprocessor options
EXTRA_CFLAGS compiler options
diff --git a/mk/rte.sdkexamples.mk b/mk/rte.sdkexamples.mk
new file mode 100644
index 000..a76570e
--- /dev/null
+++ b/mk/rte.sdkexamples.mk
@@ -0,0 +1,79 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2014 6WIND S.A.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of 6WIND S.A. nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# examples application are seen as external applications which are
+# not part of SDK.
+BUILDING_RTE_SDK :=
+export BUILDING_RTE_SDK
+
+# Build directory is given with O=
+ifndef O
+O = $(RTE_SDK)/examples
+endif
+
+# Target for which examples should be built.
+ifndef T
+T = *
+endif
+
+# list all available configurations
+EXAMPLES_CONFIGS := $(patsubst $(RTE_SRCDIR)/config/defconfig_%,%,\
+   $(wildcard $(RTE_SRCDIR)/config/defconfig_$(T)))
+EXAMPLES_TARGETS := $(addsuffix _examples,\
+   $(filter-out %~,$(EXAMPLES_CONFIGS)))
+
+.PHONY: examples
+examples: $(EXAMPLES_TARGETS)
+
+%_examples:
+   @echo == Build examples for $*
+   $(Q)if [ ! -d "${RTE_SDK}/${*}" ]; then \
+   echo "Target ${*} does not exist in ${RTE_SDK}/${*}." ; \
+   echo -n "Pleas

[dpdk-dev] [PATCH v2 5/6] examples: fix netmap_compat example

2014-05-09 Thread Olivier Matz
It is not allowed to reference a an absolute file name in SRCS-y.
A VPATH has to be used, else the dependencies won't be checked
properly.

Signed-off-by: Olivier Matz 
---
 examples/netmap_compat/bridge/Makefile | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/examples/netmap_compat/bridge/Makefile 
b/examples/netmap_compat/bridge/Makefile
index 74feb1e..ebc6b1c 100644
--- a/examples/netmap_compat/bridge/Makefile
+++ b/examples/netmap_compat/bridge/Makefile
@@ -41,9 +41,12 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # binary name
 APP = bridge

+# for compat_netmap.c
+VPATH := $(SRCDIR)/../lib
+
 # all source are stored in SRCS-y
 SRCS-y := bridge.c
-SRCS-y += $(SRCDIR)/../lib/compat_netmap.c
+SRCS-y += compat_netmap.c

 CFLAGS += -O3 -I$(SRCDIR)/../lib -I$(SRCDIR)/../netmap
 CFLAGS += $(WERROR_FLAGS)
-- 
1.9.2



[dpdk-dev] [PATCH v2 4/6] examples: fix qos_sched makefile

2014-05-09 Thread Olivier Matz
The example does not compile as the linker complains about duplicated
symbols.

Remove -lsched from LDLIBS, it is already present in rte.app.mk and
added by the DPDK framework automatically.

Signed-off-by: Olivier Matz 
---
 examples/qos_sched/Makefile | 2 --
 1 file changed, 2 deletions(-)

diff --git a/examples/qos_sched/Makefile b/examples/qos_sched/Makefile
index b91fe37..9366efe 100755
--- a/examples/qos_sched/Makefile
+++ b/examples/qos_sched/Makefile
@@ -54,6 +54,4 @@ CFLAGS += $(WERROR_FLAGS)
 CFLAGS_args.o := -D_GNU_SOURCE
 CFLAGS_cfg_file.o := -D_GNU_SOURCE

-LDLIBS += -lrte_sched
-
 include $(RTE_SDK)/mk/rte.extapp.mk
-- 
1.9.2



[dpdk-dev] [PATCH v2 3/6] examples: add a makefile to build all examples

2014-05-09 Thread Olivier Matz
It is now possible to build all examples by doing the following:

  user at droids:~/dpdk.org$ make install T=x86_64-default-linuxapp-gcc
  user at droids:~/dpdk.org$ cd examples
  user at droids:~/dpdk.org/examples$ make RTE_SDK=${PWD}/.. \
  RTE_TARGET=x86_64-default-linuxapp-gcc

Signed-off-by: Olivier Matz 
---
 examples/Makefile | 68 +++
 1 file changed, 68 insertions(+)
 create mode 100644 examples/Makefile

diff --git a/examples/Makefile b/examples/Makefile
new file mode 100644
index 000..5e36c92
--- /dev/null
+++ b/examples/Makefile
@@ -0,0 +1,68 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2014 6WIND S.A.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of 6WIND S.A. nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-default-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+DIRS-y += cmdline
+ifneq ($(ICP_ROOT),)
+DIRS-y += dpdk_qat
+endif
+DIRS-y += exception_path
+DIRS-y += helloworld
+DIRS-y += ip_reassembly
+DIRS-$(CONFIG_RTE_MBUF_SCATTER_GATHER) += ipv4_frag
+DIRS-$(CONFIG_RTE_MBUF_SCATTER_GATHER) += ipv4_multicast
+DIRS-$(CONFIG_RTE_LIBRTE_KNI) += kni
+DIRS-y += l2fwd
+DIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += l2fwd-ivshmem
+DIRS-y += l3fwd
+DIRS-y += l3fwd-power
+DIRS-y += l3fwd-vf
+DIRS-y += link_status_interrupt
+DIRS-y += load_balancer
+DIRS-y += multi_process
+DIRS-y += netmap_compat/bridge
+DIRS-$(CONFIG_RTE_LIBRTE_METER) += qos_meter
+DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += qos_sched
+DIRS-y += quota_watermark
+DIRS-y += timer
+DIRS-y += vhost
+DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen
+DIRS-y += vmdq
+DIRS-y += vmdq_dcb
+
+include $(RTE_SDK)/mk/rte.extsubdir.mk
-- 
1.9.2



[dpdk-dev] [PATCH v2 2/6] examples: use rte.extsubdir.mk to process subdirectories

2014-05-09 Thread Olivier Matz
Signed-off-by: Olivier Matz 
---
 examples/l2fwd-ivshmem/Makefile  |  9 +
 examples/multi_process/Makefile  | 16 +++-
 examples/multi_process/client_server_mp/Makefile | 15 ++-
 examples/quota_watermark/Makefile| 12 +++-
 4 files changed, 17 insertions(+), 35 deletions(-)

diff --git a/examples/l2fwd-ivshmem/Makefile b/examples/l2fwd-ivshmem/Makefile
index 7286b37..df59ed8 100644
--- a/examples/l2fwd-ivshmem/Makefile
+++ b/examples/l2fwd-ivshmem/Makefile
@@ -37,14 +37,7 @@ endif
 RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc

 include $(RTE_SDK)/mk/rte.vars.mk
-unexport RTE_SRCDIR RTE_OUTPUT RTE_EXTMK

 DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += host guest

-.PHONY: all clean $(DIRS-y)
-
-all: $(DIRS-y)
-clean: $(DIRS-y)
-
-$(DIRS-y):
-   $(MAKE) -C $@ $(MAKECMDGOALS)
+include $(RTE_SDK)/mk/rte.extsubdir.mk
diff --git a/examples/multi_process/Makefile b/examples/multi_process/Makefile
index ba96a7e..f2c8e68 100644
--- a/examples/multi_process/Makefile
+++ b/examples/multi_process/Makefile
@@ -33,15 +33,13 @@ ifeq ($(RTE_SDK),)
 $(error "Please define RTE_SDK environment variable")
 endif

-include $(RTE_SDK)/mk/rte.vars.mk
-unexport RTE_SRCDIR RTE_OUTPUT RTE_EXTMK
-
-DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += $(wildcard *_mp)
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc

-.PHONY: all clean $(DIRS-y)
+include $(RTE_SDK)/mk/rte.vars.mk

-all: $(DIRS-y)
-clean: $(DIRS-y)
+DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += client_server_mp
+DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += simple_mp
+DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += symmetric_mp

-$(DIRS-y):
-   $(MAKE) -C $@ $(MAKECMDGOALS)
+include $(RTE_SDK)/mk/rte.extsubdir.mk
diff --git a/examples/multi_process/client_server_mp/Makefile 
b/examples/multi_process/client_server_mp/Makefile
index 24d31b0..b8d6b3f 100644
--- a/examples/multi_process/client_server_mp/Makefile
+++ b/examples/multi_process/client_server_mp/Makefile
@@ -33,15 +33,12 @@ ifeq ($(RTE_SDK),)
 $(error "Please define RTE_SDK environment variable")
 endif

-include $(RTE_SDK)/mk/rte.vars.mk
-unexport RTE_SRCDIR RTE_OUTPUT RTE_EXTMK
-
-DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += $(wildcard mp_*)
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc

-.PHONY: all clean $(DIRS-y)
+include $(RTE_SDK)/mk/rte.vars.mk

-all: $(DIRS-y)
-clean: $(DIRS-y)
+DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += mp_client
+DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += mp_server

-$(DIRS-y):
-   $(MAKE) -C $@ $(MAKECMDGOALS)
+include $(RTE_SDK)/mk/rte.extsubdir.mk
diff --git a/examples/quota_watermark/Makefile 
b/examples/quota_watermark/Makefile
index 5596dcc..e4d54c2 100644
--- a/examples/quota_watermark/Makefile
+++ b/examples/quota_watermark/Makefile
@@ -37,14 +37,8 @@ endif
 RTE_TARGET ?= x86_64-default-linuxapp-gcc

 include $(RTE_SDK)/mk/rte.vars.mk
-unexport RTE_SRCDIR RTE_OUTPUT RTE_EXTMK

-DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += $(wildcard qw*)
+DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += qw
+DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += qwctl

-.PHONY: all clean $(DIRS-y)
-
-all: $(DIRS-y)
-clean: $(DIRS-y)
-
-$(DIRS-y):
-   $(MAKE) -C $@ $(MAKECMDGOALS)
+include $(RTE_SDK)/mk/rte.extsubdir.mk
-- 
1.9.2



[dpdk-dev] [PATCH v2 1/6] mk: introduce rte.extsubdir.mk

2014-05-09 Thread Olivier Matz
This makefile can be included by a project that needs to build several
applications or libraries that are located in different directories.

Signed-off-by: Olivier Matz 
---
 mk/rte.extsubdir.mk | 53 +
 1 file changed, 53 insertions(+)
 create mode 100644 mk/rte.extsubdir.mk

diff --git a/mk/rte.extsubdir.mk b/mk/rte.extsubdir.mk
new file mode 100644
index 000..f50f006
--- /dev/null
+++ b/mk/rte.extsubdir.mk
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2014 6WIND S.A.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of 6WIND S.A. nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+MAKEFLAGS += --no-print-directory
+
+# output directory
+O ?= .
+BASE_OUTPUT ?= $(O)
+CUR_SUBDIR ?= .
+
+.PHONY: all
+all: $(DIRS-y)
+
+.PHONY: clean
+clean: $(DIRS-y)
+
+.PHONY: $(DIRS-y)
+$(DIRS-y):
+   @echo "== $@"
+   $(Q)$(MAKE) -C $(@) \
+   M=$(CURDIR)/$(@)/Makefile \
+   O=$(BASE_OUTPUT)/$(CUR_SUBDIR)/$(@)/$(RTE_TARGET) \
+   BASE_OUTPUT=$(BASE_OUTPUT) \
+   CUR_SUBDIR=$(CUR_SUBDIR)/$(@) \
+   S=$(CURDIR)/$(@) \
+   $(filter-out $(DIRS-y),$(MAKECMDGOALS))
-- 
1.9.2



[dpdk-dev] [PATCH v2 0/6] examples: add a new makefile to build all examples

2014-05-09 Thread Olivier Matz
This patch series adds a makefile to build all examples supported
by the configuration. It helps to check that all examples compile
after a dpdk modification.

After applying the patches, it is possible to build all examples for
given targets, given the installation directory:

  # first, install the x86_64-default-linuxapp-gcc in
  # ${RTE_SDK}/x86_64-default-linuxapp-gcc directory
  user at droids:~/dpdk.org$ make install T=x86_64-default-linuxapp-gcc
  # build examples for this new installation in
  # ${RTE_SDK}/examples directory
  user at droids:~/dpdk.org$ make examples T=x86_64-default-linuxapp-gcc

Or directly from examples directory:

  user at droids:~/dpdk.org$ cd examples
  user at droids:~/dpdk.org/examples$ make RTE_SDK=${PWD}/.. \
  RTE_TARGET=x86_64-default-linuxapp-gcc


Changes included in v2:
  - do not build kni example if CONFIG_RTE_LIBRTE_KNI is not set
  - fix rte.extsubdir.mk when there are several levels of subdirectories
  - allow to build examples directly from dpdk root directory
  - explain in commit logs that it requires an install directory

Olivier Matz (6):
  mk: introduce rte.extsubdir.mk
  examples: use rte.extsubdir.mk to process subdirectories
  examples: add a makefile to build all examples
  examples: fix qos_sched makefile
  examples: fix netmap_compat example
  mk: add "make examples" target in root makefile

 doc/build-sdk-quick.txt  | 14 +++--
 examples/Makefile| 68 
 examples/l2fwd-ivshmem/Makefile  |  9 +--
 examples/multi_process/Makefile  | 16 +++--
 examples/multi_process/client_server_mp/Makefile | 15 ++---
 examples/netmap_compat/bridge/Makefile   |  5 +-
 examples/qos_sched/Makefile  |  2 -
 examples/quota_watermark/Makefile| 12 +---
 mk/rte.extsubdir.mk  | 53 
 mk/rte.sdkexamples.mk| 79 
 mk/rte.sdkroot.mk|  4 ++
 11 files changed, 233 insertions(+), 44 deletions(-)
 create mode 100644 examples/Makefile
 create mode 100644 mk/rte.extsubdir.mk
 create mode 100644 mk/rte.sdkexamples.mk

-- 
1.9.2



[dpdk-dev] [PATCH v2 0/2] ring: allow to init a rte_ring outside of an rte_memzone

2014-05-09 Thread Ananyev, Konstantin
-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier Matz
Sent: Friday, May 09, 2014 11:15 AM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH v2 0/2] ring: allow to init a rte_ring outside of an 
rte_memzone

These 2 patches adds 2 new functions that permits to initialize and use
a rte_ring anywhere in memory.

Before this patches, only rte_ring_create() was available. This function
allocates a rte_memzone (that cannot be freed) and initializes a ring
inside.

This series allows to do the following:
  size = rte_ring_get_memsize(1024);
  r = malloc(size);
  rte_ring_init(r, "my_ring", 1024, 0);


Changes included in v2:
  - fix syntax for functions definitions in rte_ring_get_memsize()
  - use RTE_ALIGN() to get nearest higher multiple of cache line size
  - fix description of rte_ring_init() in doxygen comments

Olivier Matz (2):
  ring: introduce rte_ring_get_memsize()
  ring: introduce rte_ring_init()

 lib/librte_ring/rte_ring.c | 89 +-
 lib/librte_ring/rte_ring.h | 67 +++---
 2 files changed, 119 insertions(+), 37 deletions(-)

-- 
Acked-by: Konstantin Ananyev 



[dpdk-dev] [PATCH v2 2/2] ring: introduce rte_ring_init()

2014-05-09 Thread Olivier Matz
Allow to initialize a ring in an already allocated memory. The rte_ring_create()
function that allocates a ring in a rte_memzone is still available and now uses
the new rte_ring_init() function in order to factorize the code.

Signed-off-by: Olivier Matz 
---
 lib/librte_ring/rte_ring.c | 63 ++
 lib/librte_ring/rte_ring.h | 51 +
 2 files changed, 82 insertions(+), 32 deletions(-)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 156fe49..2eaa6c8 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -113,18 +113,10 @@ rte_ring_get_memsize(unsigned count)
return sz;
 }

-/* create the ring */
-struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-   unsigned flags)
+int
+rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
+   unsigned flags)
 {
-   char mz_name[RTE_MEMZONE_NAMESIZE];
-   struct rte_ring *r;
-   const struct rte_memzone *mz;
-   ssize_t ring_size;
-   int mz_flags = 0;
-   struct rte_ring_list* ring_list = NULL;
-
/* compilation-time checks */
RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
  CACHE_LINE_MASK) != 0);
@@ -141,11 +133,38 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
  CACHE_LINE_MASK) != 0);
 #endif

+   /* init the ring structure */
+   memset(r, 0, sizeof(*r));
+   rte_snprintf(r->name, sizeof(r->name), "%s", name);
+   r->flags = flags;
+   r->prod.watermark = count;
+   r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
+   r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
+   r->prod.size = r->cons.size = count;
+   r->prod.mask = r->cons.mask = count-1;
+   r->prod.head = r->cons.head = 0;
+   r->prod.tail = r->cons.tail = 0;
+
+   return 0;
+}
+
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+   unsigned flags)
+{
+   char mz_name[RTE_MEMZONE_NAMESIZE];
+   struct rte_ring *r;
+   const struct rte_memzone *mz;
+   ssize_t ring_size;
+   int mz_flags = 0;
+   struct rte_ring_list* ring_list = NULL;
+
/* check that we have an initialised tail queue */
-   if ((ring_list = 
+   if ((ring_list =
 RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_RING, rte_ring_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
-   return NULL;
+   return NULL;
}

ring_size = rte_ring_get_memsize(count);
@@ -164,26 +183,16 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
mz = rte_memzone_reserve(mz_name, ring_size, socket_id, mz_flags);
if (mz != NULL) {
r = mz->addr;
-
-   /* init the ring structure */
-   memset(r, 0, sizeof(*r));
-   rte_snprintf(r->name, sizeof(r->name), "%s", name);
-   r->flags = flags;
-   r->prod.watermark = count;
-   r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
-   r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
-   r->prod.size = r->cons.size = count;
-   r->prod.mask = r->cons.mask = count-1;
-   r->prod.head = r->cons.head = 0;
-   r->prod.tail = r->cons.tail = 0;
-
+   /* no need to check return value here, we already checked the
+* arguments above */
+   rte_ring_init(r, name, count, flags);
TAILQ_INSERT_TAIL(ring_list, r, next);
} else {
r = NULL;
RTE_LOG(ERR, RING, "Cannot reserve memory\n");
}
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
-   
+
return r;
 }

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index e8493f2..96232d3 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -215,13 +215,54 @@ struct rte_ring {
 ssize_t rte_ring_get_memsize(unsigned count);

 /**
+ * Initialize a ring structure.
+ *
+ * Initialize a ring structure in memory pointed by "r". The size of the
+ * memory area must be large enough to store the ring structure and the
+ * object table. It is advised to use rte_ring_get_memsize() to get the
+ * appropriate size.
+ *
+ * The ring size is set to *count*, which must be a power of two. Water
+ * marking is disabled by default. The real usable ring size is
+ * *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is not added in RTE_TAILQ_RING global list. Indeed, the
+ * memory given by the caller may not be shareable among dpdk
+ * processes.
+ *
+ * @param r
+ *   The pointer to the ring structure followed by the objects table.
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must 

[dpdk-dev] [PATCH v2 1/2] ring: introduce rte_ring_get_memsize()

2014-05-09 Thread Olivier Matz
Add a function that returns the amount of memory occupied by a rte_ring
structure and its object table. This commit prepares the next one that
will allow to allocate a ring dynamically.

Signed-off-by: Olivier Matz 
---
 lib/librte_ring/rte_ring.c | 30 +++---
 lib/librte_ring/rte_ring.h | 16 
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 0d43a55..156fe49 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -94,6 +94,25 @@ TAILQ_HEAD(rte_ring_list, rte_ring);
 /* true if x is a power of 2 */
 #define POWEROF2(x) x)-1) & (x)) == 0)

+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+   ssize_t sz;
+
+   /* count must be a power of 2 */
+   if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
+   RTE_LOG(ERR, RING,
+   "Requested size is invalid, must be power of 2, and "
+   "do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+   return -EINVAL;
+   }
+
+   sz = sizeof(struct rte_ring) + count * sizeof(void *);
+   sz = RTE_ALIGN(sz, CACHE_LINE_SIZE);
+   return sz;
+}
+
 /* create the ring */
 struct rte_ring *
 rte_ring_create(const char *name, unsigned count, int socket_id,
@@ -102,7 +121,7 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
char mz_name[RTE_MEMZONE_NAMESIZE];
struct rte_ring *r;
const struct rte_memzone *mz;
-   size_t ring_size;
+   ssize_t ring_size;
int mz_flags = 0;
struct rte_ring_list* ring_list = NULL;

@@ -129,16 +148,13 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
return NULL;
}

-   /* count must be a power of 2 */
-   if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
-   rte_errno = EINVAL;
-   RTE_LOG(ERR, RING, "Requested size is invalid, must be power of 
2, and "
-   "do not exceed the size limit %u\n", 
RTE_RING_SZ_MASK);
+   ring_size = rte_ring_get_memsize(count);
+   if (ring_size < 0) {
+   rte_errno = ring_size;
return NULL;
}

rte_snprintf(mz_name, sizeof(mz_name), "%s%s", RTE_RING_MZ_PREFIX, 
name);
-   ring_size = count * sizeof(void *) + sizeof(struct rte_ring);

rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 775ea79..e8493f2 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -199,6 +199,22 @@ struct rte_ring {
 #endif

 /**
+ * Calculate the memory size needed for a ring
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it. This value is the sum of the size of
+ * the structure rte_ring and the size of the memory needed by the
+ * objects pointers. The value is aligned to a cache line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+ssize_t rte_ring_get_memsize(unsigned count);
+
+/**
  * Create a new ring named *name* in memory.
  *
  * This function uses ``memzone_reserve()`` to allocate memory. Its size is
-- 
1.9.2



[dpdk-dev] [PATCH v2 0/2] ring: allow to init a rte_ring outside of an rte_memzone

2014-05-09 Thread Olivier Matz
These 2 patches adds 2 new functions that permits to initialize and use
a rte_ring anywhere in memory.

Before this patches, only rte_ring_create() was available. This function
allocates a rte_memzone (that cannot be freed) and initializes a ring
inside.

This series allows to do the following:
  size = rte_ring_get_memsize(1024);
  r = malloc(size);
  rte_ring_init(r, "my_ring", 1024, 0);


Changes included in v2:
  - fix syntax for functions definitions in rte_ring_get_memsize()
  - use RTE_ALIGN() to get nearest higher multiple of cache line size
  - fix description of rte_ring_init() in doxygen comments

Olivier Matz (2):
  ring: introduce rte_ring_get_memsize()
  ring: introduce rte_ring_init()

 lib/librte_ring/rte_ring.c | 89 +-
 lib/librte_ring/rte_ring.h | 67 +++---
 2 files changed, 119 insertions(+), 37 deletions(-)

-- 
1.9.2



[dpdk-dev] Compile failed using g++ 4.8.2

2014-05-09 Thread Bo Chen
When I use Ubuntu 14.04 to compile my program, the g++ 4.8.2 print the
following error message, that need to add a space around identifier
PRIx64, anyone can help to submit a patch:

/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_mempool.h:347:6:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
  "obj=%p, mempool=%p, cookie=%"PRIx64"\n",
  ^
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_mempool.h:357:6:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
  "obj=%p, mempool=%p, cookie=%"PRIx64"\n",
  ^
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_mempool.h:368:6:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
  "obj=%p, mempool=%p, cookie=%"PRIx64"\n",
  ^
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_mempool.h:377:5:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
 "obj=%p, mempool=%p, cookie=%"PRIx64"\n",
 ^
In file included from
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_ethdev.h:177:0,
 from /home/bodc/workspace/tcproxy/src/comm/packet.cc:9:
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:95:21:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
 #define PCI_PRI_FMT "%.4"PRIx16":%.2"PRIx8":%.2"PRIx8".%"PRIx8
 ^
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:95:32:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
 #define PCI_PRI_FMT "%.4"PRIx16":%.2"PRIx8":%.2"PRIx8".%"PRIx8
^
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:95:43:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
 #define PCI_PRI_FMT "%.4"PRIx16":%.2"PRIx8":%.2"PRIx8".%"PRIx8
   ^
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:95:54:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
 #define PCI_PRI_FMT "%.4"PRIx16":%.2"PRIx8":%.2"PRIx8".%"PRIx8
  ^
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:98:27:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
 #define PCI_SHORT_PRI_FMT "%.2"PRIx8":%.2"PRIx8".%"PRIx8
   ^
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:98:37:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
 #define PCI_SHORT_PRI_FMT "%.2"PRIx8":%.2"PRIx8".%"PRIx8
 ^
/home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:98:48:
error: invalid suffix on literal; C++11 requires a space between
literal and identifier [-Werror=literal-suffix]
 #define PCI_SHORT_PRI_FMT "%.2"PRIx8":%.2"PRIx8".%"PRIx8


[dpdk-dev] [PATCH RFC 00/11] ixgbe/mbuf: add TSO support

2014-05-09 Thread Stephen Hemminger
On Fri,  9 May 2014 16:50:27 +0200
Olivier Matz  wrote:

> This series add TSO support in ixgbe DPDK driver. As discussed
> previously on the list [1], one problem is that there is not enough room
> in rte_mbuf today to store the required information to implement this
> feature:
>   - a new ol_flag
>   - the MSS
>   - the L4 header len
> 
> A solution would be to increase the size of the mbuf to 2 cache lines
> but it could have a bad impact on performance. This series proposes some
> rework to drastically reduce the size of the rte_mbuf structures before
> implementing the TSO, avoiding to change the mbuf size to 128 bytes.
> 
> After the rework of mbuf structures, the size of rte_mbuf structure is
> reduced by 9 bytes. The implementation of TSO requires to double the
> size of ol_flags (16 to 32 bits) and to double the size of offload
> information in order to add the mss and the l4 header length (32 to 64
> bits). At the end of the whole series, sizeof(rte_mbuf) is still 64
> bytes and 4 bytes are available for future use.
> 
> This rework causes a lot of modifications in the mbuf structure,
> implying some changes in the applications that directly use the mbuf
> structure fields instead of using the API functions (sometimes there is
> no function). That's why this series is a RFC. In my opinion, it's the
> proper moment for this evolution as the 1.7.0 window is open.
> 
> About TSO, the new fields in mbuf try to be generic enough to apply to
> other hardware in the future. To delegate the TCP segmentation to the
> hardware, the user has to:
> 
>   - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
> PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM)
>   - fill the mbuf->hw_offload information: l2_len, l3_len, l4_len, mss
>   - calculate the pseudo header checksum and set it in the TCP header,
> as required when doing hardware TCP checksum offload
>   - set the IP checksum to 0
> 
> Compilation of DPDK and examples is tested for the following
> targets: x86_64-*-linuxapp-gcc, i686-*-linuxapp-gcc, x86_64-*-bsdapp-gcc
> 
> The mbuf rework series is validated with autotests:
> 
>   cd dpdk.org/
>   make install T=x86_64-default-linuxapp-gcc
>   cd x86_64-default-linuxapp-gcc/
>   modprobe uio
>   insmod kmod/igb_uio.ko
>   python ../tools/igb_uio_bind.py -b igb_uio :02:00.0
>   echo 0 > /proc/sys/kernel/randomize_va_space
>   echo 1000 > 
> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
>   echo 1000 > 
> /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
>   mount -t hugetlbfs none /mnt/huge
>   make test
> 
> TSO is validated with IPv4 and IPv6 with testpmd (see the commit log of
> last patch for details).
> 
> The performance non-regression has been tested with 6WINDGate fast path.
> 
> Note: this patches may conflict with patch [2] which is pushed yet, but
> will probably be integrated before this series.
> 
> [1] http://dpdk.org/ml/archives/dev/2013-October/thread.html#572
> [2] http://dpdk.org/ml/archives/dev/2014-April/002166.html
> 

I would also like to propose changing the checksum offload flags.
Many devices can indicate good checksum in some cases but can't test
for many other types of packets. By changing the flags to be:
 PKT_RX_L4_CKSUM_GOOD and PKT_RX_IP_CKSUM_GOOD

It is then possible to support devices where some cases (IPv4 + TCP)
are supported but others are not.

This also better aligns with Linux checksum code for cases where mbuf
and meta data are being passed into kernel.



[dpdk-dev] [PATCH] malloc: fix rte_free run time in O(n) free blocks

2014-05-09 Thread Sanford, Robert
Hi Thomas,

>Some patches like this one are not yet reviewed because efforts were
>focused 
>on release 1.6.0r2. This enhancement must be integrated in 1.7.0.
>I know that patchwork service is desired and I hope it will be available
>soon.

I realized that you guys had been very busy with 1.6.0r2. I just wanted to
make
sure that lower-priority patches didn't fall through the cracks.

>By the way, looking at librte_malloc, it seems implementation of lists
>could
>be simpler. Don't you think we could improve (in another patch) this
>whole 
>code by using BSD macros for lists?

Yes, I was surprised to find the malloc code not using any kind of list
functions/macros. I am willing to rework the patch. By BSD list macros, I
believe you are referring to QUEUE(3) and sys/queue.h. It that right?

Thanks,
Robert