[dpdk-dev] [PATCH RFC 00/11] ixgbe/mbuf: add TSO support
Hi Stephen, On 05/09/2014 07:04 PM, Stephen Hemminger wrote: > I would also like to propose changing the checksum offload flags. > Many devices can indicate good checksum in some cases but can't test > for many other types of packets. By changing the flags to be: > PKT_RX_L4_CKSUM_GOOD and PKT_RX_IP_CKSUM_GOOD > > It is then possible to support devices where some cases (IPv4 + TCP) > are supported but others are not. I agree. That's also what I'm talking about in the commit log of the patch 08/11. If there is not much rework for all the patches, I think it's feasible to include this kind of modification in the v2 of this series. Regards, Olivier
[dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield
Hi Jeff, Thank you for your comment. On 05/09/2014 05:39 PM, Shaw, Jeffrey B wrote: > have you tested this patch to see if there is a negative impact to > performance? Yes, but not with testpmd. I passed our internal non-regression performance tests and it shows no difference (or below the error margin), even with low overhead processing like forwarding whatever the number of cores I use. > Wouldn't the processor have to mask the high bytes of the physical > address when it is used, for example, to populate descriptors with > buffer addresses? When compute bound, this could steal CPU cycles > away from packet processing. I think we should understand the > performance trade-off in order to save these 2 bytes. I would naively say that the cost is negligible: accessing to the length is the same as before (it's a 16 bits field) and accessing the physical address is just a mask or a shift, which should not be very long on an Intel processor (1 cycle?). This is to be compared with the number of cycles per packet in io-fwd mode, which is probably around 150 or 200. > It would be interesting to see how throughput is impacted when the > workload is core-bound. This could be accomplished by running testpmd > in io-fwd mode across 4x 10G ports. I agree, this is something we could check. If you agree, let's first wait for some other comments and see if we find a consensus on the patches. Regards, Olivier
[dpdk-dev] [PATCH RFC 00/11] ixgbe/mbuf: add TSO support
On Fri, 09 May 2014 23:49:45 +0200 Olivier MATZ wrote: > Hi Stephen, > > On 05/09/2014 07:04 PM, Stephen Hemminger wrote: > > I would also like to propose changing the checksum offload flags. > > Many devices can indicate good checksum in some cases but can't test > > for many other types of packets. By changing the flags to be: > > PKT_RX_L4_CKSUM_GOOD and PKT_RX_IP_CKSUM_GOOD > > > > It is then possible to support devices where some cases (IPv4 + TCP) > > are supported but others are not. > > I agree. That's also what I'm talking about in the commit log of > the patch 08/11. > > If there is not much rework for all the patches, I think it's feasible > to include this kind of modification in the v2 of this series. > > Regards, > Olivier > There are three checksum states: 1. Known good 2. Known bad 3. Can't tell Current choice of flags makes handling #3 impossible. If you change it to CKSUM_GOOD then 1 => GOOD, 2 => not GOOD, 3 => not GOOD. And for case #3 the software can validate it. For most cases IP checksum offload is meaning less anyway because the IP header fits in a single cache line, and the cost to checksum is minimal.
[dpdk-dev] [PATCH RFC 11/11] ixgbe/mbuf: add TSO support
Implement TSO (TCP segmentation offload) in ixgbe driver. To delegate the TCP segmentation to the hardware, the user has to: - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM) - fill the mbuf->hw_offload information: l2_len, l3_len, l4_len, mss - calculate the pseudo header checksum and set it in the TCP header, as required when doing hardware TCP checksum offload - set the IP checksum to 0 This approach seems generic enough to be used for other hw/drivers in the future. In the patch, the tx_desc_cksum_flags_to_olinfo() and tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them clearer. This does not impact performance as gcc (version 4.8 in my case) is smart enough to convert the tests into a code that does not contain any branch instruction. validation == platform: Tester (linux) <> DUT (DPDK) Run testpmd on DUT: cd dpdk.org/ make install T=x86_64-default-linuxapp-gcc cd x86_64-default-linuxapp-gcc/ modprobe uio insmod kmod/igb_uio.ko python ../tools/igb_uio_bind.py -b igb_uio :02:00.0 echo 0 > /proc/sys/kernel/randomize_va_space echo 1000 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages echo 1000 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages mount -t hugetlbfs none /mnt/huge ./app/testpmd -c 0x55 -n 4 -m 800 -- -i --port-topology=chained Disable all offload feature on Tester, and start capture: ethtool -K ixgbe0 rx off tx off tso off gso off gro off lro off ip l set ixgbe0 up tcpdump -n -e -i ixgbe0 -s 0 -w /tmp/cap We use the following scapy script for testing: def test(): ### IPv4 # checksum TCP p=Ether()/IP(src=RandIP(), dst=RandIP())/TCP(flags=0x10)/Raw(RandString(50)) sendp(p, iface="ixgbe0", count=5) # checksum UDP p=Ether()/IP(src=RandIP(), dst=RandIP())/UDP()/Raw(RandString(50)) sendp(p, iface="ixgbe0", count=5) # bad IP checksum p=Ether()/IP(src=RandIP(), dst=RandIP(), chksum=0x1234)/TCP(flags=0x10)/Raw(RandString(50)) sendp(p, iface="ixgbe0", count=5) # bad TCP checksum p=Ether()/IP(src=RandIP(), dst=RandIP())/TCP(flags=0x10, chksum=0x1234)/Raw(RandString(50)) sendp(p, iface="ixgbe0", count=5) # large packet p=Ether()/IP(src=RandIP(), dst=RandIP())/TCP(flags=0x10)/Raw(RandString(1400)) sendp(p, iface="ixgbe0", count=5) ### IPv6v6 # checksum TCP p=Ether()/IPv6(src=RandIP6(), dst=RandIP6())/TCP(flags=0x10)/Raw(RandString(50)) sendp(p, iface="ixgbe0", count=5) # checksum UDP p=Ether()/IPv6(src=RandIP6(), dst=RandIP6())/UDP()/Raw(RandString(50)) sendp(p, iface="ixgbe0", count=5) # bad TCP checksum p=Ether()/IPv6(src=RandIP6(), dst=RandIP6())/TCP(flags=0x10, chksum=0x1234)/Raw(RandString(50)) sendp(p, iface="ixgbe0", count=5) # large packet p=Ether()/IPv6(src=RandIP6(), dst=RandIP6())/TCP(flags=0x10)/Raw(RandString(1400)) sendp(p, iface="ixgbe0", count=5) Without hw cksum On DUT: # disable hw cksum (use sw) in csumonly test, disable tso stop set fwd csum tx_checksum set 0x0 0 tso set 0 0 start On tester: >>> test() Then check the capture file. With hw cksum - On DUT: # enable hw cksum in csumonly test, disable tso stop set fwd csum tx_checksum set 0xf 0 tso set 0 0 start On tester: >>> test() Then check the capture file. With TSO On DUT: set fwd csum tx_checksum set 0xf 0 tso set 800 0 start On tester: >>> test() Then check the capture file. Signed-off-by: Olivier Matz --- app/test-pmd/cmdline.c| 45 +++ app/test-pmd/config.c | 8 ++ app/test-pmd/csumonly.c | 16 app/test-pmd/testpmd.h| 2 + lib/librte_mbuf/rte_mbuf.h| 7 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 165 -- 6 files changed, 200 insertions(+), 43 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index a95b279..c628773 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -2305,6 +2305,50 @@ cmdline_parse_inst_t cmd_tx_cksum_set = { }, }; +/* *** ENABLE HARDWARE SEGMENTATION IN TX PACKETS *** */ +struct cmd_tso_set_result { + cmdline_fixed_string_t tso; + cmdline_fixed_string_t set; + uint16_t mss; + uint8_t port_id; +}; + +static void +cmd_tso_set_parsed(void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_tso_set_result *res = parsed_result; + tso_set(res->port_id, res->mss); +} + +cmdline_parse_token_string_t cmd_tso_set_tso = + TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result, + tso, "tso"); +cmdline_parse_token_string_t cmd_tso_set_set = + TOKEN_STRING_INITIALIZER(
[dpdk-dev] [PATCH RFC 10/11] testpmd: modify source address to validate checksum calculation
Always modify the source address of the packet in order to validate the calculation of the checksums (L3 or L4). This was already done for IPv4 software checksum, add it for IPv4 hw checksum and IPv6. Signed-off-by: Olivier Matz --- app/test-pmd/csumonly.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index 9caad8f..e93d75f 100644 --- a/app/test-pmd/csumonly.c +++ b/app/test-pmd/csumonly.c @@ -310,6 +310,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) if (tx_ol_flags & PKT_TX_IP_CKSUM) { /* HW checksum */ + ipv4_hdr->src_addr--; ol_flags |= PKT_TX_IP_CKSUM; } else { @@ -373,6 +374,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) unsigned char *) + l2_len); l3_len = sizeof(struct ipv6_hdr) ; l4_proto = ipv6_hdr->proto; + ipv6_hdr->src_addr[3]--; if (l4_proto == IPPROTO_UDP) { udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb, -- 1.9.2
[dpdk-dev] [PATCH RFC 09/11] mbuf: rename vlan_macip_len in hw_offload and increase its size
To implement the TCP segmentation offload, we will need to add some more meta information in the mbuf, like the length of the L4 header, the MSS, ... To prepare this modification, this patch renames vlan_macip_len in hw_offload and change its length from 32 bits to 64 bits. Signed-off-by: Olivier Matz --- app/test-pmd/csumonly.c | 4 +-- app/test-pmd/macfwd.c | 6 ++-- app/test-pmd/rxonly.c | 2 +- app/test-pmd/testpmd.c| 2 +- app/test-pmd/txonly.c | 6 ++-- examples/ip_reassembly/ipv4_rsmbl.h | 10 +++ examples/ip_reassembly/main.c | 4 +-- lib/librte_mbuf/rte_mbuf.h| 34 ++--- lib/librte_pmd_e1000/em_rxtx.c| 50 +-- lib/librte_pmd_e1000/igb_rxtx.c | 56 --- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 54 +++-- lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 3 +- lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 4 +-- 13 files changed, 126 insertions(+), 109 deletions(-) diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index 69b90a7..9caad8f 100644 --- a/app/test-pmd/csumonly.c +++ b/app/test-pmd/csumonly.c @@ -430,8 +430,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) } /* Combine the packet header write. VLAN is not consider here */ - mb->vlan_macip.f.l2_len = l2_len; - mb->vlan_macip.f.l3_len = l3_len; + mb->hw_offload.l2_len = l2_len; + mb->hw_offload.l3_len = l3_len; mb->ol_flags = ol_flags; } nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx); diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c index ab74d0c..d137f92 100644 --- a/app/test-pmd/macfwd.c +++ b/app/test-pmd/macfwd.c @@ -116,9 +116,9 @@ pkt_burst_mac_forward(struct fwd_stream *fs) ether_addr_copy(&ports[fs->tx_port].eth_addr, ð_hdr->s_addr); mb->ol_flags = txp->tx_ol_flags; - mb->vlan_macip.f.l2_len = sizeof(struct ether_hdr); - mb->vlan_macip.f.l3_len = sizeof(struct ipv4_hdr); - mb->vlan_macip.f.vlan_tci = txp->tx_vlan_id; + mb->hw_offload.l2_len = sizeof(struct ether_hdr); + mb->hw_offload.l3_len = sizeof(struct ipv4_hdr); + mb->hw_offload.vlan_tci = txp->tx_vlan_id; } nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx); fs->tx_packets += nb_tx; diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index 0bf4440..6283482 100644 --- a/app/test-pmd/rxonly.c +++ b/app/test-pmd/rxonly.c @@ -149,7 +149,7 @@ pkt_burst_receive(struct fwd_stream *fs) mb->hash.fdir.hash, mb->hash.fdir.id); if (ol_flags & PKT_RX_VLAN_PKT) printf(" - VLAN tci=0x%x", - mb->vlan_macip.f.vlan_tci); + mb->hw_offload.vlan_tci); printf("\n"); if (ol_flags != 0) { uint32_t rxf; diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 572c3aa..3085be5 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -397,7 +397,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp, mb->ol_flags = 0; mb->data_off = RTE_PKTMBUF_HEADROOM; mb->nb_segs = 1; - mb->vlan_macip.data = 0; + mb->hw_offload.u64 = 0; mb->hash.rss = 0; } diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index 5d93209..97e381a 100644 --- a/app/test-pmd/txonly.c +++ b/app/test-pmd/txonly.c @@ -264,9 +264,9 @@ pkt_burst_transmit(struct fwd_stream *fs) pkt->nb_segs = tx_pkt_nb_segs; pkt->pkt_len = tx_pkt_length; pkt->ol_flags = ol_flags; - pkt->vlan_macip.f.vlan_tci = vlan_tci; - pkt->vlan_macip.f.l2_len = sizeof(struct ether_hdr); - pkt->vlan_macip.f.l3_len = sizeof(struct ipv4_hdr); + pkt->hw_offload.vlan_tci = vlan_tci; + pkt->hw_offload.l2_len = sizeof(struct ether_hdr); + pkt->hw_offload.l3_len = sizeof(struct ipv4_hdr); pkts_burst[nb_pkt] = pkt; } nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt); diff --git a/examples/ip_reassembly/ipv4_rsmbl.h b/examples/ip_reassembly/ipv4_rsmbl.h index 9b647fb..c653993 100644 --- a/examples/ip_reassembly/ipv4_rsmbl.h +++ b/examples/ip_reassembly/ipv4_rsmbl.h @@ -168,8 +168,8 @@ ipv4_frag_chain(struct rte_mbuf *mn, struct rte_mbuf *mp) struct rte_mbuf *ms; /* adjust start of the last fragment data. */ - rte_pktmbuf_adj(mp, (uint16_t)(mp->vlan_macip.f.l2_len + - mp->vlan_macip.f.l3_len)); + rte_pktmbuf_adj(mp
[dpdk-dev] [PATCH RFC 08/11] mbuf: change ol_flags to 32 bits
There is no room to add other offload flags in the current 16 bits fields. Since we have more room in the mbuf structure, we can change the ol_flags to 32 bits. A next commit will add the support of TSO (TCP Segmentation Offload) which require a new ol_flags, justifying this commit. Thanks to this modification, another possible improvement (which is not part of this series) could be to change the checksum flags from: PKT_RX_L4_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD to: PKT_RX_L4_CKSUM, PKT_RX_IP_CKSUM, PKT_RX_L4_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD in order to detect if the checksum has been processed by hw or not. Signed-off-by: Olivier Matz --- app/test-pmd/cmdline.c | 13 +++- app/test-pmd/config.c | 10 +-- app/test-pmd/csumonly.c| 26 app/test-pmd/rxonly.c | 4 +- app/test-pmd/testpmd.h | 11 +--- app/test-pmd/txonly.c | 2 +- .../bsdapp/eal/include/exec-env/rte_kni_common.h | 2 +- .../linuxapp/eal/include/exec-env/rte_kni_common.h | 2 +- lib/librte_mbuf/rte_mbuf.c | 2 +- lib/librte_mbuf/rte_mbuf.h | 52 +++ lib/librte_pmd_e1000/em_rxtx.c | 35 +- lib/librte_pmd_e1000/igb_rxtx.c| 71 ++-- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 77 +++--- lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 2 +- 14 files changed, 157 insertions(+), 152 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index c507c46..a95b279 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -2264,8 +2264,17 @@ cmd_tx_cksum_set_parsed(void *parsed_result, __attribute__((unused)) void *data) { struct cmd_tx_cksum_set_result *res = parsed_result; - - tx_cksum_set(res->port_id, res->cksum_mask); + uint32_t ol_flags = 0; + + if (res->cksum_mask & 0x1) + ol_flags |= PKT_TX_IP_CKSUM; + if (res->cksum_mask & 0x2) + ol_flags |= PKT_TX_TCP_CKSUM; + if (res->cksum_mask & 0x4) + ol_flags |= PKT_TX_UDP_CKSUM; + if (res->cksum_mask & 0x8) + ol_flags |= PKT_TX_SCTP_CKSUM; + tx_cksum_set(res->port_id, ol_flags); } cmdline_parse_token_string_t cmd_tx_cksum_set_tx_cksum = diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 1feb133..cd82f60 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -1442,14 +1442,16 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t map_value) } void -tx_cksum_set(portid_t port_id, uint8_t cksum_mask) +tx_cksum_set(portid_t port_id, uint32_t ol_flags) { - uint16_t tx_ol_flags; + uint32_t cksum_mask = PKT_TX_IP_CKSUM | PKT_TX_L4_MASK; + if (port_id_is_invalid(port_id)) return; + /* Clear last 4 bits and then set L3/4 checksum mask again */ - tx_ol_flags = (uint16_t) (ports[port_id].tx_ol_flags & 0xFFF0); - ports[port_id].tx_ol_flags = (uint16_t) ((cksum_mask & 0xf) | tx_ol_flags); + ports[port_id].tx_ol_flags &= ~cksum_mask; + ports[port_id].tx_ol_flags |= (ol_flags & cksum_mask); } void diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index 3313b87..69b90a7 100644 --- a/app/test-pmd/csumonly.c +++ b/app/test-pmd/csumonly.c @@ -217,9 +217,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) uint16_t nb_rx; uint16_t nb_tx; uint16_t i; - uint16_t ol_flags; - uint16_t pkt_ol_flags; - uint16_t tx_ol_flags; + uint32_t ol_flags; + uint32_t pkt_ol_flags; + uint32_t tx_ol_flags; uint16_t l4_proto; uint16_t eth_type; uint8_t l2_len; @@ -261,7 +261,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) mb = pkts_burst[i]; l2_len = sizeof(struct ether_hdr); pkt_ol_flags = mb->ol_flags; - ol_flags = (uint16_t) (pkt_ol_flags & (~PKT_TX_L4_MASK)); + ol_flags = (pkt_ol_flags & (~PKT_TX_L4_MASK)); eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); eth_type = rte_be_to_cpu_16(eth_hdr->ether_type); @@ -274,8 +274,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) } /* Update the L3/L4 checksum error packet count */ - rx_bad_ip_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0); - rx_bad_l4_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0); + rx_bad_ip_csum += ((pkt_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0); + rx_bad_l4_csum += ((pkt_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0); /* * Try to figure out L3 packet type by SW. @@ -308,7 +308,7 @@ pkt_burst_checksum_forward
[dpdk-dev] [PATCH RFC 07/11] mbuf: add functions to get the name of an ol_flag
In test-pmd (rxonly.c), the code is able to dump the list of ol_flags. The issue is that the list of flags in the application has to be synchronized with the flags defined in rte_mbuf.h. This patch introduces 2 new functions rte_get_rx_ol_flag_name() and rte_get_tx_ol_flag_name() that returns the name of a flag from its mask. It also fixes rxonly.c to use this new functions and to display the proper flags. Signed-off-by: Olivier Matz --- app/test-pmd/rxonly.c | 33 ++--- lib/librte_mbuf/rte_mbuf.h | 46 -- 2 files changed, 54 insertions(+), 25 deletions(-) diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index 5751b0b..94f71c7 100644 --- a/app/test-pmd/rxonly.c +++ b/app/test-pmd/rxonly.c @@ -69,23 +69,6 @@ #include "testpmd.h" -#define MAX_PKT_RX_FLAGS 11 -static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = { - "VLAN_PKT", - "RSS_HASH", - "PKT_RX_FDIR", - "IP_CKSUM", - "IP_CKSUM_BAD", - - "IPV4_HDR", - "IPV4_HDR_EXT", - "IPV6_HDR", - "IPV6_HDR_EXT", - - "IEEE1588_PTP", - "IEEE1588_TMST", -}; - static inline void print_ether_addr(const char *what, struct ether_addr *eth_addr) { @@ -169,12 +152,16 @@ pkt_burst_receive(struct fwd_stream *fs) mb->vlan_macip.f.vlan_tci); printf("\n"); if (ol_flags != 0) { - int rxf; - - for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) { - if (ol_flags & (1 << rxf)) - printf(" PKT_RX_%s\n", - pkt_rx_flag_names[rxf]); + uint16_t rxf; + const char *name; + + for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) { + if ((ol_flags & (1 << rxf)) == 0) + continue; + name = rte_get_rx_ol_flag_name(1 << rxf); + if (name == NULL) + continue; + printf(" %s\n", name); } } rte_pktmbuf_free(mb); diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 8fa781b..55a993a 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -99,9 +99,51 @@ extern "C" { #define PKT_TX_IEEE1588_TMST 0x8000 /**< TX IEEE1588 packet to timestamp. */ /** - * Bit Mask to indicate what bits required for building TX context + * Get the name of a RX offload flag + * + * @param mask + * The mask describing the flag (only one bit must be set) + * @return + * The name of this flag, or NULL if it's not a valid RX flag. */ -#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | PKT_TX_L4_MASK) +static inline const char *rte_get_rx_ol_flag_name(uint16_t mask) +{ + switch (mask) { + case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT"; + case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH"; + case PKT_RX_FDIR: return "PKT_RX_FDIR"; + case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD"; + case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD"; + case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR"; + case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT"; + case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR"; + case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT"; + case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP"; + case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST"; + default: return NULL; + } +} + +/** + * Get the name of a TX offload flag + * + * @param mask + * The mask describing the flag (only one bit must be set) + * @return + * The name of this flag, or NULL if it's not a valid TX flag. + */ +static inline const char *rte_get_tx_ol_flag_name(uint16_t mask) +{ + switch (mask) { + case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT"; + case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM"; + case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM"; + case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM"; + case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM"; + case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST"; + default: return NULL; + } +} /** Offload features */ union rte_vlan_macip { -- 1.9.2
[dpdk-dev] [PATCH RFC 06/11] mbuf: replace data pointer by an offset
The mbuf structure already contains a pointer to the beginning of the buffer (m->buf_addr). It is not needed to use 8 bytes again to store another pointer to the beginning of the data. Using a 16 bits unsigned integer is enough as we know that a mbuf is never longer than 64KB. We gain 6 bytes in the structure thanks to this modification. Signed-off-by: Olivier Matz --- app/test-pmd/csumonly.c | 2 +- app/test-pmd/macfwd-retry.c | 2 +- app/test-pmd/macfwd.c | 2 +- app/test-pmd/rxonly.c | 2 +- app/test-pmd/testpmd.c| 2 +- app/test-pmd/txonly.c | 7 ++-- app/test/test_mbuf.c | 6 ++-- examples/exception_path/main.c| 3 +- examples/vhost/main.c | 21 +++- examples/vhost_xen/main.c | 2 +- lib/librte_mbuf/rte_mbuf.c| 7 ++-- lib/librte_mbuf/rte_mbuf.h| 62 --- lib/librte_pmd_e1000/em_rxtx.c| 12 +++ lib/librte_pmd_e1000/igb_rxtx.c | 13 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 13 lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 3 +- lib/librte_pmd_virtio/virtio_rxtx.c | 2 +- lib/librte_pmd_virtio/virtqueue.h | 5 ++- lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 5 ++- 19 files changed, 85 insertions(+), 86 deletions(-) diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index ee82eb6..3313b87 100644 --- a/app/test-pmd/csumonly.c +++ b/app/test-pmd/csumonly.c @@ -263,7 +263,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) pkt_ol_flags = mb->ol_flags; ol_flags = (uint16_t) (pkt_ol_flags & (~PKT_TX_L4_MASK)); - eth_hdr = (struct ether_hdr *) mb->data; + eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); eth_type = rte_be_to_cpu_16(eth_hdr->ether_type); if (eth_type == ETHER_TYPE_VLAN) { /* Only allow single VLAN label here */ diff --git a/app/test-pmd/macfwd-retry.c b/app/test-pmd/macfwd-retry.c index 687ff8d..7749c9e 100644 --- a/app/test-pmd/macfwd-retry.c +++ b/app/test-pmd/macfwd-retry.c @@ -119,7 +119,7 @@ pkt_burst_mac_retry_forward(struct fwd_stream *fs) fs->rx_packets += nb_rx; for (i = 0; i < nb_rx; i++) { mb = pkts_burst[i]; - eth_hdr = (struct ether_hdr *) mb->data; + eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); ether_addr_copy(&peer_eth_addrs[fs->peer_addr], ð_hdr->d_addr); ether_addr_copy(&ports[fs->tx_port].eth_addr, diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c index 8d7612c..ab74d0c 100644 --- a/app/test-pmd/macfwd.c +++ b/app/test-pmd/macfwd.c @@ -110,7 +110,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs) txp = &ports[fs->tx_port]; for (i = 0; i < nb_rx; i++) { mb = pkts_burst[i]; - eth_hdr = (struct ether_hdr *) mb->data; + eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); ether_addr_copy(&peer_eth_addrs[fs->peer_addr], ð_hdr->d_addr); ether_addr_copy(&ports[fs->tx_port].eth_addr, diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index b77c8ce..5751b0b 100644 --- a/app/test-pmd/rxonly.c +++ b/app/test-pmd/rxonly.c @@ -149,7 +149,7 @@ pkt_burst_receive(struct fwd_stream *fs) rte_pktmbuf_free(mb); continue; } - eth_hdr = (struct ether_hdr *) mb->data; + eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type); ol_flags = mb->ol_flags; print_ether_addr(" src=", ð_hdr->s_addr); diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 1964020..572c3aa 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -395,7 +395,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp, mb_ctor_arg->seg_buf_offset); mb->buf_len = mb_ctor_arg->seg_buf_size; mb->ol_flags = 0; - mb->data = (char *) mb->buf_addr + RTE_PKTMBUF_HEADROOM; + mb->data_off = RTE_PKTMBUF_HEADROOM; mb->nb_segs = 1; mb->vlan_macip.data = 0; mb->hash.rss = 0; diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index 3baa0c8..c28f3dd 100644 --- a/app/test-pmd/txonly.c +++ b/app/test-pmd/txonly.c @@ -111,13 +111,13 @@ copy_buf_to_pkt_segs(void* buf, unsigned len, struct rte_mbuf *pkt, seg = seg->next; } copy_len = seg->data_len - offset; - seg_buf = ((char *) seg->data + offset); + seg_buf = (rte_pktmbuf_mtod(seg, char *) + offset); while (len > copy_len) { rte_memcpy(seg_buf, buf, (size_t) copy_len);
[dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield
The physical address is never greater than (1 << 48) = 256 TB. We can win 2 bytes in the mbuf structure by merging the physical address and the buffer length in the same bitfield. Signed-off-by: Olivier Matz --- lib/librte_mbuf/rte_mbuf.c | 3 ++- lib/librte_mbuf/rte_mbuf.h | 7 --- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index c229525..9879095 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -104,7 +104,8 @@ rte_pktmbuf_init(struct rte_mempool *mp, m->buf_len = (uint16_t)buf_len; /* keep some headroom between start of buffer and data */ - m->data = (char*) m->buf_addr + RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); + m->data = (char*) m->buf_addr + RTE_MIN(RTE_PKTMBUF_HEADROOM, + (uint16_t)m->buf_len); /* init some constant fields */ m->pool = mp; diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 803b223..275f6b2 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -130,8 +130,8 @@ union rte_vlan_macip { struct rte_mbuf { struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */ void *buf_addr; /**< Virtual address of segment buffer. */ - phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */ - uint16_t buf_len; /**< Length of segment buffer. */ + uint64_t buf_physaddr:48; /**< Physical address of segment buffer. */ + uint64_t buf_len:16; /**< Length of segment buffer. */ #ifdef RTE_MBUF_REFCNT /** * 16-bit Reference counter. @@ -148,8 +148,9 @@ struct rte_mbuf { #else uint16_t refcnt_reserved; /**< Do not use this field */ #endif - uint16_t reserved; /**< Unused field. Required for padding. */ + uint16_t ol_flags;/**< Offload features. */ + uint32_t reserved; /**< Unused field. Required for padding. */ /* valid for any segment */ struct rte_mbuf *next; /**< Next segment of scattered packet. */ -- 1.9.2
[dpdk-dev] [PATCH RFC 04/11] mbuf: remove the rte_pktmbuf structure
The rte_pktmbuf structure was initially included in the rte_mbuf structure. This was needed when there was 2 types of mbuf (ctrl and packet). As the control mbuf has been removed, we can merge the rte_pktmbuf into the rte_mbuf structure. Advantages of doing this: - the access to mbuf fields is easier (ex: m->data instead of m->pkt.data) - make the structure more consistent: for instance, there was no reason to have the ol_flags field in rte_mbuf - it will allow a deeper reorganization of the rte_mbuf structure in the next commits, allowing to gain several bytes in it Signed-off-by: Olivier Matz --- app/test-pmd/cmdline.c | 1 - app/test-pmd/csumonly.c| 6 +- app/test-pmd/ieee1588fwd.c | 6 +- app/test-pmd/macfwd-retry.c| 2 +- app/test-pmd/macfwd.c | 8 +- app/test-pmd/rxonly.c | 12 +- app/test-pmd/testpmd.c | 8 +- app/test-pmd/testpmd.h | 2 +- app/test-pmd/txonly.c | 42 +++ app/test/commands.c| 1 - app/test/test_mbuf.c | 12 +- app/test/test_sched.c | 4 +- examples/dpdk_qat/crypto.c | 22 ++-- examples/dpdk_qat/main.c | 2 +- examples/exception_path/main.c | 10 +- examples/ip_reassembly/ipv4_rsmbl.h| 20 +-- examples/ip_reassembly/main.c | 6 +- examples/ipv4_frag/main.c | 4 +- examples/ipv4_frag/rte_ipv4_frag.h | 42 +++ examples/ipv4_multicast/main.c | 14 +-- examples/l3fwd-power/main.c| 2 +- examples/l3fwd-vf/main.c | 2 +- examples/l3fwd/main.c | 10 +- examples/load_balancer/runtime.c | 2 +- .../client_server_mp/mp_client/client.c| 2 +- examples/quota_watermark/qw/main.c | 4 +- examples/vhost/main.c | 22 ++-- examples/vhost_xen/main.c | 22 ++-- lib/librte_mbuf/rte_mbuf.c | 26 ++-- lib/librte_mbuf/rte_mbuf.h | 140 ++--- lib/librte_pmd_e1000/em_rxtx.c | 64 +- lib/librte_pmd_e1000/igb_rxtx.c| 68 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 100 +++ lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 2 +- lib/librte_pmd_pcap/rte_eth_pcap.c | 14 +-- lib/librte_pmd_virtio/virtio_rxtx.c| 16 +-- lib/librte_pmd_virtio/virtqueue.h | 6 +- lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 26 ++-- lib/librte_pmd_xenvirt/rte_eth_xenvirt.c | 12 +- lib/librte_pmd_xenvirt/virtqueue.h | 4 +- lib/librte_sched/rte_sched.c | 14 +-- lib/librte_sched/rte_sched.h | 10 +- 42 files changed, 394 insertions(+), 398 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index e3d1849..c507c46 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -5009,7 +5009,6 @@ dump_struct_sizes(void) { #define DUMP_SIZE(t) printf("sizeof(" #t ") = %u\n", (unsigned)sizeof(t)); DUMP_SIZE(struct rte_mbuf); - DUMP_SIZE(struct rte_pktmbuf); DUMP_SIZE(struct rte_mempool); DUMP_SIZE(struct rte_ring); #undef DUMP_SIZE diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index 3568ba0..ee82eb6 100644 --- a/app/test-pmd/csumonly.c +++ b/app/test-pmd/csumonly.c @@ -263,7 +263,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) pkt_ol_flags = mb->ol_flags; ol_flags = (uint16_t) (pkt_ol_flags & (~PKT_TX_L4_MASK)); - eth_hdr = (struct ether_hdr *) mb->pkt.data; + eth_hdr = (struct ether_hdr *) mb->data; eth_type = rte_be_to_cpu_16(eth_hdr->ether_type); if (eth_type == ETHER_TYPE_VLAN) { /* Only allow single VLAN label here */ @@ -430,8 +430,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) } /* Combine the packet header write. VLAN is not consider here */ - mb->pkt.vlan_macip.f.l2_len = l2_len; - mb->pkt.vlan_macip.f.l3_len = l3_len; + mb->vlan_macip.f.l2_len = l2_len; + mb->vlan_macip.f.l3_len = l3_len; mb->ol_flags = ol_flags; } nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx); diff --git a/app/test-pmd/ieee1588fwd.c b/app/test-pmd/ieee1588fwd.c index 44f0a89..4f18183 10
[dpdk-dev] [PATCH RFC 03/11] mbuf: remove rte_ctrlmbuf
The initial role of rte_ctrlmbuf is to carry generic messages (data pointer + data length) but it's not used by the DPDK or it applications. Keeping it implies: - loosing 1 byte in the rte_mbuf structure - having some dead code rte_mbuf.[ch] This patch removes this feature. Thanks to it, it is now possible to simplify the rte_mbuf structure by merging the rte_pktmbuf structure in it. This is done in next commit. Signed-off-by: Olivier Matz --- app/test-pmd/cmdline.c | 1 - app/test-pmd/testpmd.c | 2 - app/test-pmd/txonly.c| 2 +- app/test/commands.c | 1 - app/test/test_mbuf.c | 72 + examples/ipv4_multicast/main.c | 2 +- lib/librte_mbuf/rte_mbuf.c | 65 +++- lib/librte_mbuf/rte_mbuf.h | 175 ++- lib/librte_pmd_e1000/em_rxtx.c | 2 +- lib/librte_pmd_e1000/igb_rxtx.c | 2 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c| 4 +- lib/librte_pmd_virtio/virtio_rxtx.c | 2 +- lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c| 2 +- lib/librte_pmd_xenvirt/rte_eth_xenvirt.c | 2 +- 14 files changed, 54 insertions(+), 280 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 7becedc..e3d1849 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -5010,7 +5010,6 @@ dump_struct_sizes(void) #define DUMP_SIZE(t) printf("sizeof(" #t ") = %u\n", (unsigned)sizeof(t)); DUMP_SIZE(struct rte_mbuf); DUMP_SIZE(struct rte_pktmbuf); - DUMP_SIZE(struct rte_ctrlmbuf); DUMP_SIZE(struct rte_mempool); DUMP_SIZE(struct rte_ring); #undef DUMP_SIZE diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 9c56914..76b3823 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -389,13 +389,11 @@ testpmd_mbuf_ctor(struct rte_mempool *mp, mb_ctor_arg = (struct mbuf_ctor_arg *) opaque_arg; mb = (struct rte_mbuf *) raw_mbuf; - mb->type = RTE_MBUF_PKT; mb->pool = mp; mb->buf_addr = (void *) ((char *)mb + mb_ctor_arg->seg_buf_offset); mb->buf_physaddr = (uint64_t) (rte_mempool_virt2phy(mp, mb) + mb_ctor_arg->seg_buf_offset); mb->buf_len = mb_ctor_arg->seg_buf_size; - mb->type = RTE_MBUF_PKT; mb->ol_flags = 0; mb->pkt.data = (char *) mb->buf_addr + RTE_PKTMBUF_HEADROOM; mb->pkt.nb_segs = 1; diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index 1cf2574..1f066d0 100644 --- a/app/test-pmd/txonly.c +++ b/app/test-pmd/txonly.c @@ -93,7 +93,7 @@ tx_mbuf_alloc(struct rte_mempool *mp) struct rte_mbuf *m; m = __rte_mbuf_raw_alloc(mp); - __rte_mbuf_sanity_check_raw(m, RTE_MBUF_PKT, 0); + __rte_mbuf_sanity_check_raw(m, 0); return (m); } diff --git a/app/test/commands.c b/app/test/commands.c index b145036..c69544b 100644 --- a/app/test/commands.c +++ b/app/test/commands.c @@ -262,7 +262,6 @@ dump_struct_sizes(void) #define DUMP_SIZE(t) printf("sizeof(" #t ") = %u\n", (unsigned)sizeof(t)); DUMP_SIZE(struct rte_mbuf); DUMP_SIZE(struct rte_pktmbuf); - DUMP_SIZE(struct rte_ctrlmbuf); DUMP_SIZE(struct rte_mempool); DUMP_SIZE(struct rte_ring); #undef DUMP_SIZE diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c index fe0f4f6..07b5551 100644 --- a/app/test/test_mbuf.c +++ b/app/test/test_mbuf.c @@ -80,7 +80,6 @@ #define MAKE_STRING(x) # x static struct rte_mempool *pktmbuf_pool = NULL; -static struct rte_mempool *ctrlmbuf_pool = NULL; #if defined RTE_MBUF_REFCNT && defined RTE_MBUF_REFCNT_ATOMIC @@ -272,8 +271,8 @@ test_one_pktmbuf(void) GOTO_FAIL("Buffer should be continuous"); memset(hdr, 0x55, MBUF_TEST_HDR2_LEN); - rte_mbuf_sanity_check(m, RTE_MBUF_PKT, 1); - rte_mbuf_sanity_check(m, RTE_MBUF_PKT, 0); + rte_mbuf_sanity_check(m, 1); + rte_mbuf_sanity_check(m, 0); rte_pktmbuf_dump(m, 0); /* this prepend should fail */ @@ -320,48 +319,6 @@ fail: return -1; } -/* - * test control mbuf - */ -static int -test_one_ctrlmbuf(void) -{ - struct rte_mbuf *m = NULL; - char message[] = "This is a message carried by a ctrlmbuf"; - - printf("Test ctrlmbuf API\n"); - - /* alloc a mbuf */ - - m = rte_ctrlmbuf_alloc(ctrlmbuf_pool); - if (m == NULL) - GOTO_FAIL("Cannot allocate mbuf"); - if (rte_ctrlmbuf_len(m) != 0) - GOTO_FAIL("Bad length"); - - /* set data */ - rte_ctrlmbuf_data(m) = &message; - rte_ctrlmbuf_len(m) = sizeof(message); - - /* read data */ - if (rte_ctrlmbuf_data(m) != message) - GOTO_FAIL("Invalid data pointer"); - if (rte_ctrlmbuf_len(m) != sizeof(message)) - GOTO_FAIL("Inv
[dpdk-dev] [PATCH RFC 02/11] mbuf: rename RTE_MBUF_SCATTER_GATHER into RTE_MBUF_REFCNT
It seems that RTE_MBUF_SCATTER_GATHER is not the proper name for the feature it provides. "Scatter gather" means that data is stored using several buffers. RTE_MBUF_REFCNT seems to be a better name for that feature as it provides a reference counter for mbufs. The macro RTE_MBUF_SCATTER_GATHER is poisoned to ensure this modification is seen by drivers or applications using it. Signed-off-by: Olivier Matz --- app/test/test_mbuf.c | 16 +++--- config/defconfig_i686-default-linuxapp-gcc | 2 +- config/defconfig_i686-default-linuxapp-icc | 2 +- config/defconfig_x86_64-default-bsdapp-gcc | 2 +- config/defconfig_x86_64-default-linuxapp-gcc | 2 +- config/defconfig_x86_64-default-linuxapp-icc | 2 +- doc/doxy-api.conf| 2 +- examples/ipv4_frag/Makefile | 4 ++-- examples/ipv4_multicast/Makefile | 4 ++-- lib/librte_mbuf/rte_mbuf.c | 2 +- lib/librte_mbuf/rte_mbuf.h | 31 +++- 11 files changed, 36 insertions(+), 33 deletions(-) diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c index f443734..fe0f4f6 100644 --- a/app/test/test_mbuf.c +++ b/app/test/test_mbuf.c @@ -82,7 +82,7 @@ static struct rte_mempool *pktmbuf_pool = NULL; static struct rte_mempool *ctrlmbuf_pool = NULL; -#if defined RTE_MBUF_SCATTER_GATHER && defined RTE_MBUF_REFCNT_ATOMIC +#if defined RTE_MBUF_REFCNT && defined RTE_MBUF_REFCNT_ATOMIC static struct rte_mempool *refcnt_pool = NULL; static struct rte_ring *refcnt_mbuf_ring = NULL; @@ -365,7 +365,7 @@ fail: static int testclone_testupdate_testdetach(void) { -#ifndef RTE_MBUF_SCATTER_GATHER +#ifndef RTE_MBUF_REFCNT return 0; #else struct rte_mbuf *mc = NULL; @@ -406,7 +406,7 @@ fail: if (mc) rte_pktmbuf_free(mc); return -1; -#endif /* RTE_MBUF_SCATTER_GATHER */ +#endif /* RTE_MBUF_REFCNT */ } #undef GOTO_FAIL @@ -439,7 +439,7 @@ test_pktmbuf_pool(void) printf("Error pool not empty"); ret = -1; } -#ifdef RTE_MBUF_SCATTER_GATHER +#ifdef RTE_MBUF_REFCNT extra = rte_pktmbuf_clone(m[0], pktmbuf_pool); if(extra != NULL) { printf("Error pool not empty"); @@ -548,11 +548,11 @@ test_pktmbuf_free_segment(void) /* * Stress test for rte_mbuf atomic refcnt. * Implies that: - * RTE_MBUF_SCATTER_GATHER and RTE_MBUF_REFCNT_ATOMIC are both defined. + * RTE_MBUF_REFCNT and RTE_MBUF_REFCNT_ATOMIC are both defined. * For more efficency, recomended to run with RTE_LIBRTE_MBUF_DEBUG defined. */ -#if defined RTE_MBUF_SCATTER_GATHER && defined RTE_MBUF_REFCNT_ATOMIC +#if defined RTE_MBUF_REFCNT && defined RTE_MBUF_REFCNT_ATOMIC static int test_refcnt_slave(__attribute__((unused)) void *arg) @@ -657,7 +657,7 @@ test_refcnt_master(void) static int test_refcnt_mbuf(void) { -#if defined RTE_MBUF_SCATTER_GATHER && defined RTE_MBUF_REFCNT_ATOMIC +#if defined RTE_MBUF_REFCNT && defined RTE_MBUF_REFCNT_ATOMIC unsigned lnum, master, slave, tref; @@ -808,7 +808,7 @@ test_failing_mbuf_sanity_check(void) return -1; } -#ifdef RTE_MBUF_SCATTER_GATHER +#ifdef RTE_MBUF_REFCNT badbuf = *buf; badbuf.refcnt = 0; if (verify_mbuf_check_panics(&badbuf)) { diff --git a/config/defconfig_i686-default-linuxapp-gcc b/config/defconfig_i686-default-linuxapp-gcc index 14bd3d1..dd0f0d0 100644 --- a/config/defconfig_i686-default-linuxapp-gcc +++ b/config/defconfig_i686-default-linuxapp-gcc @@ -235,7 +235,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n # CONFIG_RTE_LIBRTE_MBUF=y CONFIG_RTE_LIBRTE_MBUF_DEBUG=n -CONFIG_RTE_MBUF_SCATTER_GATHER=y +CONFIG_RTE_MBUF_REFCNT=y CONFIG_RTE_MBUF_REFCNT_ATOMIC=y CONFIG_RTE_PKTMBUF_HEADROOM=128 diff --git a/config/defconfig_i686-default-linuxapp-icc b/config/defconfig_i686-default-linuxapp-icc index ec3386e..ef11051 100644 --- a/config/defconfig_i686-default-linuxapp-icc +++ b/config/defconfig_i686-default-linuxapp-icc @@ -234,7 +234,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n # CONFIG_RTE_LIBRTE_MBUF=y CONFIG_RTE_LIBRTE_MBUF_DEBUG=n -CONFIG_RTE_MBUF_SCATTER_GATHER=y +CONFIG_RTE_MBUF_REFCNT=y CONFIG_RTE_MBUF_REFCNT_ATOMIC=y CONFIG_RTE_PKTMBUF_HEADROOM=128 diff --git a/config/defconfig_x86_64-default-bsdapp-gcc b/config/defconfig_x86_64-default-bsdapp-gcc index d960e1d..f5f2140 100644 --- a/config/defconfig_x86_64-default-bsdapp-gcc +++ b/config/defconfig_x86_64-default-bsdapp-gcc @@ -210,7 +210,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n # CONFIG_RTE_LIBRTE_MBUF=y CONFIG_RTE_LIBRTE_MBUF_DEBUG=n -CONFIG_RTE_MBUF_SCATTER_GATHER=y +CONFIG_RTE_MBUF_REFCNT=y CONFIG_RTE_MBUF_REFCNT_ATOMIC=y CONFIG_RTE_PKTMBUF_HEADROOM=128 diff --git a/config/defconfig_x86_64-default-linuxapp-gcc b/config/defconfig_x86_64-default-linuxapp-gcc index f11ffbf..25a7e1a 100644 --- a/config/defconfig_x86_64-default-linuxapp-gcc +++ b/config/defconfig_x86_64-def
[dpdk-dev] [PATCH RFC 01/11] igb/ixgbe: fix IP checksum calculation
According to Intel? 82599 10 GbE Controller Datasheet (Table 7-38), both L2 and L3 lengths are needed to offload the IP checksum. Note that the e1000 driver does not need to be patched as it already contains the fix. Signed-off-by: Olivier Matz --- lib/librte_pmd_e1000/igb_rxtx.c | 2 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c index 4608595..b3c8149 100644 --- a/lib/librte_pmd_e1000/igb_rxtx.c +++ b/lib/librte_pmd_e1000/igb_rxtx.c @@ -233,7 +233,7 @@ igbe_set_xmit_ctx(struct igb_tx_queue* txq, if (ol_flags & PKT_TX_IP_CKSUM) { type_tucmd_mlhl = E1000_ADVTXD_TUCMD_IPV4; - cmp_mask |= TX_MAC_LEN_CMP_MASK; + cmp_mask |= TX_MACIP_LEN_CMP_MASK; } /* Specify which HW CTX to upload. */ diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c index 55414b9..4e307c2 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c @@ -367,7 +367,7 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq, if (ol_flags & PKT_TX_IP_CKSUM) { type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4; - cmp_mask |= TX_MAC_LEN_CMP_MASK; + cmp_mask |= TX_MACIP_LEN_CMP_MASK; } /* Specify which HW CTX to upload. */ -- 1.9.2
[dpdk-dev] [PATCH RFC 00/11] ixgbe/mbuf: add TSO support
This series add TSO support in ixgbe DPDK driver. As discussed previously on the list [1], one problem is that there is not enough room in rte_mbuf today to store the required information to implement this feature: - a new ol_flag - the MSS - the L4 header len A solution would be to increase the size of the mbuf to 2 cache lines but it could have a bad impact on performance. This series proposes some rework to drastically reduce the size of the rte_mbuf structures before implementing the TSO, avoiding to change the mbuf size to 128 bytes. After the rework of mbuf structures, the size of rte_mbuf structure is reduced by 9 bytes. The implementation of TSO requires to double the size of ol_flags (16 to 32 bits) and to double the size of offload information in order to add the mss and the l4 header length (32 to 64 bits). At the end of the whole series, sizeof(rte_mbuf) is still 64 bytes and 4 bytes are available for future use. This rework causes a lot of modifications in the mbuf structure, implying some changes in the applications that directly use the mbuf structure fields instead of using the API functions (sometimes there is no function). That's why this series is a RFC. In my opinion, it's the proper moment for this evolution as the 1.7.0 window is open. About TSO, the new fields in mbuf try to be generic enough to apply to other hardware in the future. To delegate the TCP segmentation to the hardware, the user has to: - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM) - fill the mbuf->hw_offload information: l2_len, l3_len, l4_len, mss - calculate the pseudo header checksum and set it in the TCP header, as required when doing hardware TCP checksum offload - set the IP checksum to 0 Compilation of DPDK and examples is tested for the following targets: x86_64-*-linuxapp-gcc, i686-*-linuxapp-gcc, x86_64-*-bsdapp-gcc The mbuf rework series is validated with autotests: cd dpdk.org/ make install T=x86_64-default-linuxapp-gcc cd x86_64-default-linuxapp-gcc/ modprobe uio insmod kmod/igb_uio.ko python ../tools/igb_uio_bind.py -b igb_uio :02:00.0 echo 0 > /proc/sys/kernel/randomize_va_space echo 1000 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages echo 1000 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages mount -t hugetlbfs none /mnt/huge make test TSO is validated with IPv4 and IPv6 with testpmd (see the commit log of last patch for details). The performance non-regression has been tested with 6WINDGate fast path. Note: this patches may conflict with patch [2] which is pushed yet, but will probably be integrated before this series. [1] http://dpdk.org/ml/archives/dev/2013-October/thread.html#572 [2] http://dpdk.org/ml/archives/dev/2014-April/002166.html Olivier Matz (11): igb/ixgbe: fix IP checksum calculation mbuf: rename RTE_MBUF_SCATTER_GATHER into RTE_MBUF_REFCNT mbuf: remove rte_ctrlmbuf mbuf: remove the rte_pktmbuf structure mbuf: merge physaddr and buf_len in a bitfield mbuf: replace data pointer by an offset mbuf: add functions to get the name of an ol_flag mbuf: change ol_flags to 32 bits mbuf: rename vlan_macip_len in hw_offload and increase its size testpmd: modify source address to validate checksum calculation ixgbe/mbuf: add TSO support app/test-pmd/cmdline.c | 60 ++- app/test-pmd/config.c | 18 +- app/test-pmd/csumonly.c| 50 ++- app/test-pmd/ieee1588fwd.c | 6 +- app/test-pmd/macfwd-retry.c| 2 +- app/test-pmd/macfwd.c | 8 +- app/test-pmd/rxonly.c | 47 +- app/test-pmd/testpmd.c | 10 +- app/test-pmd/testpmd.h | 15 +- app/test-pmd/txonly.c | 47 +- app/test/commands.c| 2 - app/test/test_mbuf.c | 100 + app/test/test_sched.c | 4 +- config/defconfig_i686-default-linuxapp-gcc | 2 +- config/defconfig_i686-default-linuxapp-icc | 2 +- config/defconfig_x86_64-default-bsdapp-gcc | 2 +- config/defconfig_x86_64-default-linuxapp-gcc | 2 +- config/defconfig_x86_64-default-linuxapp-icc | 2 +- doc/doxy-api.conf | 2 +- examples/dpdk_qat/crypto.c | 22 +- examples/dpdk_qat/main.c | 2 +- examples/exception_path/main.c | 11 +- examples/ip_reassembly/ipv4_rsmbl.h| 20 +- examples/ip_reassembly/main.c | 6 +- examples/ipv4_frag/Makefile| 4 +- examples/ipv4_frag/main.c | 4 +- examples/ipv4_
[dpdk-dev] [PATCH] malloc: fix rte_free run time in O(n) free blocks
2014-05-09 09:24, Sanford, Robert: > Hi Thomas, > > >Some patches like this one are not yet reviewed because efforts were > >focused > >on release 1.6.0r2. This enhancement must be integrated in 1.7.0. > >I know that patchwork service is desired and I hope it will be available > >soon. > > I realized that you guys had been very busy with 1.6.0r2. I just wanted to > make > sure that lower-priority patches didn't fall through the cracks. > > >By the way, looking at librte_malloc, it seems implementation of lists > >could > >be simpler. Don't you think we could improve (in another patch) this > >whole > >code by using BSD macros for lists? > > Yes, I was surprised to find the malloc code not using any kind of list > functions/macros. I am willing to rework the patch. By BSD list macros, I > believe you are referring to QUEUE(3) and sys/queue.h. It that right? Yes I'm referring to QUEUE(3). So I wait for your rework. Thanks -- Thomas
[dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield
I agree, we should wait for comments then test the performance when the patches have settled. -Original Message- From: Olivier MATZ [mailto:olivier.m...@6wind.com] Sent: Friday, May 09, 2014 9:06 AM To: Shaw, Jeffrey B; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield Hi Jeff, Thank you for your comment. On 05/09/2014 05:39 PM, Shaw, Jeffrey B wrote: > have you tested this patch to see if there is a negative impact to > performance? Yes, but not with testpmd. I passed our internal non-regression performance tests and it shows no difference (or below the error margin), even with low overhead processing like forwarding whatever the number of cores I use. > Wouldn't the processor have to mask the high bytes of the physical > address when it is used, for example, to populate descriptors with > buffer addresses? When compute bound, this could steal CPU cycles > away from packet processing. I think we should understand the > performance trade-off in order to save these 2 bytes. I would naively say that the cost is negligible: accessing to the length is the same as before (it's a 16 bits field) and accessing the physical address is just a mask or a shift, which should not be very long on an Intel processor (1 cycle?). This is to be compared with the number of cycles per packet in io-fwd mode, which is probably around 150 or 200. > It would be interesting to see how throughput is impacted when the > workload is core-bound. This could be accomplished by running testpmd > in io-fwd mode across 4x 10G ports. I agree, this is something we could check. If you agree, let's first wait for some other comments and see if we find a consensus on the patches. Regards, Olivier
[dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield
Hello Olivier, have you tested this patch to see if there is a negative impact to performance? Wouldn't the processor have to mask the high bytes of the physical address when it is used, for example, to populate descriptors with buffer addresses? When compute bound, this could steal CPU cycles away from packet processing. I think we should understand the performance trade-off in order to save these 2 bytes. It would be interesting to see how throughput is impacted when the workload is core-bound. This could be accomplished by running testpmd in io-fwd mode across 4x 10G ports. Thanks, Jeff -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier Matz Sent: Friday, May 09, 2014 7:51 AM To: dev at dpdk.org Subject: [dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield The physical address is never greater than (1 << 48) = 256 TB. We can win 2 bytes in the mbuf structure by merging the physical address and the buffer length in the same bitfield. Signed-off-by: Olivier Matz --- lib/librte_mbuf/rte_mbuf.c | 3 ++- lib/librte_mbuf/rte_mbuf.h | 7 --- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index c229525..9879095 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -104,7 +104,8 @@ rte_pktmbuf_init(struct rte_mempool *mp, m->buf_len = (uint16_t)buf_len; /* keep some headroom between start of buffer and data */ - m->data = (char*) m->buf_addr + RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); + m->data = (char*) m->buf_addr + RTE_MIN(RTE_PKTMBUF_HEADROOM, + (uint16_t)m->buf_len); /* init some constant fields */ m->pool = mp; diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 803b223..275f6b2 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -130,8 +130,8 @@ union rte_vlan_macip { struct rte_mbuf { struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */ void *buf_addr; /**< Virtual address of segment buffer. */ - phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */ - uint16_t buf_len; /**< Length of segment buffer. */ + uint64_t buf_physaddr:48; /**< Physical address of segment buffer. */ + uint64_t buf_len:16; /**< Length of segment buffer. */ #ifdef RTE_MBUF_REFCNT /** * 16-bit Reference counter. @@ -148,8 +148,9 @@ struct rte_mbuf { #else uint16_t refcnt_reserved; /**< Do not use this field */ #endif - uint16_t reserved; /**< Unused field. Required for padding. */ + uint16_t ol_flags;/**< Offload features. */ + uint32_t reserved; /**< Unused field. Required for padding. */ /* valid for any segment */ struct rte_mbuf *next; /**< Next segment of scattered packet. */ -- 1.9.2
[dpdk-dev] [PATCH] mk: add missing scripts directory in install directory
Trying to install headers for an external library using DPDK exported makefile rte.extshared.mk results in following error : $ cd dpdk $ make install DESTDIR=/home/marchand/myapp/staging/plop T=x86_64-default-linuxapp-gcc $ cd ~/myapp $ make RTE_SDK=/home/marchand/myapp/staging/plop RTE_TARGET=x86_64-default-linuxapp-gcc CC plop.o LD plop.so SYMLINK-FILE include/plop.h /bin/sh: /home/marchand/myapp/staging/plop/scripts/relpath.sh: No such file or directory ln: `/home/marchand/myapp/build/include' and `./include' are the same file make[1]: *** [/home/marchand/myapp/build/include/plop.h] Error 1 make: *** [all] Error 2 This comes from the fact that DPDK only installs its mk/ directory while some makefiles require the scripts/ directory content as well. So install missing files from scripts/. Signed-off-by: David Marchand --- mk/rte.sdkbuild.mk |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mk/rte.sdkbuild.mk b/mk/rte.sdkbuild.mk index 2975ee4..d4d6c05 100644 --- a/mk/rte.sdkbuild.mk +++ b/mk/rte.sdkbuild.mk @@ -63,7 +63,7 @@ build: $(ROOTDIRS-y) @echo Build complete ifneq ($(DESTDIR),) $(Q)mkdir -p $(DESTDIR) - $(Q)tar -C $(RTE_SDK) -cf - mk | tar -C $(DESTDIR) -x \ + $(Q)tar -C $(RTE_SDK) -cf - mk scripts/*.sh | tar -C $(DESTDIR) -x \ --keep-newer-files --warning=no-ignore-newer -f - $(Q)mkdir -p $(DESTDIR)/`basename $(RTE_OUTPUT)` $(Q)tar -C $(RTE_OUTPUT) -chf - \ -- 1.7.10.4
[dpdk-dev] [PATCH v2] eal: change default per socket memory allocation
From: Didier Pallard Currently, if there is more memory in hugepages than the amount requested by dpdk application, the memory is allocated by taking as much memory as possible from each socket, starting from first one. For example if a system is configured with 8 GB in 2 sockets (4 GB per socket), and dpdk is requesting only 4GB of memory, all memory will be taken in socket 0 (that have exactly 4GB of free hugepages) even if some cores are configured on socket 1, and there are free hugepages on socket 1... Change this behaviour to allocate memory on all sockets where some cores are configured, spreading the memory amongst sockets using following ratio per socket: N? of cores configured on the socket / Total number of configured cores * requested memory This algorithm is used when memory amount is specified globally using -m option. Per socket memory allocation can always be done using --socket-mem option. Changes included in v2: - only update linux implementation as bsd looks not to be ready for numa - if new algorithm fails, then defaults to previous behaviour Signed-off-by: Didier Pallard Signed-off-by: David Marchand --- lib/librte_eal/linuxapp/eal/eal_memory.c | 50 +++--- 1 file changed, 45 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 73a6394..471dcfd 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -881,13 +881,53 @@ calc_num_pages_per_socket(uint64_t * memory, if (num_hp_info == 0) return -1; - for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_mem != 0; socket++) { - /* if specific memory amounts per socket weren't requested */ - if (internal_config.force_sockets == 0) { + /* if specific memory amounts per socket weren't requested */ + if (internal_config.force_sockets == 0) { + int cpu_per_socket[RTE_MAX_NUMA_NODES]; + size_t default_size, total_size; + unsigned lcore_id; + + /* Compute number of cores per socket */ + memset(cpu_per_socket, 0, sizeof(cpu_per_socket)); + RTE_LCORE_FOREACH(lcore_id) { + cpu_per_socket[rte_lcore_to_socket_id(lcore_id)]++; + } + + /* +* Automatically spread requested memory amongst detected sockets according +* to number of cores from cpu mask present on each socket +*/ + total_size = internal_config.memory; + for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 0; socket++) { + + /* Set memory amount per socket */ + default_size = (internal_config.memory * cpu_per_socket[socket]) + / rte_lcore_count(); + + /* Limit to maximum available memory on socket */ + default_size = RTE_MIN(default_size, get_socket_mem_size(socket)); + + /* Update sizes */ + memory[socket] = default_size; + total_size -= default_size; + } + + /* +* If some memory is remaining, try to allocate it by getting all +* available memory from sockets, one after the other +*/ + for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 0; socket++) { /* take whatever is available */ - memory[socket] = RTE_MIN(get_socket_mem_size(socket), - total_mem); + default_size = RTE_MIN(get_socket_mem_size(socket) - memory[socket], + total_size); + + /* Update sizes */ + memory[socket] += default_size; + total_size -= default_size; } + } + + for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_mem != 0; socket++) { /* skips if the memory on specific socket wasn't requested */ for (i = 0; i < num_hp_info && memory[socket] != 0; i++){ hp_used[i].hugedir = hp_info[i].hugedir; -- 1.7.10.4
[dpdk-dev] [PATCH v2 7/7] pci: remove deprecated RTE_EAL_UNBIND_PORTS option
RTE_EAL_UNBIND_PORTS was deprecated in DPDK 1.4.0 and removed in 1.6.0, but the code was not removed. The bind/unbind operations should not be handled by the eal. These operations should be either done outside of dpdk or inside the PMDs themselves as these are their problems. Signed-off-by: Anatoly Burakov Signed-off-by: David Marchand --- lib/librte_eal/linuxapp/eal/eal_pci.c | 171 - 1 file changed, 171 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index d529ced..ac2c1fe 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -146,155 +146,6 @@ error: return -1; } -#ifdef RTE_EAL_UNBIND_PORTS -#define PROC_MODULES "/proc/modules" - -#define IGB_UIO_NAME "igb_uio" - -#define UIO_DRV_PATH "/sys/bus/pci/drivers/%s" - -/* maximum time to wait that /dev/uioX appears */ -#define UIO_DEV_WAIT_TIMEOUT 3 /* seconds */ - -/* - * Check that a kernel module is loaded. Returns 0 on success, or if the - * parameter is NULL, or -1 if the module is not loaded. - */ -static int -pci_uio_check_module(const char *module_name) -{ - FILE *f; - unsigned i; - char buf[BUFSIZ]; - - if (module_name == NULL) - return 0; - - f = fopen(PROC_MODULES, "r"); - if (f == NULL) { - RTE_LOG(ERR, EAL, "Cannot open "PROC_MODULES": %s\n", - strerror(errno)); - return -1; - } - - while(fgets(buf, sizeof(buf), f) != NULL) { - - for (i = 0; i < sizeof(buf) && buf[i] != '\0'; i++) { - if (isspace(buf[i])) - buf[i] = '\0'; - } - - if (strncmp(buf, module_name, sizeof(buf)) == 0) { - fclose(f); - return 0; - } - } - fclose(f); - return -1; -} - -/* bind a PCI to the kernel module driver */ -static int -pci_bind_device(struct rte_pci_device *dev, char dr_path[]) -{ - FILE *f; - int n; - char buf[BUFSIZ]; - char dev_bind[PATH_MAX]; - struct rte_pci_addr *loc = &dev->addr; - - n = rte_snprintf(dev_bind, sizeof(dev_bind), "%s/bind", dr_path); - if ((n < 0) || (n >= (int)sizeof(buf))) { - RTE_LOG(ERR, EAL, "Cannot rte_snprintf device bind path\n"); - return -1; - } - - f = fopen(dev_bind, "w"); - if (f == NULL) { - RTE_LOG(ERR, EAL, "Cannot open %s\n", dev_bind); - return -1; - } - n = rte_snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n", -loc->domain, loc->bus, loc->devid, loc->function); - if ((n < 0) || (n >= (int)sizeof(buf))) { - RTE_LOG(ERR, EAL, "Cannot rte_snprintf PCI infos\n"); - fclose(f); - return -1; - } - if (fwrite(buf, n, 1, f) == 0) { - fclose(f); - return -1; - } - - fclose(f); - return 0; -} - -static int -pci_uio_bind_device(struct rte_pci_device *dev, const char *module_name) -{ - FILE *f; - int n; - char buf[BUFSIZ]; - char uio_newid[PATH_MAX]; - char uio_bind[PATH_MAX]; - - n = rte_snprintf(uio_newid, sizeof(uio_newid), UIO_DRV_PATH "/new_id", module_name); - if ((n < 0) || (n >= (int)sizeof(uio_newid))) { - RTE_LOG(ERR, EAL, "Cannot rte_snprintf uio_newid name\n"); - return -1; - } - - n = rte_snprintf(uio_bind, sizeof(uio_bind), UIO_DRV_PATH, module_name); - if ((n < 0) || (n >= (int)sizeof(uio_bind))) { - RTE_LOG(ERR, EAL, "Cannot rte_snprintf uio_bind name\n"); - return -1; - } - - n = rte_snprintf(buf, sizeof(buf), "%x %x\n", - dev->id.vendor_id, dev->id.device_id); - if ((n < 0) || (n >= (int)sizeof(buf))) { - RTE_LOG(ERR, EAL, "Cannot rte_snprintf vendor_id/device_id\n"); - return -1; - } - - f = fopen(uio_newid, "w"); - if (f == NULL) { - RTE_LOG(ERR, EAL, "Cannot open %s\n", uio_newid); - return -1; - } - if (fwrite(buf, n, 1, f) == 0) { - fclose(f); - return -1; - } - fclose(f); - - pci_bind_device(dev, uio_bind); - return 0; -} - -static int -pci_switch_module(struct rte_pci_driver *dr, struct rte_pci_device *dev, - const char *module_name) -{ - if (rte_eal_process_type() == RTE_PROC_PRIMARY) { - /* check that our driver is loaded */ - if (pci_uio_check_module(module_name) != 0) - rte_exit(EXIT_FAILURE, "The %s module is required by the " - "%s driver\n", module_name, dr->name); - - /* unbind current driver, bind ours */ -
[dpdk-dev] [PATCH v2 6/7] pci: move RTE_PCI_DRV_FORCE_UNBIND handling out of #ifdef
Move RTE_PCI_DRV_FORCE_UNBIND flag handling out of RTE_EAL_UNBIND_PORTS section. This had nothing to do with RTE_EAL_UNBIND_PORTS anyway. Signed-off-by: David Marchand --- lib/librte_eal/linuxapp/eal/eal_pci.c | 89 - 1 file changed, 44 insertions(+), 45 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index dadb198..d529ced 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -107,6 +107,45 @@ TAILQ_HEAD(uio_res_list, uio_resource); static struct uio_res_list *uio_res_list = NULL; static int pci_parse_sysfs_value(const char *filename, uint64_t *val); +/* unbind kernel driver for this device */ +static int +pci_unbind_kernel_driver(struct rte_pci_device *dev) +{ + int n; + FILE *f; + char filename[PATH_MAX]; + char buf[BUFSIZ]; + struct rte_pci_addr *loc = &dev->addr; + + /* open /sys/bus/pci/devices/:BB:CC.D/driver */ + rte_snprintf(filename, sizeof(filename), +SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/driver/unbind", +loc->domain, loc->bus, loc->devid, loc->function); + + f = fopen(filename, "w"); + if (f == NULL) /* device was not bound */ + return 0; + + n = rte_snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n", +loc->domain, loc->bus, loc->devid, loc->function); + if ((n < 0) || (n >= (int)sizeof(buf))) { + RTE_LOG(ERR, EAL, "%s(): rte_snprintf failed\n", __func__); + goto error; + } + if (fwrite(buf, n, 1, f) == 0) { + RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__, + filename); + goto error; + } + + fclose(f); + return 0; + +error: + fclose(f); + return -1; +} + #ifdef RTE_EAL_UNBIND_PORTS #define PROC_MODULES "/proc/modules" @@ -234,46 +273,6 @@ pci_uio_bind_device(struct rte_pci_device *dev, const char *module_name) return 0; } -/* unbind kernel driver for this device */ -static int -pci_unbind_kernel_driver(struct rte_pci_device *dev) -{ - int n; - FILE *f; - char filename[PATH_MAX]; - char buf[BUFSIZ]; - struct rte_pci_addr *loc = &dev->addr; - - /* open /sys/bus/pci/devices/:BB:CC.D/driver */ - rte_snprintf(filename, sizeof(filename), -SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/driver/unbind", -loc->domain, loc->bus, loc->devid, loc->function); - - f = fopen(filename, "w"); - if (f == NULL) /* device was not bound */ - return 0; - - n = rte_snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n", -loc->domain, loc->bus, loc->devid, loc->function); - if ((n < 0) || (n >= (int)sizeof(buf))) { - RTE_LOG(ERR, EAL, "%s(): rte_snprintf failed\n", __func__); - goto error; - } - if (fwrite(buf, n, 1, f) == 0) { - RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__, - filename); - goto error; - } - - fclose(f); - return 0; - -error: - fclose(f); - return -1; -} - - static int pci_switch_module(struct rte_pci_driver *dr, struct rte_pci_device *dev, const char *module_name) @@ -1008,11 +1007,6 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d /* unbind current driver and bind on igb_uio */ if (pci_switch_module(dr, dev, IGB_UIO_NAME) < 0) return -1; - } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND && - rte_eal_process_type() == RTE_PROC_PRIMARY) { - /* unbind current driver */ - if (pci_unbind_kernel_driver(dev) < 0) - return -1; } #endif @@ -1020,6 +1014,11 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d /* map resources for devices that use igb_uio */ if (pci_uio_map_resource(dev) < 0) return -1; + } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND && + rte_eal_process_type() == RTE_PROC_PRIMARY) { + /* unbind current driver */ + if (pci_unbind_kernel_driver(dev) < 0) + return -1; } /* reference driver structure */ -- 1.7.10.4
[dpdk-dev] [PATCH v2 5/7] pci: pci_switch_module cleanup
The pci_switch_module() function should only do what its name tells: unbind pci devices and rebind them on the specified kernel driver. Hence, it can not call pci_uio_map_resource(). Call to pci_uio_map_resource() should be moved to rte_eal_pci_probe_one_driver() so that we can factorize code. Signed-off-by: David Marchand --- lib/librte_eal/linuxapp/eal/eal_pci.c | 24 +--- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index 451fbd2..dadb198 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -107,9 +107,6 @@ TAILQ_HEAD(uio_res_list, uio_resource); static struct uio_res_list *uio_res_list = NULL; static int pci_parse_sysfs_value(const char *filename, uint64_t *val); -/* forward prototype of function called in pci_switch_module below */ -static int pci_uio_map_resource(struct rte_pci_device *dev); - #ifdef RTE_EAL_UNBIND_PORTS #define PROC_MODULES "/proc/modules" @@ -279,12 +276,11 @@ error: static int pci_switch_module(struct rte_pci_driver *dr, struct rte_pci_device *dev, - int uio_status, const char *module_name) + const char *module_name) { if (rte_eal_process_type() == RTE_PROC_PRIMARY) { /* check that our driver is loaded */ - if (uio_status != 0 && - (uio_status = pci_uio_check_module(module_name)) != 0) + if (pci_uio_check_module(module_name) != 0) rte_exit(EXIT_FAILURE, "The %s module is required by the " "%s driver\n", module_name, dr->name); @@ -294,9 +290,6 @@ pci_switch_module(struct rte_pci_driver *dr, struct rte_pci_device *dev, if (pci_uio_bind_device(dev, module_name) < 0) return -1; } - /* map the NIC resources */ - if (pci_uio_map_resource(dev) < 0) - return -1; return 0; } @@ -1012,8 +1005,8 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d #ifdef RTE_EAL_UNBIND_PORTS if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) { - /* unbind driver and load uio resources for Intel NICs */ - if (pci_switch_module(dr, dev, 1, IGB_UIO_NAME) < 0) + /* unbind current driver and bind on igb_uio */ + if (pci_switch_module(dr, dev, IGB_UIO_NAME) < 0) return -1; } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND && rte_eal_process_type() == RTE_PROC_PRIMARY) { @@ -1021,12 +1014,13 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d if (pci_unbind_kernel_driver(dev) < 0) return -1; } -#else - if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) - /* just map resources for Intel NICs */ +#endif + + if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) { + /* map resources for devices that use igb_uio */ if (pci_uio_map_resource(dev) < 0) return -1; -#endif + } /* reference driver structure */ dev->driver = dr; -- 1.7.10.4
[dpdk-dev] [PATCH v2 4/7] pci: rework interrupt fd init and fix fd leak
A fd leak happens in pci_map_resource when multiple bars are mapped. Fix this by closing fd unconditionnally in this function and open the intr_handle fd in pci_uio_map_resource instead. Signed-off-by: David Marchand --- lib/librte_eal/bsdapp/eal/eal_pci.c | 60 +--- lib/librte_eal/linuxapp/eal/eal_pci.c | 71 - 2 files changed, 62 insertions(+), 69 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c index a8945e4..94ae461 100644 --- a/lib/librte_eal/bsdapp/eal/eal_pci.c +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c @@ -119,8 +119,8 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev) /* map a particular resource from a file */ static void * -pci_map_resource(struct rte_pci_device *dev, void *requested_addr, - const char *devname, off_t offset, size_t size) +pci_map_resource(void *requested_addr, const char *devname, off_t offset, +size_t size) { int fd; void *mapaddr; @@ -130,7 +130,7 @@ pci_map_resource(struct rte_pci_device *dev, void *requested_addr, */ fd = open(devname, O_RDWR); if (fd < 0) { - RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", + RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", devname, strerror(errno)); goto fail; } @@ -138,35 +138,21 @@ pci_map_resource(struct rte_pci_device *dev, void *requested_addr, /* Map the PCI memory resource of device */ mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset); + close(fd); if (mapaddr == MAP_FAILED || (requested_addr != NULL && mapaddr != requested_addr)) { RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):" - " %s (%p)\n", __func__, devname, fd, requested_addr, + " %s (%p)\n", __func__, devname, fd, requested_addr, (unsigned long)size, (unsigned long)offset, strerror(errno), mapaddr); - close(fd); goto fail; } - if (rte_eal_process_type() == RTE_PROC_PRIMARY) { - /* save fd if in primary process */ - dev->intr_handle.fd = fd; - dev->intr_handle.type = RTE_INTR_HANDLE_UIO; - } else { - /* fd is not needed in slave process, close it */ - dev->intr_handle.fd = -1; - dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; - close(fd); - } - RTE_LOG(DEBUG, EAL, " PCI memory mapped at %p\n", mapaddr); return mapaddr; fail: - dev->intr_handle.fd = -1; - dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; - return NULL; } @@ -179,19 +165,19 @@ pci_uio_map_secondary(struct rte_pci_device *dev) { size_t i; struct uio_resource *uio_res; - + TAILQ_FOREACH(uio_res, uio_res_list, next) { - + /* skip this element if it doesn't match our PCI address */ if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr))) continue; - + for (i = 0; i != uio_res->nb_maps; i++) { - if (pci_map_resource(dev, uio_res->maps[i].addr, - uio_res->path, - (off_t)uio_res->maps[i].offset, - (size_t)uio_res->maps[i].size) != - uio_res->maps[i].addr) { + if (pci_map_resource(uio_res->maps[i].addr, +uio_res->path, +(off_t)uio_res->maps[i].offset, +(size_t)uio_res->maps[i].size) + != uio_res->maps[i].addr) { RTE_LOG(ERR, EAL, "Cannot mmap device resource\n"); return (-1); @@ -219,6 +205,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) struct uio_map *maps; dev->intr_handle.fd = -1; + dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; /* secondary processes - use already recorded details */ if (rte_eal_process_type() != RTE_PROC_PRIMARY) @@ -233,6 +220,15 @@ pci_uio_map_resource(struct rte_pci_device *dev) return -1; } + /* save fd if in primary process */ + dev->intr_handle.fd = open(devname, O_RDWR); + if (dev->intr_handle.fd < 0) { + RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", + devname, strerror(errno)); + return -1; + } + dev->intr_handle.type = RTE_INTR_HANDLE_UIO; + /* allocate the mapping details for
[dpdk-dev] [PATCH v2 3/7] pci: remove virtio-uio workaround
virtio-uio does not need eal to map bars from uio device, so remove flag RTE_PCI_DRV_NEED_IGB_UIO. Then, move virtio-uio workaround out of generic eal_pci.c for linux implementation. Signed-off-by: David Marchand --- lib/librte_eal/bsdapp/eal/eal_pci.c |9 +-- lib/librte_eal/linuxapp/eal/eal_pci.c | 30 +--- lib/librte_pmd_virtio/virtio_ethdev.c | 133 - 3 files changed, 134 insertions(+), 38 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c index 5d8bcbd..a8945e4 100644 --- a/lib/librte_eal/bsdapp/eal/eal_pci.c +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c @@ -221,8 +221,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) dev->intr_handle.fd = -1; /* secondary processes - use already recorded details */ - if ((rte_eal_process_type() != RTE_PROC_PRIMARY) && - (dev->id.vendor_id != PCI_VENDOR_ID_QUMRANET)) + if (rte_eal_process_type() != RTE_PROC_PRIMARY) return (pci_uio_map_secondary(dev)); rte_snprintf(devname, sizeof(devname), "/dev/uio at pci:%u:%u:%u", @@ -234,12 +233,6 @@ pci_uio_map_resource(struct rte_pci_device *dev) return -1; } - if(dev->id.vendor_id == PCI_VENDOR_ID_QUMRANET) { - /* I/O port address already assigned */ - /* rte_virtio_pmd does not need any other bar even if available */ - return (0); - } - /* allocate the mapping details for secondary processes*/ if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) { RTE_LOG(ERR, EAL, diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index 99e07d2..c006cf5 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -584,11 +584,9 @@ pci_uio_map_resource(struct rte_pci_device *dev) { int i, j; char dirname[PATH_MAX]; - char filename[PATH_MAX]; char devname[PATH_MAX]; /* contains the /dev/uioX */ void *mapaddr; int uio_num; - unsigned long start,size; uint64_t phaddr; uint64_t offset; uint64_t pagesz; @@ -600,8 +598,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) dev->intr_handle.fd = -1; /* secondary processes - use already recorded details */ - if ((rte_eal_process_type() != RTE_PROC_PRIMARY) && - (dev->id.vendor_id != PCI_VENDOR_ID_QUMRANET)) + if (rte_eal_process_type() != RTE_PROC_PRIMARY) return (pci_uio_map_secondary(dev)); /* find uio resource */ @@ -612,31 +609,6 @@ pci_uio_map_resource(struct rte_pci_device *dev) return -1; } - if(dev->id.vendor_id == PCI_VENDOR_ID_QUMRANET) { - /* get portio size */ - rte_snprintf(filename, sizeof(filename), -"%s/portio/port0/size", dirname); - if (eal_parse_sysfs_value(filename, &size) < 0) { - RTE_LOG(ERR, EAL, "%s(): cannot parse size\n", - __func__); - return -1; - } - - /* get portio start */ - rte_snprintf(filename, sizeof(filename), -"%s/portio/port0/start", dirname); - if (eal_parse_sysfs_value(filename, &start) < 0) { - RTE_LOG(ERR, EAL, "%s(): cannot parse portio start\n", - __func__); - return -1; - } - dev->mem_resource[0].addr = (void *)(uintptr_t)start; - dev->mem_resource[0].len = (uint64_t)size; - RTE_LOG(DEBUG, EAL, "PCI Port IO found start=0x%lx with size=0x%lx\n", start, size); - /* rte_virtio_pmd does not need any other bar even if available */ - return (0); - } - /* allocate the mapping details for secondary processes*/ if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) { RTE_LOG(ERR, EAL, diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c b/lib/librte_pmd_virtio/virtio_ethdev.c index f107161..c6a1df5 100644 --- a/lib/librte_pmd_virtio/virtio_ethdev.c +++ b/lib/librte_pmd_virtio/virtio_ethdev.c @@ -36,6 +36,9 @@ #include #include #include +#ifdef RTE_EXEC_ENV_LINUXAPP +#include +#endif #include #include @@ -392,6 +395,103 @@ virtio_negotiate_features(struct virtio_hw *hw) hw->guest_features = vtpci_negotiate_features(hw, guest_features); } +#ifdef RTE_EXEC_ENV_LINUXAPP +static int +parse_sysfs_value(const char *filename, unsigned long *val) +{ + FILE *f; + char buf[BUFSIZ]; + char *end = NULL; + + if ((f = fopen(filename, "r")) == NULL) { + PMD_INIT_LOG(ERR, "%s(): cannot open sysfs value %s\n", +__func__, fil
[dpdk-dev] [PATCH v2 2/7] pci: align bsd implementation on linux
bsd implementation lacks check on driver flags, fix this. Besides, check on BAR0 is not needed and could cause trouble for devices that have no BAR0. Signed-off-by: David Marchand --- lib/librte_eal/bsdapp/eal/eal_pci.c | 42 +++ 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c index 987b446..5d8bcbd 100644 --- a/lib/librte_eal/bsdapp/eal/eal_pci.c +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c @@ -108,8 +108,14 @@ TAILQ_HEAD(uio_res_list, uio_resource); static struct uio_res_list *uio_res_list = NULL; -/* forward prototype of function called in pci_switch_module below */ -static int pci_uio_map_resource(struct rte_pci_device *dev); +/* unbind kernel driver for this device */ +static int +pci_unbind_kernel_driver(struct rte_pci_device *dev) +{ + RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not implemented " + "for BSD\n"); + return -ENOTSUP; +} /* map a particular resource from a file */ static void * @@ -214,6 +220,11 @@ pci_uio_map_resource(struct rte_pci_device *dev) dev->intr_handle.fd = -1; + /* secondary processes - use already recorded details */ + if ((rte_eal_process_type() != RTE_PROC_PRIMARY) && + (dev->id.vendor_id != PCI_VENDOR_ID_QUMRANET)) + return (pci_uio_map_secondary(dev)); + rte_snprintf(devname, sizeof(devname), "/dev/uio at pci:%u:%u:%u", dev->addr.bus, dev->addr.devid, dev->addr.function); @@ -223,11 +234,6 @@ pci_uio_map_resource(struct rte_pci_device *dev) return -1; } - /* secondary processes - use already recorded details */ - if ((rte_eal_process_type() != RTE_PROC_PRIMARY) && - (dev->id.vendor_id != PCI_VENDOR_ID_QUMRANET)) - return (pci_uio_map_secondary(dev)); - if(dev->id.vendor_id == PCI_VENDOR_ID_QUMRANET) { /* I/O port address already assigned */ /* rte_virtio_pmd does not need any other bar even if available */ @@ -479,19 +485,17 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d return 0; } - /* just map the NIC resources */ - if (pci_uio_map_resource(dev) < 0) - return -1; - - /* We always should have BAR0 mapped */ - if (rte_eal_process_type() == RTE_PROC_PRIMARY && - dev->mem_resource[0].addr == NULL) { - RTE_LOG(ERR, EAL, - "%s(): BAR0 is not mapped\n", - __func__); - return (-1); + if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) { + /* map resources for devices that use igb_uio */ + if (pci_uio_map_resource(dev) < 0) + return -1; + } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND && + rte_eal_process_type() == RTE_PROC_PRIMARY) { + /* unbind current driver */ + if (pci_unbind_kernel_driver(dev) < 0) + return -1; } - + /* reference driver structure */ dev->driver = dr; -- 1.7.10.4
[dpdk-dev] [PATCH v2 1/7] pci: fix potential mem leaks
Looking at bsd implementation, we can see that there are some potential mem leaks in linux implementation. Fix them. Signed-off-by: David Marchand --- lib/librte_eal/linuxapp/eal/eal_pci.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index 9538efe..99e07d2 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -649,11 +649,13 @@ pci_uio_map_resource(struct rte_pci_device *dev) memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr)); /* collect info about device mappings */ - if ((nb_maps = pci_uio_get_mappings(dirname, uio_res->maps, - sizeof (uio_res->maps) / sizeof (uio_res->maps[0]))) - < 0) + nb_maps = pci_uio_get_mappings(dirname, uio_res->maps, + RTE_DIM(uio_res->maps)); + if (nb_maps < 0) { + rte_free(uio_res); return (nb_maps); - + } + uio_res->nb_maps = nb_maps; /* Map all BARs */ @@ -678,6 +680,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) (mapaddr = pci_map_resource(dev, NULL, devname, (off_t)offset, (size_t)maps[j].size)) == NULL) { + rte_free(uio_res); return (-1); } -- 1.7.10.4
[dpdk-dev] [PATCH v2 0/7] pci cleanup
Hello all, Here is an attempt at having an equal implementation in bsd and linux eal_pci.c. It results in following changes : - checks on driver flag in bsd which were missing - remove virtio-uio workaround in linux eal_pci.c - remove deprecated RTE_EAL_UNBIND_PORTS option Along the way, I discovered two small bugs: a mem leak in linux eal_pci.c and a fd leak in both bsd and linux eal_pci.c. Changes included in v2: - fix another mem leak noticed by Anatoly Burakov -- David Marchand David Marchand (7): pci: fix potential mem leaks pci: align bsd implementation on linux pci: remove virtio-uio workaround pci: rework interrupt fd init and fix fd leak pci: pci_switch_module cleanup pci: move RTE_PCI_DRV_FORCE_UNBIND handling out of #ifdef pci: remove deprecated RTE_EAL_UNBIND_PORTS option lib/librte_eal/bsdapp/eal/eal_pci.c | 105 ++-- lib/librte_eal/linuxapp/eal/eal_pci.c | 282 + lib/librte_pmd_virtio/virtio_ethdev.c | 133 +++- 3 files changed, 218 insertions(+), 302 deletions(-) -- 1.7.10.4
[dpdk-dev] [PATCH v2 6/6] mk: add "make examples" target in root makefile
It is now possible to build all projects from the examples/ directory using one command from root directory. Some illustration of what is possible: - build examples in the DPDK tree for one target # install the x86_64-default-linuxapp-gcc in # ${RTE_SDK}/x86_64-default-linuxapp-gcc directory user at droids:~/dpdk.org$ make install T=x86_64-default-linuxapp-gcc # build examples for this new installation in # ${RTE_SDK}/examples directory user at droids:~/dpdk.org$ make examples T=x86_64-default-linuxapp-gcc - build examples outside DPDK tree for several targets # install all targets matching x86_64-*-linuxapp-gcc in # ${RTE_SDK}/x86_64-*-linuxapp-gcc directories user at droids:~/dpdk.org$ make install T=x86_64-*-linuxapp-gcc # build examples for these installations in /tmp/foobar user at droids:~/dpdk.org$ make examples T=x86_64-*-linuxapp-gcc O=/tmp/foobar Signed-off-by: Olivier Matz --- doc/build-sdk-quick.txt | 14 + mk/rte.sdkexamples.mk | 79 + mk/rte.sdkroot.mk | 4 +++ 3 files changed, 91 insertions(+), 6 deletions(-) create mode 100644 mk/rte.sdkexamples.mk diff --git a/doc/build-sdk-quick.txt b/doc/build-sdk-quick.txt index 8989a32..d768c44 100644 --- a/doc/build-sdk-quick.txt +++ b/doc/build-sdk-quick.txt @@ -1,12 +1,14 @@ Basic build make config T=x86_64-default-linuxapp-gcc && make Build commands - config get configuration from target template (T=) - all same as build (default rule) - build build in a configured directory - clean remove files but keep configuration - install build many targets (wildcard allowed) and install in DESTDIR - uninstall remove all installed targets + config get configuration from target template (T=) + all same as build (default rule) + buildbuild in a configured directory + cleanremove files but keep configuration + install build many targets (wildcard allowed) and install in DESTDIR + uninstallremove all installed targets + examples build examples for given targets (T=) + examples_clean clean examples for given targets (T=) Build variables EXTRA_CPPFLAGS preprocessor options EXTRA_CFLAGS compiler options diff --git a/mk/rte.sdkexamples.mk b/mk/rte.sdkexamples.mk new file mode 100644 index 000..a76570e --- /dev/null +++ b/mk/rte.sdkexamples.mk @@ -0,0 +1,79 @@ +# BSD LICENSE +# +# Copyright(c) 2014 6WIND S.A. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of 6WIND S.A. nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# examples application are seen as external applications which are +# not part of SDK. +BUILDING_RTE_SDK := +export BUILDING_RTE_SDK + +# Build directory is given with O= +ifndef O +O = $(RTE_SDK)/examples +endif + +# Target for which examples should be built. +ifndef T +T = * +endif + +# list all available configurations +EXAMPLES_CONFIGS := $(patsubst $(RTE_SRCDIR)/config/defconfig_%,%,\ + $(wildcard $(RTE_SRCDIR)/config/defconfig_$(T))) +EXAMPLES_TARGETS := $(addsuffix _examples,\ + $(filter-out %~,$(EXAMPLES_CONFIGS))) + +.PHONY: examples +examples: $(EXAMPLES_TARGETS) + +%_examples: + @echo == Build examples for $* + $(Q)if [ ! -d "${RTE_SDK}/${*}" ]; then \ + echo "Target ${*} does not exist in ${RTE_SDK}/${*}." ; \ + echo -n "Pleas
[dpdk-dev] [PATCH v2 5/6] examples: fix netmap_compat example
It is not allowed to reference a an absolute file name in SRCS-y. A VPATH has to be used, else the dependencies won't be checked properly. Signed-off-by: Olivier Matz --- examples/netmap_compat/bridge/Makefile | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/examples/netmap_compat/bridge/Makefile b/examples/netmap_compat/bridge/Makefile index 74feb1e..ebc6b1c 100644 --- a/examples/netmap_compat/bridge/Makefile +++ b/examples/netmap_compat/bridge/Makefile @@ -41,9 +41,12 @@ include $(RTE_SDK)/mk/rte.vars.mk # binary name APP = bridge +# for compat_netmap.c +VPATH := $(SRCDIR)/../lib + # all source are stored in SRCS-y SRCS-y := bridge.c -SRCS-y += $(SRCDIR)/../lib/compat_netmap.c +SRCS-y += compat_netmap.c CFLAGS += -O3 -I$(SRCDIR)/../lib -I$(SRCDIR)/../netmap CFLAGS += $(WERROR_FLAGS) -- 1.9.2
[dpdk-dev] [PATCH v2 4/6] examples: fix qos_sched makefile
The example does not compile as the linker complains about duplicated symbols. Remove -lsched from LDLIBS, it is already present in rte.app.mk and added by the DPDK framework automatically. Signed-off-by: Olivier Matz --- examples/qos_sched/Makefile | 2 -- 1 file changed, 2 deletions(-) diff --git a/examples/qos_sched/Makefile b/examples/qos_sched/Makefile index b91fe37..9366efe 100755 --- a/examples/qos_sched/Makefile +++ b/examples/qos_sched/Makefile @@ -54,6 +54,4 @@ CFLAGS += $(WERROR_FLAGS) CFLAGS_args.o := -D_GNU_SOURCE CFLAGS_cfg_file.o := -D_GNU_SOURCE -LDLIBS += -lrte_sched - include $(RTE_SDK)/mk/rte.extapp.mk -- 1.9.2
[dpdk-dev] [PATCH v2 3/6] examples: add a makefile to build all examples
It is now possible to build all examples by doing the following: user at droids:~/dpdk.org$ make install T=x86_64-default-linuxapp-gcc user at droids:~/dpdk.org$ cd examples user at droids:~/dpdk.org/examples$ make RTE_SDK=${PWD}/.. \ RTE_TARGET=x86_64-default-linuxapp-gcc Signed-off-by: Olivier Matz --- examples/Makefile | 68 +++ 1 file changed, 68 insertions(+) create mode 100644 examples/Makefile diff --git a/examples/Makefile b/examples/Makefile new file mode 100644 index 000..5e36c92 --- /dev/null +++ b/examples/Makefile @@ -0,0 +1,68 @@ +# BSD LICENSE +# +# Copyright(c) 2014 6WIND S.A. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of 6WIND S.A. nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overriden by command line or environment +RTE_TARGET ?= x86_64-default-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +DIRS-y += cmdline +ifneq ($(ICP_ROOT),) +DIRS-y += dpdk_qat +endif +DIRS-y += exception_path +DIRS-y += helloworld +DIRS-y += ip_reassembly +DIRS-$(CONFIG_RTE_MBUF_SCATTER_GATHER) += ipv4_frag +DIRS-$(CONFIG_RTE_MBUF_SCATTER_GATHER) += ipv4_multicast +DIRS-$(CONFIG_RTE_LIBRTE_KNI) += kni +DIRS-y += l2fwd +DIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += l2fwd-ivshmem +DIRS-y += l3fwd +DIRS-y += l3fwd-power +DIRS-y += l3fwd-vf +DIRS-y += link_status_interrupt +DIRS-y += load_balancer +DIRS-y += multi_process +DIRS-y += netmap_compat/bridge +DIRS-$(CONFIG_RTE_LIBRTE_METER) += qos_meter +DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += qos_sched +DIRS-y += quota_watermark +DIRS-y += timer +DIRS-y += vhost +DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen +DIRS-y += vmdq +DIRS-y += vmdq_dcb + +include $(RTE_SDK)/mk/rte.extsubdir.mk -- 1.9.2
[dpdk-dev] [PATCH v2 2/6] examples: use rte.extsubdir.mk to process subdirectories
Signed-off-by: Olivier Matz --- examples/l2fwd-ivshmem/Makefile | 9 + examples/multi_process/Makefile | 16 +++- examples/multi_process/client_server_mp/Makefile | 15 ++- examples/quota_watermark/Makefile| 12 +++- 4 files changed, 17 insertions(+), 35 deletions(-) diff --git a/examples/l2fwd-ivshmem/Makefile b/examples/l2fwd-ivshmem/Makefile index 7286b37..df59ed8 100644 --- a/examples/l2fwd-ivshmem/Makefile +++ b/examples/l2fwd-ivshmem/Makefile @@ -37,14 +37,7 @@ endif RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc include $(RTE_SDK)/mk/rte.vars.mk -unexport RTE_SRCDIR RTE_OUTPUT RTE_EXTMK DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += host guest -.PHONY: all clean $(DIRS-y) - -all: $(DIRS-y) -clean: $(DIRS-y) - -$(DIRS-y): - $(MAKE) -C $@ $(MAKECMDGOALS) +include $(RTE_SDK)/mk/rte.extsubdir.mk diff --git a/examples/multi_process/Makefile b/examples/multi_process/Makefile index ba96a7e..f2c8e68 100644 --- a/examples/multi_process/Makefile +++ b/examples/multi_process/Makefile @@ -33,15 +33,13 @@ ifeq ($(RTE_SDK),) $(error "Please define RTE_SDK environment variable") endif -include $(RTE_SDK)/mk/rte.vars.mk -unexport RTE_SRCDIR RTE_OUTPUT RTE_EXTMK - -DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += $(wildcard *_mp) +# Default target, can be overriden by command line or environment +RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc -.PHONY: all clean $(DIRS-y) +include $(RTE_SDK)/mk/rte.vars.mk -all: $(DIRS-y) -clean: $(DIRS-y) +DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += client_server_mp +DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += simple_mp +DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += symmetric_mp -$(DIRS-y): - $(MAKE) -C $@ $(MAKECMDGOALS) +include $(RTE_SDK)/mk/rte.extsubdir.mk diff --git a/examples/multi_process/client_server_mp/Makefile b/examples/multi_process/client_server_mp/Makefile index 24d31b0..b8d6b3f 100644 --- a/examples/multi_process/client_server_mp/Makefile +++ b/examples/multi_process/client_server_mp/Makefile @@ -33,15 +33,12 @@ ifeq ($(RTE_SDK),) $(error "Please define RTE_SDK environment variable") endif -include $(RTE_SDK)/mk/rte.vars.mk -unexport RTE_SRCDIR RTE_OUTPUT RTE_EXTMK - -DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += $(wildcard mp_*) +# Default target, can be overriden by command line or environment +RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc -.PHONY: all clean $(DIRS-y) +include $(RTE_SDK)/mk/rte.vars.mk -all: $(DIRS-y) -clean: $(DIRS-y) +DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += mp_client +DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += mp_server -$(DIRS-y): - $(MAKE) -C $@ $(MAKECMDGOALS) +include $(RTE_SDK)/mk/rte.extsubdir.mk diff --git a/examples/quota_watermark/Makefile b/examples/quota_watermark/Makefile index 5596dcc..e4d54c2 100644 --- a/examples/quota_watermark/Makefile +++ b/examples/quota_watermark/Makefile @@ -37,14 +37,8 @@ endif RTE_TARGET ?= x86_64-default-linuxapp-gcc include $(RTE_SDK)/mk/rte.vars.mk -unexport RTE_SRCDIR RTE_OUTPUT RTE_EXTMK -DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += $(wildcard qw*) +DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += qw +DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += qwctl -.PHONY: all clean $(DIRS-y) - -all: $(DIRS-y) -clean: $(DIRS-y) - -$(DIRS-y): - $(MAKE) -C $@ $(MAKECMDGOALS) +include $(RTE_SDK)/mk/rte.extsubdir.mk -- 1.9.2
[dpdk-dev] [PATCH v2 1/6] mk: introduce rte.extsubdir.mk
This makefile can be included by a project that needs to build several applications or libraries that are located in different directories. Signed-off-by: Olivier Matz --- mk/rte.extsubdir.mk | 53 + 1 file changed, 53 insertions(+) create mode 100644 mk/rte.extsubdir.mk diff --git a/mk/rte.extsubdir.mk b/mk/rte.extsubdir.mk new file mode 100644 index 000..f50f006 --- /dev/null +++ b/mk/rte.extsubdir.mk @@ -0,0 +1,53 @@ +# BSD LICENSE +# +# Copyright(c) 2014 6WIND S.A. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of 6WIND S.A. nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +MAKEFLAGS += --no-print-directory + +# output directory +O ?= . +BASE_OUTPUT ?= $(O) +CUR_SUBDIR ?= . + +.PHONY: all +all: $(DIRS-y) + +.PHONY: clean +clean: $(DIRS-y) + +.PHONY: $(DIRS-y) +$(DIRS-y): + @echo "== $@" + $(Q)$(MAKE) -C $(@) \ + M=$(CURDIR)/$(@)/Makefile \ + O=$(BASE_OUTPUT)/$(CUR_SUBDIR)/$(@)/$(RTE_TARGET) \ + BASE_OUTPUT=$(BASE_OUTPUT) \ + CUR_SUBDIR=$(CUR_SUBDIR)/$(@) \ + S=$(CURDIR)/$(@) \ + $(filter-out $(DIRS-y),$(MAKECMDGOALS)) -- 1.9.2
[dpdk-dev] [PATCH v2 0/6] examples: add a new makefile to build all examples
This patch series adds a makefile to build all examples supported by the configuration. It helps to check that all examples compile after a dpdk modification. After applying the patches, it is possible to build all examples for given targets, given the installation directory: # first, install the x86_64-default-linuxapp-gcc in # ${RTE_SDK}/x86_64-default-linuxapp-gcc directory user at droids:~/dpdk.org$ make install T=x86_64-default-linuxapp-gcc # build examples for this new installation in # ${RTE_SDK}/examples directory user at droids:~/dpdk.org$ make examples T=x86_64-default-linuxapp-gcc Or directly from examples directory: user at droids:~/dpdk.org$ cd examples user at droids:~/dpdk.org/examples$ make RTE_SDK=${PWD}/.. \ RTE_TARGET=x86_64-default-linuxapp-gcc Changes included in v2: - do not build kni example if CONFIG_RTE_LIBRTE_KNI is not set - fix rte.extsubdir.mk when there are several levels of subdirectories - allow to build examples directly from dpdk root directory - explain in commit logs that it requires an install directory Olivier Matz (6): mk: introduce rte.extsubdir.mk examples: use rte.extsubdir.mk to process subdirectories examples: add a makefile to build all examples examples: fix qos_sched makefile examples: fix netmap_compat example mk: add "make examples" target in root makefile doc/build-sdk-quick.txt | 14 +++-- examples/Makefile| 68 examples/l2fwd-ivshmem/Makefile | 9 +-- examples/multi_process/Makefile | 16 +++-- examples/multi_process/client_server_mp/Makefile | 15 ++--- examples/netmap_compat/bridge/Makefile | 5 +- examples/qos_sched/Makefile | 2 - examples/quota_watermark/Makefile| 12 +--- mk/rte.extsubdir.mk | 53 mk/rte.sdkexamples.mk| 79 mk/rte.sdkroot.mk| 4 ++ 11 files changed, 233 insertions(+), 44 deletions(-) create mode 100644 examples/Makefile create mode 100644 mk/rte.extsubdir.mk create mode 100644 mk/rte.sdkexamples.mk -- 1.9.2
[dpdk-dev] [PATCH v2 0/2] ring: allow to init a rte_ring outside of an rte_memzone
-Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier Matz Sent: Friday, May 09, 2014 11:15 AM To: dev at dpdk.org Subject: [dpdk-dev] [PATCH v2 0/2] ring: allow to init a rte_ring outside of an rte_memzone These 2 patches adds 2 new functions that permits to initialize and use a rte_ring anywhere in memory. Before this patches, only rte_ring_create() was available. This function allocates a rte_memzone (that cannot be freed) and initializes a ring inside. This series allows to do the following: size = rte_ring_get_memsize(1024); r = malloc(size); rte_ring_init(r, "my_ring", 1024, 0); Changes included in v2: - fix syntax for functions definitions in rte_ring_get_memsize() - use RTE_ALIGN() to get nearest higher multiple of cache line size - fix description of rte_ring_init() in doxygen comments Olivier Matz (2): ring: introduce rte_ring_get_memsize() ring: introduce rte_ring_init() lib/librte_ring/rte_ring.c | 89 +- lib/librte_ring/rte_ring.h | 67 +++--- 2 files changed, 119 insertions(+), 37 deletions(-) -- Acked-by: Konstantin Ananyev
[dpdk-dev] [PATCH v2 2/2] ring: introduce rte_ring_init()
Allow to initialize a ring in an already allocated memory. The rte_ring_create() function that allocates a ring in a rte_memzone is still available and now uses the new rte_ring_init() function in order to factorize the code. Signed-off-by: Olivier Matz --- lib/librte_ring/rte_ring.c | 63 ++ lib/librte_ring/rte_ring.h | 51 + 2 files changed, 82 insertions(+), 32 deletions(-) diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index 156fe49..2eaa6c8 100644 --- a/lib/librte_ring/rte_ring.c +++ b/lib/librte_ring/rte_ring.c @@ -113,18 +113,10 @@ rte_ring_get_memsize(unsigned count) return sz; } -/* create the ring */ -struct rte_ring * -rte_ring_create(const char *name, unsigned count, int socket_id, - unsigned flags) +int +rte_ring_init(struct rte_ring *r, const char *name, unsigned count, + unsigned flags) { - char mz_name[RTE_MEMZONE_NAMESIZE]; - struct rte_ring *r; - const struct rte_memzone *mz; - ssize_t ring_size; - int mz_flags = 0; - struct rte_ring_list* ring_list = NULL; - /* compilation-time checks */ RTE_BUILD_BUG_ON((sizeof(struct rte_ring) & CACHE_LINE_MASK) != 0); @@ -141,11 +133,38 @@ rte_ring_create(const char *name, unsigned count, int socket_id, CACHE_LINE_MASK) != 0); #endif + /* init the ring structure */ + memset(r, 0, sizeof(*r)); + rte_snprintf(r->name, sizeof(r->name), "%s", name); + r->flags = flags; + r->prod.watermark = count; + r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ); + r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ); + r->prod.size = r->cons.size = count; + r->prod.mask = r->cons.mask = count-1; + r->prod.head = r->cons.head = 0; + r->prod.tail = r->cons.tail = 0; + + return 0; +} + +/* create the ring */ +struct rte_ring * +rte_ring_create(const char *name, unsigned count, int socket_id, + unsigned flags) +{ + char mz_name[RTE_MEMZONE_NAMESIZE]; + struct rte_ring *r; + const struct rte_memzone *mz; + ssize_t ring_size; + int mz_flags = 0; + struct rte_ring_list* ring_list = NULL; + /* check that we have an initialised tail queue */ - if ((ring_list = + if ((ring_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_RING, rte_ring_list)) == NULL) { rte_errno = E_RTE_NO_TAILQ; - return NULL; + return NULL; } ring_size = rte_ring_get_memsize(count); @@ -164,26 +183,16 @@ rte_ring_create(const char *name, unsigned count, int socket_id, mz = rte_memzone_reserve(mz_name, ring_size, socket_id, mz_flags); if (mz != NULL) { r = mz->addr; - - /* init the ring structure */ - memset(r, 0, sizeof(*r)); - rte_snprintf(r->name, sizeof(r->name), "%s", name); - r->flags = flags; - r->prod.watermark = count; - r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ); - r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ); - r->prod.size = r->cons.size = count; - r->prod.mask = r->cons.mask = count-1; - r->prod.head = r->cons.head = 0; - r->prod.tail = r->cons.tail = 0; - + /* no need to check return value here, we already checked the +* arguments above */ + rte_ring_init(r, name, count, flags); TAILQ_INSERT_TAIL(ring_list, r, next); } else { r = NULL; RTE_LOG(ERR, RING, "Cannot reserve memory\n"); } rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); - + return r; } diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index e8493f2..96232d3 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -215,13 +215,54 @@ struct rte_ring { ssize_t rte_ring_get_memsize(unsigned count); /** + * Initialize a ring structure. + * + * Initialize a ring structure in memory pointed by "r". The size of the + * memory area must be large enough to store the ring structure and the + * object table. It is advised to use rte_ring_get_memsize() to get the + * appropriate size. + * + * The ring size is set to *count*, which must be a power of two. Water + * marking is disabled by default. The real usable ring size is + * *count-1* instead of *count* to differentiate a free ring from an + * empty ring. + * + * The ring is not added in RTE_TAILQ_RING global list. Indeed, the + * memory given by the caller may not be shareable among dpdk + * processes. + * + * @param r + * The pointer to the ring structure followed by the objects table. + * @param name + * The name of the ring. + * @param count + * The number of elements in the ring (must
[dpdk-dev] [PATCH v2 1/2] ring: introduce rte_ring_get_memsize()
Add a function that returns the amount of memory occupied by a rte_ring structure and its object table. This commit prepares the next one that will allow to allocate a ring dynamically. Signed-off-by: Olivier Matz --- lib/librte_ring/rte_ring.c | 30 +++--- lib/librte_ring/rte_ring.h | 16 2 files changed, 39 insertions(+), 7 deletions(-) diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index 0d43a55..156fe49 100644 --- a/lib/librte_ring/rte_ring.c +++ b/lib/librte_ring/rte_ring.c @@ -94,6 +94,25 @@ TAILQ_HEAD(rte_ring_list, rte_ring); /* true if x is a power of 2 */ #define POWEROF2(x) x)-1) & (x)) == 0) +/* return the size of memory occupied by a ring */ +ssize_t +rte_ring_get_memsize(unsigned count) +{ + ssize_t sz; + + /* count must be a power of 2 */ + if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) { + RTE_LOG(ERR, RING, + "Requested size is invalid, must be power of 2, and " + "do not exceed the size limit %u\n", RTE_RING_SZ_MASK); + return -EINVAL; + } + + sz = sizeof(struct rte_ring) + count * sizeof(void *); + sz = RTE_ALIGN(sz, CACHE_LINE_SIZE); + return sz; +} + /* create the ring */ struct rte_ring * rte_ring_create(const char *name, unsigned count, int socket_id, @@ -102,7 +121,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id, char mz_name[RTE_MEMZONE_NAMESIZE]; struct rte_ring *r; const struct rte_memzone *mz; - size_t ring_size; + ssize_t ring_size; int mz_flags = 0; struct rte_ring_list* ring_list = NULL; @@ -129,16 +148,13 @@ rte_ring_create(const char *name, unsigned count, int socket_id, return NULL; } - /* count must be a power of 2 */ - if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) { - rte_errno = EINVAL; - RTE_LOG(ERR, RING, "Requested size is invalid, must be power of 2, and " - "do not exceed the size limit %u\n", RTE_RING_SZ_MASK); + ring_size = rte_ring_get_memsize(count); + if (ring_size < 0) { + rte_errno = ring_size; return NULL; } rte_snprintf(mz_name, sizeof(mz_name), "%s%s", RTE_RING_MZ_PREFIX, name); - ring_size = count * sizeof(void *) + sizeof(struct rte_ring); rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index 775ea79..e8493f2 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -199,6 +199,22 @@ struct rte_ring { #endif /** + * Calculate the memory size needed for a ring + * + * This function returns the number of bytes needed for a ring, given + * the number of elements in it. This value is the sum of the size of + * the structure rte_ring and the size of the memory needed by the + * objects pointers. The value is aligned to a cache line size. + * + * @param count + * The number of elements in the ring (must be a power of 2). + * @return + * - The memory size needed for the ring on success. + * - -EINVAL if count is not a power of 2. + */ +ssize_t rte_ring_get_memsize(unsigned count); + +/** * Create a new ring named *name* in memory. * * This function uses ``memzone_reserve()`` to allocate memory. Its size is -- 1.9.2
[dpdk-dev] [PATCH v2 0/2] ring: allow to init a rte_ring outside of an rte_memzone
These 2 patches adds 2 new functions that permits to initialize and use a rte_ring anywhere in memory. Before this patches, only rte_ring_create() was available. This function allocates a rte_memzone (that cannot be freed) and initializes a ring inside. This series allows to do the following: size = rte_ring_get_memsize(1024); r = malloc(size); rte_ring_init(r, "my_ring", 1024, 0); Changes included in v2: - fix syntax for functions definitions in rte_ring_get_memsize() - use RTE_ALIGN() to get nearest higher multiple of cache line size - fix description of rte_ring_init() in doxygen comments Olivier Matz (2): ring: introduce rte_ring_get_memsize() ring: introduce rte_ring_init() lib/librte_ring/rte_ring.c | 89 +- lib/librte_ring/rte_ring.h | 67 +++--- 2 files changed, 119 insertions(+), 37 deletions(-) -- 1.9.2
[dpdk-dev] Compile failed using g++ 4.8.2
When I use Ubuntu 14.04 to compile my program, the g++ 4.8.2 print the following error message, that need to add a space around identifier PRIx64, anyone can help to submit a patch: /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_mempool.h:347:6: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] "obj=%p, mempool=%p, cookie=%"PRIx64"\n", ^ /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_mempool.h:357:6: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] "obj=%p, mempool=%p, cookie=%"PRIx64"\n", ^ /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_mempool.h:368:6: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] "obj=%p, mempool=%p, cookie=%"PRIx64"\n", ^ /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_mempool.h:377:5: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] "obj=%p, mempool=%p, cookie=%"PRIx64"\n", ^ In file included from /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_ethdev.h:177:0, from /home/bodc/workspace/tcproxy/src/comm/packet.cc:9: /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:95:21: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] #define PCI_PRI_FMT "%.4"PRIx16":%.2"PRIx8":%.2"PRIx8".%"PRIx8 ^ /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:95:32: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] #define PCI_PRI_FMT "%.4"PRIx16":%.2"PRIx8":%.2"PRIx8".%"PRIx8 ^ /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:95:43: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] #define PCI_PRI_FMT "%.4"PRIx16":%.2"PRIx8":%.2"PRIx8".%"PRIx8 ^ /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:95:54: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] #define PCI_PRI_FMT "%.4"PRIx16":%.2"PRIx8":%.2"PRIx8".%"PRIx8 ^ /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:98:27: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] #define PCI_SHORT_PRI_FMT "%.2"PRIx8":%.2"PRIx8".%"PRIx8 ^ /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:98:37: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] #define PCI_SHORT_PRI_FMT "%.2"PRIx8":%.2"PRIx8".%"PRIx8 ^ /home/bodc/workspace/dpdk/build/dpdk-prefix/src/dpdk/x86_64-default-linuxapp-gcc/include/rte_pci.h:98:48: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Werror=literal-suffix] #define PCI_SHORT_PRI_FMT "%.2"PRIx8":%.2"PRIx8".%"PRIx8
[dpdk-dev] [PATCH RFC 00/11] ixgbe/mbuf: add TSO support
On Fri, 9 May 2014 16:50:27 +0200 Olivier Matz wrote: > This series add TSO support in ixgbe DPDK driver. As discussed > previously on the list [1], one problem is that there is not enough room > in rte_mbuf today to store the required information to implement this > feature: > - a new ol_flag > - the MSS > - the L4 header len > > A solution would be to increase the size of the mbuf to 2 cache lines > but it could have a bad impact on performance. This series proposes some > rework to drastically reduce the size of the rte_mbuf structures before > implementing the TSO, avoiding to change the mbuf size to 128 bytes. > > After the rework of mbuf structures, the size of rte_mbuf structure is > reduced by 9 bytes. The implementation of TSO requires to double the > size of ol_flags (16 to 32 bits) and to double the size of offload > information in order to add the mss and the l4 header length (32 to 64 > bits). At the end of the whole series, sizeof(rte_mbuf) is still 64 > bytes and 4 bytes are available for future use. > > This rework causes a lot of modifications in the mbuf structure, > implying some changes in the applications that directly use the mbuf > structure fields instead of using the API functions (sometimes there is > no function). That's why this series is a RFC. In my opinion, it's the > proper moment for this evolution as the 1.7.0 window is open. > > About TSO, the new fields in mbuf try to be generic enough to apply to > other hardware in the future. To delegate the TCP segmentation to the > hardware, the user has to: > > - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies > PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM) > - fill the mbuf->hw_offload information: l2_len, l3_len, l4_len, mss > - calculate the pseudo header checksum and set it in the TCP header, > as required when doing hardware TCP checksum offload > - set the IP checksum to 0 > > Compilation of DPDK and examples is tested for the following > targets: x86_64-*-linuxapp-gcc, i686-*-linuxapp-gcc, x86_64-*-bsdapp-gcc > > The mbuf rework series is validated with autotests: > > cd dpdk.org/ > make install T=x86_64-default-linuxapp-gcc > cd x86_64-default-linuxapp-gcc/ > modprobe uio > insmod kmod/igb_uio.ko > python ../tools/igb_uio_bind.py -b igb_uio :02:00.0 > echo 0 > /proc/sys/kernel/randomize_va_space > echo 1000 > > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages > echo 1000 > > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages > mount -t hugetlbfs none /mnt/huge > make test > > TSO is validated with IPv4 and IPv6 with testpmd (see the commit log of > last patch for details). > > The performance non-regression has been tested with 6WINDGate fast path. > > Note: this patches may conflict with patch [2] which is pushed yet, but > will probably be integrated before this series. > > [1] http://dpdk.org/ml/archives/dev/2013-October/thread.html#572 > [2] http://dpdk.org/ml/archives/dev/2014-April/002166.html > I would also like to propose changing the checksum offload flags. Many devices can indicate good checksum in some cases but can't test for many other types of packets. By changing the flags to be: PKT_RX_L4_CKSUM_GOOD and PKT_RX_IP_CKSUM_GOOD It is then possible to support devices where some cases (IPv4 + TCP) are supported but others are not. This also better aligns with Linux checksum code for cases where mbuf and meta data are being passed into kernel.
[dpdk-dev] [PATCH] malloc: fix rte_free run time in O(n) free blocks
Hi Thomas, >Some patches like this one are not yet reviewed because efforts were >focused >on release 1.6.0r2. This enhancement must be integrated in 1.7.0. >I know that patchwork service is desired and I hope it will be available >soon. I realized that you guys had been very busy with 1.6.0r2. I just wanted to make sure that lower-priority patches didn't fall through the cracks. >By the way, looking at librte_malloc, it seems implementation of lists >could >be simpler. Don't you think we could improve (in another patch) this >whole >code by using BSD macros for lists? Yes, I was surprised to find the malloc code not using any kind of list functions/macros. I am willing to rework the patch. By BSD list macros, I believe you are referring to QUEUE(3) and sys/queue.h. It that right? Thanks, Robert