from:"Wiles, Keith"

[dpdk-dev] [PATCH v9] drivers/net:new PMD using tun/tap host interface

2016-11-29 Thread Wiles, Keith


> On Nov 29, 2016, at 3:36 PM, Aws Ismail  wrote:
> 
> I have verified that adding just a single tap device works with testpmd. But 
> as soon as I try more than one tap device, I would get a coredump, e.g.:
> 
> root@?localhost:~# testpmd -c f -n 4 --socket-mem 512 
> --vdev=net_tap?,iface=tap0? --vdev=net_tap?,iface=tap1? -- -i
> EAL: Detected 16 lcore(s)
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: cannot open /proc/self/numa_maps, consider that all memory is in 
> socket_id 0
> PMD: Initializing pmd_tap for net_tap as dtap0
> PMD: net_tap: Create TAP Ethernet device with 32 queues on numa 0
> PMD: Initializing pmd_tap for net_tap as dtap1
> PMD: net_tap: Create TAP Ethernet device with 32 queues on numa 0
> EAL: failed to initialize net_tap device
> PANIC in rte_eal_init():
> Cannot init pmd devices
> 6: [testpmd() [0x409149]]
> 5: [/lib/libc.so.6(__libc_start_main+0xf0) [0x7f3e65fa8740]]
> 4: [testpmd() [0x408b21]]
> 3: [/usr/lib/librte_eal.so.3.1(rte_eal_init+0xe09) [0x7f3e68ceaea9]]
> 2: [/usr/lib/librte_eal.so.3.1(__rte_panic+0xc0) [0x7f3e68ce9b5a]]
> 1: [/usr/lib/librte_eal.so.3.1(rte_dump_stack+0x18) [0x7f3e68cf2078]]

It appears the call to rte_eth_dev_allocate() in the code is wrong. I did pass 
in  the variable called tap_name created in the function, but I was told I 
needed to pass in ?name? that was passed into the function. The 
rte_eth_dev_allocate() needs a unique name for each call and name is all was 
the same.

Need to change that line to use tap_name instead or tell me the real way to 
handle this problem.

If you want a new patch I can try to get it done, but I am working on something 
else at this time and it could be a few days before I can get the patch out.

> Aborted (core dumped)
> 
> root@?localhost?:~#
> 
> 
> On Fri, Nov 25, 2016 at 2:38 PM, Aws Ismail  wrote:
> Keith,
> 
> This won't build when integrated with v16.11. The register macro
> prefix has been renamed. a v10 is needed.
> 
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index 7f303db..297d4b6 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -752,5 +752,6 @@ static struct rte_vdev_driver pmd_tap_drv = {
> .remove = rte_pmd_tap_remove,
>  };
> 
> -DRIVER_REGISTER_VDEV(net_tap, pmd_tap_drv);
> -DRIVER_REGISTER_PARAM_STRING(net_tap, "iface=,speed=N");
> +RTE_PMD_REGISTER_VDEV(net_tap, pmd_tap_drv);
> +RTE_PMD_REGISTER_ALIAS(net_tap, eth_tap);
> +RTE_PMD_REGISTER_PARAM_STRING(net_tap, "iface=,speed=N");
> 
> On Mon, Nov 21, 2016 at 7:56 AM, Ferruh Yigit  
> wrote:
> > On 10/13/2016 11:03 PM, Keith Wiles wrote:
> >> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
> >> on the local host. The PMD allows for DPDK and the host to
> >> communicate using a raw device interface on the host and in
> >> the DPDK application. The device created is a Tap device with
> >> a L2 packet header.
> >>
> >> v9 - Fix up the docs to use correct syntax
> >> v8 - Fix issue with tap_tx_queue_setup() not return zero on success.
> >> v7 - Reword the comment in common_base and fix the data->name issue
> >> v6 - fixed the checkpatch issues
> >> v5 - merge in changes from list review see related emails
> >>  fixed many minor edits
> >> v4 - merge with latest driver changes
> >> v3 - fix includes by removing ifdef for other type besides Linux
> >>  Fix the copyright notice in the Makefile
> >> v2 - merge all of the patches into one patch
> >>  Fix a typo on naming the tap device
> >>  Update the maintainers list
> >>
> >> Signed-off-by: Keith Wiles 
> >> ---
> >
> > Just a reminder, this is a new PMD and waiting for community review.
> 

Regards,
Keith

[dpdk-dev] Adding API to force freeing consumed buffers in TX ring

2016-11-22 Thread Wiles, Keith


> On Nov 21, 2016, at 9:25 AM, Richardson, Bruce  intel.com> wrote:
> 
> On Mon, Nov 21, 2016 at 04:06:32PM +0100, Olivier Matz wrote:
>> Hi,
>> 
>> On 11/21/2016 03:33 PM, Wiles, Keith wrote:
>>> 
>>>> On Nov 21, 2016, at 4:48 AM, Damjan Marion (damarion) >>> cisco.com> wrote:
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> Currently in VPP we do memcpy of whole packet when we need to do 
>>>> replication as we cannot know if specific buffer is transmitted
>>>> from tx ring before we update it again (i.e. l2 header rewrite).
>>>> 
>>>> Unless there is already a way to address this issue in DPDK which I?m not 
>>>> aware
>>>> of my proposal is that we provide mechanism for polling TX ring 
>>>> for consumed buffers. This can be either completely new API or 
>>>> extension of rte_etx_tx_burst (i.e. special case when nb_pkts=0).
>>>> 
>>>> This will allows us to start polling tx ring when we expect some 
>>>> mbuf back, instead of waiting for next tx burst (which we don?t know
>>>> when it will happen) and hoping that we will reach free_threshold soon.
>>> 
>>> +1
>>> 
>>> In Pktgen I have the problem of not being able to reclaim all of the TX 
>>> mbufs to update them for the next set of packets to send. I know this is 
>>> not a common case, but I do see the case where the application needs its 
>>> mbufs freed off the TX ring. Currently you need to have at least a TX ring 
>>> size of mbufs on hand to make sure you can send to a TX ring. If you 
>>> allocate too few you run into a deadlock case as the number of mbufs  on a 
>>> TX ring does not hit the flush mark. If you are sending to multiple TX 
>>> rings on the same numa node from the a single TX pool you have to 
>>> understand the total number of mbufs you need to have allocated to hit the 
>>> TX flush on each ring. Not a clean way to handle the problems as you may 
>>> have limited memory or require some logic to add more mbufs for dynamic 
>>> ports.
>>> 
>>> Anyway it would be great to require a way to clean up the TX done ring, 
>>> using nb_pkts == 0 is the simplest way, but a new API is fine too.
>>>> 
>>>> Any thoughts?
>> 
>> Yes, it looks useful to have a such API.
>> 
>> I would prefer another function instead of diverting the meaning of
>> nb_pkts. Maybe this?
>> 
>>  void rte_eth_tx_free_bufs(uint8_t port_id, uint16_t queue_id);
>> 
> 
> Third parameter for a limit(hint) of the number of bufs to free? If the
> TX ring is big, we might not want to stall other work for a long time
> while we free a huge number of buffers.

In order to move this along some, if we create the following API:

int rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t 
free_cnt);

Return the number of freed mbufs or -1 if not supported or invalid params.
free_cnt of zero means free all possible mbufs or just at most the number 
suggested.
The free_cnt could be a uint16_t, but I do not think it matters much.

The rte_eth_tx_done_cleanup() call will return -1 if the PMD does not support 
or port_id, queue_id are invalid.

The default in the eth_dev structure of function pointers would be NULL(not 
supported) to not require all of the drivers to be updated today. We can then 
add the support as we go along.

We could have a features request API for tx_done support and PCTYPE, plus 
others if we want to go down that path too.

> 
>   /Bruce

Regards,
Keith

[dpdk-dev] [PATCH] doc: move testpmd guide with other tools

2016-11-10 Thread Wiles, Keith


> On Nov 10, 2016, at 5:02 PM, Thomas Monjalon  
> wrote:
> 
> 2016-11-10 16:11, Mcnamara, John:
>> I had a look at the html output before and after this patch and I don't 
>> quite agree with it. I see that you are trying to clean up and make the 
>> documentation more consistent but I don't know if this is the right way to 
>> do it.
>> 
>> The problem is that TestPMD is a bit of an outlier. It isn't a sample 
>> application and it isn't really a test application despite the name (it is 
>> more of a tester application). Also I don't think that it is a tool/utility 
>> like the other apps in the target directory (if it is seen as a tool then it 
>> should be renamed to something like dpdk-tester for consistency). Testpmd 
>> also has quite a lot of documentation, more than any of our other apps or 
>> utilities, which again makes it an outlier.
> 
> Yes testpmd is not the same kind of tool as others. It helps for tests,
> debugging and demos.
> 
> About the name, as it is packaged as part of the runtime, I think we should
> find a better name. As you said it should start with "dpdk-" and it should
> contain "net" as it does not test the crypto drivers.
> Something like dpdk-testpmd-net.

To me the name dpdk-testpmd-net is a bit long and does testpmd really just test 
PMDs. I was thinking of the name dpdk-tester is really pretty short and 
descriptive IMO. Adding net or pmd to the name does not really add anything as 
dpdk is kind of networking. Just my $0.04 worth. 

> 
>> So my preference is to leave TestPMD in the high level index.
> 
> OK I understand your opinion.
> 
>> However, I do think the High level index should be cleaned up a bit and the 
>> items re-ordered into a more logical sequence. I'll submit a patch for that.
> 
> OK thanks

Regards,
Keith

[dpdk-dev] disable hugepages

2016-11-10 Thread Wiles, Keith


> On Nov 10, 2016, at 6:32 AM, Keren Hochman  
> wrote:
> 
> I tried using the following dpdk options:
> --no-huge --vdev eth_pcap0 ,rx_pcap=/t1,tx_pcap=/t2
> *It's worked but the number of elements is limited, although the machine
> has enough free memory. *rte_mempool_create is failed when I'm trying to
> allocate more memory. Is there any limitation on the memory beside the
> machine?

DPDK will just use the standard linux memory allocator, so no limitation in 
DPDK. Now you could be hitting the limit as a user, need to check your system 
to make sure you can allocate that much memory to a user. Try using the command 
ulimit and see what it reports.

I do not remember exactly how to change limits except with ulimit command. I 
may have modified /etc/security/limits.conf file.

HTH

> 
> *Thanks, Keren *
> 
> On Wed, Nov 9, 2016 at 4:50 PM, Olivier Matz  
> wrote:
> 
>> Hi Keren,
>> 
>> On 11/09/2016 03:40 PM, Keren Hochman wrote:
>>> On Wed, Nov 9, 2016 at 3:40 PM, Christian Ehrhardt <
>>> christian.ehrhardt at canonical.com> wrote:
>>> 
 
 On Wed, Nov 9, 2016 at 1:55 PM, Keren Hochman <
 keren.hochman at lightcyber.com> wrote:
 
> how can I create mempool without hugepages?My application is running
>> on a
> pcap file so no huge pages is needed ?
> 
 
 Not sure if that is what you really want (Debug use only), but in
>> general
 no-huge is available as EAL arg
 
 From http://pktgen.readthedocs.io/en/latest/usage_eal.html :
 
 EAL options for DEBUG use only:
  --no-huge   : Use malloc instead of hugetlbfs
 
>>> I need this option only for testing. How can I use rte_mempool_create if
>> I
>>> use --no-huge?
>> 
>> When using --no-huge, the dpdk libraries (including mempool) allocate
>> its memory in standard memory. Just keep in mind the physical addresses
>> will be wrong, so this memory cannot be given to hw devices.
>> 
>> Regards,
>> Olivier
>> 

Regards,
Keith

[dpdk-dev] Running 2 process on the same machine

2016-11-07 Thread Wiles, Keith

> On Nov 7, 2016, at 7:28 AM, Keren Hochman  
> wrote:
> 
> Hi,
> I need to run 2 process that uses dpdk on the same machine. One uses dpdk
> drivers, and the other just read from a pcap file.  If I disable hugepages
> in the second process rte_mempool_create fails. What is the correct way to
> handle this?

If you look at the two scripts in Pktgen pktgen-master.sh and pktgen-slave.sh 
these two scripts setup two instances of pktgen on the same machine. Plus you 
can read the README.md file.

http://dpdk.org/browse/apps/pktgen-dpdk/refs/

You have to make sure you have enough memory (huge pages) allocated for both 
instances to run.

Then use ?file-prefix XX to give each instance a different prefix for the huge 
page files in /dev/hugepages if that is the location of the files on your 
system. I would remove any files in that directory to free up the memory.

Use the ?socket-mem to allocate the correct amount of memory for each instances 
this way DPDK does not consume all the pages for a given instance.

Make sure you blacklist the ports you do not want on the first instance using 
-b option and then blacklist the ports from the first instance while allowing 
the other ports to be used on the second one.

That should do it for most cases.

> 
> Thanks, Keren

Regards,
Keith

[dpdk-dev] Unable to change source MAC address of packet

2016-10-27 Thread Wiles, Keith

> On Oct 27, 2016, at 6:33 AM, Padam Jeet Singh  
> wrote:
> 
> Hi,
> 
> I am crafting a packet in which the source MAC address as set in the Ethernet 
> header is different than the transmit port?s default MAC address. A packet 
> capture of the packets coming out of this port however comes with source MAC 
> address of the port?s default MAC address.
> 
> Altering the destination MAC address works fine and shows up correctly in 
> packet capture.
> 
> The underlying network interface is an i210 and some logs added to the 
> eth_igb_xmit_pkts function show that the packets I have crafted indeed are 
> reaching the driver with the source MAC address set in the packet code of the 
> application.
> 
> How can I disable this automatic source MAC address setting?

The packets sent with rte_eth_tx_burst() are not forced to a give MAC address. 
If you are using something on top of DPDK like Pktgen or OVS or something, then 
it may try to force a source MAC address. Maybe the hardware does it, but we 
need to know the NIC being used and then someone maybe able to answer. I do not 
know of any Intel NICs do that.

Is this what you are doing.

> 
> Thanks,
> Padam

Regards,
Keith

[dpdk-dev] mbuf changes

2016-10-24 Thread Wiles, Keith


> On Oct 24, 2016, at 10:49 AM, Morten Br?rup  
> wrote:
> 
> First of all: Thanks for a great DPDK Userspace 2016!
> 
> 
> 
> Continuing the Userspace discussion about Olivier Matz?s proposed mbuf 
> changes...
> 
> 
> 
> 1.
> 
> Stephen Hemminger had a noteworthy general comment about keeping metadata for 
> the NIC in the appropriate section of the mbuf: Metadata generated by the 
> NIC?s RX handler belongs in the first cache line, and metadata required by 
> the NIC?s TX handler belongs in the second cache line. This also means that 
> touching the second cache line on ingress should be avoided if possible; and 
> Bruce Richardson mentioned that for this reason m->next was zeroed on free().
> 
> 
> 
> 2.
> 
> There seemed to be consensus that the size of m->refcnt should match the size 
> of m->port because a packet could be duplicated on all physical ports for L3 
> multicast and L2 flooding.
> 
> Furthermore, although a single physical machine (i.e. a single server) with 
> 255 physical ports probably doesn?t exist, it might contain more than 255 
> virtual machines with a virtual port each, so it makes sense extending these 
> mbuf fields from 8 to 16 bits.

I thought we also talked about removing the m->port from the mbuf as it is not 
really needed.

> 
> 
> 
> 3.
> 
> Someone (Bruce Richardson?) suggested moving m->refcnt and m->port to the 
> second cache line, which then generated questions from the audience about the 
> real life purpose of m->port, and if m->port could be removed from the mbuf 
> structure.
> 
> 
> 
> 4.
> 
> I suggested using offset -1 for m->refcnt, so m->refcnt becomes 0 on first 
> allocation. This is based on the assumption that other mbuf fields must be 
> zeroed at alloc()/free() anyway, so zeroing m->refcnt is cheaper than setting 
> it to 1.
> 
> Furthermore (regardless of m->refcnt offset), I suggested that it is not 
> required to modify m->refcnt when allocating and freeing the mbuf, thus 
> saving one write operation on both alloc() and free(). However, this assumes 
> that m->refcnt debugging, e.g. underrun detection, is not required.
> 
> 
> 
> 5.
> 
> And here?s something new to think about:
> 
> m->next already reveals if there are more segments to a packet. Which purpose 
> does m->nb_segs serve that is not already covered by m->next?
> 
> 
> 
> 
> 
> Med venlig hilsen / kind regards
> 
> 
> 
> Morten Br?rup
> 
> CTO
> 
> 
> 
> 
> 
> SmartShare Systems A/S
> 
> Tonsbakken 16-18
> 
> DK-2740 Skovlunde
> 
> Denmark
> 
> 
> 
> Office  +45 70 20 00 93
> 
> Direct  +45 89 93 50 22
> 
> Mobile +45 25 40 82 12
> 
> 
> 
> mb at smartsharesystems.com  
> 
> www.smartsharesystems.com  
> 
> 
> 

Regards,
Keith

[dpdk-dev] Project Governance and Linux Foundation

2016-10-21 Thread Wiles, Keith

Thanks Dave for your work and notes: Comment inline

> On Oct 21, 2016, at 3:00 PM, Dave Neary  wrote:
> 
> Hi all,
> 
> We had a great session yesterday on this topic, I took some notes - does
> anyone who was there have any corrections, or anyone who was not have
> any comments?
> 
> Thanks,
> Dave.
> 
> Tim led the discussion, and started by outlining that he saw there were
> 3 different questions which we should treat independently:
> 
> 1. Is there a benefit to moving DPDK to a foundation?
> 2. If the answer is yes: there are two options currently proposed - a
> low overhead, independent project under the Linux Foundation (LF Lite),
> or joining fd.io as a sub-project. Which one of these is preferable, or
> is there another option to consider?
> 3. Are there any related changes we should consider in technical
> infrastructure and project governance?
> 
> I outlined some advantages I see to the Linux Foundation:
> * Pool resources for events
> * Provides some legal foresight
> * LF standing behind a project gives some companies assurances that
> there is good, open technical governance and a level playing field for
> participants
> 
> Stephen Hemminger asked if there was a sponsorship requirement. Tim
> responded that it is possible to do what Open vSwitch has done, and have
> no membership funding requirement. What that means is that any funds the
> project community wants to spend needs to be budgeted ad hoc.
> 
> A number of others (Shreyansh Jain, Matt Spencer) said they would like
> to see a formal model for non-technical engagement, legal protection for
> patent and copyright, and more clarity on the technical governance.
> 
> Vincent Jardin said that whatever happens, it is vital that DPDK remain
> an open, community-run project.
> 
> A number of people expressed interest in the change, but could not
> commit to funding.
> 
> Jerome Tollet said that he felt it was important to have better test and
> CI infrastructure, and that these cost money. He proposed that since
> fd.io already has infrastructure and a lab, that this would be an
> affordable option for doing this.
> 
> Vincent and Thomas Monjalon suggested that distributed testing was a
> better option - creating an opportunity for different people to send
> test results to a central gathering point. Thomas mentioned that
> Patchwork has a feature which allows aggregation of test results for
> specific patches now.
> 
> Tim asked if there was agreement on a move, and there was no opposition.
> Vincent suggested opening a call for proposals to have a wider range of
> choices than LF Lite or fd.io. Jim St. Leger said we have already had a
> group who evaluated options and made a proposal, and we should not re-do
> the process.
> 
> Jerome recommended that we focus on requirements and criteria for
> determining the choice: timing, governance requirements, budget, and
> hardware/infrastructure requirements. Keith Wiles suggested that there
> was a need for some budgetary requirement to show commitment of
> participating companies.

What I stated was more around, if we moved to LF we do need a budget and 
companies that want to contribute with money and/or people it would be great. I 
wanted to make sure everyone knows anyone can contribute for free to the 
project and the companies putting money in project are not controlling the 
technical part development of DPDK. At one point a year ago it was thought you 
had to pay to play/contribute to DPDK.

I also believe we do need a budget as the services LF provides are not free and 
we need to be able to support the project. If we can do something like OVS did 
with zero budget and still make it work then OK. The only problem I have is 
that model will not work, but I would like to be surprised.

> 
> When asked about transferring the ownership of the domain name to Linux
> Foundation, Vincent reiterated that his main concern was keeping the
> project open, and that he did not anticipate that transferring the
> domain ownership would be an issue.
> 
> Moving on to question 2:
> 
> I said that Red Hat is happy with the technical operation of the
> project, and we don't want to see the community disrupted with toolset
> changes - and it's possible to work with projects like fd.io, OVS, and
> OPNFV to do testing of DPDK.
> 
> Representatives from Brocade, Cavium, and Linaro all voiced a preference
> for a stand-alone lightweight project - one concern voiced was that
> there is a potential perception issue with fd.io too.
> 
> Maciek K and Jerome encouraged everyone not to underestimate the
> difficulty in setting up good CI and testing processes.
> 
> To close out the meeting, Tim summarised the consensus decisions:
> 
> * We agreed to move to a foundation
> * A group will work on re-doing a budget proposal with the Linux
> Foundation - target of 4 weeks to come up with a budget proposal for the
> community
> * There is a preference for an independent project rather than being a
> sub-project
> 
>

[dpdk-dev] [PATCH v5] drivers/net:new PMD using tun/tap host interface

2016-10-12 Thread Wiles, Keith

Regards,
Keith

> On Oct 12, 2016, at 1:19 PM, Wiles, Keith  wrote:
> 
> 
> Regards,
> Keith
> 
>> On Oct 12, 2016, at 9:56 AM, Yigit, Ferruh  wrote:
>> 
>>> +
>>> +static void
>>> +tap_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *tap_stats)
>>> +{
>>> +   unsigned i, imax;
>> 
>> checkpatch complain about not using "unsigned int?
> 
> I ran checkpatch on the patch and saw no errors reported via the 
> scripts/checkpatch.sh. Which checkpatch are you using here?

OK, the scripts/checkpatch.sh does not seem to report any failures, but running 
checkpatch.pl does report problems. Is the scripts/checkpatch.sh script suppose 
to display the errors or what?

It seems odd for the script file not to display warnings and errors, unless it 
is just for validating patch. I would expect the script show the problems 
normally or at least with an option.

[dpdk-dev] [PATCH v5] drivers/net:new PMD using tun/tap host interface

2016-10-12 Thread Wiles, Keith


Regards,
Keith

> On Oct 12, 2016, at 9:56 AM, Yigit, Ferruh  wrote:
> 
> On 10/11/2016 10:51 PM, Keith Wiles wrote:
>> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
>> on the local host. The PMD allows for DPDK and the host to
>> communicate using a raw device interface on the host and in
>> the DPDK application. The device created is a Tap device with
>> a L2 packet header.
>> 
>> v5 - merge in changes from list review see related emails.
>> fixed checkpatch issues and many minor edits
>> v4 - merge with latest driver changes
>> v3 - fix includes by removing ifdef for other type besides Linux.
>> Fix the copyright notice in the Makefile
>> v2 - merge all of the patches into one patch.
>> Fix a typo on naming the tap device.
>> Update the maintainers list
>> 
>> Signed-off-by: Keith Wiles 
>> ---
> 
> <..>
> 
>> diff --git a/config/common_base b/config/common_base
>> index f5d2eff..356c631 100644
>> --- a/config/common_base
>> +++ b/config/common_base
>> @@ -592,3 +592,8 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
>> CONFIG_RTE_TEST_PMD=y
>> CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
>> CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
>> +
>> +#
>> +# Set TAP PMD to 'n' as it is only supported in Linux for now.
> 
> This comments moved to final .config and creates confusion, can we
> remove it if you don't have a strong option to keep it?

What do you mean, the statement is confusing or causes problems?

> 
> <..>
> 
>> diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
>> new file mode 100644
>> index 000..eed81ec
>> --- /dev/null
>> +++ b/doc/guides/nics/tap.rst
> 
> <..>
> 
>> +.. code-block:: console
>> +
>> +   The interfaced name can be changed by adding the iface=foo0
>> +   e.g. --vdev=eth_tap,iface=foo0 --vdev=eth_tap,iface=foo1, ...
> 
> For all file:
> %s/eth_tap/net_tap/g, there are multiple lines with this usage

Missed that one.

> 
> 
> <..>
> 
>> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
>> new file mode 100644
>> index 000..c13aa1b
>> --- /dev/null
>> +++ b/drivers/net/tap/rte_eth_tap.c
> 
> <..>
> 
>> +
>> +struct tap_info {
>> +char name[RTE_ETH_NAME_MAX_LEN]; /* Interface name supplied/given */
>> +int speed;   /* Speed of interface */
>> +};
> 
> This struct can go away, it is not used, since with the updated code
> rte_pmd_tap_probe() used tap_name and speed as seperate variables
> instead of struct.
> 

OK.

> 
> <..>
> 
>> +
>> +/* If the name is different that new name as default */
>> +if (name && strcmp(name, ifr.ifr_name))
>> +snprintf(name, RTE_ETH_NAME_MAX_LEN-1, "%s", ifr.ifr_name);
> 
> syntax, space around "-"
> 
> <..>
> 
>> +
>> +static void
>> +tap_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *tap_stats)
>> +{
>> +unsigned i, imax;
> 
> checkpatch complain about not using "unsigned int?

I ran checkpatch on the patch and saw no errors reported via the 
scripts/checkpatch.sh. Which checkpatch are you using here?
> 
> 
> <..>
> 
>> +static int
>> +tap_setup_queue(struct rte_eth_dev *dev,
>> +struct pmd_internals *internals,
>> +uint16_t qid)
>> +{
>> +struct rx_queue *rx = >rxq[qid];
>> +struct tx_queue *tx = >txq[qid];
>> +int fd;
>> +
>> +if ((fd = rx->fd) < 0)
>> +if ((fd = tx->fd) < 0) {
>> +RTE_LOG(INFO, PMD, "Add queue to TAP %s for qid %d\n",
>> +dev->data->name, qid);
>> +if ((fd = tun_alloc(dev->data->name)) < 0) {
> 
> checkpatch complain about assignment in the if condition
> 
> 
> <..>
> 
>> +/* Now get the space available for data in the mbuf */
>> +buf_size = (uint16_t) (rte_pktmbuf_data_room_size(mp) -
> 
> syntax, no space after cast
> 
> 
> <..>
> 
>> +/* Create the first Tap device */
>> +if ((fd = tun_alloc(tap_name)) < 0) {
> 
> checkpatch complains about assignment in if condition
> 
>> +RTE_LOG(ERR, PMD, "tun_alloc(%s) failed\n", dev->data->name);
>> +rte_free(pmd);
> 
> rte_free(data) or goto error_exit; ?
> 
>> +rte_eth_dev_release_port(dev);
>> +return -EINVAL;
>> +}
>> +
>> +/* Presetup the fds to -1 as being not working */
>> +for(i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
>> +pmd->fds[i] = -1;
>> +pmd->rxq[i].fd = -1;
>> +pmd->txq[i].fd = -1;
>> +}
>> +
>> +/* Take the TUN/TAP fd and place in the first location */
>> +pmd->rxq[0].fd = fd;
>> +pmd->txq[0].fd = fd;
>> +pmd->fds[0] = fd;
>> +
>> +if (pmd_mac_address(fd, dev, >eth_addr) < 0) {
>> +rte_free(pmd);
> 
> rte_free(data) or goto error_exit; ?
> 
> 
> <..>
> 
>> +static int
>> +set_interface_name(const char *key __rte_unused,
>> +   const char *value,
>> +   void *extra_args)
>> +{
>> +char *name = (char *)extra_args;
>> +
>> +if (value)
>> +

[dpdk-dev] [PATCH v4] drivers/net:new PMD using tun/tap host interface

2016-10-11 Thread Wiles, Keith


Regards,
Keith

> On Oct 11, 2016, at 6:49 AM, Yigit, Ferruh  wrote:
> 
> On 10/4/2016 3:45 PM, Keith Wiles wrote:
>> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
>> on the local host. The PMD allows for DPDK and the host to
>> communicate using a raw device interface on the host and in
>> the DPDK application. The device created is a Tap device with
>> a L2 packet header.
>> 

Will try to ship out a v5 soon.
>> v4 - merge with latest driver changes
>> v3 - fix includes by removing ifdef for other type besides Linux.
>> Fix the copyright notice in the Makefile
>> v2 - merge all of the patches into one patch.
>> Fix a typo on naming the tap device.
>> Update the maintainers list
>> 
>> Signed-off-by: Keith Wiles 
>> ---
>> MAINTAINERS |   5 +
>> config/common_linuxapp  |   2 +
>> doc/guides/nics/tap.rst |  84 
>> drivers/net/Makefile|   1 +
>> drivers/net/tap/Makefile|  57 +++
>> drivers/net/tap/rte_eth_tap.c   | 866 
>> 
>> drivers/net/tap/rte_pmd_tap_version.map |   4 +
>> mk/rte.app.mk   |   1 +
>> 8 files changed, 1020 insertions(+)
>> create mode 100644 doc/guides/nics/tap.rst
>> create mode 100644 drivers/net/tap/Makefile
>> create mode 100644 drivers/net/tap/rte_eth_tap.c
>> create mode 100644 drivers/net/tap/rte_pmd_tap_version.map
>> 
> <>
>> diff --git a/config/common_linuxapp b/config/common_linuxapp
>> index 2483dfa..59a2053 100644
>> --- a/config/common_linuxapp
>> +++ b/config/common_linuxapp
>> @@ -44,3 +44,5 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y
>> CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
>> CONFIG_RTE_LIBRTE_POWER=y
>> CONFIG_RTE_VIRTIO_USER=y
>> +CONFIG_RTE_LIBRTE_PMD_TAP=y
> 
> According existing config items, a default value of a config option
> should go to config/common_base, and environment specific config file
> overwrites it if required.
> So this option needs to be added into config/common_base too as disabled
> by default.

Add the define to common_base as no, plus a comment for Linux only.

> 
>> +CONFIG_RTE_PMD_TAP_MAX_QUEUES=32

Moved this to the .c file as a define.
> 
> Is the number of max queues really needs to be a config option, I assume
> in normal use case user won't need to update this and will use single
> queue, if that is true what about pushing this into source code to not
> make config file more complex?
> 
>> diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
> 
> <...>
> 
>> +.. code-block:: console
>> +
>> +   The interfaced name can be changed by adding the iface=foo0
>> +   e.g. --vedv=eth_tap,iface=foo0 --vdev=eth_tap,iface=foo1, ...
> 
> s/vedv/vdev
> eth_tap needs to be net_tap as part of unifying device names work

Fixed.
> 
> <...>
> 
>> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
>> index bc93230..b4afa98 100644
>> --- a/drivers/net/Makefile
>> +++ b/drivers/net/Makefile
>> @@ -55,6 +55,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx
>> DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
>> DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
>> DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
>> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap
> 
> Rest of the PMDs sorted alphabetically, please do same.

Done.
> 
>> 
>> ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>> DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
> 
> <...>
> 
>> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> 
> <...>
> 
>> +
>> +static const char *drivername = "Tap PMD";
>> +static int tap_unit = 0;
> 
> No need to initialize to zero.

Fixed
> 
> <...>
> 
>> +
>> +struct pmd_internals {
>> +char name[RTE_ETH_NAME_MAX_LEN];/* Internal Tap device name */
>> +uint16_t nb_queues; /* Number of queues supported */
>> +uint16_t pad0;
> 
> Why this padding? Is it reserved?

Removed pad0. I just like to know about gaps in the structures is the reason.
> 
>> +struct ether_addr eth_addr; /* Mac address of the device port */
>> +
>> +int if_index;   /* IF_INDEX for the port */
>> +int fds[RTE_PMD_TAP_MAX_QUEUES]; /* List of all file descriptors */
>> +
>> +struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES];/* List of RX queues */
>> +struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */
>> +};
>> +
>> +/*
>> + * Tun/Tap allocation routine
>> + *
>> + * name is the number of the interface to use, unless NULL to take the host
>> + * supplied name.
>> + */
>> +static int
>> +tun_alloc(char * name)
> 
> char *name

Fixed.
> 
> <...>
> 
>> +
>> +/* Always set the fiile descriptor to non-blocking */
> 
> s/fiile/file
> 
>> +if (fcntl(fd, F_SETFL, O_NONBLOCK) < 0) {
>> +RTE_LOG(ERR, PMD, "Unable to set to nonblocking\n");
>> +perror("F_SETFL, NONBLOCK");
>> +goto error;
>> +}
>> +
>> +/* If the name is different that new name as default

[dpdk-dev] [PATCH v4] drivers/net:new PMD using tun/tap host interface

2016-10-11 Thread Wiles, Keith


Regards,
Keith

> On Oct 11, 2016, at 7:28 AM, Yigit, Ferruh  wrote:
> 
> On 10/4/2016 3:45 PM, Keith Wiles wrote:
>> +/*
>> + * Open a TAP interface device.
>> + */
>> +static int
>> +rte_pmd_tap_devinit(const char *name, const char *params)
>> +{
>> +int ret = 0;
>> +struct rte_kvargs *kvlist;
>> +struct tap_info tap_info;
>> +
>> +/* Setup default values */
>> +memset(_info, 0, sizeof(tap_info));
>> +
>> +tap_info.speed = ETH_SPEED_NUM_10G;
>> +snprintf(tap_info.name, sizeof(tap_info.name), "dtap%d", tap_unit++);
>> +
>> +if ((params == NULL) || (params[0] == '\0')) {
>> +RTE_LOG(INFO, PMD, "Initializing pmd_tap for %s\n", name);
>> +
>> +ret = eth_dev_tap_create(name, _info);
>> +goto leave;
>> +}
>> +
>> +RTE_LOG(INFO, PMD, "Initialize %s with params (%s)\n", name, params);
>> +
>> +kvlist = rte_kvargs_parse(params, valid_arguments);
>> +if (!kvlist) {
>> +ret = eth_dev_tap_create(name, _info);
>> +goto leave;
>> +}
>> +
>> +if (rte_kvargs_count(kvlist, ETH_TAP_SPEED_ARG) == 1) {
>> +ret = rte_kvargs_process(kvlist, ETH_TAP_SPEED_ARG,
>> + _interface_speed, _info);
>> +if (ret < 0)
>> +goto leave;
>> +} else
>> +set_interface_speed(NULL, NULL, _info);
>> +
>> +if (rte_kvargs_count(kvlist, ETH_TAP_IFACE_ARG) == 1) {
>> +ret = rte_kvargs_process(kvlist, ETH_TAP_IFACE_ARG,
>> + _interface_name, _info);
>> +if (ret < 0)
>> +goto leave;
>> +} else
>> +set_interface_name(NULL, NULL, (void *)_info);
> 
> Also there must be a eth_dev_tap_create() call after this point to use
> tap_info struct with custom values, right?
> "--vdev eth_tap0,iface=foo0" parameter shouldn't be working with this
> code, right?

Removed the extra code.

> 
>> +
>> +rte_kvargs_free(kvlist);
>> +
>> +leave:
>> +if (ret == -1)
>> +RTE_LOG(INFO, PMD, "Failed to create pmd_tap for %s\n", name);
>> +
>> +return ret;
>> +}
>

[dpdk-dev] [PATCH v4] drivers/net:new PMD using tun/tap host interface

2016-10-11 Thread Wiles, Keith


Regards,
Keith

> On Oct 11, 2016, at 6:30 AM, Micha? Miros?aw  wrote:
> 
> 2016-10-04 16:45 GMT+02:00, Keith Wiles :
>> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
>> on the local host. The PMD allows for DPDK and the host to
>> communicate using a raw device interface on the host and in
>> the DPDK application. The device created is a Tap device with
>> a L2 packet header.
> [...]
>> +static uint16_t
>> +pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>> +{
>> +int len, n;
>> +struct rte_mbuf *mbuf;
>> +struct rx_queue *rxq = queue;
>> +struct pollfd pfd;
>> +uint16_t num_rx;
>> +unsigned long num_rx_bytes = 0;
>> +
>> +pfd.events = POLLIN;
>> +pfd.fd = rxq->fd;
>> +for (num_rx = 0; num_rx < nb_pkts; ) {
>> +n = poll(, 1, 0);
>> +
>> +if (n <= 0)
>> +break;
>> +
> 
> Considering that syscalls are rather expensive, it would be cheaper to
> allocate an mbuf here and free it when read() returns -1 instead of
> calling poll() to check whether a packet is waiting. This way you
> save a syscall per packet and replace one syscall with one mbuf free
> per poll.

I made this change, but saw no performance difference in the two methods. 
Removing poll seems reasonable as it is simpler. TAP is already so slow is why 
the performance did not change is my guess. Anyone wanting to use TAP as a high 
performance interface is going to be surprised. I believe the best use case for 
the TAP interface is for control or exception path.

> 
> Best Regards,
> Micha? Miros?aw

[dpdk-dev] [PATCH] log: do not drop debug logs at compile time

2016-10-03 Thread Wiles, Keith


Regards,
Keith

> On Oct 3, 2016, at 10:37 AM, Olivier Matz  wrote:
> 
> 
> 
> On 10/03/2016 05:27 PM, Wiles, Keith wrote:
>> 
>> Regards,
>> Keith
>> 
>>> On Oct 3, 2016, at 10:02 AM, Olivier Matz  wrote:
>>> 
>>> Hi Keith,
>>> 
>>> On 09/30/2016 05:48 PM, Wiles, Keith wrote:
>>>>> On Sep 30, 2016, at 4:33 AM, Thomas Monjalon >>>> 6wind.com> wrote:
>>>>> 
>>>>> 2016-09-16 09:43, Olivier Matz:
>>>>>> Today, all logs whose level is lower than INFO are dropped at
>>>>>> compile-time. This prevents from enabling debug logs at runtime using
>>>>>> --log-level=8.
>>>>>> 
>>>>>> The rationale was to remove debug logs from the data path at
>>>>>> compile-time, avoiding a test at run-time.
>>>>>> 
>>>>>> This patch changes the behavior of RTE_LOG() to avoid the compile-time
>>>>>> optimization, and introduces the RTE_LOG_DP() macro that has the same
>>>>>> behavior than the previous RTE_LOG(), for the rare cases where debug
>>>>>> logs are in the data path.
>>>>>> 
>>>>>> So it is now possible to enable debug logs at run-time by just
>>>>>> specifying --log-level=8. Some drivers still have special compile-time
>>>>>> options to enable more debug log. Maintainers may consider to
>>>>>> remove/reduce them.
>>>>>> 
>>>>>> Signed-off-by: Olivier Matz 
>>>>> 
>>>>> I think it is a good change.
>>>>> However I'm not sure we should take it for 16.11 as it was sent late and
>>>>> there is no review comment.
>>>>> It is neither really a fix nor really a feature.
>>>>> If there are some +1, and no opinions against, it will go in 16.11.
>>>>> Note that some drivers would need some changes to fully benefit of
>>>>> debug logs enabled at run-time.
>>>> 
>>>> Would this be easier to add a new LOG level instead say DEBUG_DATAPATH and 
>>>> then change the RTE_LOG to exclude the new log level?
>>>> 
>>>> 
>>> 
>>> The log levels are quite standard, I don't feel it would be very clear
>>> to have a new level for that. It would also prevent to have different
>>> log level inside data path.
>> 
>> I am not following you here. Having one more log level for DEBUG in the data 
>> path is not a big change and you can still have any other log level in the 
>> data or anyplace else for that matter.
> 
> Adding a new log level is not a big change, you are right.
> But to me it looks confusing to have DEBUG, INFO, ..., WARNING, ERROR,
> plus a DEBUG_DATAPATH. For instance, how do you compare levels? Or if
> your log stream forwards logs to syslog, you cannot do a 1:1 mapping
> with standard syslog levels.

Doing 1:1 mapping is not a big problem unless you are trying to compare to old 
logs, which maybe the case the first time and after that it should not be a 
problem.

> 
> What makes you feel it's easier to add a log level instead of adding a
> new RTE_LOG_DP() function?

It seems to me the log levels are for displaying logs at different levels 
adding a new macro to not log is just a hack because we do not have a log level 
for data path. This is why I would like to see a log level added and not a new 
macro.

It also appears the new RTE_LOG() will always be in the code as you moved the 
test to the RTE_LOG_DP() macro. This would mean all RTE_LOG() in the code will 
always call rte_log(), correct?

If using a new DEBUG_DP (maybe DATAPATH is a better log level name) level we 
can use the same macro as before and modify the level only. This way we can 
remove via the compiler any log that is below the default RTE_LOG_LEVEL. I see 
keeping the rte_log() could be a performance problem or code blot when you 
really want to remove them all.

The DATAPATH log level would be above (smaller number) then DEBUG in the enum 
list. To remove all debug logs just set the RTE_LOG_LEVEL to RTE_LOG_DATAPATH.


> 
> 
> Regards,
> Olivier

[dpdk-dev] [PATCH] log: do not drop debug logs at compile time

2016-10-03 Thread Wiles, Keith


Regards,
Keith

> On Oct 3, 2016, at 10:02 AM, Olivier Matz  wrote:
> 
> Hi Keith,
> 
> On 09/30/2016 05:48 PM, Wiles, Keith wrote:
>>> On Sep 30, 2016, at 4:33 AM, Thomas Monjalon  
>>> wrote:
>>> 
>>> 2016-09-16 09:43, Olivier Matz:
>>>> Today, all logs whose level is lower than INFO are dropped at
>>>> compile-time. This prevents from enabling debug logs at runtime using
>>>> --log-level=8.
>>>> 
>>>> The rationale was to remove debug logs from the data path at
>>>> compile-time, avoiding a test at run-time.
>>>> 
>>>> This patch changes the behavior of RTE_LOG() to avoid the compile-time
>>>> optimization, and introduces the RTE_LOG_DP() macro that has the same
>>>> behavior than the previous RTE_LOG(), for the rare cases where debug
>>>> logs are in the data path.
>>>> 
>>>> So it is now possible to enable debug logs at run-time by just
>>>> specifying --log-level=8. Some drivers still have special compile-time
>>>> options to enable more debug log. Maintainers may consider to
>>>> remove/reduce them.
>>>> 
>>>> Signed-off-by: Olivier Matz 
>>> 
>>> I think it is a good change.
>>> However I'm not sure we should take it for 16.11 as it was sent late and
>>> there is no review comment.
>>> It is neither really a fix nor really a feature.
>>> If there are some +1, and no opinions against, it will go in 16.11.
>>> Note that some drivers would need some changes to fully benefit of
>>> debug logs enabled at run-time.
>> 
>> Would this be easier to add a new LOG level instead say DEBUG_DATAPATH and 
>> then change the RTE_LOG to exclude the new log level?
>> 
>> 
> 
> The log levels are quite standard, I don't feel it would be very clear
> to have a new level for that. It would also prevent to have different
> log level inside data path.

I am not following you here. Having one more log level for DEBUG in the data 
path is not a big change and you can still have any other log level in the data 
or anyplace else for that matter.

> 
> Regards,
> Olivier

[dpdk-dev] [PATCH 3/3] drivers/net:build support for new tap device driver

2016-09-16 Thread Wiles, Keith


Regards,
Keith

> On Sep 16, 2016, at 2:36 AM, Panu Matilainen  wrote:
> 
> On 09/15/2016 05:10 PM, Keith Wiles wrote:
>> Signed-off-by: Keith Wiles 
>> ---
>> config/common_linuxapp | 3 +++
>> drivers/net/Makefile   | 1 +
>> mk/rte.app.mk  | 1 +
>> 3 files changed, 5 insertions(+)
>> 
>> diff --git a/config/common_linuxapp b/config/common_linuxapp
>> index 2483dfa..704c01c 100644
>> --- a/config/common_linuxapp
>> +++ b/config/common_linuxapp
>> @@ -44,3 +44,6 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y
>> CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
>> CONFIG_RTE_LIBRTE_POWER=y
>> CONFIG_RTE_VIRTIO_USER=y
>> +CONFIG_RTE_LIBRTE_PMD_TAP=y
>> +CONFIG_RTE_PMD_TAP_MAX_QUEUES=32
>> +
>> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
>> index bc93230..b4afa98 100644
>> --- a/drivers/net/Makefile
>> +++ b/drivers/net/Makefile
>> @@ -55,6 +55,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx
>> DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
>> DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
>> DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
>> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap
>> 
>> ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>> DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
>> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
>> index 1a0095b..bd1d10f 100644
>> --- a/mk/rte.app.mk
>> +++ b/mk/rte.app.mk
>> @@ -129,6 +129,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)  += -lrte_pmd_vhost
>> endif # $(CONFIG_RTE_LIBRTE_VHOST)
>> _LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)+= -lrte_pmd_vmxnet3_uio
>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_TAP)+= -lrte_pmd_tap
>> 
>> ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
>> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
>> 
> 
> Splitting the Makefile and config changes into a separate patch makes no 
> sense at all in this case. Just do it in the patch introducing the driver.
> 
> And actually, ditto for documentation.

OK, will do, but I thought other patches followed something like this as I had 
thought I was following the normally process. Without some place defining the 
real process it is hard to decide the real way to send patches. :-(

I will send a new single patch later today with one little typo I fixed.

> 
>   - Panu -

[dpdk-dev] [PATCH 2/3] docs:tun/tap PMD information

2016-09-15 Thread Wiles, Keith


Regards,
Keith

> On Sep 15, 2016, at 9:13 AM, Wiles, Keith  wrote:
> 
> self Nak - just noticed the copyright notices are wrong. 

My mistake I was looking at the wrong file the headers appear correct.

> 
> Regards,
> Keith
> 
>> On Sep 15, 2016, at 9:10 AM, Keith Wiles  wrote:
>> 
>> Signed-off-by: Keith Wiles 
>> ---
>> doc/guides/nics/tap.rst | 84 
>> +
>> 1 file changed, 84 insertions(+)
>> create mode 100644 doc/guides/nics/tap.rst
>> 
>> diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
>> new file mode 100644
>> index 000..072def8
>> --- /dev/null
>> +++ b/doc/guides/nics/tap.rst
>> @@ -0,0 +1,84 @@
>> +..  BSD LICENSE
>> +Copyright(c) 2016 Intel Corporation. All rights reserved.
>> +All rights reserved.
>> +
>> +Redistribution and use in source and binary forms, with or without
>> +modification, are permitted provided that the following conditions
>> +are met:
>> +
>> +* Redistributions of source code must retain the above copyright
>> +notice, this list of conditions and the following disclaimer.
>> +* Redistributions in binary form must reproduce the above copyright
>> +notice, this list of conditions and the following disclaimer in
>> +the documentation and/or other materials provided with the
>> +distribution.
>> +* Neither the name of Intel Corporation nor the names of its
>> +contributors may be used to endorse or promote products derived
>> +from this software without specific prior written permission.
>> +
>> +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> +
>> +Tun/Tap Poll Mode Driver
>> +
>> +
>> +The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces on the local
>> +host. The PMD allows for DPDK and the host to communicate using a raw device
>> +interface on the host and in the DPDK application.
>> +
>> +The device created is a TAP device, which sends/receives packet in a raw 
>> format
>> +with a L2 header. The usage for a TAP PMD is for connectivity to the local 
>> host
>> +using a TAP interface. When the TAP PMD is initialized it will create a 
>> number
>> +of tap devices in the host accessed via 'ifconfig -a' or 'ip' command. The
>> +commands can be used to assign and query the virtual like device.
>> +
>> +These TAP interfaces can be used with wireshark or tcpdump or Pktgen-DPDK 
>> along
>> +with being able to be used as a network connection to the DPDK application. 
>> The
>> +method enable one or more interfaces is to use the --vdev=eth_tap option on 
>> the
>> +DPDK application  command line. Each --vdev=eth_tap option give will create 
>> an
>> +interface named dtap0, dtap1, ... and so forth.
>> +
>> +.. code-block:: console
>> +
>> +   The interfaced name can be changed by adding the iface=foo0
>> +   e.g. --vedv=eth_tap,iface=foo0 --vdev=eth_tap,iface=foo1, ...
>> +
>> +.. code-block:: console
>> +
>> +   Also the speed of the interface can be changed from 10G to whatever 
>> number
>> +   needed, but the interface does not enforce that speed.
>> +   e.g. --vdev=eth_tap,iface=foo0,speed=25000
>> +
>> +After the DPDK application is started you can send and receive packets on 
>> the
>> +interface using the standard rx_burst/tx_burst APIs in DPDK. From the host 
>> point
>> +of view you can use any host tool like tcpdump, wireshark, ping, Pktgen and
>> +others to communicate with the DPDK application. The DPDK application may 
>> not
>> +understand network protocols like IPv4/6, UDP or TCP unless the application 
>> has
>> +been written to

[dpdk-dev] [PATCH 2/3] docs:tun/tap PMD information

2016-09-15 Thread Wiles, Keith

self Nak - just noticed the copyright notices are wrong. 

Regards,
Keith

> On Sep 15, 2016, at 9:10 AM, Keith Wiles  wrote:
> 
> Signed-off-by: Keith Wiles 
> ---
> doc/guides/nics/tap.rst | 84 +
> 1 file changed, 84 insertions(+)
> create mode 100644 doc/guides/nics/tap.rst
> 
> diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
> new file mode 100644
> index 000..072def8
> --- /dev/null
> +++ b/doc/guides/nics/tap.rst
> @@ -0,0 +1,84 @@
> +..  BSD LICENSE
> +Copyright(c) 2016 Intel Corporation. All rights reserved.
> +All rights reserved.
> +
> +Redistribution and use in source and binary forms, with or without
> +modification, are permitted provided that the following conditions
> +are met:
> +
> +* Redistributions of source code must retain the above copyright
> +notice, this list of conditions and the following disclaimer.
> +* Redistributions in binary form must reproduce the above copyright
> +notice, this list of conditions and the following disclaimer in
> +the documentation and/or other materials provided with the
> +distribution.
> +* Neither the name of Intel Corporation nor the names of its
> +contributors may be used to endorse or promote products derived
> +from this software without specific prior written permission.
> +
> +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +Tun/Tap Poll Mode Driver
> +
> +
> +The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces on the local
> +host. The PMD allows for DPDK and the host to communicate using a raw device
> +interface on the host and in the DPDK application.
> +
> +The device created is a TAP device, which sends/receives packet in a raw 
> format
> +with a L2 header. The usage for a TAP PMD is for connectivity to the local 
> host
> +using a TAP interface. When the TAP PMD is initialized it will create a 
> number
> +of tap devices in the host accessed via 'ifconfig -a' or 'ip' command. The
> +commands can be used to assign and query the virtual like device.
> +
> +These TAP interfaces can be used with wireshark or tcpdump or Pktgen-DPDK 
> along
> +with being able to be used as a network connection to the DPDK application. 
> The
> +method enable one or more interfaces is to use the --vdev=eth_tap option on 
> the
> +DPDK application  command line. Each --vdev=eth_tap option give will create 
> an
> +interface named dtap0, dtap1, ... and so forth.
> +
> +.. code-block:: console
> +
> +   The interfaced name can be changed by adding the iface=foo0
> +   e.g. --vedv=eth_tap,iface=foo0 --vdev=eth_tap,iface=foo1, ...
> +
> +.. code-block:: console
> +
> +   Also the speed of the interface can be changed from 10G to whatever number
> +   needed, but the interface does not enforce that speed.
> +   e.g. --vdev=eth_tap,iface=foo0,speed=25000
> +
> +After the DPDK application is started you can send and receive packets on the
> +interface using the standard rx_burst/tx_burst APIs in DPDK. From the host 
> point
> +of view you can use any host tool like tcpdump, wireshark, ping, Pktgen and
> +others to communicate with the DPDK application. The DPDK application may not
> +understand network protocols like IPv4/6, UDP or TCP unless the application 
> has
> +been written to understand these protocols.
> +
> +If you need the interface as a real network interface meaning running and has
> +a valid IP address then you can do this with the following commands:
> +
> +.. code-block:: console
> +
> +   sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0
> +   sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1
> +
> +Please change the IP addresses as you see fit.
> +
> +If routing is enabled on the host you can also communicate with the DPDK App
> +over the internet via a standard socket layer application as long as you 
> account
> +for the protocol handing in the application.
> +
> +If you have a Network Stack in your DPDK application or something like it you
> +can utilize that stack to handle the network protocols. Plus you would be 
> able
> +to address the interface using an

[dpdk-dev] Run 2 different testpmd from the same machine

2016-09-12 Thread Wiles, Keith

Regards,
Keith

> On Sep 12, 2016, at 6:18 AM, Keren Hochman  
> wrote:
> 
> Hi,
> I tried to run 2 instances of testpmd from the same machine but received a
> message:  Cannot get hugepage information when I tried to run the second
> instance. Is there a way to disable hugepages or allow to instances to
> access it ? Thanks. keren

Running two instances or more DPDK applications you need to make sure the 
resources are split up correctly. You did not supply your command lines being 
used, but I will try to state how it is done.

First memory or huge pages must to allocated to each instance using the 
?socket-mem 128,128 or ?socket-men 128 if you only have one socket in your 
system. Make sure you have enough huge pages allocated in the /etc/sysctl.conf 
file for both instances. In the ?socket-mem 128,128 you are giving 256 huge 
pages to one instance and if the second instance used ?socket-mem 256,256 then 
512 pages, which means you need 256+512 huge pages in the system.

Next the huge page files in the /dev/hugepage directory must have different 
prefixes by using the ?file-prefix option giving different file prefixes for 
each instance. If you have already run DPDK instance once without the option 
please ?sudo rm -fr /dev/hupages/*? to release the current huge pages.

Next you need to make sure you blacklist the port using the -b option on the 
command line of the ports not used by this instance. Each instance needs to 
blacklist the ports not being used. This seems to be the easiest for me, but 
you could look into use the whitelist option as well.

Next make sure you allocate different cores to each instance using the -c or -l 
option, the -l option is a bit easier to read IMO.

Next use the ?proc-type auto in both instances just to be clear. This could be 
optional I think.

I hope this helps. You can also pull down Pktgen and look at the 
pktgen-master.sh and pktgen-slave.sh scripts and modify them for your needs. 
http://dpdk.org/download

[dpdk-dev] rte_kni.ko with lo_mode=lo_mode_ring

2016-09-02 Thread Wiles, Keith

HI Gary,

Regards,
Keith

> On Sep 2, 2016, at 8:39 AM, Mussar, Gary  wrote:
> 
> The pktgen docs state that the rte_kni.ko should be loaded with 
> lo_mode=lo_mode_ring however the source in dpdk master does not appear to 
> understand this value. This wasn't an issue in the past since the kni code 
> would simply disable lo_mode if you passed in an unknown value.
> 
> The Ubuntu 4.4.0-36-generic kernel no longer loads the module if the passed 
> value is not in the table of known values (the 4.4.0-34-generic kernel would 
> still load the module).
> 
> I have 2 questions:
> 1)
> Why does pktgen specify using a value for lo_mode that the source does not 
> support? Is lo_mode_ring something that is coming but not yet submitted?

The KNI support and kernel module is more of a DPDK issue then Pktgen, which is 
the application on top of DPDK. I do not know the reason lo_mode_ring is not 
working correctly. Look in the Maintainers file and see who maintains the KNI 
code.

> 
> 2)
> Is the Ubuntu kernel being over aggressive in parameter checking or is this a 
> "good" thing? If it is good, then should the pktgen docs be indicating that 
> an unsupported value be used for lo_mode?

I assume the Ubuntu Kernel is attempting to protect itself. Each time the 
kernel is updated it is a good think to recompile DPDK and the KNI kernel 
module or the module may not load because of a version mis-match.

> 
> Gary

[dpdk-dev] [PATCH] doc/guides: add info on how to enable QAT

2016-08-30 Thread Wiles, Keith

Regards,
Keith

> On Aug 30, 2016, at 8:46 AM, Thomas Monjalon  
> wrote:
> 
> 2016-08-30 14:26, Eoin Breen:
>> --- a/doc/guides/cryptodevs/qat.rst
>> +++ b/doc/guides/cryptodevs/qat.rst
>> @@ -78,6 +78,11 @@ Installation
>> To use the DPDK QAT PMD an SRIOV-enabled QAT kernel driver is required. The
>> VF devices exposed by this driver will be used by QAT PMD.
>> 
>> +To enable QAT in DPDK you must change the ./config/common_base file. Change 
>> the
>> +line 'CONFIG_RTE_LIBRTE_PMD_QAT=n' to 'CONFIG_RTE_LIBRTE_PMD_QAT=y' to do 
>> this.
> 
> No, the recommended way is to change the value in the generated config
> file (.config).

The way I have been changing the default configuration options is to copy the 
config/defconfig_XYZ file like defconfig_x86_64-native-linuxapp-gcc to a new 
name say defconfig-x86_64-qat-linuxapp-gcc. Then edit that file and add the 
CONFIG_RTE_LIBRTE_PMD_QAT=y to the bottom of the file. Then ?make install 
T=x86_64-qat-linuxapp-gcc -j?.

Is this not a better way to build a new configuration for a specific reason?

> 
> PS: please avoid confidential disclaimer

[dpdk-dev] Huge ring allocation

2016-08-25 Thread Wiles, Keith

Regards,
Keith

> On Aug 25, 2016, at 1:05 AM, Gregory Etelson  wrote:
> 
> Hello,
> 
> I have a DPDK process that creates pktmbuf pool with 2_000_000 objects
> In CentOS 6.x x86-64, ring size of this pool is 16MB
> 8 2MB hugepages required to hold such ring.

Have you try to use 1G huge pages, is that an option?

> In some cases, hugepages are too fragmented and there are no 8 contiguous 
> hugepages for the ring.
> As the result, the process has enough hugepages memory, but ring allocation 
> fails.

Another issue sometimes is huge pages are allocated after the system has 
booted, which means you need to assign the number of huge pages very early in 
the boot process. This means adding a line in the sysctrl.conf file instead of 
poking the value later.

vm.nr_hugepages=XXX

> I use a workaround for these cases and create a ring with standard OS 
> allocation routines.
> Is there another way I could use unclaimed dpdk hugepages for that allocation 
> ?
> 
> Regards,
> Gregory

[dpdk-dev] help

2016-08-19 Thread Wiles, Keith


Regards,
Keith

> On Aug 19, 2016, at 1:09 PM, Stephen Hemminger  networkplumber.org> wrote:
> 
> On Fri, 19 Aug 2016 13:11:48 +
>  wrote:
> 
>> Hi ,
>> 
>> I needs to enable SMP(symmetrical multiprocessing) in DPDK PMD NIC Port .
>> 
>> Please let me know the command to get the same
>> 
>> Thanks
>> jayachandran
>> The information contained in this electronic message and any attachments to 
>> this message are intended for the exclusive use of the addressee(s) and may 
>> contain proprietary, confidential or privileged information. If you are not 
>> the intended recipient, you should not disseminate, distribute or copy this 
>> e-mail. Please notify the sender immediately and destroy all copies of this 
>> message and any attachments. WARNING: Computer viruses can be transmitted 
>> via email. The recipient should check this email and any attachments for the 
>> presence of viruses. The company accepts no liability for any damage caused 
>> by any virus transmitted by this email. www.wipro.com
> 
> Thanks for your interest. Please direct all user questions to the DPDK user 
> mailing list users at dpdk.org.
> 
> The DPDK is inherently SMP, read the documentation about how lcore's are 
> setup.

Here is a good link to read the DPDK docs 
https://readthedocs.org/projects/dpdk/ 

> 
> Also don't add corporate signature's like this on open mailing lists.
> Because of the proprietary label, the email would be ignored by many users.
>

[dpdk-dev] [RFC 0/4] Use Google Test as DPDK unit test framework

2016-08-04 Thread Wiles, Keith


> On Aug 4, 2016, at 2:47 PM, Jim Murphy  wrote:
> 
> Hi,
> 
> We are looking at using our existing test environment for our DPDK
> applications that will run on our build servers. Hughpages therefore is an
> issue. What is involved in running DPDK without huge pages?

Command line option  ?no-huge should work. Note two dashs in front.

> 
> Thanks,
> 
> Jim
> 
> 
> On Wed, Aug 3, 2016 at 1:46 PM, Ming Zhao(??) 
> wrote:
> 
>> googletest is a very nice test framework and we use it very
>> extensively in our company(Luminate Wireless), together with gmock.
>> 
>> I understand the resistance from the maintainers that are concerned
>> about introducing a C++ dependency to a pure C code base. The approach
>> we take doesn't require any change to the dpdk core, instead we just
>> use things like a mock PMD(through gmock framework) to allow mocking
>> the RX/TX code path, disabling huge page usage in test so that the
>> test can be easily launched without worrying about huge page
>> collision, etc.
>> 
>> Personally I highly recommend using googletest plus some basic test
>> cases, which removes a lot of boilerplate and let the developers focus
>> the test itself.
>> 
>> On Wed, Aug 3, 2016 at 2:57 AM, Doherty, Declan
>>  wrote:
>>> 
>>> 
 -Original Message-
>>> ...
 You are not advocating but the unit test must be written in C++.
 I don't think it is a good idea to force people to write and maintain
>> the tests
 in a different language than the code it tests.
>>> 
>>> I know where you are coming from on this point, and I general would
>> agree if
>>> it were not for the advantages you get from C++ test framework. Having
>> worked with
>>> multiple C and C++ frameworks, I've found that one of the biggest
>> advantages of the
>>> C++ frameworks is the amount of boilerplate code they can save you from
>> writing. Also
>>> nearly all of C frameworks I've used make use macros to the point that
>> they look more like
>>> objective C than C. In general I feel that even if the test code is
>> written in C++ the code itself
>>> should be simple enough that someone with even a passing knowledge of
>> C++ could easily
>>> understand the intent of the test code.
>>> 
> Some of the major advantages of google test that I see over
>> continuing to use
 the
> current test include giving a consist feel to all tests, a powerful
>> test
> execution framework which allow individual test suites or tests to be
>> specified
> from the command line, support for a standard xunit output which can
>> be
 integrated
> into a continuous build systems, and a very powerful mocking library
> which allows much more control over testing failure conditions.
 
 It would be interesting to better describe in details what is missing
>> currently
 and what such a framework can bring.
 (I agree there is a huge room for improvements on unit tests)
>>> 
>>> Some of the things I've come across include:
>>> No standard output format to integrated with continuous regression
>> systems
>>> No ability to specify specific unit tests or groups of tests to run from
>> the command line
>>> No standard set of test assertions used across the test suites.
>>> No standard setup and teardown functions across test suites, state from
>> previous test
>>> suite can break current
>>> Requirement to use a python script to orchestrate test runs.
>>> No support for mocking functionality.
>>> 
>>> I know that none of the above couldn't be fixed in our current test
>> application, but I would
>>> question if it is effort worthwhile when we take an off the shelf
>> framework, which does all
>>> those things and a whole lot more, which has been test and used in a
>> huge variety of
>>> projects.
>>> 
>>> I certainly willing to look at other frameworks both C and C++ but I yet
>> to find a C framework
>>> which come close to the usability and flexibility of the popular C++
>> ones.
>>> 
>>> 
>>> 
>>

[dpdk-dev] [PATCH] validate_abi: build faster by augmenting make with job count

2016-08-01 Thread Wiles, Keith


>> Neil
>> 
> Sorry for the delayed response, I've been on vacation.
> 
>> Your modified copy of make has no bearing on the topic we are taking about 
>> customers using dpdk in standard distros right?
>> 
> !?  I really don't know what you're saying here.  My only reason for 
> commenting
> on my copy of make was to consider an explination for why make -j 0 might be
> working for me, and not for you.  I've since uninstalled and resinalled make 
> on
> my system, and my copy now behaves as yours does
> 
> But thats all really irrelevant.  I don't know what you mean by this not 
> having
> any bearing on the conversation since we're talking about customers using dpdk
> in a distro.  We're not really talking about that at all (if we were using 
> make
> would be a moot point, since distro users tend to only use pre-compiled
> binaries).  Were talking about accelerating the build process when comparing
> ABIs on two different dpdk versions, thats all.

Neil, (I am trying to not read your style of text as condescending and I will 
try to not do that as well)

Not everyone uses DPDK from prebuilt libraries and we need to support them as 
well, correct?

> 
>> Seems odd to me to send this out with 0 or lspci as it may fail because of 
>> no lspci and will fail on all Ubuntu systems. 
>> 
> Again, don't really understand what your saying.  If you look at the patch,
> neither of your assertions are true.  With this patch and no other change, the
> validate_abi script behaves exactly as it does now.  The only thing I've done 
> is
> add a check for the DPDK_MAKE_JOBS environment variable, and if its not set,
> either:
> 
> a) Set DPDK_MAKE_JOBS to 1 if lscpu doesn't exist on the system
> b) Set DPDK_MAKE_JOBS to the number of online cpus if lscpu does exist
> 
> All of that gets overridden if you manually set DPDK_MAKE_JOBS to something
> else.
> 
> seems pretty straightforward to me.

At this point do we need to add yet another environment variable to get the 
correct behavior with DPDK. DPDK is very simple to build today and I worry we 
keep adding special variables to build DPDK. Can we just use a cleaner default, 
then adding more and more special build requirements? Adding this one is fine, 
but it also means the customer must read the docs to find this new variable.

DPDK should be build able with the least amount of docs to read, then they can 
read the docs more later. Just looking at how the developer can get started 
building DPDK without RTFM problem. At some point they need to read the docs to 
possibly runs the examples, but to build DPDK should very simple IMO.

> 
>> If we ship with 1 then why even bother the adding code and if I have to edit 
>> the file or some other method to get better compile performance then why 
>> bother as well.
>> 
> Please stop and think for a minute.  Why would you need to edit a file to 
> change
> anything?  If lscpu exists, then everything happens automatically.  If it
> doesn't you can still just run:
> export DPDK_MAKE_JOBS=$BIGNUMBER; validate_abi.sh 

Please do not add extra environment variable we would start to get to the point 
of having so many pre-build requirements it becomes the private knowledge to 
just build DPDK or a huge setup/RTFM problem.

> 
> and it works fine.  If ubuntu has some other utiilty besides lscpu to parse 
> cpu
> counts, fine, we can add that in as a feature, but I don't see why that should
> stop other, non-ubuntu systems from taking advantage of this.
> 
>> Setting the value to some large number does not make any sense to me and if 
>> I have to edit file every time or maintain a patch just seems silly. 
>> 
> Goodness Keith, stop for just a minute with the editing the file train your 
> on.
> Its an environment variable, you don't have to edit a file to change it.

Yes Neil, you also need to stop an think about what you are placing on the 
developer to build DPDK. This one little problem is not the real issue to me, 
but a symptom of a growing problem in DPDK around how DPDK is build and the 
amount of knowledge or setup it requires to do this one simple task.

> 
>> It just seems easier to set it to -j and not use lspci at all. This way we 
>> all win as I am not following your logic at all.
>> 
> This doesn't even make any sense.  If you set it to -j then you get make 
> issuing
> jobs at the max rate make can issue them, which is potentially fine, but may 
> not
> be what developers want in the event they don't want this script to monopolize
> their system.  The way its written now lets people get reasonable behavior in
> most cases, and opt-in to the extreeme case should they desire.  That makes 
> far
> more sense to me, then just chewing up as much cpu as possible all the time.

I only suggest -j as this would give the developer the best build performance 
without having to require lscpu or setting up yet another build environment 
variable. The lscpu command does not exist on all systems today and other 
non-Linux based

[dpdk-dev] Application framework vs. library

2016-08-01 Thread Wiles, Keith


> On Aug 1, 2016, at 5:04 AM, Thomas Monjalon  
> wrote:
> 
> Hi,
> 
> Thanks for explaining your context.
> Your concerns are known under the term "usability enhancements".
> As they require some API changes, we won't make them in the release 16.11.
> But we must work on it now to be able to announce the API changes for 17.02
> before the 16.11 release.
> 
> 2016-08-01 10:18, Steffen.Bauch at rohde-schwarz.com:
>> Concerning the integration we try to cope with the circumstance that
>> DPDK is more comparable to an application framework and less to a
>> library.
> 
> I think DPDK should be easier to use and considered as a normal library.
> 
> 
> [...]
>> Logging behavior
>> 
>> Opening a connection to the system logger for our program is within the
>> responsibility of our specific logging module that is also used in the
>> proprietary parts of the application. For this reason openlog() should
>> not be called in our use case. In addition, using rte_openlog_stream()
>> is not usable before rte_eal_init() as logging init will overwrite the
>> used log stream. For this reason a large part of the init logs can not
>> be handled by a custom log stream. Btw, after removal of the log caching,
>> is there a fundamental difference between early log and normal logging?
> 
> You're right. Now that the log history is removed, we can rework the log
> initialization and reconsider early logging.
> 
>> A possible approach for the future could be to initialize rte_logs.file
>> to NULL (in fact it is, as it is static) and only set it during
>> initialization if it is NULL with the goal to be able to use
>> rte_openlog_stream() before rte_eal_init(). As the openlog() call is
>> only relevant when the default log stream is used (and architecture
>> dependent!) I introduced a specific entry point for calling openlog. The
>> main complexity to this changeset is added due to two reasons. First
>> reason is the circumstance that rte_eal_common_log_init() is used several
>> times during early logging and real logging initialization. Second
>> aspect is that a user might revert to the use of the default_log_stream
>> after a custom log stream has be used straight from the beginning (even
>> before rte_eal_init()). A dedicated changeset towards v16.07 for this
>> improvement is attached for review, feedback and possible inclusion.
> 
> Before entering in the proposed design for a custom log handler, we should
> discuss the desired features in DPDK logging.
> Is there some developer who would like to see the logs improved to be
> dynamically tuned in run-time live?
> Or should we just allow a custom log handler and provide only a default
> minimal handler?

I would suggest we have a custom log handler for applications that have their 
own logging system. I believe we still need a logging system in DPDK as a 
complete ?none? library type system. DPDK was designed as a complete system (or 
appears that way), but I understand moving toward a library design is 
reasonable. I just do not want to lose the complete system design as we should 
be able to have both.

The only down side to a library design is testing becomes more complex IMO as 
we do not have a standard configuration(s) and ?complete? applications. DPDK is 
a complex designed system and it can become difficult to test the library 
interactions if we do not have a clean way to test the complete set of 
libraries.

> 
> 
>> Process termination behavior
>> 
>> As the DPDK functionality is only a small part of the whole application
>> ownership of the running process is not with DPDK and it is not allowed
>> to cancel the running process as a result of an error. For this reason,
>> the current changeset instead of calling rte_panic or rte_exit returns
>> an error code and handles it in the calling function. Unfortunately this
>> change was only implemented in rte_eal_init() and not in other places
>> (drivers, libs ...), so there is currently no safety that an unknown
>> code part terminates the whole application on the spot. Future changes
>> and patches could change the error handling in a more thorough approach,
>> but I would like to have some feedback and opinions before I start the
>> heavy-lifting work in over 700 code locations. Even some interfaces
>> might be affected, as lots of functions return void now,
> 
> Yes, please let's remove every panic/exit calls from DPDK.
> The only reason it is still there is that nobody took the time to remove
> these traces from the early support of bare metal environment (now dropped).

Removing the panic calls is find, but the patch just replaces them with return 
-1 and will not do as you need to know why the calls failed. Replacing them 
with returning an error code and calling a log output would be much more 
reasonable.

> 
> 
>> Thread creation
>> 
>> In our application thread creation and setting the affinity is already
>> handled in a proprietary module. Frames are polled and processed
>> in non-EAL pthreads. For this

[dpdk-dev] dpdk-pktgen how to show more than 4 ports in one page?

2016-07-30 Thread Wiles, Keith

> On Jul 30, 2016, at 1:03 AM, linhaifeng  wrote:
> 
> hi
> 
> I use 6 ports to send pkts in VM, but can only 4 ports work, how to enable 
> more ports to work?
> 
In the help screen the command ?ppp [1-6]? is ports per page.

[dpdk-dev] [PATCH] validate_abi: build faster by augmenting make with job count

2016-07-24 Thread Wiles, Keith



Sent from my iPhone

> On Jul 21, 2016, at 1:34 PM, Neil Horman  wrote:
> 
>> On Thu, Jul 21, 2016 at 03:22:45PM +0000, Wiles, Keith wrote:
>> 
>>> On Jul 21, 2016, at 10:06 AM, Neil Horman  wrote:
>>> 
>>> On Thu, Jul 21, 2016 at 02:09:19PM +, Wiles, Keith wrote:
>>>> 
>>>>> On Jul 21, 2016, at 8:54 AM, Neil Horman  wrote:
>>>>> 
>>>>> On Wed, Jul 20, 2016 at 10:32:28PM +, Wiles, Keith wrote:
>>>>>> 
>>>>>>> On Jul 20, 2016, at 3:16 PM, Neil Horman  
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On Wed, Jul 20, 2016 at 07:47:32PM +, Wiles, Keith wrote:
>>>>>>>> 
>>>>>>>>> On Jul 20, 2016, at 12:48 PM, Neil Horman  
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 20, 2016 at 07:40:49PM +0200, Thomas Monjalon wrote:
>>>>>>>>>> 2016-07-20 13:09, Neil Horman:
>>>>>>>>>>> From: Neil Horman 
>>>>>>>>>>> 
>>>>>>>>>>> John Mcnamara and I were discussing enhacing the validate_abi 
>>>>>>>>>>> script to build
>>>>>>>>>>> the dpdk tree faster with multiple jobs.  Theres no reason not to 
>>>>>>>>>>> do it, so this
>>>>>>>>>>> implements that requirement.  It uses a MAKE_JOBS variable that can 
>>>>>>>>>>> be set by
>>>>>>>>>>> the user to limit the job count.  By default the job count is set 
>>>>>>>>>>> to the number
>>>>>>>>>>> of online cpus.
>>>>>>>>>> 
>>>>>>>>>> Please could you use the variable name DPDK_MAKE_JOBS?
>>>>>>>>>> This name is already used in scripts/test-build.sh.
>>>>>>>>> Sure
>>>>>>>>> 
>>>>>>>>>>> +if [ -z "$MAKE_JOBS" ]
>>>>>>>>>>> +then
>>>>>>>>>>> +# This counts the number of cpus on the system
>>>>>>>>>>> +MAKE_JOBS=`lscpu -p=cpu | grep -v "#" | wc -l`
>>>>>>>>>>> +fi
>>>>>>>>>> 
>>>>>>>>>> Is lscpu common enough?
>>>>>>>>> I'm not sure how to answer that.  lscpu is part of the util-linux 
>>>>>>>>> package, which
>>>>>>>>> is part of any base install.  Theres a variant for BSD, but I'm not 
>>>>>>>>> sure how
>>>>>>>>> common it is there.
>>>>>>>>> Neil
>>>>>>>>> 
>>>>>>>>>> Another acceptable default would be just "-j" without any number.
>>>>>>>>>> It would make the number of jobs unlimited.
>>>>>>>> 
>>>>>>>> I think the best is just use -j as it tries to use the correct number 
>>>>>>>> of jobs based on the number of cores, right?
>>>>>>> -j with no argument (or -j 0), is sort of, maybe what you want.  With 
>>>>>>> either of
>>>>>>> those options, make will just issue jobs as fast as it processes 
>>>>>>> dependencies.
>>>>>>> Dependent on how parallel the build is, that can lead to tons of 
>>>>>>> waiting process
>>>>>>> (i.e. more than your number of online cpus), which can actually hurt 
>>>>>>> your build
>>>>>>> time.
>>>>>> 
>>>>>> I read the manual and looked at the code, which supports your statement. 
>>>>>> (I think I had some statement on stack overflow and the last time I 
>>>>>> believe anything on the internet :-) I have not seen a lot of 
>>>>>> differences in compile times with -j on my system. Mostly I suspect it 
>>>>>> is the number of paths in the dependency, cores and memory on the system.
>>>>>> 
>>>>>> I have 72 lcores or 2 sockets, 18 cores per socket. Xeon 2.3Ghz cores.
>>>>>> 
>>>>>> $ export RTE_TARGET=x86_64-native-linuxapp-gcc 
>>>>>> 
&g

[dpdk-dev] [PATCH] validate_abi: build faster by augmenting make with job count

2016-07-21 Thread Wiles, Keith


> On Jul 21, 2016, at 10:06 AM, Neil Horman  wrote:
> 
> On Thu, Jul 21, 2016 at 02:09:19PM +0000, Wiles, Keith wrote:
>> 
>>> On Jul 21, 2016, at 8:54 AM, Neil Horman  wrote:
>>> 
>>> On Wed, Jul 20, 2016 at 10:32:28PM +, Wiles, Keith wrote:
>>>> 
>>>>> On Jul 20, 2016, at 3:16 PM, Neil Horman  wrote:
>>>>> 
>>>>> On Wed, Jul 20, 2016 at 07:47:32PM +, Wiles, Keith wrote:
>>>>>> 
>>>>>>> On Jul 20, 2016, at 12:48 PM, Neil Horman  wrote:
>>>>>>> 
>>>>>>> On Wed, Jul 20, 2016 at 07:40:49PM +0200, Thomas Monjalon wrote:
>>>>>>>> 2016-07-20 13:09, Neil Horman:
>>>>>>>>> From: Neil Horman 
>>>>>>>>> 
>>>>>>>>> John Mcnamara and I were discussing enhacing the validate_abi script 
>>>>>>>>> to build
>>>>>>>>> the dpdk tree faster with multiple jobs.  Theres no reason not to do 
>>>>>>>>> it, so this
>>>>>>>>> implements that requirement.  It uses a MAKE_JOBS variable that can 
>>>>>>>>> be set by
>>>>>>>>> the user to limit the job count.  By default the job count is set to 
>>>>>>>>> the number
>>>>>>>>> of online cpus.
>>>>>>>> 
>>>>>>>> Please could you use the variable name DPDK_MAKE_JOBS?
>>>>>>>> This name is already used in scripts/test-build.sh.
>>>>>>>> 
>>>>>>> Sure
>>>>>>> 
>>>>>>>>> +if [ -z "$MAKE_JOBS" ]
>>>>>>>>> +then
>>>>>>>>> + # This counts the number of cpus on the system
>>>>>>>>> + MAKE_JOBS=`lscpu -p=cpu | grep -v "#" | wc -l`
>>>>>>>>> +fi
>>>>>>>> 
>>>>>>>> Is lscpu common enough?
>>>>>>>> 
>>>>>>> I'm not sure how to answer that.  lscpu is part of the util-linux 
>>>>>>> package, which
>>>>>>> is part of any base install.  Theres a variant for BSD, but I'm not 
>>>>>>> sure how
>>>>>>> common it is there.
>>>>>>> Neil
>>>>>>> 
>>>>>>>> Another acceptable default would be just "-j" without any number.
>>>>>>>> It would make the number of jobs unlimited.
>>>>>> 
>>>>>> I think the best is just use -j as it tries to use the correct number of 
>>>>>> jobs based on the number of cores, right?
>>>>>> 
>>>>> -j with no argument (or -j 0), is sort of, maybe what you want.  With 
>>>>> either of
>>>>> those options, make will just issue jobs as fast as it processes 
>>>>> dependencies.
>>>>> Dependent on how parallel the build is, that can lead to tons of waiting 
>>>>> process
>>>>> (i.e. more than your number of online cpus), which can actually hurt your 
>>>>> build
>>>>> time.
>>>> 
>>>> I read the manual and looked at the code, which supports your statement. 
>>>> (I think I had some statement on stack overflow and the last time I 
>>>> believe anything on the internet :-) I have not seen a lot of differences 
>>>> in compile times with -j on my system. Mostly I suspect it is the number 
>>>> of paths in the dependency, cores and memory on the system.
>>>> 
>>>> I have 72 lcores or 2 sockets, 18 cores per socket. Xeon 2.3Ghz cores.
>>>> 
>>>> $ export RTE_TARGET=x86_64-native-linuxapp-gcc 
>>>> 
>>>> $ time make install T=${RTE_TARGET}
>>>> real   0m59.445s user  0m27.344s sys   0m7.040s
>>>> 
>>>> $ time make install T=${RTE_TARGET} -j
>>>> real   0m26.584s user  0m14.380s sys   0m5.120s
>>>> 
>>>> # Remove the x86_64-native-linuxapp-gcc
>>>> 
>>>> $ time make install T=${RTE_TARGET} -j 72
>>>> real   0m23.454s user  0m10.832s sys   0m4.664s
>>>> 
>>>> $ time make install T=${RTE_TARGET} -j 8
>>>> real   0m23.812s user  0m10.672s sys   0m4.276s
>>>> 
>>>> cd x86_64-native-linuxapp-gcc
>&g

[dpdk-dev] [PATCH] validate_abi: build faster by augmenting make with job count

2016-07-21 Thread Wiles, Keith


> On Jul 21, 2016, at 8:54 AM, Neil Horman  wrote:
> 
> On Wed, Jul 20, 2016 at 10:32:28PM +0000, Wiles, Keith wrote:
>> 
>>> On Jul 20, 2016, at 3:16 PM, Neil Horman  wrote:
>>> 
>>> On Wed, Jul 20, 2016 at 07:47:32PM +, Wiles, Keith wrote:
>>>> 
>>>>> On Jul 20, 2016, at 12:48 PM, Neil Horman  wrote:
>>>>> 
>>>>> On Wed, Jul 20, 2016 at 07:40:49PM +0200, Thomas Monjalon wrote:
>>>>>> 2016-07-20 13:09, Neil Horman:
>>>>>>> From: Neil Horman 
>>>>>>> 
>>>>>>> John Mcnamara and I were discussing enhacing the validate_abi script to 
>>>>>>> build
>>>>>>> the dpdk tree faster with multiple jobs.  Theres no reason not to do 
>>>>>>> it, so this
>>>>>>> implements that requirement.  It uses a MAKE_JOBS variable that can be 
>>>>>>> set by
>>>>>>> the user to limit the job count.  By default the job count is set to 
>>>>>>> the number
>>>>>>> of online cpus.
>>>>>> 
>>>>>> Please could you use the variable name DPDK_MAKE_JOBS?
>>>>>> This name is already used in scripts/test-build.sh.
>>>>>> 
>>>>> Sure
>>>>> 
>>>>>>> +if [ -z "$MAKE_JOBS" ]
>>>>>>> +then
>>>>>>> +   # This counts the number of cpus on the system
>>>>>>> +   MAKE_JOBS=`lscpu -p=cpu | grep -v "#" | wc -l`
>>>>>>> +fi
>>>>>> 
>>>>>> Is lscpu common enough?
>>>>>> 
>>>>> I'm not sure how to answer that.  lscpu is part of the util-linux 
>>>>> package, which
>>>>> is part of any base install.  Theres a variant for BSD, but I'm not sure 
>>>>> how
>>>>> common it is there.
>>>>> Neil
>>>>> 
>>>>>> Another acceptable default would be just "-j" without any number.
>>>>>> It would make the number of jobs unlimited.
>>>> 
>>>> I think the best is just use -j as it tries to use the correct number of 
>>>> jobs based on the number of cores, right?
>>>> 
>>> -j with no argument (or -j 0), is sort of, maybe what you want.  With 
>>> either of
>>> those options, make will just issue jobs as fast as it processes 
>>> dependencies.
>>> Dependent on how parallel the build is, that can lead to tons of waiting 
>>> process
>>> (i.e. more than your number of online cpus), which can actually hurt your 
>>> build
>>> time.
>> 
>> I read the manual and looked at the code, which supports your statement. (I 
>> think I had some statement on stack overflow and the last time I believe 
>> anything on the internet :-) I have not seen a lot of differences in compile 
>> times with -j on my system. Mostly I suspect it is the number of paths in 
>> the dependency, cores and memory on the system.
>> 
>> I have 72 lcores or 2 sockets, 18 cores per socket. Xeon 2.3Ghz cores.
>> 
>> $ export RTE_TARGET=x86_64-native-linuxapp-gcc 
>> 
>> $ time make install T=${RTE_TARGET}
>> real 0m59.445s user  0m27.344s sys   0m7.040s
>> 
>> $ time make install T=${RTE_TARGET} -j
>> real 0m26.584s user  0m14.380s sys   0m5.120s
>> 
>> # Remove the x86_64-native-linuxapp-gcc
>> 
>> $ time make install T=${RTE_TARGET} -j 72
>> real 0m23.454s user  0m10.832s sys   0m4.664s
>> 
>> $ time make install T=${RTE_TARGET} -j 8
>> real 0m23.812s user  0m10.672s sys   0m4.276s
>> 
>> cd x86_64-native-linuxapp-gcc
>> $ make clean
>> $ time make
>> real 0m28.539s user  0m9.820s sys0m3.620s
>> 
>> # Do a make clean between each build.
>> 
>> $ time make -j
>> real 0m7.217s user   0m6.532s sys0m2.332s
>> 
>> $ time make -j 8
>> real 0m8.256s user   0m6.472s sys0m2.456s
>> 
>> $ time make -j 72
>> real 0m6.866s user   0m6.184s sys0m2.216s
>> 
>> Just the real time numbers in the following table.
>> 
>> processes real Time   depdirs
>> no -j 59.4sYes
>>   -j 8 23.8sYes
>>  -j 7223.5sYes
>>-j   26.5sYes
>> 
>> no -j 28.5s No
>>   -j 8   8.

[dpdk-dev] [PATCH] validate_abi: build faster by augmenting make with job count

2016-07-20 Thread Wiles, Keith


> On Jul 20, 2016, at 3:16 PM, Neil Horman  wrote:
> 
> On Wed, Jul 20, 2016 at 07:47:32PM +0000, Wiles, Keith wrote:
>> 
>>> On Jul 20, 2016, at 12:48 PM, Neil Horman  wrote:
>>> 
>>> On Wed, Jul 20, 2016 at 07:40:49PM +0200, Thomas Monjalon wrote:
>>>> 2016-07-20 13:09, Neil Horman:
>>>>> From: Neil Horman 
>>>>> 
>>>>> John Mcnamara and I were discussing enhacing the validate_abi script to 
>>>>> build
>>>>> the dpdk tree faster with multiple jobs.  Theres no reason not to do it, 
>>>>> so this
>>>>> implements that requirement.  It uses a MAKE_JOBS variable that can be 
>>>>> set by
>>>>> the user to limit the job count.  By default the job count is set to the 
>>>>> number
>>>>> of online cpus.
>>>> 
>>>> Please could you use the variable name DPDK_MAKE_JOBS?
>>>> This name is already used in scripts/test-build.sh.
>>>> 
>>> Sure
>>> 
>>>>> +if [ -z "$MAKE_JOBS" ]
>>>>> +then
>>>>> + # This counts the number of cpus on the system
>>>>> + MAKE_JOBS=`lscpu -p=cpu | grep -v "#" | wc -l`
>>>>> +fi
>>>> 
>>>> Is lscpu common enough?
>>>> 
>>> I'm not sure how to answer that.  lscpu is part of the util-linux package, 
>>> which
>>> is part of any base install.  Theres a variant for BSD, but I'm not sure how
>>> common it is there.
>>> Neil
>>> 
>>>> Another acceptable default would be just "-j" without any number.
>>>> It would make the number of jobs unlimited.
>> 
>> I think the best is just use -j as it tries to use the correct number of 
>> jobs based on the number of cores, right?
>> 
> -j with no argument (or -j 0), is sort of, maybe what you want.  With either 
> of
> those options, make will just issue jobs as fast as it processes dependencies.
> Dependent on how parallel the build is, that can lead to tons of waiting 
> process
> (i.e. more than your number of online cpus), which can actually hurt your 
> build
> time.

I read the manual and looked at the code, which supports your statement. (I 
think I had some statement on stack overflow and the last time I believe 
anything on the internet :-) I have not seen a lot of differences in compile 
times with -j on my system. Mostly I suspect it is the number of paths in the 
dependency, cores and memory on the system.

I have 72 lcores or 2 sockets, 18 cores per socket. Xeon 2.3Ghz cores.

$ export RTE_TARGET=x86_64-native-linuxapp-gcc 

$ time make install T=${RTE_TARGET}
real0m59.445s user  0m27.344s sys   0m7.040s

$ time make install T=${RTE_TARGET} -j
real0m26.584s user  0m14.380s sys   0m5.120s

# Remove the x86_64-native-linuxapp-gcc

$ time make install T=${RTE_TARGET} -j 72
real0m23.454s user  0m10.832s sys   0m4.664s

$ time make install T=${RTE_TARGET} -j 8
real0m23.812s user  0m10.672s sys   0m4.276s

cd x86_64-native-linuxapp-gcc
$ make clean
$ time make
real0m28.539s user  0m9.820s sys0m3.620s

# Do a make clean between each build.

$ time make -j
real0m7.217s user   0m6.532s sys0m2.332s

$ time make -j 8
real0m8.256s user   0m6.472s sys0m2.456s

$ time make -j 72
real0m6.866s user   0m6.184s sys0m2.216s

Just the real time numbers in the following table.

processes real Time   depdirs
 no -j 59.4sYes
   -j 8 23.8sYes
  -j 7223.5sYes
-j   26.5sYes

 no -j 28.5s No
   -j 8   8.2s No
  -j 72  6.8s No
-j 7.2s No

Looks like the depdirs build time on my system:
$ make clean -j
$ rm .depdirs
$ time make -j
real0m23.734s user  0m11.228s sys   0m4.844s

About 16 seconds, which is not a lot of savings. Now the difference from no -j 
to -j is a lot, but the difference between -j and -j  is not a huge 
saving. This leads me back to over engineering the problem when ?-j? would work 
just as well here.

Even on my MacBook Pro i7 system the difference is not that much 1m8s without 
depdirs build for -j in a VirtualBox with all 4 cores 8G RAM. Compared to 1m13s 
with -j 4 option.

I just wonder if it makes a lot of sense to use cpuinfo in this given case if 
it turns out to be -j works with the 80% rule?

On some other project with a lot more files like the FreeBSD or Linux distro, 
yes it would make a fair amount of real time difference.

Keith

> 
> While its fine in los of cases, its not always fine, and with this
> implementation you can still opt in to that behavior by setting 
> DPDK_MAKE_JOBS=0
> 
> Neil
> 
>>

[dpdk-dev] [PATCH] validate_abi: build faster by augmenting make with job count

2016-07-20 Thread Wiles, Keith


> On Jul 20, 2016, at 12:48 PM, Neil Horman  wrote:
> 
> On Wed, Jul 20, 2016 at 07:40:49PM +0200, Thomas Monjalon wrote:
>> 2016-07-20 13:09, Neil Horman:
>>> From: Neil Horman 
>>> 
>>> John Mcnamara and I were discussing enhacing the validate_abi script to 
>>> build
>>> the dpdk tree faster with multiple jobs.  Theres no reason not to do it, so 
>>> this
>>> implements that requirement.  It uses a MAKE_JOBS variable that can be set 
>>> by
>>> the user to limit the job count.  By default the job count is set to the 
>>> number
>>> of online cpus.
>> 
>> Please could you use the variable name DPDK_MAKE_JOBS?
>> This name is already used in scripts/test-build.sh.
>> 
> Sure
> 
>>> +if [ -z "$MAKE_JOBS" ]
>>> +then
>>> +   # This counts the number of cpus on the system
>>> +   MAKE_JOBS=`lscpu -p=cpu | grep -v "#" | wc -l`
>>> +fi
>> 
>> Is lscpu common enough?
>> 
> I'm not sure how to answer that.  lscpu is part of the util-linux package, 
> which
> is part of any base install.  Theres a variant for BSD, but I'm not sure how
> common it is there.
> Neil
> 
>> Another acceptable default would be just "-j" without any number.
>> It would make the number of jobs unlimited.

I think the best is just use -j as it tries to use the correct number of jobs 
based on the number of cores, right?

[dpdk-dev] [PATCH] rte_delay_us can be replaced with user function

2016-07-19 Thread Wiles, Keith

Hi Jozef,

I have two quick comments inline.
> On Jul 19, 2016, at 7:42 AM, jozmarti at cisco.com wrote:
> 
> From: Jozef Martiniak 
> 
> when running single-core, some drivers tend to call rte_delay_us for a
> long time, and that is causing packet drops.
> Attached patch introduces 2 new functions:
> 
> void rte_delay_us_callback_register(void(*userfunc)(unsigned));
> void rte_delay_us_callback_unregister(void);
> 
> First one replaces rte_delay_us with userfunc and second one restores
> original rte_delay_us.
> Test user_delay_us is included.
> 
> Signed-off-by: Jozef Martiniak 
> ---
> app/test/test_cycles.c | 39 ++
> lib/librte_eal/common/eal_common_timer.c   | 19 +++
> lib/librte_eal/common/include/generic/rte_cycles.h | 13 
> 3 files changed, 71 insertions(+)
> 
> diff --git a/app/test/test_cycles.c b/app/test/test_cycles.c
> index f6c043a..2b44a53 100644
> --- a/app/test/test_cycles.c
> +++ b/app/test/test_cycles.c
> @@ -90,3 +90,42 @@ test_cycles(void)
> }
> 
> REGISTER_TEST_COMMAND(cycles_autotest, test_cycles);
> +
> +/*
> + * rte_delay_us_callback test
> + *
> + * - check if callback is correctly registered/unregistered
> + *
> + */
> +
> +static int pattern;
> +static void my_rte_delay_us(unsigned us)
> +{
> +pattern += us;
> +}
> +
> +static int
> +test_user_delay_us(void)
> +{
> +pattern = 0;
> +
> +rte_delay_us_callback_register(my_rte_delay_us);
> +
> +rte_delay_us(2);
> +if (pattern != 2)
> +return -1;
> +
> +rte_delay_us(3);
> +if (pattern != 5)
> +return -1;
> +
> +rte_delay_us_callback_unregister();
> +
> +rte_delay_us(3);
> +if (pattern != 5)
> +return -1;
> +
> +return 0;
> +}
> +
> +REGISTER_TEST_COMMAND(user_delay_us, test_user_delay_us);
> diff --git a/lib/librte_eal/common/eal_common_timer.c 
> b/lib/librte_eal/common/eal_common_timer.c
> index c4227cd..a982562 100644
> --- a/lib/librte_eal/common/eal_common_timer.c
> +++ b/lib/librte_eal/common/eal_common_timer.c
> @@ -47,9 +47,18 @@
> /* The frequency of the RDTSC timer resolution */
> static uint64_t eal_tsc_resolution_hz;
> 
> +/* User function which replaces rte_delay_us function */
> +static void (*rte_delay_us_override)(unsigned) = NULL;
> +
> void
> rte_delay_us(unsigned us)
> {
> + if (unlikely(rte_delay_us_override != NULL))
> + {
> + rte_delay_us_override(us);
> + return;
> + }
> +
>   const uint64_t start = rte_get_timer_cycles();
>   const uint64_t ticks = (uint64_t)us * rte_get_timer_hz() / 1E6;
>   while ((rte_get_timer_cycles() - start) < ticks)
> @@ -84,3 +93,13 @@ set_tsc_freq(void)
>   RTE_LOG(DEBUG, EAL, "TSC frequency is ~%" PRIu64 " KHz\n", freq / 1000);
>   eal_tsc_resolution_hz = freq;
> }
> +
> +void rte_delay_us_callback_register(void (*userfunc)(unsigned))
> +{
> +rte_delay_us_override = userfunc;
> +}
> +
> +void rte_delay_us_callback_unregister(void)
> +{
> +rte_delay_us_override = NULL;
> +}

I guess I would have used the rte_delay_us_callback_register(NULL) to 
unregister, but this is fine.

> diff --git a/lib/librte_eal/common/include/generic/rte_cycles.h 
> b/lib/librte_eal/common/include/generic/rte_cycles.h
> index 8cc21f2..274f798 100644
> --- a/lib/librte_eal/common/include/generic/rte_cycles.h
> +++ b/lib/librte_eal/common/include/generic/rte_cycles.h
> @@ -202,4 +202,17 @@ rte_delay_ms(unsigned ms)
>   rte_delay_us(ms * 1000);
> }
> 
> +/**
> + * Replace rte_delay_us with user defined function.
> + *
> + * @param userfunc
> + *   User function which replaces rte_delay_us.
> + */
> +void rte_delay_us_callback_register(void(*userfunc)(unsigned));
> +
> +/**
> + * Unregister user callback function. Restores original rte_delay_us.
> + */
> +void rte_delay_us_callback_unregister(void);

Just a note we need to add these two new APIs to the map file for ABI checking.

Other then these two comments I would give this one a +1 unless someone else 
has some comments.

> +
> #endif /* _RTE_CYCLES_H_ */
> -- 
> 2.1.4
>

[dpdk-dev] [PATCH] spinlock:move constructor function out of header

2016-07-14 Thread Wiles, Keith


> On Jul 14, 2016, at 2:59 PM, Thomas Monjalon  
> wrote:
> 
> Thanks Keith for continuing work.
> 
> 2016-07-14 14:31, Keith Wiles:
>> Moving the rte_rtm_init() constructor routine out of the
>> header file and into a new rte_spinlock.c for all archs/platforms.
>> Having constructor routines in a header file is not a good
>> place to keep these types of functions.
>> 
>> The problem is with linking 3rd party libraries when
>> an application is not linked directly to dpdk libraries, which
>> in this case causes a missing symbol in the linking phase.
>> 
>> Fixes: ba7468997ea6 ("spinlock: add HTM lock elision for x86")
>> 
>> Originally submitted by Damjan Marion 
> 
> You should keep the original Signed-off and authorship (From: field)
> with "git am".
> It is also easier to track when using -v2 --in-reply-to='?.

I can try and add that into the patch for v2.
> 
>> Signed-off-by: Keith Wiles 
>> ---
>> lib/librte_eal/bsdapp/eal/Makefile |  1 +
>> lib/librte_eal/common/arch/arm/rte_spinlock.c  | 46 
>> ++
>> lib/librte_eal/common/arch/ppc_64/rte_spinlock.c   | 46 
>> ++
>> lib/librte_eal/common/arch/tile/rte_spinlock.c | 46 
>> ++
>> lib/librte_eal/common/arch/x86/rte_spinlock.c  | 46 
>> ++
>> .../common/include/arch/x86/rte_spinlock.h | 14 ++-
>> lib/librte_eal/linuxapp/eal/Makefile   |  1 +
> 
> I am not sure we should add a .c file for each arch, given it is called only
> from arch/x86/rte_spinlock.h.

I did not like having the .c for everyone, but the previous comment seemed to 
suggest it. I am willing to change it any better method, just let me know what 
you think. I would like just one.

On a side note I have combined the bsdapp and linuxapp into a single directory 
before. It is doable and it eliminates a number of duplicate files or code. 
Plus a also added support for OS X for DPDK, but I do not have access to any 
NICs with that version yet other then virtual ones. I could submit it and may 
be someone will write the kext to make it work. :-)

[dpdk-dev] pktgen wr_cksum error

2016-07-11 Thread Wiles, Keith

the line is using C99 option of ?for (int i = 0; i < X; i++) need to move the 
declare of the int i out to the function instead of in the for loop.
I have fixed his already, but will make a special patch to update this problem 
when i can.

> On Jul 11, 2016, at 9:09 AM, Posadas, Emerson  
> wrote:
> 
> Hello
> 
> I'm trying to build pktgen-v3.0.05 with dpdk 16.04 for with RTE_TARGET 
> x86_64-native-linuxapp-gcc. Seemts that the build is failing due to an error 
> on wr_cksum.c. Is something I can try to fix this error and build pktgen 
> successfully? Not sure if others have had this error before.
> 
> # make
> == lib
> == common
>  CC wr_cksum.o
> In file included from /root/pktgen-v3.0.05/lib/common/wr_cksum.c:102:0:
> /root/pktgen-v3.0.05/lib/common/wr_mbuf.h: In function 
> '__pktmbuf_alloc_noreset':
> /root/pktgen-v3.0.05/lib/common/wr_mbuf.h:47:2: error: 'for' loop initial 
> declarations are only allowed in C99 mode
> /root/pktgen-v3.0.05/lib/common/wr_mbuf.h:47:2: note: use option -std=c99 or 
> -std=gnu99 to compile your code
> make[3]: *** [wr_cksum.o] Error 1
> make[2]: *** [all] Error 2
> make[1]: *** [common] Error 2
> make: *** [lib] Error 2
> root at pktgen:~
> 
> 
> EP
>

[dpdk-dev] [PATCH v3 00/11] Fix build errors related to exported headers

2016-07-08 Thread Wiles, Keith


> On Jul 8, 2016, at 4:56 AM, Yigit, Ferruh  wrote:
> 
> On 7/8/2016 9:05 AM, Adrien Mazarguil wrote:
>> On Thu, Jul 07, 2016 at 06:33:17PM +0000, Wiles, Keith wrote:
>>> 
>>>> On Jul 7, 2016, at 10:49 AM, Adrien Mazarguil >>> 6wind.com> wrote:
>>>> 
>>>> DPDK uses GNU C language extensions in most of its code base. This is fine
>>>> for internal source files whose compilation flags are controlled by DPDK,
>>>> however user applications that use exported "public" headers may experience
>>>> compilation failures when enabling strict error/standard checks (-std and
>>>> -pedantic for instance).
>>> 
>>> Do you try compiling these changes with CLANG and/or ICC compilers?
>> 
>> clang/clang++ yes, works fine. I did not try with ICC however.
> 
> I tested with icc, getting following error [1], compiler warning seems
> valid, but didn't investigate what in your patch cause this.
> 
> [1]
> .../app/test/test_table_acl.c(487): error #2405: array of elements
> containing a flexible array member is nonstandard
>struct rte_pipeline_table_entry entries[5];
>^
> 
> .../app/test/test_table_acl.c(492): error #2405: array of elements
> containing a flexible array member is nonstandard
>struct rte_pipeline_table_entry entries_ptr[5];

I am guessing it does not like the uint8_t action_data[0] in the 
rte_pipeline_table_entry structure. I can see why it would be non-standard 
allocated on the stack in this case. Maybe a keyword like __extension__ needs 
to be added or pragma.

> 
>> 
>> Note that considering "({ ... })" is a GNU extension, compilers that do
>> support this syntax also support the GNU __extension__ keyword. As a result,
>> those that do not support this keyword most likely already cannot compile
>> DPDK at all.
>> 
>

[dpdk-dev] [PATCH v3 00/11] Fix build errors related to exported headers

2016-07-07 Thread Wiles, Keith


> On Jul 7, 2016, at 10:49 AM, Adrien Mazarguil  
> wrote:
> 
> DPDK uses GNU C language extensions in most of its code base. This is fine
> for internal source files whose compilation flags are controlled by DPDK,
> however user applications that use exported "public" headers may experience
> compilation failures when enabling strict error/standard checks (-std and
> -pedantic for instance).

Do you try compiling these changes with CLANG and/or ICC compilers?

[dpdk-dev] Low packet generation rate of 526kpps using pktgen-dpdk from inside VM

2016-07-05 Thread Wiles, Keith

-Original Message-
From: Abhishek Mahajan <abhishek.maha...@aricent.com>
Date: Tuesday, July 5, 2016 at 1:49 AM
To: Keith Wiles , "dev at dpdk.org" 
Cc: "Addepalli, Srinivasa R" , "Shivastava, 
RakeshX" 
Subject: RE: [dpdk-dev] Low packet generation rate of 526kpps using pktgen-dpdk 
from inside VM

Hi Keith,

I tried your core mask suggestion, still I am getting same performance values.

I just enabled logs for virtio tx in DPDK to find the problem, I am getting 
"PMD: virtio_xmit_pkts() tx: No free tx descriptors to transmit".
This seems to be virtio issue, is there any tuning parameters for increasing 
the virtio performance for using with pktgen-dpdk ?

KW: Pktgen has its own mbufs for sending this message is from the virtio driver 
and you need to look at the configuration options for that driver. Normally the 
TX mbufs are used directly from the application (Pktgen), but in this case it 
must have its own set of mbufs.

Regards
Abhishek

-Original Message-
From: Wiles, Keith [mailto:keith.wi...@intel.com]
Sent: Monday, July 04, 2016 8:03 PM
To: Abhishek Mahajan ; dev at dpdk.org
Cc: Addepalli, Srinivasa R ; RakeshX 
Contact 
Subject: Re: [dpdk-dev] Low packet generation rate of 526kpps using pktgen-dpdk 
from inside VM

-Original Message-
From: dev  on behalf of Abhishek Mahajan 
<abhishek.maha...@aricent.com>
Date: Monday, July 4, 2016 at 7:27 AM
To: "dev at dpdk.org" 
Cc: "Addepalli, Srinivasa R" , "Shivastava, 
RakeshX" 
Subject: [dpdk-dev] Low packet generation rate of 526kpps using pktgen-dpdk 
from inside VM

Hi,

With pktgen-dpdk inside VM, it is observed that packet TX rate (526Kpps with 
64byte size) is very low. How can higher packet rates be achieved ?

On enabling logs of virtio pmd in DPDK, getting following logs in syslogs:
PMD: virtio_xmit_pkts() tx: No free tx descriptors to transmit

Two PC setup having configuration as follow:
-> Xeon Processor : Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz  having 6 cores.
-> RAM : 16GB
->One PC is generating traffic other is receiving the packets using pktgen 
tool. The two PC are connected directly with 10G Ethernet interface.

Generation side PC - Host
-> 2 cores dedicated to OVS-DPDK
-> 1 core is free for Linux kernel
-> OS: Ubuntu 14.04
-> Linux kernel : 3.13.0-85-generic
-> DPDK version : 2.2.0
-> OVS version : 2.5.0
-> VFIO driver for port binding

VM - Total 3 cores
-> 2 cores for pktgen
-> 1 core for Linux kernel
-> OS: Ubuntu 15.04
-> Linux kernel : 3.19.8
-> DPDK version : 16.04
-> Pktgen version : 3.0.00
-> UIO driver for port binding

DPDK, OVS and pktgen are build using - x86_64-ivshmen-linuxapp-gcc option.

VM launched through following Qemu command :

* qemu-system-x86_64 --enable-kvm -smp 3 -cpu host -m 2048 -chardev 
socket,id=charnet0,path=/var/run/openvswitch/vhost-user1 -netdev 
type=vhost-user,id=hostnet0,chardev=charnet0,vhostforce=on -device 
virtio-net-pci,netdev=hostnet0,mac=16:fa:13:26:31:1b -object 
memory-backend-file,id=mem,size=2048M,mem-path=/dev/hugepages,share=on -numa 
node,memdev=mem -mem-prealloc -hda /home/aricent/ubuntu_15_04.img

Pktgen is launched using following command:
Pktgen -c -0x3 -n 4 -m 512 -b 000:00:04.0 - p 0x3 -t -m 
"[1:1].0"

KW: I do not think this is Pktgen problem, but please try pktgen ?c 0x6 ?n 4 ?m 
512 ?b :00:04.0 -- -T ?m?[2:2].0?
Core 0 needs to be left alone for Linux and core 1 is used for Pktgen display 
and core 2 is used for Rx/Tx on port 0.

OVS command used:

* /usr/sbin/ovs-vswitchd --dpdk -c 0xc -n 4 -m 2048 -- 
unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info 
--mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log 
--pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach -monitor

Thanks & Regards
Abhishek
"DISCLAIMER: This message is proprietary to Aricent and is intended solely for 
the use of the individual to whom it is addressed. It may contain privileged or 
confidential information and should not be circulated or used for any purpose 
other than for what it is intended. If you have received this message in error, 
please notify the originator immediately. If you are not the intended 
recipient, you are notified that you are strictly prohibited from using, 
copying, altering, or disclosing the contents of this message. Aricent accepts 
no responsibility for loss or damage arising from the use of the information 
transmitted by this email including damage from virus."

"DISCLAIMER: This message is proprietary to Aricent and is intended solely for 
the use of the individual to whom it is addressed. It may contain privileged or 
confidential information and should not be circulated or used for any purpose 
other than for what it is intended. If you have received this message in error, 
please notify the originator

[dpdk-dev] Low packet generation rate of 526kpps using pktgen-dpdk from inside VM

2016-07-04 Thread Wiles, Keith

-Original Message-
From: dev  on behalf of Abhishek Mahajan 

Date: Monday, July 4, 2016 at 7:27 AM
To: "dev at dpdk.org" 
Cc: "Addepalli, Srinivasa R" , "Shivastava, 
RakeshX" 
Subject: [dpdk-dev] Low packet generation rate of 526kpps using pktgen-dpdk 
from inside VM

Hi,

With pktgen-dpdk inside VM, it is observed that packet TX rate (526Kpps with 
64byte size) is very low. How can higher packet rates be achieved ?

On enabling logs of virtio pmd in DPDK, getting following logs in syslogs:
PMD: virtio_xmit_pkts() tx: No free tx descriptors to transmit

Two PC setup having configuration as follow:
-> Xeon Processor : Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz  having 6 cores.
-> RAM : 16GB
->One PC is generating traffic other is receiving the packets using pktgen 
tool. The two PC are connected directly with 10G Ethernet interface.

Generation side PC - Host
-> 2 cores dedicated to OVS-DPDK
-> 1 core is free for Linux kernel
-> OS: Ubuntu 14.04
-> Linux kernel : 3.13.0-85-generic
-> DPDK version : 2.2.0
-> OVS version : 2.5.0
-> VFIO driver for port binding

VM - Total 3 cores
-> 2 cores for pktgen
-> 1 core for Linux kernel
-> OS: Ubuntu 15.04
-> Linux kernel : 3.19.8
-> DPDK version : 16.04
-> Pktgen version : 3.0.00
-> UIO driver for port binding

DPDK, OVS and pktgen are build using - x86_64-ivshmen-linuxapp-gcc option.

VM launched through following Qemu command :

* qemu-system-x86_64 --enable-kvm -smp 3 -cpu host -m 2048 -chardev 
socket,id=charnet0,path=/var/run/openvswitch/vhost-user1 -netdev 
type=vhost-user,id=hostnet0,chardev=charnet0,vhostforce=on -device 
virtio-net-pci,netdev=hostnet0,mac=16:fa:13:26:31:1b -object 
memory-backend-file,id=mem,size=2048M,mem-path=/dev/hugepages,share=on -numa 
node,memdev=mem -mem-prealloc -hda /home/aricent/ubuntu_15_04.img

Pktgen is launched using following command:
Pktgen -c -0x3 -n 4 -m 512 -b 000:00:04.0 - p 0x3 -t -m 
"[1:1].0"

KW: I do not think this is Pktgen problem, but please try pktgen ?c 0x6 ?n 4 ?m 
512 ?b :00:04.0 -- -T ?m?[2:2].0?
Core 0 needs to be left alone for Linux and core 1 is used for Pktgen display 
and core 2 is used for Rx/Tx on port 0.

OVS command used:

* /usr/sbin/ovs-vswitchd --dpdk -c 0xc -n 4 -m 2048 -- 
unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info 
--mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log 
--pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach -monitor

Thanks & Regards
Abhishek
"DISCLAIMER: This message is proprietary to Aricent and is intended solely for 
the use of the individual to whom it is addressed. It may contain privileged or 
confidential information and should not be circulated or used for any purpose 
other than for what it is intended. If you have received this message in error, 
please notify the originator immediately. If you are not the intended 
recipient, you are notified that you are strictly prohibited from using, 
copying, altering, or disclosing the contents of this message. Aricent accepts 
no responsibility for loss or damage arising from the use of the information 
transmitted by this email including damage from virus."

[dpdk-dev] e1000: unused variable warnings with clang

2016-07-01 Thread Wiles, Keith


On 7/1/16, 12:16 PM, "dev on behalf of Wiles, Keith"  wrote:

Well, after make sure I had CCACHE off and having the RTE_SDK variable set 
correctly, it does appear we have a problem building DPDK/e1000 directory with 
CLANG on my Ubuntu 16.04 updated as of today. If someone could please verify 
the if this is something in my system. Using T=x86_64-native-linuxapp-gcc does 
not have the same problem.

Using the following build line:

--

Sorry my bad, this is NOT a problem. Today is not my day ? woke up at 4:30am

[dpdk-dev] e1000: unused variable warnings with clang

2016-07-01 Thread Wiles, Keith

Well, after make sure I had CCACHE off and having the RTE_SDK variable set 
correctly, it does appear we have a problem building DPDK/e1000 directory with 
CLANG on my Ubuntu 16.04 updated as of today. If someone could please verify 
the if this is something in my system. Using T=x86_64-native-linuxapp-gcc does 
not have the same problem.

Using the following build line:

rkwiles at supermicro (master):~/.../intel/dpdk$ CCACHE_DISABLE=true make 
install T=x86_64-native-linuxapp-clang -j
Configuration done
== Build lib
== Build lib/librte_compat
== Build lib/librte_eal
== Build lib/librte_net
== Build lib/librte_eal/common
  SYMLINK-FILE include/rte_compat.h
  SYMLINK-FILE include/rte_ip.h

---Snip---

  CC igb_ethdev.o
  CC em_ethdev.o
  CC e1000_osdep.o
  CC i40e_ethdev_vf.o
  CC igb_rxtx.o
  CC e1000_ich8lan.o
  CC null_crypto_pmd_ops.o
  CC e1000_phy.o
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:44:38:
 error: unused parameter 'hw' [-Werror,-Wunused-parameter]
e1000_write_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value)
 ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:44:46:
 error: unused parameter 'reg' [-Werror,-Wunused-parameter]
e1000_write_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value)
 ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:44:56:
 error: unused parameter 'value' [-Werror,-Wunused-parameter]
e1000_write_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value)
   ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:50:37:
 error: unused parameter 'hw' [-Werror,-Wunused-parameter]
e1000_read_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value)
^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:50:45:
 error: unused parameter 'reg' [-Werror,-Wunused-parameter]
e1000_read_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value)
^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:57:36:
 error: unused parameter 'hw' [-Werror,-Wunused-parameter]
e1000_pci_set_mwi(struct e1000_hw *hw)
   ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:62:38:
 error: unused parameter 'hw' [-Werror,-Wunused-parameter]
e1000_pci_clear_mwi(struct e1000_hw *hw)
 ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:71:42:
 error: unused parameter 'hw' [-Werror,-Wunused-parameter]
e1000_read_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value)
 ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:71:50:
 error: unused parameter 'reg' [-Werror,-Wunused-parameter]
e1000_read_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value)
 ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:71:60:
 error: unused parameter 'value' [-Werror,-Wunused-parameter]
e1000_read_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value)
   ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:80:43:
 error: unused parameter 'hw' [-Werror,-Wunused-parameter]
e1000_write_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value)
  ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:80:51:
 error: unused parameter 'reg' [-Werror,-Wunused-parameter]
e1000_write_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value)
  ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_osdep.c:80:61:
 error: unused parameter 'value' [-Werror,-Wunused-parameter]
e1000_write_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value)
^
13 errors generated.
/work/home/rkwiles/projects/intel/dpdk/mk/internal/rte.compile-pre.mk:126: 
recipe for target 'e1000_osdep.o' failed
make[6]: *** [e1000_osdep.o] Error 1
make[6]: *** Waiting for unfinished jobs
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_vf.c:161:62:
 error: unused parameter 'hw' [-Werror,-Wunused-parameter]
STATIC s32 e1000_acquire_vf(struct e1000_hw E1000_UNUSEDARG *hw)
 ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_vf.c:175:63:
 error: unused parameter 'hw' [-Werror,-Wunused-parameter]
STATIC void e1000_release_vf(struct e1000_hw E1000_UNUSEDARG *hw)
  ^
/work/home/rkwiles/projects/intel/dpdk/drivers/net/e1000/base/e1000_vf.c:187:65:
 error: unused parameter 'hw'

[dpdk-dev] e1000: unused variable warnings with clang

2016-07-01 Thread Wiles, Keith


Update: I pulled a new clone and now I do not see that problem, so please 
ignore. 

Regards,
Keith

[dpdk-dev] e1000: unused variable warnings with clang

2016-07-01 Thread Wiles, Keith

I am seeing unused variable warnings/errors when building the current repo with 
the clang compiler builds in the e1000/base directory.

Does anyone else see this problem?

Regards,
Keith

[dpdk-dev] [PATCH] mempool: rename functions with confusing names

2016-06-29 Thread Wiles, Keith


On 6/29/16, 11:00 AM, "dev on behalf of Bruce Richardson"  wrote:

>On Wed, Jun 29, 2016 at 05:55:27PM +0200, Thomas Monjalon wrote:
>> 2016-06-29 14:55, Bruce Richardson:
>> > The mempool_count and mempool_free_count behaved contrary to what their
>> > names suggested. The free_count function actually returned the number of
>> > elements that were allocated from the pool, not the number unallocated as
>> > the name implied.
>> > 
>> > Fix this by introducing two new functions to replace the old ones,
>> > * rte_mempool_unallocated_count to replace rte_mempool_count
>> > * rte_mempool_allocated_count to replace rte_mempool_free_count
>> 
>> What about available/used instead of unallocated/allocated?
>> 
>
>I don't particularly mind what the name is, to be honest. I like "avail"
>because it is shorter, but I'm a little uncertain about "used", because it
>implies that the entries are finished with i.e. like a used match, or tissue 
>:-)
>
>How about "avail/in_use"?

+1 for those names.
>
>/Bruce
>

[dpdk-dev] FYI: Using ccache a linux compiler caching tool with DPDK

2016-06-27 Thread Wiles, Keith

FYI, Just to help document the issue here.

Using ccache on Linux can improve your compile time from a few minutes to a few 
seconds depending the use case.

On my machine Ubuntu 16.04 up to date as of 06/26/2016 using:
sudo apt-get install ccache

System Information:
   Model Name : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
   sysname: Linux
   release: 4.4.0-24-generic
   version: #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016
   machine: x86_64

CPU Information:
   nthreads   :   2
   ncores :  18
   nsockets   :   2
   nbooks :   1
   nlcores:  72

72 lcores, 2 socket(s), 18 cores per socket, 2 hyper-thread(s) per core

Note: I have added the environment variable ?CCACHE_CPP2=yes

Builds using DPDK 16.07-rc0 on 06/27/2016
Build with ?time CCACHE_DISABLE=true make install 
T=x86_64-native-linuxapp-clang? gives:

real2m52.844s
user2m32.220s
sys 0m15.548s

Rebuild with no changes, just issue the above line again:

real0m9.929s
user0m5.056s
sys 0m3.036s

If you issue the above line with ?j appended give:

real0m4.292s
user0m5.304s
sys 0m4.348s

remove x86_64-native-linuxapp-clang build directory.
Build with ?time CCACHE_DISABLE=true make install 
T=x86_64-native-linuxapp-clang -j? gives:

real0m29.681s
user3m18.104s
sys 0m34.876s

cd x86_64-native-linuxapp-clang directory
make clean
time CCACHE_DISABLE=true make

real3m13.992s
user2m45.044s
sys 0m21.784s

make clean
make  # just to fill the ccache
make clean
time make

real0m15.066s
user0m6.776s
sys 0m3.684s

make clean
time make ?j

real0m4.527s
user0m7.060s
sys 0m5.428s


As you can see the performance gain is very nice and I ccache does seem to work 
correctly except in a few cases.

It appears the ccache and clang have a few problems without the global having 
the environment variable set:
export CCACHE_CPP2=yes

Without this environment variable clang starts to complain about a number of 
issues. If you see a compile problem please rebuild with ccache disabled using 
?sudo CCACHE_DISABLE=true ?? or uninstall ccache ?sudo apt-get remove ccache?

It is possible that GCC will also have compiler issues, but I do not seem to 
see any, just try without ccache to verify.

Here is a link to help explain more details:
See e.g. 
http://petereisentraut.blogspot.com/2011/09/ccache-and-clang-part-2.html

If the build works then I have not found any issues with using ccache except a 
faster build time ?  YMMV 

Regards,
Keith

[dpdk-dev] [PATCH v2 2/2] fix building with clang-3.8.0 compiler

2016-06-27 Thread Wiles, Keith

On 6/27/16, 11:40 AM, "Richardson, Bruce"  wrote:

>On Mon, Jun 27, 2016 at 05:29:59PM +0100, Wiles, Keith wrote:
>> On 6/27/16, 7:58 AM, "dev on behalf of Wiles, Keith" > dpdk.org on behalf of keith.wiles at intel.com> wrote:
>> 
>> >
>> >On 6/27/16, 3:46 AM, "Richardson, Bruce"  
>> >wrote:
>> >
>> >>On Sun, Jun 26, 2016 at 10:54:12AM -0500, Keith Wiles wrote:
>> >>> Latest clang compiler 3.8.0 on latest update of Ubuntu
>> >>> creates a few more warnings on -Warray-bounds and extra
>> >>> () around 'if' expressions.
>> >>> 
>> >>> Signed-off-by: Keith Wiles 
>> >>> ---
>> >>>  app/test-pmd/Makefile| 3 +++
>> >>>  app/test/Makefile| 3 +++
>> >>>  drivers/net/bonding/Makefile | 4 
>> >>>  drivers/net/fm10k/Makefile   | 2 ++
>> >>>  drivers/net/i40e/Makefile| 2 ++
>> >>>  lib/librte_cmdline/Makefile  | 6 ++
>> >>>  lib/librte_eal/linuxapp/eal/Makefile | 8 
>> >>>  7 files changed, 28 insertions(+)
>> >>> 
>> >>All the fixes in this patch seem to be just disabling the compiler 
>> >>warnings, which
>> >>should really be the last resort in cases like this. Can some of the 
>> >>issues be
>> >>fixed by actually fixing the issues in the code?
>> >
>> >I did look at the code to fix the problem, because I could not see one:
>> >
>> >/work/home/rkwiles/projects/intel/dpdk/app/test-pmd/cmdline.c:3357:2140: 
>> >error: array index 3 is past the end of the array (which contains 3 
>> >elements) [-Werror,-Warray-bounds]
>> >  if (!__extension__ ({ size_t __s1_len, __s2_len; (__builtin_constant_p 
>> > (res->proto) && __builtin_constant_p ("ip") && (__s1_len = 
>> > __builtin_strlen (res->proto), __s2_len = __builtin_strlen ("ip"), 
>> > (!((size_t)(const void *)((res->proto) + 1) - (size_t)(const void 
>> > *)(res->proto) == 1) || __s1_len >= 4) && (!((size_t)(const void *)(("ip") 
>> > + 1) - (size_t)(const void *)("ip") == 1) || __s2_len >= 4)) ? 
>> > __builtin_strcmp (res->proto, "ip") : (__builtin_constant_p (res->proto) 
>> > && ((size_t)(const void *)((res->proto) + 1) - (size_t)(const void 
>> > *)(res->proto) == 1) && (__s1_len = __builtin_strlen (res->proto), 
>> > __s1_len < 4) ? (__builtin_constant_p ("ip") && ((size_t)(const void 
>> > *)(("ip") + 1) - (size_t)(const void *)("ip") == 1) ? __builtin_strcmp 
>> > (res->proto, "ip") : (__extension__ ({ const unsigned char *__s2 = (const 
>> > unsigned char *) (const char *) ("ip"); int __result = (((const unsigned 
>> > char *) (const char *) (res->proto))[0] - __s2[0]); if (__s1_len > 0 && 
>> > __result == 0) { __result = (((const unsigned char *) (const char *) 
>> > (res->proto))[1] - __s2[1]); if (__s1_len > 1 && __result == 0) { __result 
>> > = (((const unsigned char *) (const char *) (res->proto))[2] - __s2[2]); if 
>> > (__s1_len > 2 && __result == 0) __result = (((const unsigned char *) 
>> > (const char *) (res->proto))[3] - __s2[3]); } } __result; }))) : 
>> > (__builtin_constant_p ("ip") && ((size_t)(const void *)(("ip") + 1) - 
>> > (size_t)(const void *)("ip") == 1) && (__s2_len = __builtin_strlen ("ip"), 
>> > __s2_len < 4) ? (__builtin_constant_p (res->proto) && ((size_t)(const void 
>> > *)((res->proto) + 1) - (size_t)(const void *)(res->proto) == 1) ? 
>> > __builtin_strcmp (res->proto, "ip") : (- (__extension__ ({ const unsigned 
>> > char *__s2 = (const unsigned char *) (const char *) (res->proto); int 
>> > __result = (((const unsigned char *) (const char *) ("ip"))[0] - __s2[0]); 
>> > if (__s2_len > 0 && __result == 0) { __result = (((const unsigned char *) 
>> > (const char *) ("ip"))[1] - __s2[1]); if (__s2_len > 1 && __result == 0) { 
>> > __result = (((const unsigned char *) (const char *) ("ip"))[2] - __s2[2]); 
>> > if (__s2_len > 2 && __result == 0) __result = (((const unsigned char *) 
>> > (const char *) ("ip"))[3] - __s2[3]); } } __res

[dpdk-dev] [PATCH v2 2/2] fix building with clang-3.8.0 compiler

2016-06-27 Thread Wiles, Keith

On 6/27/16, 7:58 AM, "dev on behalf of Wiles, Keith"  wrote:

>
>On 6/27/16, 3:46 AM, "Richardson, Bruce"  wrote:
>
>>On Sun, Jun 26, 2016 at 10:54:12AM -0500, Keith Wiles wrote:
>>> Latest clang compiler 3.8.0 on latest update of Ubuntu
>>> creates a few more warnings on -Warray-bounds and extra
>>> () around 'if' expressions.
>>> 
>>> Signed-off-by: Keith Wiles 
>>> ---
>>>  app/test-pmd/Makefile| 3 +++
>>>  app/test/Makefile| 3 +++
>>>  drivers/net/bonding/Makefile | 4 
>>>  drivers/net/fm10k/Makefile   | 2 ++
>>>  drivers/net/i40e/Makefile| 2 ++
>>>  lib/librte_cmdline/Makefile  | 6 ++
>>>  lib/librte_eal/linuxapp/eal/Makefile | 8 
>>>  7 files changed, 28 insertions(+)
>>> 
>>All the fixes in this patch seem to be just disabling the compiler warnings, 
>>which
>>should really be the last resort in cases like this. Can some of the issues be
>>fixed by actually fixing the issues in the code?
>
>I did look at the code to fix the problem, because I could not see one:
>
>/work/home/rkwiles/projects/intel/dpdk/app/test-pmd/cmdline.c:3357:2140: 
>error: array index 3 is past the end of the array (which contains 3 elements) 
>[-Werror,-Warray-bounds]
>  if (!__extension__ ({ size_t __s1_len, __s2_len; (__builtin_constant_p 
> (res->proto) && __builtin_constant_p ("ip") && (__s1_len = __builtin_strlen 
> (res->proto), __s2_len = __builtin_strlen ("ip"), (!((size_t)(const void 
> *)((res->proto) + 1) - (size_t)(const void *)(res->proto) == 1) || __s1_len 
> >= 4) && (!((size_t)(const void *)(("ip") + 1) - (size_t)(const void *)("ip") 
> == 1) || __s2_len >= 4)) ? __builtin_strcmp (res->proto, "ip") : 
> (__builtin_constant_p (res->proto) && ((size_t)(const void *)((res->proto) + 
> 1) - (size_t)(const void *)(res->proto) == 1) && (__s1_len = __builtin_strlen 
> (res->proto), __s1_len < 4) ? (__builtin_constant_p ("ip") && ((size_t)(const 
> void *)(("ip") + 1) - (size_t)(const void *)("ip") == 1) ? __builtin_strcmp 
> (res->proto, "ip") : (__extension__ ({ const unsigned char *__s2 = (const 
> unsigned char *) (const char *) ("ip"); int __result = (((const unsigned char 
> *) (const char *) (res->proto))[0] - __s2[0]); if (__s1_len > 0 && __result 
> == 0) { __result = (((const unsigned char *) (const char *) (res->proto))[1] 
> - __s2[1]); if (__s1_len > 1 && __result == 0) { __result = (((const unsigned 
> char *) (const char *) (res->proto))[2] - __s2[2]); if (__s1_len > 2 && 
> __result == 0) __result = (((const unsigned char *) (const char *) 
> (res->proto))[3] - __s2[3]); } } __result; }))) : (__builtin_constant_p 
> ("ip") && ((size_t)(const void *)(("ip") + 1) - (size_t)(const void *)("ip") 
> == 1) && (__s2_len = __builtin_strlen ("ip"), __s2_len < 4) ? 
> (__builtin_constant_p (res->proto) && ((size_t)(const void *)((res->proto) + 
> 1) - (size_t)(const void *)(res->proto) == 1) ? __builtin_strcmp (res->proto, 
> "ip") : (- (__extension__ ({ const unsigned char *__s2 = (const unsigned char 
> *) (const char *) (res->proto); int __result = (((const unsigned char *) 
> (const char *) ("ip"))[0] - __s2[0]); if (__s2_len > 0 && __result == 0) { 
> __result = (((const unsigned char *) (const char *) ("ip"))[1] - __s2[1]); if 
> (__s2_len > 1 && __result == 0) { __result = (((const unsigned char *) (const 
> char *) ("ip"))[2] - __s2[2]); if (__s2_len > 2 && __result == 0) __result = 
> (((const unsigned char *) (const char *) ("ip"))[3] - __s2[3]); } } __result; 
> } : __builtin_strcmp (res->proto, "ip"; })) {
>
>Here is the line of code for that one:
>if (!strcmp(res->proto, "ip")) {
>
>The ?Wno-parenthese-equality problem gives the output here:
>
>/work/home/rkwiles/projects/intel/dpdk/lib/librte_cmdline/cmdline_cirbuf.c:288:19:
> error: equality comparison with extraneous parentheses 
>[-Werror,-Wparentheses-equality]
> if (((cbuf)->len == 0)) {
>
>The line is:
>
>if (CIRBUF_IS_EMPTY(cbuf)) {
>
>This one is in cmdline_cirbuf.h, which can be changed, but I do not think we 
>need to remove the parenthese.
>
>I will look at some of other solution, so I rejected the patch.

I found the problem to the compile errors I am seeing with building with clang 
and shared libraries.

The x86_64-linux-gnu/bits/string2.h header file if getting included from 
string.h, but this would be mean __GNUC__ is defined and this is the clang 
compiler. After much investigation it turns out ?ccache? is the problem here. 
If ccache is enabled with clang builds the __GNUC__ is defined some how, I 
never did find the location.

Just a warning it appears ?ccache? for caching object files is not compatible 
with DPDK builds ? in all cases.

>
>
>>
>>/Bruce
>>
>
>
>
>

[dpdk-dev] [PATCH] mbuf:rearrange mbuf to be more mbuf chain friendly

2016-06-25 Thread Wiles, Keith


On 6/25/16, 10:29 AM, "dev on behalf of Keith Wiles"  wrote:

>Move the next pointer to the first cacheline of the rte_mbuf structure
>and move the offload values to the second cacheline to give better
>performance to applications using chained mbufs.
>
>Enabled by a configuration option CONFIG_RTE_MBUF_CHAIN_FRIENDLY default
>is set to No.
>
>Signed-off-by: Keith Wiles 

nak thought I had based these on the current master ?

[dpdk-dev] [PATCH] scripts: relax line length check for fixed commit

2016-06-25 Thread Wiles, Keith

On 6/24/16, 4:30 AM, "dev on behalf of Bruce Richardson"  wrote:

>On Fri, Jun 24, 2016 at 12:44:18AM +0200, Thomas Monjalon wrote:
>> It is better to keep the line "Fixes:" longer than 75 characters
>> than splitting.
>> 
>> Signed-off-by: Thomas Monjalon 
>
>Definite +1

Yes, +2 for me.

>
>Acked-by: Bruce Richardson 
>
>

[dpdk-dev] [PATCH v16 0/3] mempool: add mempool handler feature

2016-06-24 Thread Wiles, Keith


On 6/23/16, 11:22 PM, "dev on behalf of Thomas Monjalon"  wrote:

>> David Hunt (2):
>>   mempool: support mempool handler operations
>>   app/test: test mempool handler
>>   mbuf: make default mempool ops configurable at build
>
>Applied, thanks for the nice feature
>
>I'm sorry David, the revision record is v17 ;)

Quick David, make two more updates to the patch ?

Thanks David and Great work !!!
>
>

[dpdk-dev] pktgen counters not

2016-06-20 Thread Wiles, Keith

Most of the counters come from the NIC hardware, but I have to do the different 
sizes in software. If the packet was received then it will get counted as that 
works in my machine and many others. Unless I broke something in Pktgen it 
would mean to me that the packets are not getting received by the hardware or 
something else is going on.

Regards,
Keith

On 6/20/16, 4:16 PM, "dev on behalf of Posadas, Emerson"  wrote:

>Hello
>
>I'm running the pktgen application to do some benchmarking. My current issue 
>with pktgen is that I don't get any packet counted against the packet size on 
>the application even after sending some traffic. 
>
>  Flags:Port:   P--:0   P--:1
>Link State  : 
>TotalRate
>Pkts/s Max/Rx   : 0/0 0/0 
>0/0
>   Max/Tx   :  14194206/0  14194254/0  
> 28388460/0
>MBits/s Rx/Tx   : 0/0 0/0 
>0/0
>Broadcast   :   0   0
>Multicast   :   0   0
>  64 Bytes  :   0   0
>  65-127:   0   0
>  128-255   :   0   0
>  256-511   :   0   0
>  512-1023  :   0   0
>  1024-1518 :   0   0
>
>I'm using DPDK 16.04 with pktgen 3.0.00 and using the following pktgen 
>configuration flags:
>./app/app/x86_64-native-linuxapp-gcc/app/pktgen -c 0x1ff  -n 2 -- -P -m 
>"[1-4].0, [5-8].1"
>
>My NIC device is a dual port 10GbE Intel Corporation 82599EB PCIe card. 
>
>EP
>
>

[dpdk-dev] Performance hit - NICs on different CPU sockets

2016-06-16 Thread Wiles, Keith


On 6/16/16, 3:16 PM, "dev on behalf of Wiles, Keith"  wrote:

>
>On 6/16/16, 3:00 PM, "Take Ceara"  wrote:
>
>>On Thu, Jun 16, 2016 at 9:33 PM, Wiles, Keith  
>>wrote:
>>> On 6/16/16, 1:20 PM, "Take Ceara"  wrote:
>>>
>>>>On Thu, Jun 16, 2016 at 6:59 PM, Wiles, Keith  
>>>>wrote:
>>>>>
>>>>> On 6/16/16, 11:56 AM, "dev on behalf of Wiles, Keith" >>>> dpdk.org on behalf of keith.wiles at intel.com> wrote:
>>>>>
>>>>>>
>>>>>>On 6/16/16, 11:20 AM, "Take Ceara"  wrote:
>>>>>>
>>>>>>>On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith  
>>>>>>>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Right now I do not know what the issue is with the system. Could be 
>>>>>>>> too many Rx/Tx ring pairs per port and limiting the memory in the 
>>>>>>>> NICs, which is why you get better performance when you have 8 core per 
>>>>>>>> port. I am not really seeing the whole picture and how DPDK is 
>>>>>>>> configured to help more. Sorry.
>>>>>>>
>>>>>>>I doubt that there is a limitation wrt running 16 cores per port vs 8
>>>>>>>cores per port as I've tried with two different machines connected
>>>>>>>back to back each with one X710 port and 16 cores on each of them
>>>>>>>running on that port. In that case our performance doubled as
>>>>>>>expected.
>>>>>>>
>>>>>>>>
>>>>>>>> Maybe seeing the DPDK command line would help.
>>>>>>>
>>>>>>>The command line I use with ports 01:00.3 and 81:00.3 is:
>>>>>>>./warp17 -c 0xF3   -m 32768 -w :81:00.3 -w :01:00.3 --
>>>>>>>--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00
>>>>>>>
>>>>>>>Our own qmap args allow the user to control exactly how cores are
>>>>>>>split between ports. In this case we end up with:
>>>>>>>
>>>>>>>warp17> show port map
>>>>>>>Port 0[socket: 0]:
>>>>>>>   Core 4[socket:0] (Tx: 0, Rx: 0)
>>>>>>>   Core 5[socket:0] (Tx: 1, Rx: 1)
>>>>>>>   Core 6[socket:0] (Tx: 2, Rx: 2)
>>>>>>>   Core 7[socket:0] (Tx: 3, Rx: 3)
>>>>>>>   Core 8[socket:0] (Tx: 4, Rx: 4)
>>>>>>>   Core 9[socket:0] (Tx: 5, Rx: 5)
>>>>>>>   Core 20[socket:0] (Tx: 6, Rx: 6)
>>>>>>>   Core 21[socket:0] (Tx: 7, Rx: 7)
>>>>>>>   Core 22[socket:0] (Tx: 8, Rx: 8)
>>>>>>>   Core 23[socket:0] (Tx: 9, Rx: 9)
>>>>>>>   Core 24[socket:0] (Tx: 10, Rx: 10)
>>>>>>>   Core 25[socket:0] (Tx: 11, Rx: 11)
>>>>>>>   Core 26[socket:0] (Tx: 12, Rx: 12)
>>>>>>>   Core 27[socket:0] (Tx: 13, Rx: 13)
>>>>>>>   Core 28[socket:0] (Tx: 14, Rx: 14)
>>>>>>>   Core 29[socket:0] (Tx: 15, Rx: 15)
>>>>>>>
>>>>>>>Port 1[socket: 1]:
>>>>>>>   Core 10[socket:1] (Tx: 0, Rx: 0)
>>>>>>>   Core 11[socket:1] (Tx: 1, Rx: 1)
>>>>>>>   Core 12[socket:1] (Tx: 2, Rx: 2)
>>>>>>>   Core 13[socket:1] (Tx: 3, Rx: 3)
>>>>>>>   Core 14[socket:1] (Tx: 4, Rx: 4)
>>>>>>>   Core 15[socket:1] (Tx: 5, Rx: 5)
>>>>>>>   Core 16[socket:1] (Tx: 6, Rx: 6)
>>>>>>>   Core 17[socket:1] (Tx: 7, Rx: 7)
>>>>>>>   Core 18[socket:1] (Tx: 8, Rx: 8)
>>>>>>>   Core 19[socket:1] (Tx: 9, Rx: 9)
>>>>>>>   Core 30[socket:1] (Tx: 10, Rx: 10)
>>>>>>>   Core 31[socket:1] (Tx: 11, Rx: 11)
>>>>>>>   Core 32[socket:1] (Tx: 12, Rx: 12)
>>>>>>>   Core 33[socket:1] (Tx: 13, Rx: 13)
>>>>>>>   Core 34[socket:1] (Tx: 14, Rx: 14)
>>>>>>>   Core 35[socket:1] (Tx: 15, Rx: 15)
>>>>>>
>>>>>>On each socket you have 10 physical cores or 20 lcores per socket for 40 
>>>>>>lcores total.
>>>>>>
>>>>>>The above is listing the LCORES (or hyper-threads) and not COREs, which I 
>>>>>>

[dpdk-dev] Performance hit - NICs on different CPU sockets

2016-06-16 Thread Wiles, Keith


On 6/16/16, 3:00 PM, "Take Ceara"  wrote:

>On Thu, Jun 16, 2016 at 9:33 PM, Wiles, Keith  wrote:
>> On 6/16/16, 1:20 PM, "Take Ceara"  wrote:
>>
>>>On Thu, Jun 16, 2016 at 6:59 PM, Wiles, Keith  
>>>wrote:
>>>>
>>>> On 6/16/16, 11:56 AM, "dev on behalf of Wiles, Keith" >>> dpdk.org on behalf of keith.wiles at intel.com> wrote:
>>>>
>>>>>
>>>>>On 6/16/16, 11:20 AM, "Take Ceara"  wrote:
>>>>>
>>>>>>On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith  
>>>>>>wrote:
>>>>>>
>>>>>>>
>>>>>>> Right now I do not know what the issue is with the system. Could be too 
>>>>>>> many Rx/Tx ring pairs per port and limiting the memory in the NICs, 
>>>>>>> which is why you get better performance when you have 8 core per port. 
>>>>>>> I am not really seeing the whole picture and how DPDK is configured to 
>>>>>>> help more. Sorry.
>>>>>>
>>>>>>I doubt that there is a limitation wrt running 16 cores per port vs 8
>>>>>>cores per port as I've tried with two different machines connected
>>>>>>back to back each with one X710 port and 16 cores on each of them
>>>>>>running on that port. In that case our performance doubled as
>>>>>>expected.
>>>>>>
>>>>>>>
>>>>>>> Maybe seeing the DPDK command line would help.
>>>>>>
>>>>>>The command line I use with ports 01:00.3 and 81:00.3 is:
>>>>>>./warp17 -c 0xF3   -m 32768 -w :81:00.3 -w :01:00.3 --
>>>>>>--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00
>>>>>>
>>>>>>Our own qmap args allow the user to control exactly how cores are
>>>>>>split between ports. In this case we end up with:
>>>>>>
>>>>>>warp17> show port map
>>>>>>Port 0[socket: 0]:
>>>>>>   Core 4[socket:0] (Tx: 0, Rx: 0)
>>>>>>   Core 5[socket:0] (Tx: 1, Rx: 1)
>>>>>>   Core 6[socket:0] (Tx: 2, Rx: 2)
>>>>>>   Core 7[socket:0] (Tx: 3, Rx: 3)
>>>>>>   Core 8[socket:0] (Tx: 4, Rx: 4)
>>>>>>   Core 9[socket:0] (Tx: 5, Rx: 5)
>>>>>>   Core 20[socket:0] (Tx: 6, Rx: 6)
>>>>>>   Core 21[socket:0] (Tx: 7, Rx: 7)
>>>>>>   Core 22[socket:0] (Tx: 8, Rx: 8)
>>>>>>   Core 23[socket:0] (Tx: 9, Rx: 9)
>>>>>>   Core 24[socket:0] (Tx: 10, Rx: 10)
>>>>>>   Core 25[socket:0] (Tx: 11, Rx: 11)
>>>>>>   Core 26[socket:0] (Tx: 12, Rx: 12)
>>>>>>   Core 27[socket:0] (Tx: 13, Rx: 13)
>>>>>>   Core 28[socket:0] (Tx: 14, Rx: 14)
>>>>>>   Core 29[socket:0] (Tx: 15, Rx: 15)
>>>>>>
>>>>>>Port 1[socket: 1]:
>>>>>>   Core 10[socket:1] (Tx: 0, Rx: 0)
>>>>>>   Core 11[socket:1] (Tx: 1, Rx: 1)
>>>>>>   Core 12[socket:1] (Tx: 2, Rx: 2)
>>>>>>   Core 13[socket:1] (Tx: 3, Rx: 3)
>>>>>>   Core 14[socket:1] (Tx: 4, Rx: 4)
>>>>>>   Core 15[socket:1] (Tx: 5, Rx: 5)
>>>>>>   Core 16[socket:1] (Tx: 6, Rx: 6)
>>>>>>   Core 17[socket:1] (Tx: 7, Rx: 7)
>>>>>>   Core 18[socket:1] (Tx: 8, Rx: 8)
>>>>>>   Core 19[socket:1] (Tx: 9, Rx: 9)
>>>>>>   Core 30[socket:1] (Tx: 10, Rx: 10)
>>>>>>   Core 31[socket:1] (Tx: 11, Rx: 11)
>>>>>>   Core 32[socket:1] (Tx: 12, Rx: 12)
>>>>>>   Core 33[socket:1] (Tx: 13, Rx: 13)
>>>>>>   Core 34[socket:1] (Tx: 14, Rx: 14)
>>>>>>   Core 35[socket:1] (Tx: 15, Rx: 15)
>>>>>
>>>>>On each socket you have 10 physical cores or 20 lcores per socket for 40 
>>>>>lcores total.
>>>>>
>>>>>The above is listing the LCORES (or hyper-threads) and not COREs, which I 
>>>>>understand some like to think they are interchangeable. The problem is the 
>>>>>hyper-threads are logically interchangeable, but not performance wise. If 
>>>>>you have two run-to-completion threads on a single physical core each on a 
>>>>>different hyper-thread of that core [0,1], then the second lcore or thread 
>>>>>(1) on that physica

[dpdk-dev] Performance hit - NICs on different CPU sockets

2016-06-16 Thread Wiles, Keith


On 6/16/16, 11:56 AM, "dev on behalf of Wiles, Keith"  wrote:

>
>On 6/16/16, 11:20 AM, "Take Ceara"  wrote:
>
>>On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith  
>>wrote:
>>
>>>
>>> Right now I do not know what the issue is with the system. Could be too 
>>> many Rx/Tx ring pairs per port and limiting the memory in the NICs, which 
>>> is why you get better performance when you have 8 core per port. I am not 
>>> really seeing the whole picture and how DPDK is configured to help more. 
>>> Sorry.
>>
>>I doubt that there is a limitation wrt running 16 cores per port vs 8
>>cores per port as I've tried with two different machines connected
>>back to back each with one X710 port and 16 cores on each of them
>>running on that port. In that case our performance doubled as
>>expected.
>>
>>>
>>> Maybe seeing the DPDK command line would help.
>>
>>The command line I use with ports 01:00.3 and 81:00.3 is:
>>./warp17 -c 0xF3   -m 32768 -w :81:00.3 -w :01:00.3 --
>>--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00
>>
>>Our own qmap args allow the user to control exactly how cores are
>>split between ports. In this case we end up with:
>>
>>warp17> show port map
>>Port 0[socket: 0]:
>>   Core 4[socket:0] (Tx: 0, Rx: 0)
>>   Core 5[socket:0] (Tx: 1, Rx: 1)
>>   Core 6[socket:0] (Tx: 2, Rx: 2)
>>   Core 7[socket:0] (Tx: 3, Rx: 3)
>>   Core 8[socket:0] (Tx: 4, Rx: 4)
>>   Core 9[socket:0] (Tx: 5, Rx: 5)
>>   Core 20[socket:0] (Tx: 6, Rx: 6)
>>   Core 21[socket:0] (Tx: 7, Rx: 7)
>>   Core 22[socket:0] (Tx: 8, Rx: 8)
>>   Core 23[socket:0] (Tx: 9, Rx: 9)
>>   Core 24[socket:0] (Tx: 10, Rx: 10)
>>   Core 25[socket:0] (Tx: 11, Rx: 11)
>>   Core 26[socket:0] (Tx: 12, Rx: 12)
>>   Core 27[socket:0] (Tx: 13, Rx: 13)
>>   Core 28[socket:0] (Tx: 14, Rx: 14)
>>   Core 29[socket:0] (Tx: 15, Rx: 15)
>>
>>Port 1[socket: 1]:
>>   Core 10[socket:1] (Tx: 0, Rx: 0)
>>   Core 11[socket:1] (Tx: 1, Rx: 1)
>>   Core 12[socket:1] (Tx: 2, Rx: 2)
>>   Core 13[socket:1] (Tx: 3, Rx: 3)
>>   Core 14[socket:1] (Tx: 4, Rx: 4)
>>   Core 15[socket:1] (Tx: 5, Rx: 5)
>>   Core 16[socket:1] (Tx: 6, Rx: 6)
>>   Core 17[socket:1] (Tx: 7, Rx: 7)
>>   Core 18[socket:1] (Tx: 8, Rx: 8)
>>   Core 19[socket:1] (Tx: 9, Rx: 9)
>>   Core 30[socket:1] (Tx: 10, Rx: 10)
>>   Core 31[socket:1] (Tx: 11, Rx: 11)
>>   Core 32[socket:1] (Tx: 12, Rx: 12)
>>   Core 33[socket:1] (Tx: 13, Rx: 13)
>>   Core 34[socket:1] (Tx: 14, Rx: 14)
>>   Core 35[socket:1] (Tx: 15, Rx: 15)
>
>On each socket you have 10 physical cores or 20 lcores per socket for 40 
>lcores total.
>
>The above is listing the LCORES (or hyper-threads) and not COREs, which I 
>understand some like to think they are interchangeable. The problem is the 
>hyper-threads are logically interchangeable, but not performance wise. If you 
>have two run-to-completion threads on a single physical core each on a 
>different hyper-thread of that core [0,1], then the second lcore or thread (1) 
>on that physical core will only get at most about 30-20% of the CPU cycles. 
>Normally it is much less, unless you tune the code to make sure each thread is 
>not trying to share the internal execution units, but some internal execution 
>units are always shared.
>
>To get the best performance when hyper-threading is enable is to not run both 
>threads on a single physical core, but only run one hyper-thread-0.
>
>In the table below the table lists the physical core id and each of the lcore 
>ids per socket. Use the first lcore per socket for the best performance:
>Core 1 [1, 21][11, 31]
>Use lcore 1 or 11 depending on the socket you are on.
>
>The info below is most likely the best performance and utilization of your 
>system. If I got the values right ?
>
>./warp17 -c 0x0FFFe0   -m 32768 -w :81:00.3 -w :01:00.3 --
>--qmap 0.0x0003FE --qmap 1.0x0FFE00
>
>Port 0[socket: 0]:
>   Core 2[socket:0] (Tx: 0, Rx: 0)
>   Core 3[socket:0] (Tx: 1, Rx: 1)
>   Core 4[socket:0] (Tx: 2, Rx: 2)
>   Core 5[socket:0] (Tx: 3, Rx: 3)
>   Core 6[socket:0] (Tx: 4, Rx: 4)
>   Core 7[socket:0] (Tx: 5, Rx: 5)
>   Core 8[socket:0] (Tx: 6, Rx: 6)
>   Core 9[socket:0] (Tx: 7, Rx: 7)
>
>8 cores on first socket leaving 0-1 lcores for Linux.

9 cores and leaving the first core or two lcores for Linux
>
>Port 1[socket: 1]:
>   Core 10[socket:1] (Tx: 0, Rx: 0)
>   Core 11[socket:1] (Tx: 1, Rx: 1)
>   Core 12[socket:1] (Tx: 2, Rx: 2)
>   Core 13[socket:1] (Tx: 3, Rx: 3)
>   C

[dpdk-dev] Performance hit - NICs on different CPU sockets

2016-06-16 Thread Wiles, Keith


On 6/16/16, 11:20 AM, "Take Ceara"  wrote:

>On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith  wrote:
>
>>
>> Right now I do not know what the issue is with the system. Could be too many 
>> Rx/Tx ring pairs per port and limiting the memory in the NICs, which is why 
>> you get better performance when you have 8 core per port. I am not really 
>> seeing the whole picture and how DPDK is configured to help more. Sorry.
>
>I doubt that there is a limitation wrt running 16 cores per port vs 8
>cores per port as I've tried with two different machines connected
>back to back each with one X710 port and 16 cores on each of them
>running on that port. In that case our performance doubled as
>expected.
>
>>
>> Maybe seeing the DPDK command line would help.
>
>The command line I use with ports 01:00.3 and 81:00.3 is:
>./warp17 -c 0xF3   -m 32768 -w :81:00.3 -w :01:00.3 --
>--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00
>
>Our own qmap args allow the user to control exactly how cores are
>split between ports. In this case we end up with:
>
>warp17> show port map
>Port 0[socket: 0]:
>   Core 4[socket:0] (Tx: 0, Rx: 0)
>   Core 5[socket:0] (Tx: 1, Rx: 1)
>   Core 6[socket:0] (Tx: 2, Rx: 2)
>   Core 7[socket:0] (Tx: 3, Rx: 3)
>   Core 8[socket:0] (Tx: 4, Rx: 4)
>   Core 9[socket:0] (Tx: 5, Rx: 5)
>   Core 20[socket:0] (Tx: 6, Rx: 6)
>   Core 21[socket:0] (Tx: 7, Rx: 7)
>   Core 22[socket:0] (Tx: 8, Rx: 8)
>   Core 23[socket:0] (Tx: 9, Rx: 9)
>   Core 24[socket:0] (Tx: 10, Rx: 10)
>   Core 25[socket:0] (Tx: 11, Rx: 11)
>   Core 26[socket:0] (Tx: 12, Rx: 12)
>   Core 27[socket:0] (Tx: 13, Rx: 13)
>   Core 28[socket:0] (Tx: 14, Rx: 14)
>   Core 29[socket:0] (Tx: 15, Rx: 15)
>
>Port 1[socket: 1]:
>   Core 10[socket:1] (Tx: 0, Rx: 0)
>   Core 11[socket:1] (Tx: 1, Rx: 1)
>   Core 12[socket:1] (Tx: 2, Rx: 2)
>   Core 13[socket:1] (Tx: 3, Rx: 3)
>   Core 14[socket:1] (Tx: 4, Rx: 4)
>   Core 15[socket:1] (Tx: 5, Rx: 5)
>   Core 16[socket:1] (Tx: 6, Rx: 6)
>   Core 17[socket:1] (Tx: 7, Rx: 7)
>   Core 18[socket:1] (Tx: 8, Rx: 8)
>   Core 19[socket:1] (Tx: 9, Rx: 9)
>   Core 30[socket:1] (Tx: 10, Rx: 10)
>   Core 31[socket:1] (Tx: 11, Rx: 11)
>   Core 32[socket:1] (Tx: 12, Rx: 12)
>   Core 33[socket:1] (Tx: 13, Rx: 13)
>   Core 34[socket:1] (Tx: 14, Rx: 14)
>   Core 35[socket:1] (Tx: 15, Rx: 15)

On each socket you have 10 physical cores or 20 lcores per socket for 40 lcores 
total.

The above is listing the LCORES (or hyper-threads) and not COREs, which I 
understand some like to think they are interchangeable. The problem is the 
hyper-threads are logically interchangeable, but not performance wise. If you 
have two run-to-completion threads on a single physical core each on a 
different hyper-thread of that core [0,1], then the second lcore or thread (1) 
on that physical core will only get at most about 30-20% of the CPU cycles. 
Normally it is much less, unless you tune the code to make sure each thread is 
not trying to share the internal execution units, but some internal execution 
units are always shared.

To get the best performance when hyper-threading is enable is to not run both 
threads on a single physical core, but only run one hyper-thread-0.

In the table below the table lists the physical core id and each of the lcore 
ids per socket. Use the first lcore per socket for the best performance:
Core 1 [1, 21][11, 31]
Use lcore 1 or 11 depending on the socket you are on.

The info below is most likely the best performance and utilization of your 
system. If I got the values right ?

./warp17 -c 0x0FFFe0   -m 32768 -w :81:00.3 -w :01:00.3 --
--qmap 0.0x0003FE --qmap 1.0x0FFE00

Port 0[socket: 0]:
   Core 2[socket:0] (Tx: 0, Rx: 0)
   Core 3[socket:0] (Tx: 1, Rx: 1)
   Core 4[socket:0] (Tx: 2, Rx: 2)
   Core 5[socket:0] (Tx: 3, Rx: 3)
   Core 6[socket:0] (Tx: 4, Rx: 4)
   Core 7[socket:0] (Tx: 5, Rx: 5)
   Core 8[socket:0] (Tx: 6, Rx: 6)
   Core 9[socket:0] (Tx: 7, Rx: 7)

8 cores on first socket leaving 0-1 lcores for Linux.

Port 1[socket: 1]:
   Core 10[socket:1] (Tx: 0, Rx: 0)
   Core 11[socket:1] (Tx: 1, Rx: 1)
   Core 12[socket:1] (Tx: 2, Rx: 2)
   Core 13[socket:1] (Tx: 3, Rx: 3)
   Core 14[socket:1] (Tx: 4, Rx: 4)
   Core 15[socket:1] (Tx: 5, Rx: 5)
   Core 16[socket:1] (Tx: 6, Rx: 6)
   Core 17[socket:1] (Tx: 7, Rx: 7)
   Core 18[socket:1] (Tx: 8, Rx: 8)
   Core 19[socket:1] (Tx: 9, Rx: 9)

All 10 cores on the second socket.

++Keith

>
>Just for reference, the cpu_layout script shows:
>$ $RTE_SDK/tools/cpu_layout.py
>
>Core and Socket Information (as reported by '/proc/cpuinfo')
>
>
>cores =  [0, 1, 2, 3, 4, 8, 9, 10, 11

[dpdk-dev] Performance hit - NICs on different CPU sockets

2016-06-16 Thread Wiles, Keith

On 6/16/16, 9:36 AM, "Take Ceara"  wrote:

>Hi Keith,
>
>On Tue, Jun 14, 2016 at 3:47 PM, Wiles, Keith  wrote:
>>>> Normally the limitation is in the hardware, basically how the PCI bus is 
>>>> connected to the CPUs (or sockets). How the PCI buses are connected to the 
>>>> system depends on the Mother board design. I normally see the buses 
>>>> attached to socket 0, but you could have some of the buses attached to the 
>>>> other sockets or all on one socket via a PCI bridge device.
>>>>
>>>> No easy way around the problem if some of your PCI buses are split or all 
>>>> on a single socket. Need to look at your system docs or look at lspci it 
>>>> has an option to dump the PCI bus as an ASCII tree, at least on Ubuntu.
>>>
>>>This is the motherboard we use on our system:
>>>
>>>http://www.supermicro.com/products/motherboard/Xeon/C600/X10DRX.cfm
>>>
>>>I need to swap some NICs around (as now we moved everything on socket
>>>1) before I can share the lspci output.
>>
>> FYI: the option for lspci is ?lspci ?tv?, but maybe more options too.
>>
>
>I retested with two 10G X710 ports connected back to back:
>port 0: :01:00.3 - socket 0
>port 1: :81:00.3 - socket 1

Please provide the output from tools/cpu_layout.py.

>
>I ran the following scenarios:
>- assign 16 threads from CPU 0 on socket 0 to port 0 and 16 threads
>from CPU 1 to port 1 => setup rate of 1.6M sess/s
>- assign only the 16 threads from CPU0 for both ports (so 8 threads on
>socket 0 for port 0 and 8 threads on socket 0 for port 1) => setup
>rate of 3M sess/s
>- assign only the 16 threads from CPU1 for both ports (so 8 threads on
>socket 1 for port 0 and 8 threads on socket 1 for port 1) => setup
>rate of 3M sess/s
>
>I also tried a scenario with two machines connected back to back each
>of which had a NIC on socket 1. I assigned 16 threads from socket 1 on
>each machine to the port and performance scaled to 6M sess/s as
>expected.
>
>I double checked all our memory allocations and, at least in the
>tested scenario, we never use memory that's not on the same socket as
>the core.
>
>I pasted below the output of lspci -tv. I see that :01:00.3 and
>:81:00.3 are connected to different PCI bridges but on each of
>those bridges there are also "Intel Corporation Xeon E7 v3/Xeon E5
>v3/Core i7 DMA Channel " devices.
>
>It would be great if you could also take a look in case I
>missed/misunderstood something.
>
>Thanks,
>Dumitru
>

[dpdk-dev] Performance hit - NICs on different CPU sockets

2016-06-14 Thread Wiles, Keith


On 6/14/16, 2:46 AM, "Take Ceara"  wrote:

>Hi Keith,
>
>On Mon, Jun 13, 2016 at 9:35 PM, Wiles, Keith  wrote:
>>
>> On 6/13/16, 9:07 AM, "dev on behalf of Take Ceara" > on behalf of dumitru.ceara at gmail.com> wrote:
>>
>>>Hi,
>>>
>>>I'm reposting here as I didn't get any answers on the dpdk-users mailing 
>>>list.
>>>
>>>We're working on a stateful traffic generator (www.warp17.net) using
>>>DPDK and we would like to control two XL710 NICs (one on each socket)
>>>to maximize CPU usage. It looks that we run into the following
>>>limitation:
>>>
>>>http://dpdk.org/doc/guides/linux_gsg/nic_perf_intel_platform.html
>>>section 7.2, point 3
>>>
>>>We completely split memory/cpu/NICs across the two sockets. However,
>>>the performance with a single CPU and both NICs on the same socket is
>>>better.
>>>Why do all the NICs have to be on the same socket, is there a
>>>driver/hw limitation?
>>
>> Normally the limitation is in the hardware, basically how the PCI bus is 
>> connected to the CPUs (or sockets). How the PCI buses are connected to the 
>> system depends on the Mother board design. I normally see the buses attached 
>> to socket 0, but you could have some of the buses attached to the other 
>> sockets or all on one socket via a PCI bridge device.
>>
>> No easy way around the problem if some of your PCI buses are split or all on 
>> a single socket. Need to look at your system docs or look at lspci it has an 
>> option to dump the PCI bus as an ASCII tree, at least on Ubuntu.
>
>This is the motherboard we use on our system:
>
>http://www.supermicro.com/products/motherboard/Xeon/C600/X10DRX.cfm
>
>I need to swap some NICs around (as now we moved everything on socket
>1) before I can share the lspci output.

FYI: the option for lspci is ?lspci ?tv?, but maybe more options too.

>
>Thanks,
>Dumitru
>

[dpdk-dev] Performance hit - NICs on different CPU sockets

2016-06-13 Thread Wiles, Keith

On 6/13/16, 9:07 AM, "dev on behalf of Take Ceara"  wrote:

>Hi,
>
>I'm reposting here as I didn't get any answers on the dpdk-users mailing list.
>
>We're working on a stateful traffic generator (www.warp17.net) using
>DPDK and we would like to control two XL710 NICs (one on each socket)
>to maximize CPU usage. It looks that we run into the following
>limitation:
>
>http://dpdk.org/doc/guides/linux_gsg/nic_perf_intel_platform.html
>section 7.2, point 3
>
>We completely split memory/cpu/NICs across the two sockets. However,
>the performance with a single CPU and both NICs on the same socket is
>better.
>Why do all the NICs have to be on the same socket, is there a
>driver/hw limitation?

Normally the limitation is in the hardware, basically how the PCI bus is 
connected to the CPUs (or sockets). How the PCI buses are connected to the 
system depends on the Mother board design. I normally see the buses attached to 
socket 0, but you could have some of the buses attached to the other sockets or 
all on one socket via a PCI bridge device.

No easy way around the problem if some of your PCI buses are split or all on a 
single socket. Need to look at your system docs or look at lspci it has an 
option to dump the PCI bus as an ASCII tree, at least on Ubuntu.
>
>Thanks,
>Dumitru Ceara
>

[dpdk-dev] [PATCH v3 1/6] mk: sort drivers in static application link list

2016-06-10 Thread Wiles, Keith


On 6/10/16, 1:32 PM, "dev on behalf of Ferruh Yigit"  wrote:

>From: Thomas Monjalon 
>
>Just a clean up to prepare next patches.

One thing you have or will do with these patches is create a set of groups PMD, 
Core, LIB, ? and sort the items in each group. I was thinking it maybe more 
useful to use different names for the groups instead of _LDLIBS e.g. _PMD, 
_CORE, _LIB(this one may not work), _XYZ then create _LDLIBS from these groups. 
I know it is another change and sorry I did not catch it before. This only 
makes sense if we need say different PMD libs in different places within the 
_LDLIBS variable. Something to think about if you have to make changes again.

>
>Signed-off-by: Thomas Monjalon 
>---
> mk/rte.app.mk | 45 +
> 1 file changed, 17 insertions(+), 28 deletions(-)
>
>diff --git a/mk/rte.app.mk b/mk/rte.app.mk
>index b84b56d..07be17c 100644
>--- a/mk/rte.app.mk
>+++ b/mk/rte.app.mk
>@@ -124,57 +124,46 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)+= -lrte_eal
> _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)+= -lrte_cmdline
> _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)+= -lrte_cfgfile
> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)   += -lrte_pmd_bond
>-
> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)+= -lrte_pmd_xenvirt
> 
> ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
> # plugins (link only if static libraries)
> 
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)+= -lrte_pmd_vmxnet3_uio
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += -lrte_pmd_virtio
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
> _LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD)  += -lrte_pmd_bnx2x
> _LDLIBS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD)  += -lrte_pmd_cxgbe
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_E1000_PMD)  += -lrte_pmd_e1000
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_ENA_PMD)+= -lrte_pmd_ena
> _LDLIBS-$(CONFIG_RTE_LIBRTE_ENIC_PMD)   += -lrte_pmd_enic
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_I40E_PMD)   += -lrte_pmd_i40e
> _LDLIBS-$(CONFIG_RTE_LIBRTE_FM10K_PMD)  += -lrte_pmd_fm10k
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_I40E_PMD)   += -lrte_pmd_i40e
> _LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)  += -lrte_pmd_ixgbe
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_E1000_PMD)  += -lrte_pmd_e1000
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_ENA_PMD)+= -lrte_pmd_ena
> _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)   += -lrte_pmd_mlx4
> _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)   += -lrte_pmd_mlx5
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD)+= -lrte_pmd_nfp
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2)   += -lrte_pmd_szedata2
> _LDLIBS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD)  += -lrte_pmd_mpipe
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_RING)   += -lrte_pmd_ring
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD)+= -lrte_pmd_nfp
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)   += -lrte_pmd_null
> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)   += -lrte_pmd_pcap
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
> _LDLIBS-$(CONFIG_RTE_LIBRTE_QEDE_PMD)   += -lrte_pmd_qede
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)   += -lrte_pmd_null
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_RING)   += -lrte_pmd_ring
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2)   += -lrte_pmd_szedata2
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += -lrte_pmd_virtio
>+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)  += -lrte_pmd_vhost
>+endif # $(CONFIG_RTE_LIBRTE_VHOST)
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)+= -lrte_pmd_vmxnet3_uio
> 
> ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)+= -lrte_pmd_qat
> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_GCM)   += -lrte_pmd_aesni_gcm
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += 
>-L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_GCM)  += -lrte_pmd_aesni_gcm
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_GCM)  += 
>-L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO) += -lrte_pmd_null_crypto
>-
>-# AESNI MULTI BUFFER / GCM PMDs are dependent on the IPSec_MB library
>-ifeq ($(CONFIG_RTE_LIBRTE_PMD_AESNI_MB),y)
>-_LDLIBS-y += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
>-else ifeq ($(CONFIG_RTE_LIBRTE_PMD_AESNI_GCM),y)
>-_LDLIBS-y += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
>-endif
>-
>-# SNOW3G PMD is dependent on the LIBSSO library
>+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)+= -lrte_pmd_qat
> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G) += -lrte_pmd_snow3g
> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G) += -L$(LIBSSO_PATH)/build -lsso
> endif # CONFIG_RTE_LIBRTE_CRYPTODEV
> 
>-ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>-
>-_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)  += -lrte_pmd_vhost
>-
>-endif # $(CONFIG_RTE_LIBRTE_VHOST)
>-
> endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
> 
> _LDLIBS-y += $(EXECENV_LDLIBS)
>-- 
>2.5.5
>
>

[dpdk-dev] [PATCH] mk: generate internal library dependencies from DEPDIRS-y automatically

2016-06-07 Thread Wiles, Keith

On 6/7/16, 9:19 AM, "dev on behalf of Thomas Monjalon"  wrote:

>2016-06-07 15:07, Bruce Richardson:
>> On Tue, Jun 07, 2016 at 03:00:45PM +0200, Thomas Monjalon wrote:
>> > 2016-06-07 14:36, Christian Ehrhardt:
>> > > But I still struggle to see how to fix the circular dependency between
>> > > librte_eal and librte_mempool.
>> > 
>> > Why is there a circular dependency?
>> > Only because of logs using mempool?
>> > 
>> > > Maybe now is a time to look at this part of the original threads again to
>> > > eventually get apps less overlinked?
>> > > => http://www.dpdk.org/ml/archives/dev/2016-May/039441.html
>> > > My naive suggestions in generalized form can be found there (no answer 
>> > > yet):
>> > > =>
>> > > http://stackoverflow.com/questions/37351699/how-to-create-both-so-files-for-two-circular-depending-libraries
>> > 
>> > I would prefer removing the circular dependency.
>> > Maybe we can rewrite the code to not use mempool or move it outside of EAL.
>> 
>> Or else we can take the attitude that the mempools and the rings are just a 
>> core
>> part of DPDK and move them and the EAL into a dpdk_core library at link time.
>> Having the code separate in the git tree is good, but I'm not sure having
>> the resulting object files being in separate .a/.so files is particularly 
>> useful.
>> I can't see someone wanting to use one without the other.
>
>EAL could be used as an abstraction layer on top of systems and platforms.
>And I think keeping things separated and layered help to maintain a design
>easy to understand.

One of the other areas I see as an issue when building DPDK is the headers are 
only linked into place as the makefile is executed causing a dependency on 
build order. If you have a circular dependency on building libs or core code we 
have to link directly. Would this problem be helped if we had a two pass type 
build system, one to link the headers and then build the libraries?
>

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-03 Thread Wiles, Keith


On 6/3/16, 2:18 PM, "Neil Horman"  wrote:

>On Fri, Jun 03, 2016 at 07:07:50PM +, Wiles, Keith wrote:
>> On 6/3/16, 2:00 PM, "dev on behalf of Wiles, Keith" > on behalf of keith.wiles at intel.com> wrote:
>> 
>> >On 6/3/16, 1:52 PM, "Arnon Warshavsky" mailto:arnon at 
>> >qwilt.com>> wrote:
>> >
>> >
>> >
>> >On Fri, Jun 3, 2016 at 9:38 PM, Neil Horman > >tuxdriver.com<mailto:nhorman at tuxdriver.com>> wrote:
>> >On Fri, Jun 03, 2016 at 06:29:13PM +, Wiles, Keith wrote:
>> >>
>> >> On 6/3/16, 12:44 PM, "Neil Horman" > >> tuxdriver.com<mailto:nhorman at tuxdriver.com>> wrote:
>> >>
>> >> >On Fri, Jun 03, 2016 at 04:04:14PM +, Wiles, Keith wrote:
>> >> >> Sorry, I deleted all of the text as it was getting a bit long.
>> >> >>
>> >> >> Here are my thoughts as of now, which is a combination of many 
>> >> >> suggestions I read from everyone?s emails. I hope this is not too hard 
>> >> >> to understand.
>> >> >>
>> >> >> - Break out the current command line options out of the DPDK common 
>> >> >> code and move into a new lib.
>> >> >>   - At this point I was thinking of keeping the rte_eal_init(args, 
>> >> >> argv) API and just have it pass the args/argv to the new lib to create 
>> >> >> the data storage.
>> >> >>  - Maybe move the rte_eal_init() API to the new lib or keep it in 
>> >> >> the common eal code. Do not want to go hog wild.
>> >> >>   - The rte_eal_init(args, argv) would then call to the new API 
>> >> >> rte_eal_initialize(void), which in turn queries the data storage. 
>> >> >> (still thinking here)
>> >> >These three items seem to be the exact opposite of my suggestion.  The 
>> >> >point of
>> >> >this change was to segregate the parsing of configuration away from the
>> >> >initalization dpdk using that configurtion.  By keeping rte_eal_init in 
>> >> >such a
>> >> >way that the command line is directly passed into it, you've not changed 
>> >> >that
>> >> >implicit binding to command line options.
>> >>
>> >> Neil,
>> >>
>> >> You maybe reading the above wrong or I wrote it wrong, which is a high 
>> >> possibility. I want to move the command line parsing out of DPDK an into 
>> >> a library, but I still believe I need to provide some backward 
>> >> compatibility for ABI and to reduce the learning curve. The current 
>> >> applications can still call the rte_eal_init(), which then calls the new 
>> >> lib parser for dpdk command line options and then calls 
>> >> rte_eal_initialize() or move to the new API rte_eal_initialize() preceded 
>> >> by a new library call to parse the old command line args. At some point 
>> >> we can deprecate the rte_eal_init() if we think it is reasonable.
>> >>
>> >> >
>> >> >I can understand if you want to keep rte_eal_init as is for ABI 
>> >> >purposes, but
>> >> >then you should create an rte_eal_init2(foo), where foo is some handle 
>> >> >to in
>> >> >memory parsed configuration, so that applications can preform that 
>> >> >separation.
>> >>
>> >> I think you describe what I had planned here. The rte_eal_initialize() 
>> >> routine is the new rte_eal_init2() API and the rte_eal_init() was only 
>> >> for backward compatibility was my thinking. I figured the argument to 
>> >> rte_eal_initialize() would be something to be decided, but it will mostly 
>> >> likely be some type of pointer to the storage.
>> >>
>> >> I hope that clears that up, but let me know.
>> >>
>> >yes, that clarifies your thinking, and I agree with it.  Thank you!
>> >Neil
>> >
>> >> ++Keith
>> >>
>> >> >
>> >> >Neil
>> >> >
>> >> >>   - The example apps args needs to be passed to the examples as is for 
>> >> >> now, then we can convert them one at a time if needed.
>> >> >>
>> >> >> - I would like to keep the storage of the data separate from the file 
>> >> >> parser as they can use the ?set? routines to bu

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-03 Thread Wiles, Keith

On 6/3/16, 2:00 PM, "dev on behalf of Wiles, Keith"  wrote:

>On 6/3/16, 1:52 PM, "Arnon Warshavsky" mailto:arnon at 
>qwilt.com>> wrote:
>
>
>
>On Fri, Jun 3, 2016 at 9:38 PM, Neil Horman tuxdriver.com<mailto:nhorman at tuxdriver.com>> wrote:
>On Fri, Jun 03, 2016 at 06:29:13PM +, Wiles, Keith wrote:
>>
>> On 6/3/16, 12:44 PM, "Neil Horman" mailto:nhorman 
>> at tuxdriver.com>> wrote:
>>
>> >On Fri, Jun 03, 2016 at 04:04:14PM +, Wiles, Keith wrote:
>> >> Sorry, I deleted all of the text as it was getting a bit long.
>> >>
>> >> Here are my thoughts as of now, which is a combination of many 
>> >> suggestions I read from everyone?s emails. I hope this is not too hard to 
>> >> understand.
>> >>
>> >> - Break out the current command line options out of the DPDK common code 
>> >> and move into a new lib.
>> >>   - At this point I was thinking of keeping the rte_eal_init(args, argv) 
>> >> API and just have it pass the args/argv to the new lib to create the data 
>> >> storage.
>> >>  - Maybe move the rte_eal_init() API to the new lib or keep it in the 
>> >> common eal code. Do not want to go hog wild.
>> >>   - The rte_eal_init(args, argv) would then call to the new API 
>> >> rte_eal_initialize(void), which in turn queries the data storage. (still 
>> >> thinking here)
>> >These three items seem to be the exact opposite of my suggestion.  The 
>> >point of
>> >this change was to segregate the parsing of configuration away from the
>> >initalization dpdk using that configurtion.  By keeping rte_eal_init in 
>> >such a
>> >way that the command line is directly passed into it, you've not changed 
>> >that
>> >implicit binding to command line options.
>>
>> Neil,
>>
>> You maybe reading the above wrong or I wrote it wrong, which is a high 
>> possibility. I want to move the command line parsing out of DPDK an into a 
>> library, but I still believe I need to provide some backward compatibility 
>> for ABI and to reduce the learning curve. The current applications can still 
>> call the rte_eal_init(), which then calls the new lib parser for dpdk 
>> command line options and then calls rte_eal_initialize() or move to the new 
>> API rte_eal_initialize() preceded by a new library call to parse the old 
>> command line args. At some point we can deprecate the rte_eal_init() if we 
>> think it is reasonable.
>>
>> >
>> >I can understand if you want to keep rte_eal_init as is for ABI purposes, 
>> >but
>> >then you should create an rte_eal_init2(foo), where foo is some handle to in
>> >memory parsed configuration, so that applications can preform that 
>> >separation.
>>
>> I think you describe what I had planned here. The rte_eal_initialize() 
>> routine is the new rte_eal_init2() API and the rte_eal_init() was only for 
>> backward compatibility was my thinking. I figured the argument to 
>> rte_eal_initialize() would be something to be decided, but it will mostly 
>> likely be some type of pointer to the storage.
>>
>> I hope that clears that up, but let me know.
>>
>yes, that clarifies your thinking, and I agree with it.  Thank you!
>Neil
>
>> ++Keith
>>
>> >
>> >Neil
>> >
>> >>   - The example apps args needs to be passed to the examples as is for 
>> >> now, then we can convert them one at a time if needed.
>> >>
>> >> - I would like to keep the storage of the data separate from the file 
>> >> parser as they can use the ?set? routines to build the data storage up.
>> >>   - Keeping them split allows for new parsers to be created, while 
>> >> keeping the data storage from changing.
>> >> - The rte_cfg code could be modified to use the new configuration if 
>> >> someone wants to take on that task ?
>> >>
>> >> - Next is the data storage and how we can access the data in a clean 
>> >> simple way.
>> >> - I want to have some simple level of hierarchy in the data.
>> >>   - Having a string containing at least two levels ?primary:secondary?.
>> >>  - Primary string is something like ?EAL? or ?Pktgen? or ?testpmd? to 
>> >> divide the data storage into logical major groups.
>> >> - The primary allows us to have groups and then we can have 
>> >> common secondary strings in differe

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-03 Thread Wiles, Keith

On 6/3/16, 1:52 PM, "Arnon Warshavsky" mailto:arnon at 
qwilt.com>> wrote:



On Fri, Jun 3, 2016 at 9:38 PM, Neil Horman mailto:nhorman at tuxdriver.com>> wrote:
On Fri, Jun 03, 2016 at 06:29:13PM +, Wiles, Keith wrote:
>
> On 6/3/16, 12:44 PM, "Neil Horman" mailto:nhorman 
> at tuxdriver.com>> wrote:
>
> >On Fri, Jun 03, 2016 at 04:04:14PM +, Wiles, Keith wrote:
> >> Sorry, I deleted all of the text as it was getting a bit long.
> >>
> >> Here are my thoughts as of now, which is a combination of many suggestions 
> >> I read from everyone?s emails. I hope this is not too hard to understand.
> >>
> >> - Break out the current command line options out of the DPDK common code 
> >> and move into a new lib.
> >>   - At this point I was thinking of keeping the rte_eal_init(args, argv) 
> >> API and just have it pass the args/argv to the new lib to create the data 
> >> storage.
> >>  - Maybe move the rte_eal_init() API to the new lib or keep it in the 
> >> common eal code. Do not want to go hog wild.
> >>   - The rte_eal_init(args, argv) would then call to the new API 
> >> rte_eal_initialize(void), which in turn queries the data storage. (still 
> >> thinking here)
> >These three items seem to be the exact opposite of my suggestion.  The point 
> >of
> >this change was to segregate the parsing of configuration away from the
> >initalization dpdk using that configurtion.  By keeping rte_eal_init in such 
> >a
> >way that the command line is directly passed into it, you've not changed that
> >implicit binding to command line options.
>
> Neil,
>
> You maybe reading the above wrong or I wrote it wrong, which is a high 
> possibility. I want to move the command line parsing out of DPDK an into a 
> library, but I still believe I need to provide some backward compatibility 
> for ABI and to reduce the learning curve. The current applications can still 
> call the rte_eal_init(), which then calls the new lib parser for dpdk command 
> line options and then calls rte_eal_initialize() or move to the new API 
> rte_eal_initialize() preceded by a new library call to parse the old command 
> line args. At some point we can deprecate the rte_eal_init() if we think it 
> is reasonable.
>
> >
> >I can understand if you want to keep rte_eal_init as is for ABI purposes, but
> >then you should create an rte_eal_init2(foo), where foo is some handle to in
> >memory parsed configuration, so that applications can preform that 
> >separation.
>
> I think you describe what I had planned here. The rte_eal_initialize() 
> routine is the new rte_eal_init2() API and the rte_eal_init() was only for 
> backward compatibility was my thinking. I figured the argument to 
> rte_eal_initialize() would be something to be decided, but it will mostly 
> likely be some type of pointer to the storage.
>
> I hope that clears that up, but let me know.
>
yes, that clarifies your thinking, and I agree with it.  Thank you!
Neil

> ++Keith
>
> >
> >Neil
> >
> >>   - The example apps args needs to be passed to the examples as is for 
> >> now, then we can convert them one at a time if needed.
> >>
> >> - I would like to keep the storage of the data separate from the file 
> >> parser as they can use the ?set? routines to build the data storage up.
> >>   - Keeping them split allows for new parsers to be created, while keeping 
> >> the data storage from changing.
> >> - The rte_cfg code could be modified to use the new configuration if 
> >> someone wants to take on that task ?
> >>
> >> - Next is the data storage and how we can access the data in a clean 
> >> simple way.
> >> - I want to have some simple level of hierarchy in the data.
> >>   - Having a string containing at least two levels ?primary:secondary?.
> >>  - Primary string is something like ?EAL? or ?Pktgen? or ?testpmd? to 
> >> divide the data storage into logical major groups.
> >> - The primary allows us to have groups and then we can have common 
> >> secondary strings in different groups if needed.
> >>  - Secondary string can be whatever the developer of that group would 
> >> like e.g. simple ?EAL:foobar?, two levels ?testpmd:foo.bar?
> >>
> >>   - The secondary string is treated as a single string if it has a 
> >> hierarchy or not, but referencing a single value in the data storage.
> >>  - Key value pairs (KVP) or a hashmap data store.
> >> - The key here is the whol

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-03 Thread Wiles, Keith


On 6/3/16, 12:44 PM, "Neil Horman"  wrote:

>On Fri, Jun 03, 2016 at 04:04:14PM +, Wiles, Keith wrote:
>> Sorry, I deleted all of the text as it was getting a bit long.
>> 
>> Here are my thoughts as of now, which is a combination of many suggestions I 
>> read from everyone?s emails. I hope this is not too hard to understand.
>> 
>> - Break out the current command line options out of the DPDK common code and 
>> move into a new lib.
>>   - At this point I was thinking of keeping the rte_eal_init(args, argv) API 
>> and just have it pass the args/argv to the new lib to create the data 
>> storage.
>>  - Maybe move the rte_eal_init() API to the new lib or keep it in the 
>> common eal code. Do not want to go hog wild.
>>   - The rte_eal_init(args, argv) would then call to the new API 
>> rte_eal_initialize(void), which in turn queries the data storage. (still 
>> thinking here)
>These three items seem to be the exact opposite of my suggestion.  The point of
>this change was to segregate the parsing of configuration away from the
>initalization dpdk using that configurtion.  By keeping rte_eal_init in such a
>way that the command line is directly passed into it, you've not changed that
>implicit binding to command line options.

Neil,

You maybe reading the above wrong or I wrote it wrong, which is a high 
possibility. I want to move the command line parsing out of DPDK an into a 
library, but I still believe I need to provide some backward compatibility for 
ABI and to reduce the learning curve. The current applications can still call 
the rte_eal_init(), which then calls the new lib parser for dpdk command line 
options and then calls rte_eal_initialize() or move to the new API 
rte_eal_initialize() preceded by a new library call to parse the old command 
line args. At some point we can deprecate the rte_eal_init() if we think it is 
reasonable.

>
>I can understand if you want to keep rte_eal_init as is for ABI purposes, but
>then you should create an rte_eal_init2(foo), where foo is some handle to in
>memory parsed configuration, so that applications can preform that separation.

I think you describe what I had planned here. The rte_eal_initialize() routine 
is the new rte_eal_init2() API and the rte_eal_init() was only for backward 
compatibility was my thinking. I figured the argument to rte_eal_initialize() 
would be something to be decided, but it will mostly likely be some type of 
pointer to the storage.

I hope that clears that up, but let me know.

++Keith

>
>Neil
>
>>   - The example apps args needs to be passed to the examples as is for now, 
>> then we can convert them one at a time if needed.
>> 
>> - I would like to keep the storage of the data separate from the file parser 
>> as they can use the ?set? routines to build the data storage up.
>>   - Keeping them split allows for new parsers to be created, while keeping 
>> the data storage from changing.
>> - The rte_cfg code could be modified to use the new configuration if someone 
>> wants to take on that task ?
>> 
>> - Next is the data storage and how we can access the data in a clean simple 
>> way.
>> - I want to have some simple level of hierarchy in the data.
>>   - Having a string containing at least two levels ?primary:secondary?.
>>  - Primary string is something like ?EAL? or ?Pktgen? or ?testpmd? to 
>> divide the data storage into logical major groups.
>> - The primary allows us to have groups and then we can have common 
>> secondary strings in different groups if needed.
>>  - Secondary string can be whatever the developer of that group would 
>> like e.g. simple ?EAL:foobar?, two levels ?testpmd:foo.bar?
>> 
>>   - The secondary string is treated as a single string if it has a hierarchy 
>> or not, but referencing a single value in the data storage.
>>  - Key value pairs (KVP) or a hashmap data store.
>> - The key here is the whole string ?EAL:foobar? not just ?foobar? 
>> secondary string.
>>- If we want to have the two split I am ok with that as well 
>> meaning the API would be:
>>  rte_map_get(mapObj, ?EAL?, ?foo.bar?);
>>  rte_map_set(mapObj, ?EAL?, ?foo.bar?, value);
>>- Have the primary as a different section in the data store, 
>> would allow for dumping that section maybe easier, not sure.
>>   - I am leaning toward
>>  - Not going to try splitting up the string or parse it as it is up to 
>> the developer to make it unique in the data store.
>> - Use a code design to make the strings simple to use without having typos 
>> be a problem.
>>- Not sure what the design is yet, but I do not want to have to concat 
>> two string or split strings in the code.
>> 
>> This is as far as I have gotten and got tired of typing ?
>> 
>> I hope this will satisfy most everyone?s needs for now.
>> 
>> 
>> Regards,
>> Keith
>> 
>> 
>> 
>

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-03 Thread Wiles, Keith



Regards,
Keith



On 6/3/16, 11:04 AM, "dev on behalf of Wiles, Keith"  wrote:

>Sorry, I deleted all of the text as it was getting a bit long.
>
>Here are my thoughts as of now, which is a combination of many suggestions I 
>read from everyone?s emails. I hope this is not too hard to understand.
>
>- Break out the current command line options out of the DPDK common code and 
>move into a new lib.
>  - At this point I was thinking of keeping the rte_eal_init(args, argv) API 
> and just have it pass the args/argv to the new lib to create the data storage.
> - Maybe move the rte_eal_init() API to the new lib or keep it in the 
> common eal code. Do not want to go hog wild.
>  - The rte_eal_init(args, argv) would then call to the new API 
> rte_eal_initialize(void), which in turn queries the data storage. (still 
> thinking here)
>  - The example apps args needs to be passed to the examples as is for now, 
> then we can convert them one at a time if needed.
>
>- I would like to keep the storage of the data separate from the file parser 
>as they can use the ?set? routines to build the data storage up.
>  - Keeping them split allows for new parsers to be created, while keeping the 
> data storage from changing.
>- The rte_cfg code could be modified to use the new configuration if someone 
>wants to take on that task ?
>
>- Next is the data storage and how we can access the data in a clean simple 
>way.
>- I want to have some simple level of hierarchy in the data.
>  - Having a string containing at least two levels ?primary:secondary?.
> - Primary string is something like ?EAL? or ?Pktgen? or ?testpmd? to 
> divide the data storage into logical major groups.
>- The primary allows us to have groups and then we can have common 
> secondary strings in different groups if needed.
> - Secondary string can be whatever the developer of that group would like 
> e.g. simple ?EAL:foobar?, two levels ?testpmd:foo.bar?
>
>  - The secondary string is treated as a single string if it has a hierarchy 
> or not, but referencing a single value in the data storage.
> - Key value pairs (KVP) or a hashmap data store.
>- The key here is the whole string ?EAL:foobar? not just ?foobar? 
> secondary string.
>   - If we want to have the two split I am ok with that as well 
> meaning the API would be:
> rte_map_get(mapObj, ?EAL?, ?foo.bar?);
> rte_map_set(mapObj, ?EAL?, ?foo.bar?, value);
>   - Have the primary as a different section in the data store, would 
> allow for dumping that section maybe easier, not sure.
>  - I am leaning toward
A single string, but let me know your thoughts.
> - Not going to try splitting up the string or parse it as it is up to the 
> developer to make it unique in the data store.
>- Use a code design to make the strings simple to use without having typos be 
>a problem.
>   - Not sure what the design is yet, but I do not want to have to concat two 
> string or split strings in the code.
>
>This is as far as I have gotten and got tired of typing ?
>
>I hope this will satisfy most everyone?s needs for now.
>
>
>Regards,
>Keith
>
>
>
>

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-03 Thread Wiles, Keith

Sorry, I deleted all of the text as it was getting a bit long.

Here are my thoughts as of now, which is a combination of many suggestions I 
read from everyone?s emails. I hope this is not too hard to understand.

- Break out the current command line options out of the DPDK common code and 
move into a new lib.
  - At this point I was thinking of keeping the rte_eal_init(args, argv) API 
and just have it pass the args/argv to the new lib to create the data storage.
 - Maybe move the rte_eal_init() API to the new lib or keep it in the 
common eal code. Do not want to go hog wild.
  - The rte_eal_init(args, argv) would then call to the new API 
rte_eal_initialize(void), which in turn queries the data storage. (still 
thinking here)
  - The example apps args needs to be passed to the examples as is for now, 
then we can convert them one at a time if needed.

- I would like to keep the storage of the data separate from the file parser as 
they can use the ?set? routines to build the data storage up.
  - Keeping them split allows for new parsers to be created, while keeping the 
data storage from changing.
- The rte_cfg code could be modified to use the new configuration if someone 
wants to take on that task ?

- Next is the data storage and how we can access the data in a clean simple way.
- I want to have some simple level of hierarchy in the data.
  - Having a string containing at least two levels ?primary:secondary?.
 - Primary string is something like ?EAL? or ?Pktgen? or ?testpmd? to 
divide the data storage into logical major groups.
- The primary allows us to have groups and then we can have common 
secondary strings in different groups if needed.
 - Secondary string can be whatever the developer of that group would like 
e.g. simple ?EAL:foobar?, two levels ?testpmd:foo.bar?

  - The secondary string is treated as a single string if it has a hierarchy or 
not, but referencing a single value in the data storage.
 - Key value pairs (KVP) or a hashmap data store.
- The key here is the whole string ?EAL:foobar? not just ?foobar? 
secondary string.
   - If we want to have the two split I am ok with that as well meaning 
the API would be:
 rte_map_get(mapObj, ?EAL?, ?foo.bar?);
 rte_map_set(mapObj, ?EAL?, ?foo.bar?, value);
   - Have the primary as a different section in the data store, would 
allow for dumping that section maybe easier, not sure.
  - I am leaning toward
 - Not going to try splitting up the string or parse it as it is up to the 
developer to make it unique in the data store.
- Use a code design to make the strings simple to use without having typos be a 
problem.
   - Not sure what the design is yet, but I do not want to have to concat two 
string or split strings in the code.

This is as far as I have gotten and got tired of typing ?

I hope this will satisfy most everyone?s needs for now.


Regards,
Keith

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Wiles, Keith


On 6/2/16, 12:11 PM, "Neil Horman"  wrote:

>
>1) The definition of a config structure that can be passed to rte_eal_init,
>defining the configuration for that running process

Having a configuration structure means we have to have an ABI change to that 
structure anytime we add or remove an option. I was thinking a very simple DB 
of some kind would be better. Have the code query the DB to obtain the needed 
information. The APIs used to query and set the DB needs to be very easy to use 
as well.

Maybe each option can define its own structure if needed or just a simple 
variable type can be used for the basic types (int, string, bool, ?)

Would this work better in the long run, does a fixed structure still make sense?

>
>2) The creation and use of an API that various DPDK libraries can use to
>retrieve that structure (or elements thereof), based on some explicit or 
>imlicit
>id, so that the configuration can be used (I'm thinking here specifically of
>multiple dpdk applications using a dpdk shared library)
>
>3) The removal of the eal_parse_args code from the core dpdk library entirely,
>packaging it instead as its own library that interprets command line arguments
>as currently defined, and populates an instance of the structure defined in (1)
>
>4) Altering the Makefiles, so that the example apps link against the new 
>library
>in (3), altering the app source code to work with the config structure defined
>in (1)
>
>With those steps, I think we will remove the command line bits from the dpdk
>core, and do so without altering the user experience for any of the sample apps
>(which will demonstrate to other developers that the same can be done with 
>their
>applications).  From there we will be free to create alternate methods of
>populating the config struct defined in (1) (via JSON file, YAML, XML, or
>whatever).
>
>Neil
>
>> >> 
>> >> For the purposes of the example apps, it would seem that either JSON, 
>> >> YAML, or
>> >> the above Lua format would work just fine.
>> >
>> >+1
>> >
>> 
>> Regards,
>> ++Keith
>> 
>> 
>

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Wiles, Keith

On 6/2/16, 12:11 PM, "Neil Horman"  wrote:

>On Thu, Jun 02, 2016 at 01:53:32PM +, Wiles, Keith wrote:
>> 
>> On 6/2/16, 8:19 AM, "Thomas Monjalon"  wrote:
>> 
>> >2016-06-02 06:41, Neil Horman:
>> >> I'm not sure why you're focusing no selecting a config file format at 
>> >> all.  Why
>> 
>> The reason is I am on now looking at formats is because I have been thinking 
>> about this issue for some time and already understand your comments. I agree 
>> with you and Thomas, which to me would be the details needing to be done as 
>> part of the project. I guess I found the config file format a lot more fun 
>> to define. ?
>> 
>
>Sure, it is more fun to define, but I think its likely the wrong problem to
>solve (or perhaps not even the wrong problem, but rather the less pressing
>problem).

It is not the wrong problem, just a different starting point in the overall 
problem from the changes below. Now, I believe we have come full circle to 
identify the whole problem.

Let me look at the problem some and see if I can identify a configuration 
structure.
>> 
>> Regards,
>> ++Keith

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Wiles, Keith

On 6/2/16, 8:19 AM, "Thomas Monjalon"  wrote:

>2016-06-02 06:41, Neil Horman:
>> I'm not sure why you're focusing no selecting a config file format at all.  
>> Why

The reason is I am on now looking at formats is because I have been thinking 
about this issue for some time and already understand your comments. I agree 
with you and Thomas, which to me would be the details needing to be done as 
part of the project. I guess I found the config file format a lot more fun to 
define. ?

>> not just focus on removing the argument parsing from the core rte_eal_init 
>> code,
>> instead passing in a configuration struct that is stored and queried per
>> application.  Leave the parsing of a config file and population of that 
>> config
>> struct as an exercize to the application developer.  That way a given
>> application can use command line options, config files, or whatever method 
>> they
>> choose, which would be in keeping with traditional application design.

Moving the code out of DPDK into multiple different libraries one for 
converting command line to config structure (support the current options) and 
possibly some config file format library to config structure would give options 
for the developers. DPDK just needs to be driven by a configuration structure 
or set of APIs and not use args/argv directly.

Moving the current args/argv code out of DPDK into a library should be easy (I 
guess) and I am willing to do that work if we think it is needed today.

>> 
>> For the purposes of the example apps, it would seem that either JSON, YAML, 
>> or
>> the above Lua format would work just fine.
>
>+1
>

Regards,
++Keith

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Wiles, Keith


On 6/1/16, 11:18 AM, "Richardson, Bruce"  wrote:

>On Wed, Jun 01, 2016 at 10:58:41AM -0500, Jay Rolette wrote:
>> On Wed, Jun 1, 2016 at 10:00 AM, Wiles, Keith  
>> wrote:
>> 
>> > Started from the link below, but did not want to highjack the thread.
>> > http://dpdk.org/ml/archives/dev/2016-June/040021.html
>> >
>> > I was thinking about this problem from a user perspective and command line
>> > options are very difficult to manage specifically when you have a large
>> > number of options as we have in dpdk. I see all of these options as a type
>> > of database of information for the DPDK and the application, because the
>> > application command line options are also getting very complex as well.
>> >
>> > I have been looking at a number of different options here and the
>> > direction I was thinking was using a file for the options and
>> > configurations with the data in a clean format. It could have been a INI
>> > file or JSON or XML, but they all seem to have some problems I do not like.
>> > The INI file is too flat and I wanted a hierarchy in the data, the JSON
>> > data is similar and XML is just hard to read. I wanted to be able to manage
>> > multiple applications and possible system the DPDK/app runs. The problem
>> > with the above formats is they are just data and not easy to make decisions
>> > about the system and applications at runtime.
>> >
>> 
>> INI format is simplest for users to read, but if you really need hierarchy,
>> JSON will do that just fine. Not sure what you mean by "JSON data is
>> similar"...
>> 
>> 
>I'd be quite concerned if we start needing lots of hierarchies for 
>configuration.
>
>I'd really just like to see ini file format used for this because:
>* it's a well understood, simple format
>* very easily human readable and editable
>* lots of support for it in lots of languages
>* hierarchies are possible in it too - just not as easy as in other formats
>  though. [In a previous life I worked with ini files which had address
>  hierarchies 6-levels deep in them. It wasn't hard to work with]

Maybe INI will work for hierarchies, but I bet it was not super obvious without 
a lot of comments.

>* it works well with grep since you must have one value per-line
>* it allows comments

We can have comments in any format not really a deciding factor IMHO.

>* we already have a DPDK library for parsing them
>
>However, for me the biggest advantage of using something like ini is that it
>would force us to keep things simple!

Simple is good and with any of these formats you can be simple or complex, just 
depends on the usage.

If all I wanted was to run a few examples then I would say INI is just fine. I 
would like to have a configuration file that can help me understand the system 
and pick the right options for a two socket system or one socket or system with 
4 cores or 16 or running in a VM/container or not or FreeBSD or Linux or ? you 
get the picture all with a single image (maybe not with the FreeBSD/Linux).

In a static data format you do not get these easily (maybe if you added a bunch 
of different options and then made the code figure out which ones to use), in a 
interpreted language you do get them for free.

This is why I do not want just a database of options.

>
>I'd stay away from formats like json or XML that are designed for serializing
>entire objects or structures, and look for something that allows us to just
>specify configuration values.
>
>Regards,
>/Bruce
>
>

[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-01 Thread Wiles, Keith

Not to highjack this thread I created another one ? please have a look, thanks.
http://dpdk.org/ml/archives/dev/2016-June/040079.html

Regards,
Keith

-Original Message-
From: dev  on behalf of Thomas Monjalon 

Date: Wednesday, June 1, 2016 at 9:03 AM
To: Yuanhan Liu 
Cc: "dev at dpdk.org" , Bruce Richardson , "Tan, Jianfeng" , Stephen Hemminger 
, Christian Ehrhardt , Panu Matilainen , Olivier Matz 

Subject: Re: [dpdk-dev] [RFC] kernel paramters like DPDK CLI options

>2016-06-01 21:19, Yuanhan Liu:
>> On Wed, Jun 01, 2016 at 02:39:28PM +0200, Thomas Monjalon wrote:
>> > I was thinking to implement the library options parsing in DPDK.
>> > But if the application implements its own options parsing without using
>> > the DPDK one, yes the option parsing is obviously done in the application.
>> > 
>> > > I'd say, that would work, but I see inflexibility and some drawbacks:
>> > > 
>> > > - I would assume "--pciopt" has the input style of
>> > > 
>> > >   "domain:bus:devid:func,option1,option2,..."
>> > > 
>> > >   It then looks hard to me to use it: I need figure out the
>> > >   pci id first.
>> > 
>> > What do you suggest instead of PCI id?
>>

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Wiles, Keith

Started from the link below, but did not want to highjack the thread.
http://dpdk.org/ml/archives/dev/2016-June/040021.html

I was thinking about this problem from a user perspective and command line 
options are very difficult to manage specifically when you have a large number 
of options as we have in dpdk. I see all of these options as a type of database 
of information for the DPDK and the application, because the application 
command line options are also getting very complex as well.

I have been looking at a number of different options here and the direction I 
was thinking was using a file for the options and configurations with the data 
in a clean format. It could have been a INI file or JSON or XML, but they all 
seem to have some problems I do not like. The INI file is too flat and I wanted 
a hierarchy in the data, the JSON data is similar and XML is just hard to read. 
I wanted to be able to manage multiple applications and possible system the 
DPDK/app runs. The problem with the above formats is they are just data and not 
easy to make decisions about the system and applications at runtime.

If the ?database? of information could be queried by the EAL, drivers and 
application then we do not need to try and create a complex command line. It 
would be nice to execute a DPDK applications like this:

./some_dpdk_app ?config-file dpdk-config-filename

The dpdk-config-filename could contain a lot of information and be able to 
startup multiple different applications. The dpdk-config-file could also 
include other config files to complete the configuration. The format of the 
data in the config file needs to be readable, but allow the user to put in new 
options, needs to be hierarchical in nature and have some simple functions to 
execute if required.

The solution I was thinking is the file information is really just a fragment 
of a scripting language, which the DPDK application contains this scripting 
language interpreter. I was looking at using Lua lua.org as the scripting 
language interpreter it is small and easy to understand. Python and others are 
very big and require a lot of resources and Lua requires very few system 
resources. Also I did not want to have to write a parser (lex/yacc). The other 
nice feature of Lua is you can create a sandbox for the code to run in and 
limit the type of system resources and APIs that can be accessed by the 
application and configuration. Lua can be trimmed down to a fairly small size 
and builds on just about any system or we can just install Lua on the system 
without changes from a rpm or deb.

I use Lua in pktgen at this time and the interface between ?C? and Lua is very 
simple and easy. Currently I include Lua in Pktgen, but I could have just used 
a system library.

The data in the config file can be data statements along with some limited code 
to make some data changes at run time without having to modify the real 
application. Here is a simple config file I wrote: Some of the options do not 
make sense to be in the file at the same time, but wanted to see all of the 
options. The mk_lcore_list() and mk_coremap() are just Lua functions we can 
preload to help convert the simple strings into real data in this case tables 
of information. The application could be something like pktgen = { map = { ? }, 
more_options = 1, } this allows the same file to possible contain many 
application configurations. Needs a bit more work.

dpdk_default = {
lcore_mask = 0xFF00,
lcore_list = mk_lcore_list("0-7", 10, "14-16"),
coremap = mk_coremap("(0-7)@0,10,(14-16)@1"),
master_lcore = 1,
log_level = 7,
ranks = 4,
channels = 2,
memory = 512,
socket_mem = { 512, 512 },
huge_dir = "/mnt/huge",
base_virtaddr = 0,
create_uio_dev = true,
vfio_intr = "legacy",
xen_dom0 = false,
proc_type = "auto",
pci_blacklist = {
"08:00.0",
"08:00.1",
"09:00.0",
"09:00.1",
"83:00.1",
"87:00.0",
"87:00.1",
"89:00.0",
"89:00.1"
},
pci_whitelist = {
},
vdev = {
eth_pcap0 = { iface = "eth2" },
eth_pcap1 = { iface = "eth3" },
},
driver = { },
syslog = true,
vmware_tsc_map = false,
file_prefix = "pg",
huge_unlink = true,
no_huge = false,
no_pci = false,
no_hpet = false,
no_shconf = false,
}

pktgen_data = {
   map = { ? },
   more-data = 1,
}

The EAL, driver, application, ? would query an API to access the data and the 
application can change his options quickly without modifying the code.

Anyway comments are welcome.

Regards,
Keith

[dpdk-dev] [dpdk-users] memory tracker in PMD level

2016-05-31 Thread Wiles, Keith

Resend to the list for others to view.

Regards,
Keith



>Please do not send HTML based text to the list only plain text emails.
>
>Regards,
>Keith
>
>
>From: SAKTHIVEL ANAND S 
>Date: Tuesday, May 31, 2016 at 10:05 AM
>To: Keith Wiles 
>Subject: Re: [dpdk-users] memory tracker in PMD level
>
>it looks it prints tx and rx counts in terms of number of packets and related 
>status(error type of that droped packetr). but actually i need mbuf status 
>which is associated with that interface. tx and rx cound is secondary part for 
>me.. i would like to check memory leak if any, in my code.
>
KW: All TX count drops are handled by the applications as that number of 
packets placed on the TX ring is returned in the tx_burst call not in the PMD. 
On the RX side if the mbufs are not available to attach to the data then you 
can look at the mempool stats, but that is about it.
>
>On Tue, May 31, 2016 at 8:29 PM, SAKTHIVEL ANAND S  
>wrote:
>Ok thanks.
>
>
>On Tue, May 31, 2016 at 8:19 PM, Wiles, Keith  wrote:
>
>From: SAKTHIVEL ANAND S 
>Date: Tuesday, May 31, 2016 at 9:45 AM
>To: Keith Wiles 
>Subject: Re: [dpdk-users] memory tracker in PMD level
>
>Hi
>
>so what is the api , should i call to print those on .. is it like 
>i40e_dev_stats_get()  ??
>
>Please have a look at the Docs in the area of rte_eth_stats_get() 
>https://readthedocs.org/projects/dpdk/
>
>Thanks
>
>
>On Tue, May 31, 2016 at 8:12 PM, Wiles, Keith  wrote:
>
>>Hi
>>
>>How can we track the packet drops in the PMD level, when receiving rings is
>>full or by some other means.
>>
>>Lets assume my processing speed is less than receiving line rate so PMD
>>rings are over run and incoming packets are getting dropped. how should i
>>track this issue(status of mbufs and dropping packets counts in PMD). Is
>>there any APIs or any tool available? is libtrace will help here?
>
>In most/all of the PMD drivers the stats will give you dropped packets at the 
>wire.
>
>>
>>any help would be much appreciated.
>>
>>--
>>Thanks
>>Sakthivel S OM
>>
>
>
>
>
>
>--
>Thanks
>Sakthivel S OM
>
>
>
>
>
>
>-- 
>Thanks
>Sakthivel S OM
>
>
>
>-- 
>Thanks
>Sakthivel S OM
>
>
>
>
>

[dpdk-dev] about memzone name size issue

2016-05-31 Thread Wiles, Keith

>Hi all,
>I find a issue on link_bonding unit test case.
>
>When I run model6 test case, will generate core dump error.
>I debug it, find the error code in function:
>rte_mempool_create_empty(const char *name, unsigned n, unsigned elt_size,
>unsigned cache_size, unsigned private_data_size,
>int socket_id, unsigned flags)
>{
>  ...
>ret = snprintf(mz_name, sizeof(mz_name), 
> RTE_MEMPOOL_MZ_FORMAT, name);
>if (ret < 0 || ret >= (int)sizeof(mz_name)) {
>rte_errno = ENAMETOOLONG;
>goto exit_unlock;
>}
>   .
>}
>The memzone name size only 32 bytes, but the mz_name in link_bonding is bigger 
>32 bytes. Could we set memzone name size to 64 bytes ?
>Thanks  a lot

Having a name of 32 bytes is pretty big IMO, what would be a better reason for 
changing a internal structure as it will take two releases to make that change?
How big is the name string you are passing into the routine?
>

[dpdk-dev] rte_pmd_init_all is missing while upgrading from DPDK 1.6 to DPDK 1.7

2016-05-24 Thread Wiles, Keith

The PMD?s are now being inited in the EAL init call for the code and that API 
is not required anymore. If you are adding a new driver, make sure it is up to 
date with any changes in the driver code too.

Regards,
Keith



>Hi,
>
>I have rte_pmd_init_all used in my code. While upgrading from DPDK 1.6 to
>DPDK 1.7, I see this function is missing in DPDK 1.7. I
>removed rte_pmd_init_all from my code, but I am not sure whether
>initialization is done for all poll mode drivers
>
>What is the right way to do this?. Can anyone please help in this.
>
>Thanks,
>Shiva
>

[dpdk-dev] [PATCH] Add rte_mempool_free

2016-05-16 Thread Wiles, Keith

>There is no inverse of rte_mempool_create, so this patch adds one.
>The typical usage of rte_mempool_create is to create a pool at
>initialization time and only to free it upon program exit, so an
>rte_mempool_free function at first seems to be of little value.
>However, it is very useful as a sanity check for a clean shutdown
>when used in conjunction with tools like AddressSanitizer. Further,
>the call itself verifies that all elements have been returned to
>the pool or it fails.
>
>Signed-off-by: Ben Walker 
>---
> lib/librte_mempool/rte_dom0_mempool.c | 22 +++
> lib/librte_mempool/rte_mempool.c  | 70 +++
> lib/librte_mempool/rte_mempool.h  | 41 
> 3 files changed, 133 insertions(+)
>
>diff --git a/lib/librte_mempool/rte_dom0_mempool.c 
>b/lib/librte_mempool/rte_dom0_mempool.c
>index 0d6d750..edf2d58 100644
>--- a/lib/librte_mempool/rte_dom0_mempool.c
>+++ b/lib/librte_mempool/rte_dom0_mempool.c
>@@ -131,3 +131,25 @@ rte_dom0_mempool_create(const char *name, unsigned 
>elt_num, unsigned elt_size,
> 
>   return mp;
> }
>+
>+/* free the mempool supporting Dom0 */
>+int
>+rte_dom0_mempool_free(struct rte_mempool *mp)
>+{
>+  const struct rte_memzone *mz;
>+  char mz_name[RTE_MEMZONE_NAMESIZE];
>+  int rc;
>+
>+  rc = rte_mempool_xmem_free(mp);
>+  if (rc) {
>+  return rc;
>+  }

Remove {} on single line statements.
>+
>+  snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_OBJ_NAME, mp->name);
>+  mz = rte_memzone_lookup(mz_name);
>+  if (mz) {
>+  rte_memzone_free(mz);
>+  }

Here too.
>+
>+  return 0;
>+}
>diff --git a/lib/librte_mempool/rte_mempool.c 
>b/lib/librte_mempool/rte_mempool.c
>index 70812d9..82f645e 100644
>--- a/lib/librte_mempool/rte_mempool.c
>+++ b/lib/librte_mempool/rte_mempool.c
>@@ -638,6 +638,76 @@ exit_unlock:
>   return NULL;
> }
> 
>+#ifndef RTE_LIBRTE_XEN_DOM0
>+/* stub if DOM0 support not configured */
>+int
>+rte_dom0_mempool_free(struct rte_mempool *mp __rte_unused)
>+{
>+  rte_errno = EINVAL;
>+  return -1;

I was thinking this should just return OK or zero. The chances of being called 
is very low and maybe will not be called, right? If so then do we need the 
function?
>+}
>+#endif
>+
>+int
>+rte_mempool_free(struct rte_mempool *mp)
>+{
>+  if (rte_xen_dom0_supported())
>+  return rte_dom0_mempool_free(mp);
>+  else
>+  return rte_mempool_xmem_free(mp);
>+}
>+
>+
>+/* Free the memory pool */
>+int
>+rte_mempool_xmem_free(struct rte_mempool *mp)
>+{
>+  char mz_name[RTE_MEMZONE_NAMESIZE];
>+  struct rte_mempool_list *mempool_list;
>+  struct rte_tailq_entry *te = NULL;
>+  const struct rte_memzone *mz;
>+  unsigned count;
>+
>+  if (!mp) {
>+  return 0;
>+  }
Remove the extra {}
>+
>+  count = rte_mempool_free_count(mp);
>+  if (count != 0) {
>+  /* All elements must be returned to the pool before free */
>+  return count;
>+  }

This one also does not really need the {}
>+
>+  rte_rwlock_write_lock(RTE_EAL_MEMPOOL_RWLOCK);
>+
>+  /* Free the ring associated with this mempool */
>+  if (mp->ring) {
>+  rte_ring_free(mp->ring);
>+  }

This one too.
>+
>+  /* Remove the entry from the mempool list and free it. */
>+  rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
>+  mempool_list = RTE_TAILQ_CAST(rte_mempool_tailq.head, rte_mempool_list);
>+  TAILQ_FOREACH(te, mempool_list, next) {
>+  if ((struct rte_mempool *)te->data == mp) {
>+  TAILQ_REMOVE(mempool_list, te, next);
>+  rte_free(te);
>+  break;
>+  }
>+  }
>+  rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
>+
>+  snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_MZ_FORMAT, mp->name);
>+  mz = rte_memzone_lookup(mz_name);
>+  if (mz) {
>+  rte_memzone_free(mz);
>+  }

This one too.
>+
>+  rte_rwlock_write_unlock(RTE_EAL_MEMPOOL_RWLOCK);
>+
>+  return 0;
>+}

The big question is how do you know the mempool is not being used someplace?

>+
> /* Return the number of entries in the mempool */
> unsigned
> rte_mempool_count(const struct rte_mempool *mp)
>diff --git a/lib/librte_mempool/rte_mempool.h 
>b/lib/librte_mempool/rte_mempool.h
>index 9745bf0..26949c7 100644
>--- a/lib/librte_mempool/rte_mempool.h
>+++ b/lib/librte_mempool/rte_mempool.h
>@@ -728,6 +728,47 @@ rte_dom0_mempool_create(const char *name, unsigned n, 
>unsigned elt_size,
>   rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
>   int socket_id, unsigned flags);
> 
>+/**
>+ * Free the memory pool created by rte_mempool_create
>+ *
>+ * All elements must be placed back in the pool prior to calling this 
>function.
>+ *
>+ * @param mp
>+ *   A pointer to the mempool structure.
>+ * @return
>+ *   0 on success. -1 on error

[dpdk-dev] [RFC PATCH] tools:new tool for system info CPU, memory and huge pages

2016-05-16 Thread Wiles, Keith

>2016-05-16 14:11, Wiles, Keith:
>> >2016-05-13 15:48, Wiles, Keith:
>> >> I create this new tool to combine some information and use /sys/devices 
>> >> instead. What I was hoping was some of you could try this script and see 
>> >> if it works correctly. Also I was hope to find out if this script is 
>> >> useful and what other features you would like in this type of tool.
>> >
>> >What is the intent of this script? Is it to be used for bug report?
>> >There already have some tools to display system informations. Why adding
>> >one more?
>> >Examples of useful tools: hwloc/lstopo, lspci, hugeadm.
>> 
>> I was looking to replace the cpu_layout.py tool which uses the /procfs 
>> instead of /sysfs, just figured we could then add some extra information 
>> into this script as well. Yes, we have other tools, but some people do not 
>> know or use or install these tools. I was hoping this one would be able to 
>> display a number of things to help the developer and us in helping them 
>> debug problems.
>> 
>> Stephen Hemminger sent an email about the use of sysfs instead of procfs.
>> http://dpdk.org/ml/archives/dev/2016-May/038560.html
>
>I agree that cpu_layout.py should be removed.
>Should we implement something else? Or just point to existing tools?

Well I did create something else :-) As for pointing to existing tools, we 
should do that anyway in the documentation.
>Or call existing tools from a small script?

Calling the existing tools could be a problem as they may not be installed, but 
they could install the tools too after the script points it out.

>Is it the DPDK focus to develop and maintain such tool?
I can not answer that question per say. I can at least be the maintainer for 
this script or someone else.

I think we need something that pulls all of the information from the system for 
the developer to see at a quick glance to help debug his system. Also we can 
then say run this script and post to the list, which is nice and simple as a 
debug aid.
>


Regards,
Keith

[dpdk-dev] [RFC PATCH] tools:new tool for system info CPU, memory and huge pages

2016-05-16 Thread Wiles, Keith

>2016-05-13 15:48, Wiles, Keith:
>> I create this new tool to combine some information and use /sys/devices 
>> instead. What I was hoping was some of you could try this script and see if 
>> it works correctly. Also I was hope to find out if this script is useful and 
>> what other features you would like in this type of tool.
>
>What is the intent of this script? Is it to be used for bug report?
>There already have some tools to display system informations. Why adding
>one more?
>Examples of useful tools: hwloc/lstopo, lspci, hugeadm.

I was looking to replace the cpu_layout.py tool which uses the /procfs instead 
of /sysfs, just figured we could then add some extra information into this 
script as well. Yes, we have other tools, but some people do not know or use or 
install these tools. I was hoping this one would be able to display a number of 
things to help the developer and us in helping them debug problems.

Stephen Hemminger sent an email about the use of sysfs instead of procfs.
http://dpdk.org/ml/archives/dev/2016-May/038560.html


>


Regards,
Keith

[dpdk-dev] [RFC PATCH] tools:new tool for system info CPU, memory and huge pages

2016-05-13 Thread Wiles, Keith

Sorry this one should have been an RFC instead of a patch.

I create this new tool to combine some information and use /sys/devices 
instead. What I was hoping was some of you could try this script and see if it 
works correctly. Also I was hope to find out if this script is useful and what 
other features you would like in this type of tool.

I am not a Python coder, so if you have suggestions on the code please let me 
know.

Thanks

>The new tool uses /sys/devices instead of /proc directory, which
>does not exist on all systems. If the procfs is not available
>then memory and huge page information is skipped.
>
>The tool also can emit a json format in short or long form to
>allow for machine readable information.
>
>Here is the usage information:
>
>Usage: sys_info.py [options]
>
>Show the lcores layout with core id and socket(s).
>
>Options:
>--help or -h:
>Display the usage information and quit
>
>--long or -l:
>Display the information in a machine readable long format.
>
>--short or -s:
>Display the information in a machine readable short format.
>
>--pci or -p:
>Display all of the Ethernet devices in the system using 'lspci'.
>
>--version or -v:
>Display the current version number of this script.
>
>--debug or -d:
>Output some debug information.
>
>default:
>Display the information in a human readable format.
>
>Signed-off-by: Keith Wiles 
>---
> tools/sys_info.py | 551 ++
> 1 file changed, 551 insertions(+)
> create mode 100755 tools/sys_info.py
>
>diff --git a/tools/sys_info.py b/tools/sys_info.py
>new file mode 100755
>index 000..5e09d12
>--- /dev/null
>+++ b/tools/sys_info.py
>@@ -0,0 +1,551 @@
>+#! /usr/bin/python
>+#
>+#   BSD LICENSE
>+#
>+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
>+#   All rights reserved.
>+#
>+#   Redistribution and use in source and binary forms, with or without
>+#   modification, are permitted provided that the following conditions
>+#   are met:
>+#
>+# * Redistributions of source code must retain the above copyright
>+#   notice, this list of conditions and the following disclaimer.
>+# * Redistributions in binary form must reproduce the above copyright
>+#   notice, this list of conditions and the following disclaimer in
>+#   the documentation and/or other materials provided with the
>+#   distribution.
>+# * Neither the name of Intel Corporation nor the names of its
>+#   contributors may be used to endorse or promote products derived
>+#   from this software without specific prior written permission.
>+#
>+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>+#
>+
>+import os, sys
>+import getopt
>+import json
>+from os.path import exists, abspath, dirname, basename
>+
>+version = "0.1.3"
>+
>+# Global lists and flags
>+machine_readable = 0
>+show_pci = False
>+debug_flag = False
>+coremaps_flag = False
>+
>+sys_info = {}
>+coremaps = {}
>+
>+def proc_cpuinfo_path():
>+'''Return the cpu information from /proc'''
>+return "/proc/cpuinfo"
>+
>+def proc_sysinfo_path():
>+'''Return the system path string from /proc'''
>+return "/proc/sysinfo"
>+
>+def proc_meminfo_path():
>+'''Return the memory information path from /proc'''
>+return "/proc/meminfo"
>+
>+def sys_system_path():
>+'''Return the system path string from /sys'''
>+return "/sys/devices/system"
>+
>+def read_file(path, whole_file=False):
>+'''Read the first line of a file'''
>+
>+if os.access(path, os.F_OK) == False:
>+print "Path (%s) Not found" % path
>+return ""
>+
>+fd = open(path)
>+if whole_file == True:
>+lines = fd.readlines()
>+else:
>+line = fd.readline()
>+fd.close()
>+
>+if whole_file == True:
>+return lines
>+
>+return line
>+
>+def get_range(line):
>+'''Split a line and convert to low/high values'''
>+
>+line = line.strip()
>+
>+if '-' in line:
>+low, high = line.split("-")
>+elif ',' in line:
>+low, high = line.split(",")
>+else:
>+return [int(line)]
>+
>+return [int(low), int(high)]
>+
>+def

[dpdk-dev] [PATCH] mbuf: add helpers to prefetch mbuf

2016-05-09 Thread Wiles, Keith

>diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>index 529debb..e3ee0b3 100644
>--- a/lib/librte_mbuf/rte_mbuf.h
>+++ b/lib/librte_mbuf/rte_mbuf.h
>@@ -842,6 +842,44 @@ struct rte_mbuf {
>   uint16_t timesync;
> } __rte_cache_aligned;
> 
>+/**
>+ * Prefetch the first part of the mbuf
>+ *
>+ * The first 64 bytes of the mbuf corresponds to fields that are used early
>+ * in the receive path. If the cache line of the architecture is higher than
>+ * 64B, the second part will also be prefetched.
>+ *
>+ * @param m
>+ *   The pointer to the mbuf.
>+ */
>+static inline void
>+rte_mbuf_prefetch_part0(struct rte_mbuf *m)
>+{
>+  rte_prefetch0(>cacheline0);
>+}
>+
>+/**
>+ * Prefetch the second part of the mbuf
>+ *
>+ * The next 64 bytes of the mbuf corresponds to fields that are used in the
>+ * transmit path. If the cache line of the architecture is higher than 64B,
>+ * this function does nothing as it is expected that the full mbuf is
>+ * already in cache.
>+ *
>+ * @param m
>+ *   The pointer to the mbuf.
>+ */
>+static inline void
>+rte_mbuf_prefetch_part1(struct rte_mbuf *m)
>+{
>+#if RTE_CACHE_LINE_SIZE == 64
>+  rte_prefetch0(>cacheline1);
>+#else
>+  RTE_SET_USED(m);
>+#endif
>+}

I am not super happy with the names here, but I understand that 
rte_mbuf_prefetch_cacheline0() is a bit long. I could live with them being 
longer if that makes more sense and adds to readability.

Another idea is to have only one function for both:

enum { MBUF_CACHELINE0 = 0, MBUF_CACHELINE1, MBUF_CACHELINES }; // 
Optional enum if you want

static inline void
rte_mbuf_prefetch(struct rte_mbuf *m, unsigned cacheline)   // Make sure we 
add a comment about the constant value
{
if (cacheline == MBUF_CACHELINE0)
rte_prefetch0(>cacheline0);
else if (cacheline == MBUF_CACHELINE1)
rte_prefetch0(>cacheline1);
else {
rte_prefetch0(>cacheline0);
rte_prefetch0(>cacheline1);
}
}

I believe if you use constant value in the call for the cacheline variable then 
the extra code should be optimized out. If not then what about a macro instead.

#define rte_mbuf_prefetch(m, c) \
do { \
if ((c) == MBUF_CACHELINE0) \
rte_prefetch0(&(m)->cacheline0); \
else if ((c) == MBUF_CACHELINE1) \
rte_prefetch0(&(m)->cacheline1); \
else { \
rte_prefetch0(&(m)->cacheline0); \
rte_prefetch0(&(m)->cacheline1); \
} \
} while((0))

Call like this:
rte_mbuf_prefetch(m, 0);// For cacheline 0
rte_mbuf_prefetch(m, 1);// For cacheline 1
rte_mbuf_prefetch(m, 2);// For cacheline 0 and 1

We could have another routine:
rte_mbuf_prefetch_data(m, 0);   // Prefetch the first cacheline of the 
packet data.

Just a thought and I did not test the above code, so I hope it works that way. 
I noticed something like this in the linux spinlock code a few years ago.



>+
>+
> static inline uint16_t rte_pktmbuf_priv_size(struct rte_mempool *mp);
> 
> /**
>-- 
>2.8.0.rc3
>
>


Regards,
Keith

[dpdk-dev] Fwd: Rx Error when two pktgen sending packet to each other at the same time on Intel 82599ES 10G

2016-05-06 Thread Wiles, Keith

>Hi,
>
>I am doing experiments about packet classification algorithm and I found I
>always get RX Error when the throughput is too high, so I did the following
>experiments.
>
>There are two PC servers (A and B), each of them has a Intel 82599ES 10G
>with two ports(1 and 2). And they are connected to each other. I simply run
>pktgen on both server.
>
>When I start one of the server and let it generate 10Gb/s traffic each
>port, which is start A1 and A2. I can receive the 10Gb/s traffic on the
>other server's every port.
>
>When I start one port of the two servers and let them generate 10Gb/s
>traffic to each other, which is start A1 and B1. Both the port show that
>they can receive 10Gb/s and send 10Gb/s traffic.
>
>But When I start both port on both server, which is start A1, A2, B1, B2,
>each port shows it can generate 10Gb/s but it can only receive 6.7Gb/s
>traffic and the left 3.3Gb/s are considered RX Error.
>
>When I stop one of the ports, which is start A1, B1, B2, On server A I
>receive 8.1Gb/s on A1, 8.4Gb/s on A2, while A1 is sending 10Gb/s traffic,
>the traffic left is considered RX Error. I receive 10Gb/s traffic on B1,
>sending 10Gb/s traffic on B1 and B2.
>
>My parameter is -c 0xff -n 4 -- -P -m "[1:2].0, [3:4].1", but it won't
>change when I assign more lcore on RX queue.

Are the cores 1, 2, 3, 4 on different physical cores each or do they share a 
physical core?

One of the issues is the PCI bus on some NICs has a bottle neck on the bus and 
limits the amount of traffic for a given NIC. This could be your problem, but I 
do not know if these NICs have that problem.
>
>I'm thinking it maybe is the parameter problem, such as the Hugepage or
>others, is there any solution or advices?
>


Regards,
Keith

[dpdk-dev] DPDK cpu layout script won't work on non x86

2016-05-06 Thread Wiles, Keith

>The format and layout of /proc/cpuinfo changes on different architectures,
>therefore the DPDK cpu_layout.py tool doesn't work.
>
>Could you please change the tool to use sysfs 
>/sys/devices/system/cpu/cpuN/topology/
>instead? Then the script would be portable and use the same files as the DPDK
>use internally when mapping out cores.

I will look into it and see what I can do. If someone else wants to try please 
let me know.
>
>


Regards,
Keith

[dpdk-dev] Question about memzone failure problem.

2016-04-22 Thread Wiles, Keith


I was creating a rte_ring and got a message ?RING: Cannot reserve memory? I 
track it down to me trying create a ring using the same name as another ring. 
The question is I tracked down the problem and in the eal_common_memzone.c file 
I noticed the message for the problem was using DEBUG in the log message and I 
think it should be ERR. Should it be ERR instead of DEBUG, if so I will send a 
patch?

diff --git a/lib/librte_eal/common/eal_common_memzone.c 
b/lib/librte_eal/common/eal_common_memzone.c
index 711c845..89ffc12 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -142,7 +142,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, 
size_t len,

/* zone already exist */
if ((memzone_lookup_thread_unsafe(name)) != NULL) {
-   RTE_LOG(DEBUG, EAL, "%s(): memzone <%s> already exists\n",
+   RTE_LOG(ERR, EAL, "%s(): memzone <%s> already exists\n",
__func__, name);
rte_errno = EEXIST;
return NULL;


Regards,
Keith

[dpdk-dev] [PATCH] Rate as a decimal number

2016-04-21 Thread Wiles, Keith

>Currently Pktgen does not accept a decimal number for the rate. This patch 
>makes possible to set a decimal number as a rate.
>
>Signed-off-by: I?aki Murillo Arroyo 
>---
> app/cmd-functions.c|   71 +-
> app/cmd-functions.h|   38 +
> app/lpktgenlib.c   |2 +-
> app/pktgen-cmds.c  |   10 +-
> app/pktgen-cmds.c.orig | 2622 
> app/pktgen-cmds.h  |2 +-
> app/pktgen-port-cfg.h  |2 +-
> app/pktgen.c   |2 +-
> 8 files changed, 2737 insertions(+), 12 deletions(-)
> create mode 100644 app/pktgen-cmds.c.orig
>
>diff --git a/app/cmd-functions.c b/app/cmd-functions.c
>index b2fda7c..a009907 100644
>--- a/app/cmd-functions.c
>+++ b/app/cmd-functions.c
>@@ -1805,7 +1805,7 @@ struct cmd_set_result {
>   cmdline_fixed_string_t set;
>   cmdline_portlist_t portlist;
>   cmdline_fixed_string_t what;
>-  uint32_t value;
>+  float value;
> };
> 



Looks like you checked in the pktgen-cmds.c.orig file and it shows up here are 
the complete file. Can you remove it and resend the patch again with Pktgen in 
the subject and called v2 version.

Thanks

>   rate = 1;
>diff --git a/app/pktgen-cmds.c.orig b/app/pktgen-cmds.c.orig
>new file mode 100644
>index 000..83ba230
>--- /dev/null
>+++ b/app/pktgen-cmds.c.orig
>@@ -0,0 +1,2622 @@
>+/*-
>+ * Copyright (c) <2010>, Intel Corporation
>+ * All rights reserved.
>+ *
>+ * Redistribution and use in source and binary forms, with or without
>+ * modification, are permitted provided that the following conditions
>+ * are met:
>+ *
>+ * - Redistributions of source code must retain the above copyright
>+ *   notice, this list of conditions and the following disclaimer.
>+ *
>+ * - Redistributions in binary form must reproduce the above copyright
>+ *   notice, this list of conditions and the following disclaimer in
>+ *   the documentation and/or other materials provided with the
>+ *   distribution.
>+ *
>+ * - Neither the name of Intel Corporation nor the names of its
>+ *   contributors may be used to endorse or promote products derived
>+ *   from this software without specific prior written permission.
>+ *
>+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
>+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>+ * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
>+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
>+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
>+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
>+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
>+ * OF THE POSSIBILITY OF SUCH DAMAGE.
>+ */


>
>diff --git a/app/pktgen-cmds.h b/app/pktgen-cmds.h
>index 683abc3..1c50e59 100644
>--- a/app/pktgen-cmds.h
>+++ b/app/pktgen-cmds.h
>@@ -135,7 +135,7 @@ extern void pktgen_set_pkt_size(port_info_t *info, 
>uint32_t size);
> extern void pktgen_set_port_value(port_info_t *info,
>   char type,
>   uint32_t portValue);
>-extern void pktgen_set_tx_rate(port_info_t *info, uint32_t rate);
>+extern void pktgen_set_tx_rate(port_info_t *info, float rate);
> extern void pktgen_set_ipaddr(port_info_t *info, char type,
>   cmdline_ipaddr_t *ip);
> extern void pktgen_set_dst_mac(port_info_t *info, cmdline_etheraddr_t *mac);
>diff --git a/app/pktgen-port-cfg.h b/app/pktgen-port-cfg.h
>index bd5876c..a20286d 100644
>--- a/app/pktgen-port-cfg.h
>+++ b/app/pktgen-port-cfg.h
>@@ -193,7 +193,7 @@ typedef struct port_info_s {
>   uint16_t pid;   /**< Port ID value */
>   uint16_t tx_burst;  /**< Number of TX burst packets */
>   uint8_t pad0;
>-  uint8_t tx_rate;/**< Percentage rate for tx packets */
>+  float tx_rate;  /**< Percentage rate for tx packets */
>   rte_atomic32_t port_flags;  /**< Special send flags for ARP and 
> other */
> 
>   rte_atomic64_t transmit_count;  /**< Packets to transmit loaded into 
> current_tx_count */
>diff --git a/app/pktgen.c b/app/pktgen.c
>index a0705ab..a64e0ca 100644
>--- a/app/pktgen.c
>+++ b/app/pktgen.c
>@@ -143,7 +143,7 @@ pktgen_packet_rate(port_info_t *info)
>   info->tx_pps= pps;
>   info->tx_cycles = ((cpp * info->tx_burst) /
>  wr_get_port_txcnt(pktgen.l2p, info->pid));
>-  info->tx_cycles -= ff[info->tx_rate / 10];
>+  info->tx_cycles -= ff[(int)info->tx_rate / 10];
> }
> 
> /**//**
>-- 
>1.9.1
>
>


Regards,
Keith

[dpdk-dev] Pktgen rate issue

2016-04-21 Thread Wiles, Keith

>Hello Keith,
>
>I have done a patch to the pktgen, but I not sure how to submit it. I 
>would be very grateful if you give me some advice on how to contribute.

Please have a look at the DPDK.org page on development : http://dpdk.org/dev

You could send me the patch on the current Pktgen and I can review the changes.

>
>Thank you,
>
> I?aki Murillo
>
>El 12/04/16 a las 14:55, Wiles, Keith escribi?:
>>> Hello
>>>
>>> I have been using pktgen for a while and I released that there is no
>>> possibility to set a rate between  two whole numbers.
>>>
>>>   I looked up the source code and I found out that the rate is stored in
>>> a uint8_t. So, I made a quick-and-dirty change just to check if it is
>>> possible to get a speed between two hole numbers. It seemed to work
>>> properly.
>>>
>>>   Is there a reason to use an uint8_t instead of a float, for example?
>> The uint8_t was easier then using floats :-)
>>
>> If you have a patch for this change please send it along and I will look at 
>> adding the support.
>>
>>> Thank you,
>>>
>>> I?aki Murillo
>>>
>>
>> Regards,
>> Keith
>>
>>
>>
>>
>
>


Regards,
Keith

[dpdk-dev] perfomance of rte_lpm rule subsystem

2016-04-20 Thread Wiles, Keith

>I just realizied that my patch could be confusing. I want to emphasize that it 
>contains two completly different and independent set of changes. One is new 
>rule subsystem and the other is 64 bit next hop. Maybe I should've prepared a 
>patch with only rule changes, but I wanted to discuss fist and if there would 
>be interest spend some time and make the final patch containing only rule 
>changes.

Normally if you have two or more distinct changes then you need split them up 
into different patch sets. Each change needs to be a complete patch and 
independent unless you have a order issue with the patches.

> 
>
>Please ignore the next hop changes. They have nothing to do with rule 
>subsystem changes. The rule changes don't need new next hop and I don't want 
>64 bit next hop to be part of dpdk.
>
>>> Hi.
>>> 
>>> Doing some test with rte_lpm (adding/deleting bgp full table rules) I
>>> noticed that
>>> rule subsystem is very slow even considering that probably it was never
>>> designed for using
>>> in a data forwarding plane. So I want to propose some changes to the "rule"
>>> subsystem.
>>> 
>>> I reimplemented rule part ot the lib using rte_hash, and perfomance of
>>> adding/deleted routes have increased dramatically.
>>> If increasing speed of adding deleting routes makes sence for anybody else
>>> I would like to discuss my patch.
>>> The patch also include changes that make next_hop 64 bit, so please just
>>> ignore them. The rule changes are in the following
>>> functions only:
>>> 
>>> rte_lpm2_create
>>> 
>>> rule_find
>>> rule_add
>>> rule_delete
>>> find_previous_rule
>>> delete_depth_small
>>> delete_depth_big
>>> 
>>> rte_lpm2_add
>>> rte_lpm2_delete
>>> rte_lpm2_is_rule_present
>>> rte_lpm2_delete_all
>


Regards,
Keith

[dpdk-dev] [PATCH v2 1/1] cmdline: add any multi string mode to token string

2016-04-15 Thread Wiles, Keith

>While parsing token string there may be several modes:
>- fixed single string
>- multi-choice single string
>- any single string
>
>This patch add one more mode - any multi string.
>
>Signed-off-by: Piotr Azarewicz 

Does this patch also require updates to the documentation?
>

Regards,
Keith

[dpdk-dev] [PATCH PktGen/PCAP] let multi tx queues with pcap work

2016-04-15 Thread Wiles, Keith

Thank you.

>Currently, dpdk-pktgen will panic when assigning multi tx queues with pcap
>packet, for example:
>./app/app/build/pktgen  -c f -n 2 -- -P -m "[0:1-3].0"  -s 0:pcap/large.pcap
>
>This patch fix this bug by assigning qid to memory pool name.
>
>Signed-off-by: Zhouyi Zhou 
>---
> app/pktgen-pcap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/app/pktgen-pcap.c b/app/pktgen-pcap.c
>index 1af0a53..960294c 100644
>--- a/app/pktgen-pcap.c
>+++ b/app/pktgen-pcap.c
>@@ -337,7 +337,7 @@ pktgen_pcap_parse(pcap_info_t *pcap, port_info_t *info, 
>unsigned qid)
> 
>   wr_pcap_rewind(pcap);
> 
>-  snprintf(name, sizeof(name), "%-12s%d:%d", "PCAP TX", info->pid, 0);
>+  snprintf(name, sizeof(name), "%-12s%d:%d", "PCAP TX", info->pid, qid);
>   rte_printf_status("Process: %-*s ", 18, name);
> 
>   pkt_sizes = elt_count = i = 0;
>-- 
>1.9.1
>
>
>


Regards,
Keith

[dpdk-dev] [PATCH 26/36] mempool: introduce a function to create an empty mempool

2016-04-15 Thread Wiles, Keith

>
>
>On 04/14/2016 05:57 PM, Wiles, Keith wrote:
>>> Introduce a new function rte_mempool_create_empty()
>>> that allocates a mempool that is not populated.
>>>
>>> The functions rte_mempool_create() and rte_mempool_xmem_create()
>>> now make use of it, making their code much easier to read.
>>> Currently, they are the only users of rte_mempool_create_empty()
>>> but the function will be made public in next commits.
>>>
>>> Signed-off-by: Olivier Matz 
>>> +/* create an empty mempool */
>>> +static struct rte_mempool *
>>> +rte_mempool_create_empty(const char *name, unsigned n, unsigned elt_size,
>>> +   unsigned cache_size, unsigned private_data_size,
>>> +   int socket_id, unsigned flags)
>>> {
>> 
>> When two processes need to use the same mempool, do we have a race condition 
>> with one doing a rte_mempool_create_empty() and the other process tries to 
>> use it when it finds that mempool before being fully initialized by the 
>> first process?
>> 
>
>I'm not an expert of the dpdk multiprocess model. But I would
>say that there are a lot of possible race conditions like this
>(ex: a port is created but not started), and I assume that
>applications doing multiprocess have their synchronization.
>
>If we really want a solution in mempool, we could:
>
>- remove the TAILQ_INSERT_TAIL() from rte_mempool_create()
>- create a new function rte_mempool_share() that adds the
>  mempool in the tailq for multiprocess. This function would
>  be called at the end of rte_mempool_create(), or by the
>  user if using rte_mempool_create_empty().
>
>I may be mistaking but I don't feel it's really required. Any
>comment from a multiprocess expert is welcome though.

Yes, I agree we should have the developer handle the multiprocess 
synchronization. The only thing I think we can do is provide a sync point API, 
but that is all I can think of to do.

Maybe instead of adding a fix for each place in DPDK, we require the developer 
to add the sync up when he has done all of the inits in his code or we provide 
one. Maybe a minor issue and we can ignore my comments for now.
>
>
>Regards,
>Olivier
>


Regards,
Keith

[dpdk-dev] [PATCH 10/36] mempool: use the list to iterate the mempool elements

2016-04-15 Thread Wiles, Keith

>Hi,
>
>On 04/14/2016 05:33 PM, Wiles, Keith wrote:
>>>
>>> static void
>>> -txq_mp2mr_mbuf_check(void *arg, void *start, void *end,
>>> -uint32_t index __rte_unused)
>>> +txq_mp2mr_mbuf_check(struct rte_mempool *mp, void *arg, void *obj,
>>> +   __rte_unused uint32_t index)
>> 
>> I have seen this use of __rte_unused or attributes attached to variables and 
>> structures in couple different ways.
>> 
>> I have seen the placement of the attribute after and before the variable I 
>> prefer the attribute to be after, but could adapt I hope.
>> Do we have a rule about where the attribute is put in this case and others. 
>> I have seen the attributes for structures are always at the end of the 
>> structure, which is some cases it may not compile in other places.
>> 
>> I would like to suggest we place the attributes at the end of structure e.g. 
>> __rte_cached_aligned and I would like to see the __rte_unused after the 
>> variable as a style in the code.
>
>I agree the __rte_unused shouldn't have moved in this patch.
>
>About the default place of attributes, I've seen an exemple where
>putting it at the end didn't what I expected:
>
>   struct {
>   int foo;
>   } __rte_cache_aligned;
>
>The expected behavior is to define a structure which is cache aligned.
>But if "rte_memory.h" is not included, it will define a structure which
>is not cache aligned, and a global variable called __rte_cache_aligned,
>without any compiler warning.
>
>The gcc doc gives some hints about where to place the attributes:
>https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html

Then I suggest we start using the suggested syntax in the above link.

I did a quick read of the text and did not see where the above unused would be 
placed, did someone find it?

>
>Given its length, I'm not sure the dpdk coding rules should contain
>anything more than "refer to gcc documentation" ;)
>
>Regards,
>Olivier
>


Regards,
Keith

[dpdk-dev] [PATCH 26/36] mempool: introduce a function to create an empty mempool

2016-04-14 Thread Wiles, Keith

>Introduce a new function rte_mempool_create_empty()
>that allocates a mempool that is not populated.
>
>The functions rte_mempool_create() and rte_mempool_xmem_create()
>now make use of it, making their code much easier to read.
>Currently, they are the only users of rte_mempool_create_empty()
>but the function will be made public in next commits.
>
>Signed-off-by: Olivier Matz 
>+/* create an empty mempool */
>+static struct rte_mempool *
>+rte_mempool_create_empty(const char *name, unsigned n, unsigned elt_size,
>+  unsigned cache_size, unsigned private_data_size,
>+  int socket_id, unsigned flags)
> {

When two processes need to use the same mempool, do we have a race condition 
with one doing a rte_mempool_create_empty() and the other process tries to use 
it when it finds that mempool before being fully initialized by the first 
process?

Regards,
Keith

[dpdk-dev] [PATCH 10/36] mempool: use the list to iterate the mempool elements

2016-04-14 Thread Wiles, Keith

>
> static void
>-txq_mp2mr_mbuf_check(void *arg, void *start, void *end,
>-   uint32_t index __rte_unused)
>+txq_mp2mr_mbuf_check(struct rte_mempool *mp, void *arg, void *obj,
>+  __rte_unused uint32_t index)

I have seen this use of __rte_unused or attributes attached to variables and 
structures in couple different ways.

I have seen the placement of the attribute after and before the variable I 
prefer the attribute to be after, but could adapt I hope.
Do we have a rule about where the attribute is put in this case and others. I 
have seen the attributes for structures are always at the end of the structure, 
which is some cases it may not compile in other places.

I would like to suggest we place the attributes at the end of structure e.g. 
__rte_cached_aligned and I would like to see the __rte_unused after the 
variable as a style in the code.

Thanks

> {
>   struct txq_mp2mr_mbuf_check_data *data = arg;
>-  struct rte_mbuf *buf =
>-  (void *)((uintptr_t)start + data->mp->header_size);
>
Regards,
Keith

[dpdk-dev] [PATCH 02/36] mempool: replace elt_size by total_elt_size

2016-04-14 Thread Wiles, Keith

>In some mempool functions, we use the size of the elements as arguments or in
>variables. There is a confusion between the size including or not including
>the header and trailer.
>
>To avoid this confusion:
>- update the API documentation
>- rename the variables and argument names as "elt_size" when the size does not
>  include the header and trailer, or else as "total_elt_size".
>
>Signed-off-by: Olivier Matz 

Acked by: Keith Wiles 
>---
> lib/librte_mempool/rte_mempool.c | 21 +++--
> lib/librte_mempool/rte_mempool.h | 19 +++
> 2 files changed, 22 insertions(+), 18 deletions(-)
>
>diff --git a/lib/librte_mempool/rte_mempool.c 
>b/lib/librte_mempool/rte_mempool.c
>index ce78476..90b5b1b 100644
>--- a/lib/librte_mempool/rte_mempool.c
>+++ b/lib/librte_mempool/rte_mempool.c
>@@ -156,13 +156,13 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
>uint32_t obj_idx,
>  *
>  * Given the pointer to the memory, and its topology in physical memory
>  * (the physical addresses table), iterate through the "elt_num" objects
>- * of size "total_elt_sz" aligned at "align". For each object in this memory
>+ * of size "elt_sz" aligned at "align". For each object in this memory
>  * chunk, invoke a callback. It returns the effective number of objects
>  * in this memory. */
> uint32_t
>-rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t elt_sz, size_t 
>align,
>-  const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
>-  rte_mempool_obj_iter_t obj_iter, void *obj_iter_arg)
>+rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
>+  size_t align, const phys_addr_t paddr[], uint32_t pg_num,
>+  uint32_t pg_shift, rte_mempool_obj_iter_t obj_iter, void *obj_iter_arg)
> {
>   uint32_t i, j, k;
>   uint32_t pgn, pgf;
>@@ -178,7 +178,7 @@ rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t 
>elt_sz, size_t align,
>   while (i != elt_num && j != pg_num) {
> 
>   start = RTE_ALIGN_CEIL(va, align);
>-  end = start + elt_sz;
>+  end = start + total_elt_sz;
> 
>   /* index of the first page for the next element. */
>   pgf = (end >> pg_shift) - (start >> pg_shift);
>@@ -255,6 +255,7 @@ mempool_populate(struct rte_mempool *mp, size_t num, 
>size_t align,
>   mempool_obj_populate, );
> }
> 
>+/* get the header, trailer and total size of a mempool element. */
> uint32_t
> rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
>   struct rte_mempool_objsz *sz)
>@@ -332,17 +333,17 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
>flags,
>  * Calculate maximum amount of memory required to store given number of 
> objects.
>  */
> size_t
>-rte_mempool_xmem_size(uint32_t elt_num, size_t elt_sz, uint32_t pg_shift)
>+rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t 
>pg_shift)
> {
>   size_t n, pg_num, pg_sz, sz;
> 
>   pg_sz = (size_t)1 << pg_shift;
> 
>-  if ((n = pg_sz / elt_sz) > 0) {
>+  if ((n = pg_sz / total_elt_sz) > 0) {
>   pg_num = (elt_num + n - 1) / n;
>   sz = pg_num << pg_shift;
>   } else {
>-  sz = RTE_ALIGN_CEIL(elt_sz, pg_sz) * elt_num;
>+  sz = RTE_ALIGN_CEIL(total_elt_sz, pg_sz) * elt_num;
>   }
> 
>   return sz;
>@@ -362,7 +363,7 @@ mempool_lelem_iter(void *arg, __rte_unused void *start, 
>void *end,
>  * given memory footprint to store required number of elements.
>  */
> ssize_t
>-rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t elt_sz,
>+rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
>   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
> {
>   uint32_t n;
>@@ -373,7 +374,7 @@ rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, 
>size_t elt_sz,
>   va = (uintptr_t)vaddr;
>   uv = va;
> 
>-  if ((n = rte_mempool_obj_iter(vaddr, elt_num, elt_sz, 1,
>+  if ((n = rte_mempool_obj_iter(vaddr, elt_num, total_elt_sz, 1,
>   paddr, pg_num, pg_shift, mempool_lelem_iter,
>   )) != elt_num) {
>   return -(ssize_t)n;
>diff --git a/lib/librte_mempool/rte_mempool.h 
>b/lib/librte_mempool/rte_mempool.h
>index bd78df5..ca4657f 100644
>--- a/lib/librte_mempool/rte_mempool.h
>+++ b/lib/librte_mempool/rte_mempool.h
>@@ -1289,7 +1289,7 @@ struct rte_mempool *rte_mempool_lookup(const char *name);
>  * calculates header, trailer, body and total sizes of the mempool object.
>  *
>  * @param elt_size
>- *   The size of each element.
>+ *   The size of each element, without header and trailer.
>  * @param flags
>  *   The flags used for the mempool creation.
>  *   Consult rte_mempool_create() for more information about possible values.
>@@ -1315,14 +1315,15 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, 
>uint32_t flags,
>  *
>  * @param elt_num
>  *   Number of elements.
>- * @param elt_sz
>- *   The size of

[dpdk-dev] [PATCH 01/36] mempool: fix comments and style

2016-04-14 Thread Wiles, Keith

>No functional change, just fix some comments and styling issues.
>Also avoid to duplicate comments between rte_mempool_create()
>and rte_mempool_xmem_create().
>
>Signed-off-by: Olivier Matz 

Acked by: Keith Wiles 
>---
> lib/librte_mempool/rte_mempool.c | 17 +---
> lib/librte_mempool/rte_mempool.h | 59 +---
> 2 files changed, 26 insertions(+), 50 deletions(-)
>
>diff --git a/lib/librte_mempool/rte_mempool.c 
>b/lib/librte_mempool/rte_mempool.c
>index 7a0e07e..ce78476 100644
>--- a/lib/librte_mempool/rte_mempool.c
>+++ b/lib/librte_mempool/rte_mempool.c
>@@ -152,6 +152,13 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
>uint32_t obj_idx,
>   rte_ring_sp_enqueue(mp->ring, obj);
> }
> 
>+/* Iterate through objects at the given address
>+ *
>+ * Given the pointer to the memory, and its topology in physical memory
>+ * (the physical addresses table), iterate through the "elt_num" objects
>+ * of size "total_elt_sz" aligned at "align". For each object in this memory
>+ * chunk, invoke a callback. It returns the effective number of objects
>+ * in this memory. */
> uint32_t
> rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t elt_sz, size_t 
> align,
>   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
>@@ -341,10 +348,8 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t elt_sz, 
>uint32_t pg_shift)
>   return sz;
> }
> 
>-/*
>- * Calculate how much memory would be actually required with the
>- * given memory footprint to store required number of elements.
>- */
>+/* Callback used by rte_mempool_xmem_usage(): it sets the opaque
>+ * argument to the end of the object. */
> static void
> mempool_lelem_iter(void *arg, __rte_unused void *start, void *end,
>   __rte_unused uint32_t idx)
>@@ -352,6 +357,10 @@ mempool_lelem_iter(void *arg, __rte_unused void *start, 
>void *end,
>   *(uintptr_t *)arg = (uintptr_t)end;
> }
> 
>+/*
>+ * Calculate how much memory would be actually required with the
>+ * given memory footprint to store required number of elements.
>+ */
> ssize_t
> rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t elt_sz,
>   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
>diff --git a/lib/librte_mempool/rte_mempool.h 
>b/lib/librte_mempool/rte_mempool.h
>index 8595e77..bd78df5 100644
>--- a/lib/librte_mempool/rte_mempool.h
>+++ b/lib/librte_mempool/rte_mempool.h
>@@ -214,7 +214,7 @@ struct rte_mempool {
> 
> }  __rte_cache_aligned;
> 
>-#define MEMPOOL_F_NO_SPREAD  0x0001 /**< Do not spread in memory. */
>+#define MEMPOOL_F_NO_SPREAD  0x0001 /**< Do not spread among memory 
>channels. */
> #define MEMPOOL_F_NO_CACHE_ALIGN 0x0002 /**< Do not align objs on cache 
> lines.*/
> #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is 
> "single-producer".*/
> #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
> "single-consumer".*/
>@@ -270,7 +270,8 @@ struct rte_mempool {
> /* return the header of a mempool object (internal) */
> static inline struct rte_mempool_objhdr *__mempool_get_header(void *obj)
> {
>-  return (struct rte_mempool_objhdr *)RTE_PTR_SUB(obj, sizeof(struct 
>rte_mempool_objhdr));
>+  return (struct rte_mempool_objhdr *)RTE_PTR_SUB(obj,
>+  sizeof(struct rte_mempool_objhdr));
> }
> 
> /**
>@@ -544,8 +545,9 @@ rte_mempool_create(const char *name, unsigned n, unsigned 
>elt_size,
> /**
>  * Create a new mempool named *name* in memory.
>  *
>- * This function uses ``memzone_reserve()`` to allocate memory. The
>- * pool contains n elements of elt_size. Its size is set to n.
>+ * The pool contains n elements of elt_size. Its size is set to n.
>+ * This function uses ``memzone_reserve()`` to allocate the mempool header
>+ * (and the objects if vaddr is NULL).
>  * Depending on the input parameters, mempool elements can be either allocated
>  * together with the mempool header, or an externally provided memory buffer
>  * could be used to store mempool objects. In later case, that external
>@@ -560,18 +562,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned 
>elt_size,
>  * @param elt_size
>  *   The size of each element.
>  * @param cache_size
>- *   If cache_size is non-zero, the rte_mempool library will try to
>- *   limit the accesses to the common lockless pool, by maintaining a
>- *   per-lcore object cache. This argument must be lower or equal to
>- *   CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE. It is advised to choose
>- *   cache_size to have "n modulo cache_size == 0": if this is
>- *   not the case, some elements will always stay in the pool and will
>- *   never be used. The access to the per-lcore table is of course
>- *   faster than the multi-producer/consumer pool. The cache can be
>- *   disabled if the cache_size argument is set to 0; it can be useful to
>- *   avoid losing objects in cache. Note that even if not used, the
>- *   memory space for cache is always reserved in a mempool structure,
>- *   except if

[dpdk-dev] [PATCH 00/36] mempool: rework memory allocation

2016-04-14 Thread Wiles, Keith

>
>
>On 04/14/2016 03:50 PM, Wiles, Keith wrote:
>>> This series is a rework of mempool. For those who don't want to read
>>> all the cover letter, here is a sumary:
>>>
>>> - it is not possible to allocate large mempools if there is not enough
>>>   contiguous memory, this series solves this issue
>>> - introduce new APIs with less arguments: "create, populate, obj_init"
>>> - allow to free a mempool
>>> - split code in smaller functions, will ease the introduction of ext_handler
>>> - remove test-pmd anonymous mempool creation
>>> - remove most of dom0-specific mempool code
>>> - opens the door for a eal_memory rework: we probably don't need large
>>>   contiguous memory area anymore, working with pages would work.
>>>
>>> This breaks the ABI as it was indicated in the deprecation for 16.04.
>>> The API stays almost the same, no modification is needed in examples app
>>> or in test-pmd. Only kni and mellanox drivers are slightly modified.
>>>
>>> This patch applies on top of 16.04 + v5 of Keith's patch:
>>> "mempool: reduce rte_mempool structure size"
>>
>> I have not digested this complete patch yet, but this one popped out at me 
>> as the External Memory Manager support is setting in the wings for 16.07 
>> release. If this causes the EMM patch to be rewritten or updated that seems 
>> like a problem to me. Does this patch add the External Memory Manager 
>> support?
>> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32015/focus=35107
>
>I've reworked the series you are referring to, and rebased it on top
>of this series. Please see:
>http://dpdk.org/ml/archives/dev/2016-April/037509.html

Thanks I just saw that update :-)

>
>Regards,
>Olivier
>


Regards,
Keith

[dpdk-dev] [PATCH v5] mempool: reduce rte_mempool structure size

2016-04-14 Thread Wiles, Keith

>From: Keith Wiles 
>
>The rte_mempool structure is changed, which will cause an ABI change
>for this structure. Providing backward compat is not reasonable
>here as this structure is used in multiple defines/inlines.
>
>Allow mempool cache support to be dynamic depending on if the
>mempool being created needs cache support. Saves about 1.5M of
>memory used by the rte_mempool structure.
>
>Allocating small mempools which do not require cache can consume
>larges amounts of memory if you have a number of these mempools.
>
>Change to be effective in release 16.07.
>
>Signed-off-by: Keith Wiles 
>Acked-by: Olivier Matz 

For the change to this patch:
Acked-by: Keith Wiles 

>---
>
>Changes in v5:
>
>- use RTE_PTR_ADD() instead of cast to (char *) to fix compilation on tilera.
>  Error log was:
>
>  rte_mempool.c: In function ?rte_mempool_xmem_create?:
>  rte_mempool.c:595: error: cast increases required alignment of target type
>
>
> app/test/test_mempool.c  |  4 +--
> lib/librte_mempool/rte_mempool.c | 55 ++--
> lib/librte_mempool/rte_mempool.h | 29 ++---
> 3 files changed, 40 insertions(+), 48 deletions(-)
>
>diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
>index f0f823b..10e1fa4 100644
>--- a/app/test/test_mempool.c
>+++ b/app/test/test_mempool.c
>@@ -122,8 +122,8 @@ test_mempool_basic(void)
>   return -1;
> 
>   printf("get private data\n");
>-  if (rte_mempool_get_priv(mp) !=
>-  (char*) mp + MEMPOOL_HEADER_SIZE(mp, mp->pg_num))
>+  if (rte_mempool_get_priv(mp) != (char *)mp +
>+  MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))
>   return -1;
> 
>   printf("get physical address of an object\n");
>diff --git a/lib/librte_mempool/rte_mempool.c 
>b/lib/librte_mempool/rte_mempool.c
>index f8781e1..7a0e07e 100644
>--- a/lib/librte_mempool/rte_mempool.c
>+++ b/lib/librte_mempool/rte_mempool.c
>@@ -452,12 +452,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   /* compilation-time checks */
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
> RTE_CACHE_LINE_MASK) != 0);
>-#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_cache) &
> RTE_CACHE_LINE_MASK) != 0);
>-  RTE_BUILD_BUG_ON((offsetof(struct rte_mempool, local_cache) &
>-RTE_CACHE_LINE_MASK) != 0);
>-#endif
> #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_debug_stats) &
> RTE_CACHE_LINE_MASK) != 0);
>@@ -527,9 +523,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>*/
>   int head = sizeof(struct rte_mempool);
>   int new_size = (private_data_size + head) % page_size;
>-  if (new_size) {
>+  if (new_size)
>   private_data_size += page_size - new_size;
>-  }
>   }
> 
>   /* try to allocate tailq entry */
>@@ -544,7 +539,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>* store mempool objects. Otherwise reserve a memzone that is large
>* enough to hold mempool header and metadata plus mempool objects.
>*/
>-  mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num) + private_data_size;
>+  mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size);
>+  mempool_size += private_data_size;
>   mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
>   if (vaddr == NULL)
>   mempool_size += (size_t)objsz.total_size * n;
>@@ -591,8 +587,15 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   mp->cache_flushthresh = CALC_CACHE_FLUSHTHRESH(cache_size);
>   mp->private_data_size = private_data_size;
> 
>+  /*
>+   * local_cache pointer is set even if cache_size is zero.
>+   * The local_cache points to just past the elt_pa[] array.
>+   */
>+  mp->local_cache = (struct rte_mempool_cache *)
>+  RTE_PTR_ADD(mp, MEMPOOL_HEADER_SIZE(mp, pg_num, 0));
>+
>   /* calculate address of the first element for continuous mempool. */
>-  obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num) +
>+  obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size) +
>   private_data_size;
>   obj = RTE_PTR_ALIGN_CEIL(obj, RTE_MEMPOOL_ALIGN);
> 
>@@ -606,9 +609,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   mp->elt_va_start = (uintptr_t)obj;
>   mp->elt_pa[0] = mp->phys_addr +
>   (mp->elt_va_start - (uintptr_t)mp);
>-
>-  /* mempool elements in a separate chunk of memory. */
>   } else {
>+  /* mempool elements in a separate chunk of memory. */
>   mp->elt_va_start = (uintptr_t)vaddr;
>

[dpdk-dev] [PATCH v5] mempool: reduce rte_mempool structure size

2016-04-14 Thread Wiles, Keith

>From: Keith Wiles 
>
>The rte_mempool structure is changed, which will cause an ABI change
>for this structure. Providing backward compat is not reasonable
>here as this structure is used in multiple defines/inlines.
>
>Allow mempool cache support to be dynamic depending on if the
>mempool being created needs cache support. Saves about 1.5M of
>memory used by the rte_mempool structure.
>
>Allocating small mempools which do not require cache can consume
>larges amounts of memory if you have a number of these mempools.
>
>Change to be effective in release 16.07.
>
>Signed-off-by: Keith Wiles 
>Acked-by: Olivier Matz 
>---
>
>Changes in v5:
>
>- use RTE_PTR_ADD() instead of cast to (char *) to fix compilation on tilera.
>  Error log was:
>
>  rte_mempool.c: In function ?rte_mempool_xmem_create?:
>  rte_mempool.c:595: error: cast increases required alignment of target type
>
>
> app/test/test_mempool.c  |  4 +--
> lib/librte_mempool/rte_mempool.c | 55 ++--
> lib/librte_mempool/rte_mempool.h | 29 ++---
> 3 files changed, 40 insertions(+), 48 deletions(-)
>
>diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
>index f0f823b..10e1fa4 100644
>--- a/app/test/test_mempool.c
>+++ b/app/test/test_mempool.c
>@@ -122,8 +122,8 @@ test_mempool_basic(void)
>   return -1;
> 
>   printf("get private data\n");
>-  if (rte_mempool_get_priv(mp) !=
>-  (char*) mp + MEMPOOL_HEADER_SIZE(mp, mp->pg_num))
>+  if (rte_mempool_get_priv(mp) != (char *)mp +
>+  MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))

Should we not add the RTE_PTR_ADD() here as well?

>   return -1;
> 
>   printf("get physical address of an object\n");
>diff --git a/lib/librte_mempool/rte_mempool.c 
>b/lib/librte_mempool/rte_mempool.c
>index f8781e1..7a0e07e 100644
>--- a/lib/librte_mempool/rte_mempool.c
>+++ b/lib/librte_mempool/rte_mempool.c
>@@ -452,12 +452,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   /* compilation-time checks */
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
> RTE_CACHE_LINE_MASK) != 0);
>-#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_cache) &
> RTE_CACHE_LINE_MASK) != 0);
>-  RTE_BUILD_BUG_ON((offsetof(struct rte_mempool, local_cache) &
>-RTE_CACHE_LINE_MASK) != 0);
>-#endif
> #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_debug_stats) &
> RTE_CACHE_LINE_MASK) != 0);
>@@ -527,9 +523,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>*/
>   int head = sizeof(struct rte_mempool);
>   int new_size = (private_data_size + head) % page_size;
>-  if (new_size) {
>+  if (new_size)
>   private_data_size += page_size - new_size;
>-  }
>   }
> 
>   /* try to allocate tailq entry */
>@@ -544,7 +539,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>* store mempool objects. Otherwise reserve a memzone that is large
>* enough to hold mempool header and metadata plus mempool objects.
>*/
>-  mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num) + private_data_size;
>+  mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size);
>+  mempool_size += private_data_size;
>   mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
>   if (vaddr == NULL)
>   mempool_size += (size_t)objsz.total_size * n;
>@@ -591,8 +587,15 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   mp->cache_flushthresh = CALC_CACHE_FLUSHTHRESH(cache_size);
>   mp->private_data_size = private_data_size;
> 
>+  /*
>+   * local_cache pointer is set even if cache_size is zero.
>+   * The local_cache points to just past the elt_pa[] array.
>+   */
>+  mp->local_cache = (struct rte_mempool_cache *)
>+  RTE_PTR_ADD(mp, MEMPOOL_HEADER_SIZE(mp, pg_num, 0));
>+
>   /* calculate address of the first element for continuous mempool. */
>-  obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num) +
>+  obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size) +
>   private_data_size;
>   obj = RTE_PTR_ALIGN_CEIL(obj, RTE_MEMPOOL_ALIGN);
> 
>@@ -606,9 +609,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   mp->elt_va_start = (uintptr_t)obj;
>   mp->elt_pa[0] = mp->phys_addr +
>   (mp->elt_va_start - (uintptr_t)mp);
>-
>-  /* mempool elements in a separate chunk of memory. */
>   } else {
>+  /* mempool elements in a separate chunk of memory. */
>   mp->elt_va_start = (uintptr_t)vaddr;
>

[dpdk-dev] memory allocation requirements

2016-04-13 Thread Wiles, Keith

>After looking at the patches for container support, it appears that
>some changes are needed in the memory management:
>http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788
>
>I think it is time to collect what are the needs and expectations of
>the DPDK memory allocator. The goal is to satisfy every needs while
>cleaning the API.
>Here is a first try to start the discussion.
>
>The memory allocator has 2 classes of API in DPDK.
>First the user/application allows or requires DPDK to take over some
>memory resources of the system. The characteristics can be:
>   - numa node
>   - page size
>   - swappable or not
>   - contiguous (cannot be guaranteed) or not
>   - physical address (as root only)
>Then the drivers or other libraries use the memory through
>   - rte_malloc
>   - rte_memzone
>   - rte_mempool

Do not forget about rte_pktmbuf_create() which replies on rte_mempool and 
rte_memzone for high performance. We need to make sure we do not break this 
area.

We need to draw a good diagram showing the relationship to memory allocation 
and API?s at least I need something like this to help understand the bigger 
picture. Then we can start looking at how to modify memory allocation.

What I want is to reduce the primary APIs related to the complexity of the APIs 
and start using less arguments and handle the most command cases. The more 
complex memory configurations should not be hidden in the API IMO, but more 
explicit using APIs to configure the memory allocation in the case of 
rte_mempool_create(). 

>I think we can integrate the characteristics of the requested memory
>in rte_malloc. Then rte_memzone would be only a named rte_malloc.
>The rte_mempool still focus on collection of objects with cache.
>
>If a rework happens, maybe that the build options CONFIG_RTE_LIBRTE_IVSHMEM
>and CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS can be removed.
These need to be remove or at least moved to a runtime configuration.

>The Xen support should also be better integrated.

The Xen support and just external memory manager support is coming in 16.07, 
which I hope helps with the external memory management support by adding a 
better structure around how external memory is managed.

>
>Currently, the first class of API is directly implemented as command line
>parameters. Please let's think of C functions first.

I would also like to start thinking in terms of config-file read on startup 
instead of command line options. This would help create the C functions you are 
talking about IMO.

>The EAL parameters should simply wrap some API functions and let the
>applications tune the memory initialization with a well documented API.
>
>Probably that I forget some needs, e.g. for the secondary processes.
>Please comment.

This will be a big change and will effect many applications, but I hope it can 
be kept to a minimum.
>


Regards,
Keith

[dpdk-dev] Pktgen rate issue

2016-04-12 Thread Wiles, Keith

>Hello
>
>I have been using pktgen for a while and I released that there is no 
>possibility to set a rate between  two whole numbers.
>
>  I looked up the source code and I found out that the rate is stored in 
>a uint8_t. So, I made a quick-and-dirty change just to check if it is 
>possible to get a speed between two hole numbers. It seemed to work 
>properly.
>
>  Is there a reason to use an uint8_t instead of a float, for example?

The uint8_t was easier then using floats :-)

If you have a patch for this change please send it along and I will look at 
adding the support.

>
>Thank you,
>
>I?aki Murillo
>


Regards,
Keith

[dpdk-dev] DPDK libraries not compiling from source

2016-04-11 Thread Wiles, Keith

>Hi all,
>
>I tried compiling DPDK from source using setup.sh script in tools
>directory, but I get this error
>
>[32] Remove KNI module
>[33] Remove hugepage mappings
>
>[34] Exit Script

The way I compile DPDK is this way:

# cd 
# export RTE_SDK=`pwd`
# export RTE_TARGET=x86_64-native-linuxapp-gcc
# make install T=x86_64-native-linuxapp-gcc

Strictly speaking you do not need to set the two variables, but you do need 
them later for building examples or say pktgen-dpdk.
The last command will give an warning about not being able to install, but that 
is OK unless you need the files installed. Installing the files are not 
required if you use the environment variables.

Hope that gets you started. The only concern is the ?cc? command not found, 
which makes me believe you do not have a compiler installed. The Docs do talk 
about how to install the basic required tools for ubuntu, which is what I am 
using for the above commands. After getting DPDK to build you need to setup the 
rest of the system to run DPDK and that is all covered in the Docs.

>
>Option: 14
>
>/bin/sh: line 1: cc: command not found
>cp: cannot stat
>?/root/workspace/dpdk/x86_64-native-linuxapp-gcc/.config_tmp?: No such file
>or directory
>cp: cannot stat
>?/root/workspace/dpdk/x86_64-native-linuxapp-gcc/.config_tmp?: No such file
>or directory
>make[5]: Nothing to be done for `depdirs'.
>Configuration done
>/root/workspace/dpdk/mk/rte.vars.mk:81: *** RTE_ARCH is not defined.  Stop.
>make[2]: *** [all] Error 2
>make[1]: *** [pre_install] Error 2
>make: *** [install] Error 2
>--
> RTE_TARGET exported as x86_64-native-linuxapp-gcc
>--
>
>Then, I set RTE_ARCH=x86_64 manually, but it did not work
>
>please suggest appropriate way to compile DPDK libraries.
>


Regards,
Keith

1 2 3 4 >

1 - 100 of 324 matches

Mail list logo