Re: Request to map PCI device in a secondary that uses the --block for that device

2024-05-22 Thread Antonio Di Bacco
For completeness (vfio_res_list is NULL):

(gdb) bt
#0  0x7f75f074e849 in pci_vfio_map_resource_secondary
(dev=0x7f377c00) at ../drivers/bus/pci/linux/pci_vfio.c:901
#1  0x7f75f074ea7f in pci_vfio_map_resource (dev=0x7f377c00) at
../drivers/bus/pci/linux/pci_vfio.c:958
#2  0x7f75f0749eb5 in rte_pci_map_device (dev=0x7f377c00) at
../drivers/bus/pci/linux/pci.c:70
#3  0x7f75f0747e81 in rte_pci_probe_one_driver (dr=0x7f75f0397100
, dev=0x7f377c00) at ../drivers/bus/pci/pci_common.c:251
#4  0x7f75f074820a in pci_probe_all_drivers (dev=0x7f377c00) at
../drivers/bus/pci/pci_common.c:353
#5  0x7f75f07489b8 in pci_plug (dev=0x7f377c10) at
../drivers/bus/pci/pci_common.c:595
#6  0x7f75f17a0a88 in local_dev_probe (devargs=0x7f75e4000bac
":40:04.3", new_dev=0x7f75ea943868) at
../lib/eal/common/eal_common_dev.c:174
#7  0x7f75f17c8832 in __handle_primary_request
(param=0x7f75e4000b60) at ../lib/eal/common/hotplug_mp.c:242
#8  0x7f75f17ce78d in eal_alarm_callback (arg=0x0) at
../lib/eal/linux/eal_alarm.c:110
#9  0x7f75f17d38c2 in eal_intr_process_interrupts
(events=0x7f75ea943b10, nfds=1) at
../lib/eal/linux/eal_interrupts.c:1025
#10 0x7f75f17d3ba8 in eal_intr_handle_interrupts (pfd=16,
totalfds=2) at ../lib/eal/linux/eal_interrupts.c:1099
#11 0x7f75f17d3d8f in eal_intr_thread_main (arg=0x0) at
../lib/eal/linux/eal_interrupts.c:1171
#12 0x7f75f17b59c6 in ctrl_thread_init (arg=0x7f3640d0) at
../lib/eal/common/eal_common_thread.c:206
#13 0x7f75f1b5d802 in start_thread () from /lib64/libc.so.6
#14 0x7f75f1afd450 in clone3 () from /lib64/libc.so.6
(gdb) p vfio_res
$1 = (struct mapped_pci_resource *) 0x0
(gdb) p vfio_res_list
$2 = (struct mapped_pci_res_list *) 0x0
(gdb) p pci_addr
$3 = ":40:04.3", '\000' 
(gdb)

On Wed, May 22, 2024 at 7:03 PM Antonio Di Bacco  wrote:
>
> I found the exact line where I get the crash (the line saying TAILQ_FOREACH)
>
> /* if we're in a secondary process, just find our tailq entry */
> TAILQ_FOREACH(vfio_res, vfio_res_list, next) {
> if (rte_pci_addr_cmp(_res->pci_addr,
>  >addr))
> continue;
> break;
> }
> /* if we haven't found our tailq entry, something's wrong */
> if (vfio_res == NULL) {
> RTE_LOG(ERR, EAL, "%s cannot find TAILQ entry for PCI device!\n",
> pci_addr);
> return -1;
> }
>
> On Wed, May 22, 2024 at 10:06 AM Antonio Di Bacco
>  wrote:
> >
> > Thank you.
> > In my case,  I have the same block/allow but the primary does an
> > explicit probe of a DMA engine on the processor, the secondary is
> > notified and it crashes.  It should not crash I suppose.
> > The same software is running on several machines (100) but the problem
> > is sporadic just on two of them.
> >
> > Very strange.
> >
> > Thx,
> > Antonio.
> >
> > On Wed, May 22, 2024 at 9:21 AM Dmitry Kozlyuk  
> > wrote:
> > >
> > > 2024-05-22 08:46 (UTC+0200), Antonio Di Bacco:
> > > > Is it correct that a primary requests a secondary to map a device that
> > > > the secondary explicitly blocks with the --block arg ?
> > > >
> > > > IN my case this requests of mapping creates a crash in the secondary.
> > > >
> > > > Using DPDK 21.11
> > > >
> > > > Best regards,
> > > > Antonio.
> > >
> > > Hi Antonio,
> > >
> > > "Secondary processes which requires access to physical devices in Primary
> > > process, must be passed with the same allow and block options."
> > >
> > > https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html


Re: Request to map PCI device in a secondary that uses the --block for that device

2024-05-22 Thread Antonio Di Bacco
I found the exact line where I get the crash (the line saying TAILQ_FOREACH)

/* if we're in a secondary process, just find our tailq entry */
TAILQ_FOREACH(vfio_res, vfio_res_list, next) {
if (rte_pci_addr_cmp(_res->pci_addr,
 >addr))
continue;
break;
}
/* if we haven't found our tailq entry, something's wrong */
if (vfio_res == NULL) {
RTE_LOG(ERR, EAL, "%s cannot find TAILQ entry for PCI device!\n",
pci_addr);
return -1;
}

On Wed, May 22, 2024 at 10:06 AM Antonio Di Bacco
 wrote:
>
> Thank you.
> In my case,  I have the same block/allow but the primary does an
> explicit probe of a DMA engine on the processor, the secondary is
> notified and it crashes.  It should not crash I suppose.
> The same software is running on several machines (100) but the problem
> is sporadic just on two of them.
>
> Very strange.
>
> Thx,
> Antonio.
>
> On Wed, May 22, 2024 at 9:21 AM Dmitry Kozlyuk  
> wrote:
> >
> > 2024-05-22 08:46 (UTC+0200), Antonio Di Bacco:
> > > Is it correct that a primary requests a secondary to map a device that
> > > the secondary explicitly blocks with the --block arg ?
> > >
> > > IN my case this requests of mapping creates a crash in the secondary.
> > >
> > > Using DPDK 21.11
> > >
> > > Best regards,
> > > Antonio.
> >
> > Hi Antonio,
> >
> > "Secondary processes which requires access to physical devices in Primary
> > process, must be passed with the same allow and block options."
> >
> > https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html


Re: Failure while allocating 1GB hugepages

2024-05-22 Thread Antonio Di Bacco
That was really useful. Thx

On Fri, May 10, 2024 at 5:07 PM Dmitry Kozlyuk  wrote:
>
> 2024-05-10 11:33 (UTC+0200), Antonio Di Bacco:
> > I have 16 hugepages available per NUMA on a 4 NUMA system:
> >
> > [user@node-1 hugepages]$ cat
> > /sys/devices/system/node/*/hugepages/hugepages-1048576kB/free_hugepages
> > 16
> > 16
> > 16
> > 16
> >
> > Using the following program with dpdk 21.11, sometimes I can allocate
> > a few pages but most of the time I cannot. I tried also to remove
> > rtemap_* under /dev/hugepages.
> > rte_memzone_reserve_aligned is always supposed to use a new page?
> >
> > #include 
> > #include 
> > #include 
> >
> > #include 
> > #include 
> >
> > int main(int argc, char **argv)
> > {
> > const struct rte_memzone *mz;
> > int ret;
> > printf("pid: %d\n", getpid());
> > // Initialize EAL
> > ret = rte_eal_init(argc, argv);
> > if (ret < 0) {
> > fprintf(stderr, "Error with EAL initialization\n");
> > return -1;
> > }
> >
> > for (int socket = 0; socket < 4; socket++)
> > {
> >   for (int i = 0; i < 16; i++)
> >   {
> > // Allocate memory using rte_memzone_reserve_aligned
> > char name[32];
> > sprintf(name, "my_memzone%d-%d", i, socket);
> > mz = rte_memzone_reserve_aligned(name, 1ULL << 30, socket,
> > RTE_MEMZONE_IOVA_CONTIG, 1ULL << 30);
> >
> > if (mz == NULL) {
> >   printf("errno %s\n", rte_strerror(rte_errno));
> >   fprintf(stderr, "Memory allocation failed\n");
> >   rte_eal_cleanup();
> >   return -1;
> >   }
> >
> >   printf("Memory allocated with name %s at socket %d physical
> > address: %p, addr %p addr64 %lx size: %zu\n", name, mz->socket_id,
> > (mz->iova), mz->addr, mz->addr_64, mz->len);
> > }
> > }
> >
> > // Clean up EAL
> > rte_eal_cleanup();
> > return 0;
> > }
>
> Hi Antonio,
>
> Does it succeed without RTE_MEMZONE_IOVA_CONTIG?
> If so, does your system/app have ASLR enabled?
>
> When memzone size is 1G and hugepage size is 1G,
> two hugepages are required: one for the requested amount of memory,
> and one for memory allocator element header,
> which does not fit into the same page obviously.
> I suspect that two allocated hugepages get non-continuous IOVA
> and that's why the function fails.
> There are no useful logs in EAL to check the suspicion,
> but you can hack elem_check_phys_contig() in malloc_elem.c.


Re: Request to map PCI device in a secondary that uses the --block for that device

2024-05-22 Thread Antonio Di Bacco
Thank you.
In my case,  I have the same block/allow but the primary does an
explicit probe of a DMA engine on the processor, the secondary is
notified and it crashes.  It should not crash I suppose.
The same software is running on several machines (100) but the problem
is sporadic just on two of them.

Very strange.

Thx,
Antonio.

On Wed, May 22, 2024 at 9:21 AM Dmitry Kozlyuk  wrote:
>
> 2024-05-22 08:46 (UTC+0200), Antonio Di Bacco:
> > Is it correct that a primary requests a secondary to map a device that
> > the secondary explicitly blocks with the --block arg ?
> >
> > IN my case this requests of mapping creates a crash in the secondary.
> >
> > Using DPDK 21.11
> >
> > Best regards,
> > Antonio.
>
> Hi Antonio,
>
> "Secondary processes which requires access to physical devices in Primary
> process, must be passed with the same allow and block options."
>
> https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html


Request to map PCI device in a secondary that uses the --block for that device

2024-05-22 Thread Antonio Di Bacco
Is it correct that a primary requests a secondary to map a device that
the secondary explicitly blocks with the --block arg ?

IN my case this requests of mapping creates a crash in the secondary.

Using DPDK 21.11

Best regards,
Antonio.


Sporadic SIGSEGV crash in secondary process, what could be the culprit?

2024-05-21 Thread Antonio Di Bacco
#0  0x7f427f1e5e51 in pci_vfio_map_resource_secondary () from
/usr/local/lib64/dpdk/pmds-22.0/librte_bus_pci.so.22.0
#1  0x7f427f1e0c0d in pci_plug.cold () from
/usr/local/lib64/dpdk/pmds-22.0/librte_bus_pci.so.22.0
#2  0x7f42801bda2f in local_dev_probe () from
/usr/local/lib64/librte_eal.so.22
#3  0x7f42801d60cc in __handle_primary_request () from
/usr/local/lib64/librte_eal.so.22
#4  0x7f42801d9cec in eal_alarm_callback () from
/usr/local/lib64/librte_eal.so.22
#5  0x7f42801dc682 in eal_intr_thread_main () from
/usr/local/lib64/librte_eal.so.22
#6  0x7f428053c802 in start_thread () from /lib64/libc.so.6
#7  0x7f42804dc450 in clone3 () from /lib64/libc.so.6

I need a clue about what could cause this crash, on a machine, in
particular, when this crash appears it stays forever, I mean if I
relaunch the primary I get steadily the same crash till I do a power
off and on (a software reboot doesn't help)
I'm on Rocky Linux


Failure while allocating 1GB hugepages

2024-05-10 Thread Antonio Di Bacco
I have 16 hugepages available per NUMA on a 4 NUMA system:

[user@node-1 hugepages]$ cat
/sys/devices/system/node/*/hugepages/hugepages-1048576kB/free_hugepages
16
16
16
16

Using the following program with dpdk 21.11, sometimes I can allocate
a few pages but most of the time I cannot. I tried also to remove
rtemap_* under /dev/hugepages.
rte_memzone_reserve_aligned is always supposed to use a new page?

#include 
#include 
#include 

#include 
#include 

int main(int argc, char **argv)
{
const struct rte_memzone *mz;
int ret;
printf("pid: %d\n", getpid());
// Initialize EAL
ret = rte_eal_init(argc, argv);
if (ret < 0) {
fprintf(stderr, "Error with EAL initialization\n");
return -1;
}

for (int socket = 0; socket < 4; socket++)
{
  for (int i = 0; i < 16; i++)
  {
// Allocate memory using rte_memzone_reserve_aligned
char name[32];
sprintf(name, "my_memzone%d-%d", i, socket);
mz = rte_memzone_reserve_aligned(name, 1ULL << 30, socket,
RTE_MEMZONE_IOVA_CONTIG, 1ULL << 30);

if (mz == NULL) {
  printf("errno %s\n", rte_strerror(rte_errno));
  fprintf(stderr, "Memory allocation failed\n");
  rte_eal_cleanup();
  return -1;
  }

  printf("Memory allocated with name %s at socket %d physical
address: %p, addr %p addr64 %lx size: %zu\n", name, mz->socket_id,
(mz->iova), mz->addr, mz->addr_64, mz->len);
}
}

// Clean up EAL
rte_eal_cleanup();
return 0;
}


Re: MLX5 VF stops transmitting when the Physical Function is added to a Linux bridge

2024-03-25 Thread Antonio Di Bacco
Hi Stephen,

I know that the mlx5_core is bifurcated, I did another test,  I just
added one of the VF to the bridge but I see that testpmd using other
VFs suffer the same problem.
When I add to the bridge another interface (not the one with VF but a
simple 1Gbps named ens5f0) I see that VFs used by testpmd stop
working.

If I give the command ifconfig ens5f0 down, then, the testpmd VF
resumes transmission.

Best regards,
Antonio.


On Mon, Mar 25, 2024 at 6:14 PM Stephen Hemminger
 wrote:
>
> On Mon, 25 Mar 2024 15:59:36 +0100
> Antonio Di Bacco  wrote:
>
> > I have a Connect X5 card (PF ens1f0np0) directly connected to another 
> > server:
> >
> > 1) create VF on PF on both servers
> > 2) change mac address of VFs to my own addressing
> > 3) start testpmd on server 1 in txonly mode to transmit to server 0
> > 4) start testpmd on server 0 in rxonly mode to receive
> > 5) everything is fine, I keep receiving packets on node-0
> >
> > Now, on server 1 I add the PF to a linux bridge, and everything's still 
> > fine.
> >
> > If I add another interface (a simple 1Gbps with no VF, ens5f0)  to the
> > linux bridge, then, I don't receive anymore packets on node-0
> >
> > If I remove the ens5f0 from the bridge or I put down the ens5f0 the
> > traffic flow restarts.
> >
> > I understand that DPDK uses the VF directly with no dependencies on
> > the kernel. How can operations on the kernel side (like adding an
> > interface to bridge) can affect the VF?
> >
> >
> > Best regards,
> > Antonio.
>
> Mellanox is bifurcated driver, so kernel and DPDK interact.
> Adding device to bridge will change MAC address.


MLX5 VF stops transmitting when the Physical Function is added to a Linux bridge

2024-03-25 Thread Antonio Di Bacco
I have a Connect X5 card (PF ens1f0np0) directly connected to another server:

1) create VF on PF on both servers
2) change mac address of VFs to my own addressing
3) start testpmd on server 1 in txonly mode to transmit to server 0
4) start testpmd on server 0 in rxonly mode to receive
5) everything is fine, I keep receiving packets on node-0

Now, on server 1 I add the PF to a linux bridge, and everything's still fine.

If I add another interface (a simple 1Gbps with no VF, ens5f0)  to the
linux bridge, then, I don't receive anymore packets on node-0

If I remove the ens5f0 from the bridge or I put down the ens5f0 the
traffic flow restarts.

I understand that DPDK uses the VF directly with no dependencies on
the kernel. How can operations on the kernel side (like adding an
interface to bridge) can affect the VF?


Best regards,
Antonio.


DPDK used for building programs and the one used to run the programs

2023-12-05 Thread Antonio Di Bacco
On the target machine I use to have a DPDK compiled after installing
the Mellanox 5/6 drivers.
I see that there are files related to MLX5 pmds in the target machine
(include files).

To compile my programs I use a container where there is installed a
DPDK that doesn't have the MLX5 support, I mean I don't find the
rte_mlx5_pmds.h in the container.

Now, a program compiled in the container could have problems when
using MLX5 on the target machine?

Thanks,
Anna.


Re: tap device speed

2023-10-09 Thread Antonio Di Bacco
Hi Maxime,

trying to follow the steps at the link provided:


sage:
==

1. Probe required Kernel modules
# modprobe vdpa
# modprobe vduse
# modprobe virtio-vdpa

2. Build (require vduse kernel headers to be available)
# meson build
# ninja -C build

3. Create a VDUSE device (vduse0) using Vhost PMD with
testpmd (with 4 queue pairs in this example)
# ./build/app/dpdk-testpmd --no-pci
--vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  --
-i --txq=4 --rxq=4

4. Attach the VDUSE device to the vDPA bus
# vdpa dev add name vduse0 mgmtdev vduse

I'm stuck at step 4, with an error:

manderwo@fetgem01:~$ sudo vdpa dev add name vduse0 mgmtdev vduse
kernel answers: Invalid argument
manderwo@fetgem01:~$ vdpa mgmtdev show
vduse:
  supported_classes block
manderwo@fetgem01:~$

Do you know what could be wrong?

Thank you in advance,
Antonio.

On Wed, Oct 4, 2023 at 9:50 AM Maxime Coquelin
 wrote:
>
>
>
> On 10/4/23 09:42, Antonio Di Bacco wrote:
> > Thank you for your info that are giving me the right heads up
> > To experiment with VDUSE and share a virtual network interface (I
> > don't have a physical NIC) between the Linux kernel and DPDK using
> > VDUSE, I'm about to follow these steps:
> >
> > Load Required Kernel Modules:
> > modprobe vduse
> > Create /dev/vdpa0 device with:
> > vdpa -d /dev/vdpa0 -n my_vdpa_driver -q queue_count
> >
> > I wonder which vdpa_driver should I use, I don't have a real NIC
> > After having this vdpa0 interface up I can run my DPDK application:
> >
> > ./my_dpdk_app --vdev "net_vdpa0,iface=/dev/vdpa0"
>
> You will use the Vhost PMD in this case (alternative is to use the Vhost
> API directly).
>
> Maybe I missed to add steps with Vhost PMD in my doc repo, I'll improve
> it but in the mean time you can refer to the steps provided in the DPDK
> VDUSE series cover letter:
> https://inbox.dpdk.org/dev/7f72500a-5317-c66d-3f36-2fd65c874...@redhat.com/T/
>
> Please read the vduse documentation on my gitlab repo anyways, it
> provides pointers on the missing Kernel patches (being upstreamed).
>
> Maxime
>
>
> > Regards,
> > Antonio.
> >
> > On Wed, Oct 4, 2023 at 9:08 AM Maxime Coquelin
> >  wrote:
> >>
> >>
> >>
> >> On 10/4/23 08:17, David Marchand wrote:
> >>> On Tue, Oct 3, 2023 at 6:01 PM Stephen Hemminger
> >>>  wrote:
> >>>>
> >>>> On Tue, 3 Oct 2023 10:49:16 +0200
> >>>> Antonio Di Bacco  wrote:
> >>>>
> >>>>> I understand, could we use another solution ? Like a memif  interface
> >>>>> in DPDK and libmemif in Linux?
> >>>>
> >>>> The issue is accessing kernel networking devices. Both virtio user
> >>>> and XDP are faster for that. Memif is for doing process to process 
> >>>> networking.
> >>>
> >>> For dpdk <-> kernel, as you are mentioning virtio-user/vhost, let me
> >>> add that there is some activity on this side, with VDUSE.
> >>>
> >>> Maxime is working on the VDUSE kernel and dpdk bits.
> >>> He gave a talk about the current status during the summit and some
> >>> performance numbers:
> >>> https://dpdksummit2023.sched.com/event/1P9xA/vduse-performance-how-fast-is-it-maxime-coquelin-red-hat
> >>>
> >>>
> >>
> >> Thanks for sharing David.
> >> I'd like just to add some more information on VDUSE if you want to
> >> experiment with VDUSE, which is still under development:
> >> https://gitlab.com/mcoquelin/vduse-doc
> >>
> >> Maxime
> >>
> >
>


Re: tap device speed

2023-10-04 Thread Antonio Di Bacco
Thank you for your info that are giving me the right heads up
To experiment with VDUSE and share a virtual network interface (I
don't have a physical NIC) between the Linux kernel and DPDK using
VDUSE, I'm about to follow these steps:

Load Required Kernel Modules:
modprobe vduse
Create /dev/vdpa0 device with:
vdpa -d /dev/vdpa0 -n my_vdpa_driver -q queue_count

I wonder which vdpa_driver should I use, I don't have a real NIC
After having this vdpa0 interface up I can run my DPDK application:

./my_dpdk_app --vdev "net_vdpa0,iface=/dev/vdpa0"

Regards,
Antonio.

On Wed, Oct 4, 2023 at 9:08 AM Maxime Coquelin
 wrote:
>
>
>
> On 10/4/23 08:17, David Marchand wrote:
> > On Tue, Oct 3, 2023 at 6:01 PM Stephen Hemminger
> >  wrote:
> >>
> >> On Tue, 3 Oct 2023 10:49:16 +0200
> >> Antonio Di Bacco  wrote:
> >>
> >>> I understand, could we use another solution ? Like a memif  interface
> >>> in DPDK and libmemif in Linux?
> >>
> >> The issue is accessing kernel networking devices. Both virtio user
> >> and XDP are faster for that. Memif is for doing process to process 
> >> networking.
> >
> > For dpdk <-> kernel, as you are mentioning virtio-user/vhost, let me
> > add that there is some activity on this side, with VDUSE.
> >
> > Maxime is working on the VDUSE kernel and dpdk bits.
> > He gave a talk about the current status during the summit and some
> > performance numbers:
> > https://dpdksummit2023.sched.com/event/1P9xA/vduse-performance-how-fast-is-it-maxime-coquelin-red-hat
> >
> >
>
> Thanks for sharing David.
> I'd like just to add some more information on VDUSE if you want to
> experiment with VDUSE, which is still under development:
> https://gitlab.com/mcoquelin/vduse-doc
>
> Maxime
>


Re: tap device speed

2023-10-03 Thread Antonio Di Bacco
I understand, could we use another solution ? Like a memif  interface
in DPDK and libmemif in Linux?

On Mon, Oct 2, 2023 at 11:21 PM Stephen Hemminger
 wrote:
>
> On Mon, 2 Oct 2023 21:13:03 +0200
> Antonio Di Bacco  wrote:
>
> > I'm doing a test where we have a couple of tap devices, the two
> > devices are seen by testpmd that is setup in forward mode.
> >
> > On the linux side, the two tap devices are confined in different
> > network namespaces and in one namespace we have an iperf server while
> > on the other namespace the iperf client sending either UDP or TCP.
> >
> > I expected a bandwidth in the range of few gpbs while the actual
> > measured bandwidth is a few gigabits.
> >
> > I suppose I need to configure the tap devices with optimized
> > parameters but I don't know where to look for advice.
> >
> > If I try to use the loopback interface I can get something 40 gbps
> > with a command like this:
> >
> > iperf -c 127.0.0.1 -u -i 1 -b 40g -t 10 -l 4
> >
> > .
>
> Sorry TAP device is inherently slow. It requires copies to/from Linux
> kernel. You are doing well if you get 1 million packets per second.
>
> One thing to check is that checksum is not being done twice.


tap device speed

2023-10-02 Thread Antonio Di Bacco
I'm doing a test where we have a couple of tap devices, the two
devices are seen by testpmd that is setup in forward mode.

On the linux side, the two tap devices are confined in different
network namespaces and in one namespace we have an iperf server while
on the other namespace the iperf client sending either UDP or TCP.

I expected a bandwidth in the range of few gpbs while the actual
measured bandwidth is a few gigabits.

I suppose I need to configure the tap devices with optimized
parameters but I don't know where to look for advice.

If I try to use the loopback interface I can get something 40 gbps
with a command like this:

iperf -c 127.0.0.1 -u -i 1 -b 40g -t 10 -l 4

.


Mellanox card

2023-08-12 Thread Antonio Di Bacco
I'm facing this nasty error in mellanox cards:

mlx5_net: probe of PCI device <:BB:DD.F> aborted after
encountering an error: Cannot allocate memory

The error seems to appear when a card is probed in a secondary process.
The first time the secondary is started, the probe goes fine and no
error, then, the secondary is killed and restarted. On the second time
the card is probed I get this error.

Regards,
Antonio


Re: dpdk-testpmd works but dpdk pktgen crashes on startup with MLX5 card

2023-07-13 Thread Antonio Di Bacco
The problem was on my side, the mellanox card was on NUMA 1 but I
provided only cores on NUMA 0.

Thank you

On Wed, Jul 12, 2023 at 4:40 PM Wiles, Keith  wrote:
>
> From: Maayan Kashani 
> Date: Wednesday, July 12, 2023 at 8:57 AM
> To: Antonio Di Bacco , users@dpdk.org 
> Cc: Raslan Darawsheh , Ali Alnubani 
> Subject: RE: dpdk-testpmd works but dpdk pktgen crashes on startup with MLX5 
> card
>
> Hi, Antonio,
> Sorry for the late reply,
> Thanks for bringing this issue to our attention.
> We need to investigate it, and share more data once we have it.
>
> Regards,
> Maayan Kashani
>
> > -Original Message-
> > From: Antonio Di Bacco 
> > Sent: Wednesday, 12 July 2023 0:52
> > To: users@dpdk.org
> > Subject: dpdk-testpmd works but dpdk pktgen crashes on startup with MLX5
> > card
> >
> > External email: Use caution opening links or attachments
> >
> >
> > If I try to use dpdk-pktgen on a MLX5 card, I get this SIGSEGV
> >
> > [user@dhcp-10-84-89-229 pktgen-dpdk]$  sudo
> > LD_LIBRARY_PATH=/usr/local/lib64 ./usr/local/bin/pktgen -l50-54  -n 2  
> > --allow
> > c1:00.0 -- -P -m "52.1"
>
>
>
> Hope the format is correct I told macos outlook to reply as text, but it 
> never seems to work. ☹
>
>
>
> I noticed here you define lcores  -l 50-54, which means 50 is used for timers 
> and display output. Then 51-54 are used for ports.
>
> The one thing I see here is that you define a lcore.port mapping  of -m 
> “52.1” meaning lcore 52 and port 1. You only have 1 port, which means it 
> should be -m “52.0” the other unused lcores will be reported as not used. 
> Looks like I need to add some tests to detect this problem. ☹
>
>
>
> I hope this helps. I did not see this email as I have a filter set to detect 
> a subject line with Pktgen in the text.
>
>
> >
> > *** Copyright(c) <2010-2023>, Intel Corporation. All rights reserved.
> > *** Pktgen  created by: Keith Wiles -- >>> Powered by DPDK <<<
> >
> > 0: mlx5_pci9  1   15b3:1019/c1:00.0
> >
> >
> >
> > *** Unable to create capture memzone for socket ID 2
> > *** Unable to create capture memzone for socket ID 3
> > *** Unable to create capture memzone for socket ID 4
> > *** Unable to create capture memzone for socket ID 5
> > *** Unable to create capture memzone for socket ID 6
> > *** Unable to create capture memzone for socket ID 7
> >  repeating message
> > 
> > *** Unable to create capture memzone for socket ID 219
> > *** Unable to create capture memzone for socket ID 220
> > *** Unable to create capture memzone for socket ID 221
> > *** Unable to create capture memzone for socket ID 222
> > WARNING: Nothing to do on lcore 51: exiting
> > WARNING: Nothing to do on lcore 53: exiting
> > WARNING: Nothing to do on lcore 54: exiting
> > - Ports 0-0 of 1 Copyright(c) <2010-2023>, Intel Corporation
> >   Port:Flags:
> > Link State  :
> > Pkts/s Rx   :
> >Tx   :
> > MBits/s Rx/Tx   :
> > Pkts/s Rx Max   :
> >Tx Max   :
> > Broadcast   :
> > Multicast   :
> > Sizes 64:
> >   65-127:
> >   128-255   :
> >   256-511   :
> >   512-1023  :
> >   1024-1518 :
> > Runts/Jumbos:
> > ARP/ICMP Pkts   :
> > Errors Rx/Tx:
> > Total Rx Pkts   :
> >   Tx Pkts   :
> >   Rx/Tx MBs :
> > TCP Flags   :
> > TCP Seq/Ack :
> > Pattern Type:
> > Tx Count/% Rate :
> > Pkt Size/Rx:Tx Burst:
> > TTL/Port Src/Dest   :
> > Pkt Type:VLAN ID:
> > 802.1p CoS/DSCP/IPP :
> > VxLAN Flg/Grp/vid   :
> > IP  Destination :
> > Source  :
> > MAC Destination :
> > Source  :
> > NUMA/Vend:ID/PCI:
> > -- Pktgen 23.06.1 (DPDK 22.11.2)  Powered by DPDK  (pid:20433) 
> > 
> >
> >
> > == Pktgen got a Segment Fault
> >
> > Obtained 11 stack frames.
> > ./usr/local/bin/pktgen() [0x43f1b8]
> > /lib64/libc.so.6(+0x54df0) [0x7fe22a2a3df0]
> > ./usr/local/bin/pktgen() [0x458859]
> > ./usr/local/bin/pktgen() [0x4592cc]
> > ./usr/local/bin/pktgen() [0x43d6d9]
> > ./usr/local/bin/pktgen() [0x43d73a]
> > ./usr/local/bin/pktgen() [0x41cd10]
> > ./usr/local/bin/pktgen() [0x43f601]
> > /lib64/libc.so.6(+0x3feb0) [0x7fe22a28eeb0]
> > /lib64/libc.so.6(__libc_start_main+0x80) [0x7fe22a28ef60]
> > ./usr/local/bin/pktgen() [0x404bf5]
> >
> >
> > Testpmd works fine on the same card.
> >
> > Anyone can give me a suggestion?
> >
> > Best regards.


dpdk-testpmd works but dpdk pktgen crashes on startup with MLX5 card

2023-07-11 Thread Antonio Di Bacco
If I try to use dpdk-pktgen on a MLX5 card, I get this SIGSEGV

[user@dhcp-10-84-89-229 pktgen-dpdk]$  sudo
LD_LIBRARY_PATH=/usr/local/lib64 ./usr/local/bin/pktgen -l50-54  -n 2
 --allow c1:00.0 -- -P -m "52.1"

*** Copyright(c) <2010-2023>, Intel Corporation. All rights reserved.
*** Pktgen  created by: Keith Wiles -- >>> Powered by DPDK <<<

0: mlx5_pci9  1   15b3:1019/c1:00.0



*** Unable to create capture memzone for socket ID 2
*** Unable to create capture memzone for socket ID 3
*** Unable to create capture memzone for socket ID 4
*** Unable to create capture memzone for socket ID 5
*** Unable to create capture memzone for socket ID 6
*** Unable to create capture memzone for socket ID 7
 repeating message

*** Unable to create capture memzone for socket ID 219
*** Unable to create capture memzone for socket ID 220
*** Unable to create capture memzone for socket ID 221
*** Unable to create capture memzone for socket ID 222
WARNING: Nothing to do on lcore 51: exiting
WARNING: Nothing to do on lcore 53: exiting
WARNING: Nothing to do on lcore 54: exiting
- Ports 0-0 of 1 Copyright(c) <2010-2023>, Intel Corporation
  Port:Flags:
Link State  :
Pkts/s Rx   :
   Tx   :
MBits/s Rx/Tx   :
Pkts/s Rx Max   :
   Tx Max   :
Broadcast   :
Multicast   :
Sizes 64:
  65-127:
  128-255   :
  256-511   :
  512-1023  :
  1024-1518 :
Runts/Jumbos:
ARP/ICMP Pkts   :
Errors Rx/Tx:
Total Rx Pkts   :
  Tx Pkts   :
  Rx/Tx MBs :
TCP Flags   :
TCP Seq/Ack :
Pattern Type:
Tx Count/% Rate :
Pkt Size/Rx:Tx Burst:
TTL/Port Src/Dest   :
Pkt Type:VLAN ID:
802.1p CoS/DSCP/IPP :
VxLAN Flg/Grp/vid   :
IP  Destination :
Source  :
MAC Destination :
Source  :
NUMA/Vend:ID/PCI:
-- Pktgen 23.06.1 (DPDK 22.11.2)  Powered by DPDK  (pid:20433) 


== Pktgen got a Segment Fault

Obtained 11 stack frames.
./usr/local/bin/pktgen() [0x43f1b8]
/lib64/libc.so.6(+0x54df0) [0x7fe22a2a3df0]
./usr/local/bin/pktgen() [0x458859]
./usr/local/bin/pktgen() [0x4592cc]
./usr/local/bin/pktgen() [0x43d6d9]
./usr/local/bin/pktgen() [0x43d73a]
./usr/local/bin/pktgen() [0x41cd10]
./usr/local/bin/pktgen() [0x43f601]
/lib64/libc.so.6(+0x3feb0) [0x7fe22a28eeb0]
/lib64/libc.so.6(__libc_start_main+0x80) [0x7fe22a28ef60]
./usr/local/bin/pktgen() [0x404bf5]


Testpmd works fine on the same card.

Anyone can give me a suggestion?

Best regards.


rte_dev_probe of two GPUs

2023-07-04 Thread Antonio Di Bacco
I have a primary process that receives --allow :00:00.0 as a parameter.

My server has four GPUs, if I rte_dev_probe any of those it works fine.

When I try to rte_dev_probe two of those, on the second one I get this errors:

EAL: Cannot find device (36:00.0)
EAL: Failed to attach device on primary process

Is there a limitation on the number of GPUs that can be attached to a
primary process?


Re: Multiple rte_launch_remote multiple times on main lcore

2023-06-27 Thread Antonio Di Bacco
This is very useful. Anyway, just on main_lcore, could I launch many
pthreads (with pthread_create) ?

Does this interfere with DPDK?

On Sun, Jun 25, 2023 at 4:49 PM Stephen Hemminger
 wrote:
>
> On Tue, 20 Jun 2023 17:33:59 +0200
> Antonio Di Bacco  wrote:
>
> > Is it possible to launch multiple threads on the main lcore?
> > Who will be in charge of scheduling those threads on the main lcore
> > (main lcore is isolated)?
> >
> > Not the OS I suppose.
> >
> > Thank you
>
> If you start trying to add threads like this, it will lead to
> all sorts of locking problems.  When one thread gets the lock
> and then gets preempted by the scheduler and another thread
> (bound to same lcore) tries to acquire the lock, it will spin
> and wait until the first thread is rescheduled.
>
> DPDK was designed for dedicated threads per lcore.


pthread_setspecific with rte eal thread

2023-06-24 Thread Antonio Di Bacco
Is it possible to use pthread_setspecific with a thread launched with
rte_eal_remote_launch?

Is there a similar function is more appropriate?

Best regards


Multiple rte_launch_remote multiple times on main lcore

2023-06-20 Thread Antonio Di Bacco
Is it possible to launch multiple threads on the main lcore?
Who will be in charge of scheduling those threads on the main lcore
(main lcore is isolated)?

Not the OS I suppose.

Thank you


Threads on mail lcore seems to stop for several seconds

2023-06-20 Thread Antonio Di Bacco
I have a few threads running on main lcore (launched with
rte_eal_remote_launch). It seems that randomly all threads get stuck
for some reason and then they resume again after a few seconds.

What could be the problem?

Regards.


eal-intr-thread, rte_mp_handle threads of secondary running on ISOLATED cores

2023-04-21 Thread Antonio Di Bacco
I have a primary process whose
eal-intr-thread
rte_mp_handle
threads are running correctly on "non-isolated" cores.

The primary forks a secondary whose
eal-intr-thread
rte_mp_handle
threads are running on "isolated" cores.

I'm using DPDK 21.11, I believe this is a bug. What is your opinion?

BR,
Anna.


rte_eal_remote_launch or pthread_create on main lcore

2023-04-13 Thread Antonio Di Bacco
My main lcore is sitting there just waiting for some message to do
non-real time things.
I would like to launch a thread on main lcore.
I could use a pthread or rte_eal thread, which one would you recommend?

Regards,
Anna


Measuring core frequency with dpdk

2023-04-10 Thread Antonio Di Bacco
Is it possible to measure the core frequency using a DPDK api? Not the
maximum or nominal frequency but the actual number of instruction
cycles per second.

Best regards,
Anna


Prevent rte_eth_tx_burst from freeing mbufs

2023-03-30 Thread Antonio Di Bacco
Is it possibile to take full control of mbuf and avoid the mbuf to be
freed after it has been sent?


DPDK process launched without -l or -c options

2023-03-21 Thread Antonio Di Bacco
What happens if an EAL is initialized without .-l or -c ? Does the
process use all cores or all non-isolated cores?


Add more lcores to application after rte_eal_init has been called

2023-02-22 Thread Antonio Di Bacco
I need to add some more cores to the application after the DPDK has
already been initialised.
Is it possible?
For other resources like network cards, I managed to probe new cards
dynamically but don't understand how I can do the same for new lcores.


dpdk-test power_autotest, power_caps_autotest

2023-01-11 Thread Antonio Di Bacco
The dpdk-test is really a nice piece of software.
Today I tried to use it to "autotest" the power DPDK subsystem but I
fail to fully understand the feedback it gives (consider I've disable
intel_pstate driver):

For example:

RTE>>power_autotest
POWER: Invalid Power Management Environment(0) set


RTE>>power_caps_autotest
POWER: Env isn't set yet!
POWER: Attempting to initialise ACPI cpufreq power management...
POWER: Failed to write /sys/devices/system/cpu/cpu%u/cpufreq/scaling_governor
POWER: Cannot set governor of lcore 2 to userspace
POWER: Attempting to initialise PSTAT power management...
POWER: Initialized successfully for lcore 2 power management
POWER: Capabilities 1


How to fix the "Invalid Power Management Environment(0) set" and the
Failed to write /sys/devices/system/cpu/cpu%u/cpufreq/scaling_governor
?


Re: Anonymous structs in DPDK

2022-12-14 Thread Antonio Di Bacco
Right, I'm trying to avoid PIMPL because there are a few cases where I
need to put a DPDK data type in my classes. Anyway, I will propose a
patch adding a tag name to rte_spinlock_t that shouldn't be harmful.

Regards,
Antonio.

On Wed, Dec 14, 2022 at 9:12 AM Pavel Vazharov  wrote:
>
> Hi,
>
> In my opinion, in that case you'll need to use things like PIMPL to hide the 
> implementation in the .cpp files. (I guess you were just trying to avoid the 
> need of PIMPL by other means?)
> And if you need to pass some DPDK related types to the outside world you'll 
> need to wrap them as well. You may need type erasure here and there. It 
> depends on the final goal.
>
> Regards,
> Pavel.
>
> On Wed, Dec 14, 2022 at 10:03 AM Antonio Di Bacco  
> wrote:
>>
>> Hi,
>>
>> I need to completely isolate my application from DPDK, I'm building a
>> C++ library that encapsulates the DPDK in order that the application
>> doesn't need to include (either directly or indirectly) any DPDK
>> header file. In the library cpp files I can include rte_spinlock.h but
>> not in the .hpp files.
>>
>> Best regards.
>>
>> On Wed, Dec 14, 2022 at 1:34 AM Stephen Hemminger
>>  wrote:
>> >
>> > On Tue, 13 Dec 2022 13:55:10 +
>> > Ferruh Yigit  wrote:
>> >
>> > > On 12/13/2022 12:51 PM, Antonio Di Bacco wrote:
>> > > > I noticed that DPDK include files have a number of anonymous/unnamed 
>> > > > struct:
>> > > >
>> > > > For example:
>> > > >
>> > > > /**
>> > > >  * The rte_spinlock_t type.
>> > > >  */
>> > > > typedef struct {
>> > > > volatile int locked; /**< lock status 0 = unlocked, 1 = locked 
>> > > > */
>> > > > } rte_spinlock_t;
>> > > >
>> > > > This choice doesn't allow to use forward declaration. I need forward
>> > > > declaration because I'm using a rte_spinlock_t pointer in a C++ class
>> > > > and I don't want to include rte_spinlock.h to prevent my application
>> > > > to include it as well.
>> > > >
>> > > > Is there any reason to use unnamed structures?
>> > > >
>> > >
>> > > Hi Antonio Di,
>> > >
>> > > I don't think there is a specific reason to not use named struct, I
>> > > assume that is only because there was no need to have it.
>> > >
>> > > So if you need, you can send a simple patch to convert anonymous struct
>> > > to named struct, although I am not clear why you can't include
>> > > 'rte_spinlock.h' in the file you declare your class.
>> > >
>> > > Cheers,
>> > > ferruh
>> >
>> > Why not include rte_spinlock.h? Spinlocks are meant to be embedded
>> > in the object using it. Using spinlocks by reference adds more space
>> > and causes a cache miss.


Re: Anonymous structs in DPDK

2022-12-14 Thread Antonio Di Bacco
Hi,

I need to completely isolate my application from DPDK, I'm building a
C++ library that encapsulates the DPDK in order that the application
doesn't need to include (either directly or indirectly) any DPDK
header file. In the library cpp files I can include rte_spinlock.h but
not in the .hpp files.

Best regards.

On Wed, Dec 14, 2022 at 1:34 AM Stephen Hemminger
 wrote:
>
> On Tue, 13 Dec 2022 13:55:10 +
> Ferruh Yigit  wrote:
>
> > On 12/13/2022 12:51 PM, Antonio Di Bacco wrote:
> > > I noticed that DPDK include files have a number of anonymous/unnamed 
> > > struct:
> > >
> > > For example:
> > >
> > > /**
> > >  * The rte_spinlock_t type.
> > >  */
> > > typedef struct {
> > > volatile int locked; /**< lock status 0 = unlocked, 1 = locked */
> > > } rte_spinlock_t;
> > >
> > > This choice doesn't allow to use forward declaration. I need forward
> > > declaration because I'm using a rte_spinlock_t pointer in a C++ class
> > > and I don't want to include rte_spinlock.h to prevent my application
> > > to include it as well.
> > >
> > > Is there any reason to use unnamed structures?
> > >
> >
> > Hi Antonio Di,
> >
> > I don't think there is a specific reason to not use named struct, I
> > assume that is only because there was no need to have it.
> >
> > So if you need, you can send a simple patch to convert anonymous struct
> > to named struct, although I am not clear why you can't include
> > 'rte_spinlock.h' in the file you declare your class.
> >
> > Cheers,
> > ferruh
>
> Why not include rte_spinlock.h? Spinlocks are meant to be embedded
> in the object using it. Using spinlocks by reference adds more space
> and causes a cache miss.


Anonymous structs in DPDK

2022-12-13 Thread Antonio Di Bacco
I noticed that DPDK include files have a number of anonymous/unnamed struct:

For example:

/**
 * The rte_spinlock_t type.
 */
typedef struct {
volatile int locked; /**< lock status 0 = unlocked, 1 = locked */
} rte_spinlock_t;

This choice doesn't allow to use forward declaration. I need forward
declaration because I'm using a rte_spinlock_t pointer in a C++ class
and I don't want to include rte_spinlock.h to prevent my application
to include it as well.

Is there any reason to use unnamed structures?

Thx


Re: Support for RDMA in DPDK

2022-12-02 Thread Antonio Di Bacco
Hi Cliff,

my goal is to be able to use a set of DPDK APIs to allow an
application to configure a network card (Connect-X or E810-C) to
transfer the received packets to a location in the RAM with the RDMA
in order to improve the latency.

Thx,
Antonio.

On Fri, Dec 2, 2022 at 5:50 PM Cliff Burdick  wrote:
>>
>>
>> Thank you for the information.
>>
>> I would need RDMA from a network card. Is there anything boiling in the pot?
>>
>> On Thu, Dec 1, 2022 at 7:24 PM Cliff Burdick  wrote:
>> >
>> > If you are looking for RDMA to GPU devices, you can use gpudev in newer 
>> > DPDK versions to do it.
>> >
>> > On Wed, Nov 30, 2022 at 2:02 AM Antonio Di Bacco  
>> > wrote:
>> >>
>> >> I would like to understand if DPDK will support officially RDMA
>> >> through a set of APIs.
>> >>
>> >> Regards.
>
>
> Hi Antonio, can you talk about what you're trying to do? The Connect-X cards 
> with gpudev can do RDMA. I'm not sure if it works or is supported on other 
> cards.
>


Re: Support for RDMA in DPDK

2022-12-02 Thread Antonio Di Bacco
Thank you for the information.

I would need RDMA from a network card. Is there anything boiling in the pot?

On Thu, Dec 1, 2022 at 7:24 PM Cliff Burdick  wrote:
>
> If you are looking for RDMA to GPU devices, you can use gpudev in newer DPDK 
> versions to do it.
>
> On Wed, Nov 30, 2022 at 2:02 AM Antonio Di Bacco  
> wrote:
>>
>> I would like to understand if DPDK will support officially RDMA
>> through a set of APIs.
>>
>> Regards.


Support for RDMA in DPDK

2022-11-30 Thread Antonio Di Bacco
I would like to understand if DPDK will support officially RDMA
through a set of APIs.

Regards.


Memory fill with dmadev

2022-11-17 Thread Antonio Di Bacco
I'm using rte_dma_fill to set a pattern to a memory allocated with
rte_memzone_reserve_aligned.

I know that the dst address of the dma should be set to memzone->iova
field of the memory zone, anyway, the phys address obtained using
rte_mem_virt2phy(memzone->addr) could work as well?

I'm using --iova pa in rte_eal_init .


AMD PTDMA support

2022-10-19 Thread Antonio Di Bacco
Anyone using the PTDMA in DPDK?

I applied a patch from sebastian selwin to dpdk 21.11 (ABI22).
Then I chose rawdev_autotest in dpdk-test but I get this error:

### Run selftest on each available rawdev
Running Copy Tests
In test_enqueue_copies:71 - Error with rte_ptdma_completed_ops - 1
Rawdev 0 (:04:00.2) selftest: Failed
Test Failed
RTE>>


Re: rte_eth_tx_burst() always returns 0 in tight loop

2022-09-26 Thread Antonio Di Bacco
Is there any way to check if a TX queue is full before transmitting
using the rte_eth_rx_burst() or should I rely on the return value of
rte_eth_tx_burst()?

On Thu, Jul 7, 2022 at 4:30 PM Antonio Di Bacco  wrote:
>
> I have an E810-C card with  iavf-4.4.2.1  ice-1.8.8 drivers.
>
> My tx_free_threshold is 8 together with tx_rs_thresh.
>
> I have a tight loop sending BURSTs of 8 packets, each packet is 9014
> bytes long (8 packets take 6 usecs to be serialized).
> If I put the rte_delay_us_block to 7 then everything works fine, every
> cycle 8 packets are transmitted.
> If I lower  the rte_delay_us_block to 1 usec, then I observe that the
> first FOR cycle is ok, nb_tx is 8 as expected, then, the second cycle
> prints a 7 while all subsequent cycles print Z (zero packets sent). I
> know that 1 usec delay is too small and I expect that no packets are
> transmitted for some cycles but I don't understand why I get an nb_tx
> set to 0 forever after the first two cycles.
>
> for (;;)
> {
> rte_spinlock_lock(_conf[src_port]) ;
> const uint16_t nb_tx = rte_eth_tx_burst(src_port, 0, tx_bufs,
> BURST_SIZE);
> rte_spinlock_unlock(_conf[src_port]);
>
> rte_delay_us_block(7);  // tested with 1
>
> if (nb_tx == 0)
> printf("Z");
> else if (nb_tx < BURST_SIZE)
> printf("nb_tx %d\n", nb_tx);
>
> tx += nb_tx;
>
> if (unlikely(nb_tx < BURST_SIZE)) {
> uint16_t buf;
>
> for (buf = nb_tx; buf < BURST_SIZE; buf++)
> rte_pktmbuf_free(tx_bufs[buf]);
> }
> }
>
> On Wed, Jul 6, 2022 at 5:00 PM Stephen Hemminger
>  wrote:
> >
> > On Wed, 6 Jul 2022 09:21:28 +0200
> > Antonio Di Bacco  wrote:
> >
> > > I wonder why calling eth_dev_tx_burst in a tight loop doesn't allow to
> > > write the packets into the transmit buffer. Only solution I found is
> > > to include a small delay after the tx_burst that is less than the
> > > estimated serialization time of the packet in order to be able to
> > > saturate the ethernet line.
> > >
> > > Anyway I wonder if this is the right approach.
> > >
> > > Thx,
> > > Antonio.
> > >
> > > On Sun, Jul 3, 2022 at 10:19 PM Gábor LENCSE  wrote:
> > > >
> > > > Dear Antonio,
> > > >
> > > > According to my experience, the rte_eth_tx_burst() function reports the
> > > > packets as "sent" (by a non-zero return value), when they are still in
> > > > the transmit buffer.
> > > >
> > > > (If you are interested in the details, you can see them in Section 3.6.5
> > > > of this paper: 
> > > > http://www.hit.bme.hu/~lencse/publications/e104-b_2_128.pdf )
> > > >
> > > > Therefore, I think that the return value of 0 may mean that
> > > > rte_eth_tx_burst() can't even commit itself for the future delivery of
> > > > the packets. I could only guess why. E.g. all its resources have been
> > > > exhausted.
> > > >
> > > > Best regards,
> > > >
> > > > Gábor
> > > >
> > > >
> > > > 7/3/2022 5:57 PM keltezéssel, Antonio Di Bacco írta:
> > > > > I'm trying to send packets continuously in a  tight loop with a burst
> > > > > size of 8 and packets are 9600 bytes long.
> > > > > If I don't insert a delay after the rte_eth_tx_burst it always 
> > > > > returns 0.
> > > > >
> > > > > What's the explanation of this behaviour ?
> > > > >
> > > > > Best regards,
> > > > > Antonio.
> > > >
> >
> > Which driver? How did you set the tx_free threshold.
> > The driver will need to cleanup already transmitted packets.


Re: ABI version on build machine and target machine

2022-09-20 Thread Antonio Di Bacco
The fact that I don't have to align the DPDK on the build system and
on the target (provided that they have the same ABI) is very soothing.

Thank you.

On Tue, Sep 20, 2022 at 5:13 PM Stephen Hemminger
 wrote:
>
> On Tue, 20 Sep 2022 16:24:52 +0200
> Antonio Di Bacco  wrote:
>
> > I have an application that is built with version 21.11.0 of the DPDK 
> > (ABI22).
> > Now I run the application  on a target machine that has 21.11.2
> > installed, then the same ABI.
> >
> > How can I be sure that the DPDK inlined source code in 21.11.0 is the
> > same or compatible with that in 21.11.2?
> > I mean, if an inline function in version 21.11.0 has changed in
> > 21.11.2, could I see problems?
> >
> > Is this important or the fact that the two DPDKs have the same ABI
> > grants that everything will work?
>
> yes. Especially for bugfix (ie stable) releases.
> The only thing is that if a bug fix is done in an inline in 21.11.2
> then obviously you have to build with that version to get it.


ABI version on build machine and target machine

2022-09-20 Thread Antonio Di Bacco
I have an application that is built with version 21.11.0 of the DPDK (ABI22).
Now I run the application  on a target machine that has 21.11.2
installed, then the same ABI.

How can I be sure that the DPDK inlined source code in 21.11.0 is the
same or compatible with that in 21.11.2?
I mean, if an inline function in version 21.11.0 has changed in
21.11.2, could I see problems?

Is this important or the fact that the two DPDKs have the same ABI
grants that everything will work?


rte_memzone_reserve in a secondary: to call or not to call, this is the problem

2022-09-12 Thread Antonio Di Bacco
In the documentation for rte_memzon_reserve() I see that one of the
possible returned errors is E_RTE_SECONDARY and this makes me think
that it is not possible to call this function in a secondary process.
Actually, it seems to work, is that a problem of the documentation ?

Best regards,
Antonio.


Re: Launching a new primary process from a primary

2022-09-11 Thread Antonio Di Bacco
Thank you very much, Dmitry

On Sun, Sep 11, 2022 at 8:20 PM Dmitry Kozlyuk  wrote:
>
> 2022-09-11 19:55 (UTC+0200), Antonio Di Bacco:
> > Do you know where can I find the "test app".
>
> http://git.dpdk.org/dpdk/tree/app/test/process.h#n75


Deadlock while allocating memory from two different primaries

2022-09-08 Thread Antonio Di Bacco
I have a primary that spawns another primary that will only run for a
few seconds and then exit. While spawning the "son" primary (using
system call) the "father" primary continues to allocate memory using
rte_memzone_reserve().

After a few spawnings, I get a lock:

simple_mem_mp5542 FLOCK  WRITE* 0 0  0 /dev/hugepages2M...
simple_mem_mp5542 FLOCK  WRITE  0 0  0 /dev/hugepages2M...

During these tests I got the impression that the lock doesn't arise
from the concurrency of the calls to allocate memory but from the
concurrency of allocations and rte_eal_init of the spawned process.

This is the reference code:


#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include "mp_commands.h"

#define RTE_LOGTYPE_APP RTE_LOGTYPE_USER1

volatile int quit = 0;

static int
lcore_alloc_forever(__rte_unused void *arg)
{
unsigned lcore_id = rte_lcore_id();

printf("Starting core on primary lcore %u\n", lcore_id);
while (!quit)
{

char mem_name[64];

sprintf(mem_name, "%s%d", "mymem", rand());
rte_delay_ms(10);

const struct rte_memzone* memzone =
rte_memzone_reserve_aligned(mem_name, 1024*1024, rand()%2,
RTE_MEMZONE_256KB | RTE_MEMZONE_SIZE_HINT_ONLY |
RTE_MEMZONE_IOVA_CONTIG, RTE_CACHE_LINE_SIZE);
if (memzone == NULL)
printf("Cannot allocate\n");
rte_delay_ms(10);
//printf("lcore %u hugepages size %ld\n", lcore_id,
memzone->hugepage_sz);
rte_memzone_free(memzone);


}

return 0;
}

static int
lcore_alloc_time_limited(__rte_unused void *arg)
{
unsigned lcore_id = rte_lcore_id();
char mem_name[32];

printf("Starting core %u \n", lcore_id);
for (int i = 0; i < 200; i++)
{
sprintf(mem_name, "%s%d", "mymem", rand());
rte_delay_ms(rand() % 20);
const struct rte_memzone* memzone =
rte_memzone_reserve_aligned(mem_name, 1024*1024, rand()%2,
RTE_MEMZONE_256KB | RTE_MEMZONE_SIZE_HINT_ONLY |
RTE_MEMZONE_IOVA_CONTIG, RTE_CACHE_LINE_SIZE);
if (memzone == NULL)
printf("Cannot allocate\n");
rte_delay_ms(rand() % 20);
//printf("hugepages size %ld\n", memzone->hugepage_sz);
rte_memzone_free(memzone);

}

printf("Exiting core %u \n", lcore_id);

return 0;
}


int
main(int argc, char **argv)
{
const unsigned flags = 0;

const unsigned priv_data_sz = 0;

int ret;
unsigned lcore_id;

int numa = 0;

ret = rte_eal_init(argc, argv);
if (ret < 0)
rte_exit(EXIT_FAILURE, "Cannot init EAL\n");

argc -= ret; argv += ret;


/* Start of ring structure. 8< */
if (rte_eal_process_type() == RTE_PROC_PRIMARY)
{
RTE_LOG(INFO, APP, "Finished Process Init.\n");

int second_process = (argv[1] != NULL) && (strcmp(argv[1],
"second") == 0);

if (!second_process)
{
RTE_LCORE_FOREACH_WORKER(lcore_id) {
rte_eal_remote_launch(lcore_alloc_forever, NULL, lcore_id);
}

while (true)
{
printf("Launching process\n");
system("build/simple_mem_mp --file-prefix bb  -l 5-15
--proc-type primary -- second");

#define MAX_2M_PAGES512
struct rte_memzone* memzones[MAX_2M_PAGES];

for (int i = 0; i < MAX_2M_PAGES; i++)
{
char mem_name[32];
sprintf(mem_name, "%s%d", "mymem", rand());
memzones[i] =
rte_memzone_reserve_aligned(mem_name, 1024*1024, rand()%2,
RTE_MEMZONE_256KB | RTE_MEMZONE_SIZE_HINT_ONLY |
RTE_MEMZONE_IOVA_CONTIG, RTE_CACHE_LINE_SIZE);
if (memzones[i] == NULL)
printf("Cannot allocate\n");
}
rte_delay_ms(rand() % 20);

for (int i = 0; i < MAX_2M_PAGES; i++)
{
rte_memzone_free(memzones[i]);
}

rte_delay_ms(1);
}

}
else
{
RTE_LCORE_FOREACH_WORKER(lcore_id) {
rte_eal_remote_launch(lcore_alloc_time_limited, NULL, lcore_id);
}

}


} else {

/* call lcore_recv() on every worker lcore */
RTE_LCORE_FOREACH_WORKER(lcore_id) {
rte_eal_remote_launch(lcore_alloc_time_limited, NULL, lcore_id);
}

printf("Secondary started\n");

}

rte_eal_mp_wait_lcore();

/* clean up the EAL */
rte_eal_cleanup();

printf("Proc type %d exiting\n", rte_eal_process_type());

return 0;
}


Re: Can a DPDK API like rte_eth_dev_set_mtu be called by a normal pthread

2022-08-30 Thread Antonio Di Bacco
 printf("Cannot allocate pktmbuf_pool\n");
return -1;
}
else
{
printf("pktmbuf_pool %p\n", pktmbuf_pool);
}
}

int port_init(int port_id)
{
struct rte_eth_conf port_conf;
int retval;
struct rte_eth_dev_info dev_info;
struct rte_eth_rxconf   rxconf;
struct rte_eth_txconf   txconf;

memset( _conf, 0, sizeof(struct rte_eth_conf) );


retval = rte_eth_dev_info_get(port_id, _info );
if (retval != 0)
return -1;

uint16_t rx_ring_size = 1024;
uint16_t tx_ring_size = 1024;

retval = rte_eth_dev_configure(port_id, 1, 1, _conf );
if (retval != 0)
return -1;

printf("Eth start %d\n", __LINE__);

retval = rte_eth_dev_adjust_nb_rx_tx_desc(
port_id, _ring_size, _ring_size );
if (retval != 0)
return -1;


/* Setup the RX queue (always done, it's mandatory for mlx5 at least) */
rxconf = dev_info.default_rxconf;
retval = rte_eth_rx_queue_setup(port_id, 0,
 rx_ring_size,
 0, ,
 pktmbuf_pool );
if (retval < 0)
return -1;

if ( tx_ring_size ) {

txconf  = dev_info.default_txconf;
txconf.offloads = port_conf.txmode.offloads;
retval = rte_eth_tx_queue_setup(port_id, 0,
 tx_ring_size,
 0,  );
if (retval < 0)
return -1;
}

/* Use DPDK for always setting the largest MTU we support, with
 * no need to manual mtu configuration on system interfaces.
 * DAVIDE tested on Intel ixgbe, let's see if it works with
i40e/mlx5 TODO */
rte_eth_dev_set_mtu( port_id, 9000 );

retval = rte_eth_dev_start(port_id );
if (retval < 0)
return -1;
}


int init_vf(const char* vf)
{
printf("Num ports %d\n", rte_eth_dev_count_avail());
int res = rte_dev_probe(vf);
if (res < 0) {
printf("Cannot probe\n");
return -1;
}

uint16_t port_id = 0;
res = rte_eth_dev_get_port_by_name(vf, _id );
if (res < 0) {
printf("Port not found\n");
return -1;
}

printf("Num ports %d\n", rte_eth_dev_count_avail());

printf("Configuring port %d\n", port_id);
init_mbufs();
port_init(port_id);

}

int
main(int argc, char **argv)
{
const unsigned flags = 0;

const unsigned priv_data_sz = 0;

int ret;
unsigned lcore_id;

int numa = 0;

ret = rte_eal_init(argc, argv);
if (ret < 0)
rte_exit(EXIT_FAILURE, "Cannot init EAL\n");


argc -= ret;
argv += ret;


const char* vf_bdf = argv[1];
printf("VF-BDF %s\n", vf_bdf);

/* Start of ring structure. 8< */
if (rte_eal_process_type() == RTE_PROC_PRIMARY)
{
//init_vf(vf_bdf);

int first_lcore = rte_get_next_lcore(-1, 1, 0);
rte_eal_remote_launch(lcore_primary, NULL, first_lcore);


char cmdline[256];
sprintf(cmdline, "build/simple_eth_tx_mp -l 30-32
--file-prefix tx --allow ':00:00.0' --proc-type auto -- %s",
vf_bdf);
system(cmdline);

rte_delay_ms(1);


} else {

printf("Secondary pid is %d, waiting if you want to attach\n",
getpid());
sleep(10);

init_vf(vf_bdf);

//
sleep(5);
int first_lcore = rte_get_next_lcore(-1, 1, 0);
rte_eal_remote_launch(lcore_secondary, NULL, first_lcore);


    printf("Secondary started\n");


}


rte_eal_mp_wait_lcore();

/* clean up the EAL */
rte_eal_cleanup();

printf("Proc type %d exiting\n", rte_eal_process_type());

return 0;
}



On Tue, Aug 23, 2022 at 2:08 PM Dmitry Kozlyuk  wrote:
>
> 2022-08-23 13:57 (UTC+0200), Antonio Di Bacco:
> > Thank you, and what if the pthread is launched by a secondary DPDK
> > process? Should rte_eth_dev_set_mtu be working ?
> > Because I'm observing a crash.
>
> Do you have a debug stack trace?
> Does PMD that you use support setting MTU? Memif doesn't, for example.
> Check for .mtu_set in struct eth_dev_ops filled in the PMD code.


Re: Secondary process stuck in rte_eal_memory_init

2022-08-24 Thread Antonio Di Bacco
Can you try launching the secondary with some delay in order not to
overlap with memory allocations done in the primary?
Is your primary allocating memory on NUMA 0 where the secondary is running?

On Tue, Aug 23, 2022 at 4:54 PM Anna Tauzzi  wrote:
>
> I have a primary process that spawns a secondary process.Primary is on NUMA 1 
> while secondary on NUMA 0.
> The secondary process starts up but when calling rte_eal_init it gets stuck 
> with this backtrace:
>
> flock()
> sync_walk()
> rte_memseg_list_walk_thread_unsafe()
> eal_memalloc_sync_with_primary()
> rte_eal_hugepage_attach()
> rte_eal_memory_init()
> rte_eal_init.cold()
>
> While starting the secondary, it is possible that the primary is allocating 
> memory on different NUMAs. I'm saying this because if in the primary I 
> replace the dpdk memory allocation function (rte_zalloc...) with a plain 
> memalign I don't get this problem.
>
>
>


Re: Can a DPDK API like rte_eth_dev_set_mtu be called by a normal pthread

2022-08-23 Thread Antonio Di Bacco
Thank you, and what if the pthread is launched by a secondary DPDK
process? Should rte_eth_dev_set_mtu be working ?
Because I'm observing a crash.


On Tue, Aug 23, 2022 at 12:22 PM Dmitry Kozlyuk
 wrote:
>
> 2022-08-23 11:25 (UTC+0200), Antonio Di Bacco:
> > I have a DPDK process that also creates a normal pthread, is there
> > anything wrong to call a DPDK api from this normal pthread?
>
> Nothing wrong, this is allowed.
>
> You may be interested in rte_thread_register(),
> if you want to consider this thread as an lcore.
> Also, rte_socket_id() will return SOCKET_ID_ANY
> unless the thread is registered, for example.


Can a DPDK API like rte_eth_dev_set_mtu be called by a normal pthread

2022-08-23 Thread Antonio Di Bacco
I have a DPDK process that also creates a normal pthread, is there
anything wrong to call a DPDK api from this normal pthread?


PMD for FPGA based network device

2022-08-15 Thread Antonio Di Bacco
I need to write a PMD for an FPGA based network device based on Xilinx
FPGA (Ultrascale).

Just to start I tried to load a dummy bitstream that has a status
register with a fixed value and a bunch of writable registers.

Then, I bound the vfio-pci to the device (10ee:903f)  because I plan
to be able to use "user space DMA" in my PMD.

I was able to mmap the fpga registers and read the status register. I
can write and read back the writable registers but, when I stop my
process and relaunch it, I find that the registers are all cleared
(really bad).
I tried to mmap the fpga using resource0 under sysfs and I'm able to
write and read back the registers correctly and the registers are
persistent (good).
Then it is not a problem related to the bitstream but something wrong
I make on the VFIO mmap or the device is reset when I start my C code
to do the mmap.
I also tried disabling IOMMU when loading the VFIO-PCI driver with no luck.

I don't know if this is the right place to ask this question but I
know there are many knowledgeable people reading this forum and I
cannot find many examples of using VFIO.

This is the script I use to bind vfio:

VID="10ee"
DID="903f"
DBDF=":"`lspci -n | grep -E ${VID}:${DID} | cut -d ' ' -f1`
echo ${DBDF}
ROOT_DBDF=":3a:00.0"
readlink /sys/bus/pci/devices/${DBDF}/iommu_group
GROUP=`readlink /sys/bus/pci/devices/${DBDF}/iommu_group | rev | cut
-d '/' -f1 | rev`
echo "GROUP " ${GROUP}
# unbind the ROOT device makes the group viable
echo ${ROOT_DBDF}  > /sys/bus/pci/devices/${ROOT_DBDF}/driver/unbind; sleep 1
echo ${DBDF}  > /sys/bus/pci/devices/${DBDF}/driver/unbind; sleep 1
echo ${VID} ${DID} > /sys/bus/pci/drivers/vfio-pci/remove_id; sleep 1
echo ${VID} ${DID} > /sys/bus/pci/drivers/vfio-pci/new_id; sleep 1
echo 8086 2030 > /sys/bus/pci/drivers/vfio-pci/new_id; sleep 1
ls -l /sys/bus/pci/devices/${DBDF}/iommu_group/devices; sleep 1

chmod 660 /dev/vfio/vfio


WHILE this is the C code I use to do the mmap:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

int main(int argc, char** argv)
{

int container, group, device, i;
struct vfio_group_status group_status =
{ .argsz = sizeof(group_status) };
struct vfio_iommu_type1_info iommu_info = { .argsz =
sizeof(iommu_info) };
struct vfio_iommu_type1_dma_map dma_map = { .argsz = sizeof(dma_map) };
struct vfio_device_info device_info = { .argsz = sizeof(device_info) };

/* Create a new container */
container = open("/dev/vfio/vfio", O_RDWR);

if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION)
printf("Unknown API version\n");
/* Unknown API version */

if (!ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU))
printf("Doesn't support IOMMU driver we want\n");

/* Open the group */
group = open("/dev/vfio/69", O_RDWR);

/* Test the group is viable and available */
ioctl(group, VFIO_GROUP_GET_STATUS, _status);

if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE))
printf("Group is not viable\n");
/* Group is not viable (ie, not all devices bound for vfio) */

/* Add the group to the container */
ioctl(group, VFIO_GROUP_SET_CONTAINER, );

/* Enable the IOMMU model we want */
ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);

/* Get addition IOMMU info */
ioctl(container, VFIO_IOMMU_GET_INFO, _info);

/* Allocate some space and setup a DMA mapping */
/*
dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
dma_map.size = 1024 * 1024;
dma_map.iova = 0;
dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;

ioctl(container, VFIO_IOMMU_MAP_DMA, _map);
*/
/* Get a file descriptor for the device */
device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, ":3b:00.0");

/* Test and setup the device */
ioctl(device, VFIO_DEVICE_GET_INFO, _info);

printf("NUM REGIONS %d\n", device_info.num_regions);

struct vfio_region_info regs[64];
for (i = 0; i < device_info.num_regions; i++) {
regs[i].argsz = sizeof(struct vfio_region_info);
regs[i].index = i;

ioctl(device, VFIO_DEVICE_GET_REGION_INFO, [i]);

printf("region %d flags %08x offset %lld size %lld\n",
i, regs[i].flags, regs[i].offset, regs[i].size);
/* Setup mappings... read/write offsets, mmaps
* For PCI devices, config space is a region */
}

volatile uint8_t* ptr = mmap(0, regs[0].size, PROT_READ |
PROT_WRITE, MAP_SHARED, device, 0);

printf("addr %p\n", ptr);

printf("reg 0x38000 %08x\n", *(uint32_t*)(ptr + 0x38000));

{
uint32_t ival = 

rte_memzone_reserve_aligned always allocate 1GB hugepages irrespective of the size I request I allocate

2022-07-19 Thread Antonio Di Bacco
I'm working on a system that has the following hugepages configuration:
##
sudo dpdk-hugepages.py -s
Node Pages Size Total
2256   2Mb512Mb
28 1Gb8Gb
0256   2Mb512Mb
08 1Gb8Gb
3256   2Mb512Mb
38 1Gb8Gb
1256   2Mb512Mb
18 1Gb8Gb
Hugepages mounted on /dev/hugepages
##

If I try to allocate memory with
rte_memzone_reserve_aligned(mem_name, 1024*1024, socket_id,
RTE_MEMZONE_2MB | RTE_MEMZONE_SIZE_HINT_ONLY |
RTE_MEMZONE_IOVA_CONTIG, RTE_CACHE_LINE_SIZE):

it always allocate the memory on 1GB hugepage.

If I remove RTE_MEMZONE_SIZE_HINT_ONLY flag, the allocation fails.


Regards.


Re: Getting mbuf pointer from buf_addr pointer

2022-07-12 Thread Antonio Di Bacco
I was asked to reproduce the exact same behaviour of a library that
used VMDQ queues without DPDK.
Now this library offered an API to allocate a buffer for a packet and
returned an address where the application could write its data.

void* vmdqNew (vmdqUidPOOLt* p);

I implemented the same API to allocate an MBUF from a pool and I also
return the right address as a void* but then to transmit the packet I
need the address of the MBUF for the tx_burst API and I don't have it.

On Tue, Jul 12, 2022 at 10:32 AM Dmitry Kozlyuk
 wrote:
>
> 2022-07-11 11:21 (UTC+0200), Antonio Di Bacco:
> > Is there any API that allows me to get the MBUF pointer given its
> > buf_addr pointer?
>
> No.
>
> How did you come across this need?
> Sounds like an app design flaw.
> What is the original problem you're solving?
>
> P.S. It may be possible in specific cases when you know (from code)
> how mempool sets buf_addr for a given mbuf,
> and that PMDs don't change buf_addr with the used set of offloads.
> Apparently, this approach would be extremely brittle and dangerous.


Getting mbuf pointer from buf_addr pointer

2022-07-11 Thread Antonio Di Bacco
Is there any API that allows me to get the MBUF pointer given its
buf_addr pointer?

Regards,
Antonio.


xxx_DESCS_PER_LOOP workaround

2022-07-07 Thread Antonio Di Bacco
I'm using the ice PMD driver and, luckily, before going mad, I
discovered that an rte_eth_rx_burst with a burst size of 1 will never
succeed even if packets have arrived on the network card. This is due
to the fact that ICE_DESCS_PER_LOOP is set to 4 or 8 and then you need
to use a burst size not lower than 4 or 8.
Now, I have an application where a single packet (9000 bytes) has to
be delivered through the network with the lowest possible latency.

Is it possible to workaround this limitation by accepting the
performance degradation?
I read that there is a "runtime" option-a ,rx_low_latency=1
but I'm not sure if this will be a workaround.


Re: rte_eth_tx_burst() always returns 0 in tight loop

2022-07-07 Thread Antonio Di Bacco
I have an E810-C card with  iavf-4.4.2.1  ice-1.8.8 drivers.

My tx_free_threshold is 8 together with tx_rs_thresh.

I have a tight loop sending BURSTs of 8 packets, each packet is 9014
bytes long (8 packets take 6 usecs to be serialized).
If I put the rte_delay_us_block to 7 then everything works fine, every
cycle 8 packets are transmitted.
If I lower  the rte_delay_us_block to 1 usec, then I observe that the
first FOR cycle is ok, nb_tx is 8 as expected, then, the second cycle
prints a 7 while all subsequent cycles print Z (zero packets sent). I
know that 1 usec delay is too small and I expect that no packets are
transmitted for some cycles but I don't understand why I get an nb_tx
set to 0 forever after the first two cycles.

for (;;)
{
rte_spinlock_lock(_conf[src_port]) ;
const uint16_t nb_tx = rte_eth_tx_burst(src_port, 0, tx_bufs,
BURST_SIZE);
rte_spinlock_unlock(_conf[src_port]);

rte_delay_us_block(7);  // tested with 1

if (nb_tx == 0)
printf("Z");
else if (nb_tx < BURST_SIZE)
printf("nb_tx %d\n", nb_tx);

tx += nb_tx;

if (unlikely(nb_tx < BURST_SIZE)) {
uint16_t buf;

for (buf = nb_tx; buf < BURST_SIZE; buf++)
rte_pktmbuf_free(tx_bufs[buf]);
}
}

On Wed, Jul 6, 2022 at 5:00 PM Stephen Hemminger
 wrote:
>
> On Wed, 6 Jul 2022 09:21:28 +0200
> Antonio Di Bacco  wrote:
>
> > I wonder why calling eth_dev_tx_burst in a tight loop doesn't allow to
> > write the packets into the transmit buffer. Only solution I found is
> > to include a small delay after the tx_burst that is less than the
> > estimated serialization time of the packet in order to be able to
> > saturate the ethernet line.
> >
> > Anyway I wonder if this is the right approach.
> >
> > Thx,
> > Antonio.
> >
> > On Sun, Jul 3, 2022 at 10:19 PM Gábor LENCSE  wrote:
> > >
> > > Dear Antonio,
> > >
> > > According to my experience, the rte_eth_tx_burst() function reports the
> > > packets as "sent" (by a non-zero return value), when they are still in
> > > the transmit buffer.
> > >
> > > (If you are interested in the details, you can see them in Section 3.6.5
> > > of this paper: 
> > > http://www.hit.bme.hu/~lencse/publications/e104-b_2_128.pdf )
> > >
> > > Therefore, I think that the return value of 0 may mean that
> > > rte_eth_tx_burst() can't even commit itself for the future delivery of
> > > the packets. I could only guess why. E.g. all its resources have been
> > > exhausted.
> > >
> > > Best regards,
> > >
> > > Gábor
> > >
> > >
> > > 7/3/2022 5:57 PM keltezéssel, Antonio Di Bacco írta:
> > > > I'm trying to send packets continuously in a  tight loop with a burst
> > > > size of 8 and packets are 9600 bytes long.
> > > > If I don't insert a delay after the rte_eth_tx_burst it always returns 
> > > > 0.
> > > >
> > > > What's the explanation of this behaviour ?
> > > >
> > > > Best regards,
> > > > Antonio.
> > >
>
> Which driver? How did you set the tx_free threshold.
> The driver will need to cleanup already transmitted packets.


Re: Shared memory between two primary DPDK processes

2022-07-07 Thread Antonio Di Bacco
You are right, process 1 is always allocating 1GB even if I request
only 10MB, and memseg->hugepage_sz is 1GB.
When I use rte_memseg_get_fd_offset I get an FD and the offset is 0
(correct) because that's the memseg offset not the memzone.
To access the memory allocated by process 1 I need to take into
account also the offset between memzone and memseg.
And then I need to add (memzone->iova -  memseg->iova) to the address
returned by mmap.






On Thu, Jul 7, 2022 at 2:26 AM Dmitry Kozlyuk  wrote:
>
> 2022-07-07 00:14 (UTC+0200), Antonio Di Bacco:
> > Dear Dmitry,
> >
> > I tried to follow this approach and if I allocate 1GB on primary
> > process number 1, then I can mmap that memory on the primary process
> > number 2.
> > I also tried to convert the virt addr of the allocation made in
> > primary 1 to phys and then I converted the virt addr returned by mmap
> > in primary 2 and I got the same phys addr.
> >
> > Unfortunately, if I try to allocated only 10 MB for example in primary
> > 1, then mmap in primary 2 succeeds but it seems that this virt addr
> > doesn't correspond to the same phys memory as in primary 1.
> >
> > In the primary 2, the mmap is used like this:
> >
> > int flags = MAP_SHARED | MAP_HUGETLB ;
> >
> > uint64_t* addr = (uint64_t*) mmap(NULL, sz, PROT_READ|PROT_WRITE,
> > flags, my_mem_fd, off);
>
> Hi Antonio,
>
> From `man 2 mmap`:
>
>Huge page (Huge TLB) mappings
>For  mappings that employ huge pages, the requirements for the
>arguments of mmap() and munmap() differ somewhat from the requirements
>for mappings that use the native system page size.
>
>For mmap(), offset must be a multiple of the underlying huge page
>size.  The system automatically aligns length to be a  multiple  of
>the underlying huge page size.
>
>For munmap(), addr, and length must both be a multiple of the
>underlying huge page size.
>
> Probably process 1 maps a 1 GB hugepage:
> DPDK does so if 1 GB hugepages are used even if you only allocate 10 MB.
> You can examine memseg to see what size it is (not memzone!).
> Hugepage size is a property of each mounted HugeTBL filesystem.
> It determines which kernel pool to use.
> Pools are over different sets of physical pages.
> This means that the kernel doesn't allow to map given page frames
> as 1 GB and 2 MB hugepages at the same time via hugetlbfs.
> I'm surprised mmap() works at all in your case
> and suspect that it is mapping 2 MB hugepages in process 2.
>
> The solution may be, in process 2:
>
> base_offset = RTE_ALIGN_FLOOR(offset, hugepage_size)
> map_addr = mmap(fd, size=hugepage_size, offset=base_offset)
> addr = RTE_PTR_ADD(map_addr, offset - base_offset)
>
> Note that if [offset; offset+size) crosses a hugepage boundary,
> you have to map more than one page.


Re: Shared memory between two primary DPDK processes

2022-07-06 Thread Antonio Di Bacco
Dear Dmitry,

I tried to follow this approach and if I allocate 1GB on primary
process number 1, then I can mmap that memory on the primary process
number 2.
I also tried to convert the virt addr of the allocation made in
primary 1 to phys and then I converted the virt addr returned by mmap
in primary 2 and I got the same phys addr.

Unfortunately, if I try to allocated only 10 MB for example in primary
1, then mmap in primary 2 succeeds but it seems that this virt addr
doesn't correspond to the same phys memory as in primary 1.

In the primary 2, the mmap is used like this:

int flags = MAP_SHARED | MAP_HUGETLB ;

uint64_t* addr = (uint64_t*) mmap(NULL, sz, PROT_READ|PROT_WRITE,
flags, my_mem_fd, off);

On Fri, Apr 8, 2022 at 3:26 PM Dmitry Kozlyuk  wrote:
>
> 2022-04-08 14:31 (UTC+0200), Antonio Di Bacco:
> > I know that it is possible to share memory between a primary and secondary
> > process using rte_memzone_reserve_aligned to allocate memory in primary
> > that is "seen" also by the secondary. If we have two primary processes
> > (started with different file-prefix) the same approach is not feasible. I
> > wonder how to share a chunk of memory hosted on a hugepage between two
> > primaries.
> >
> > Regards.
>
> Hi Antonio,
>
> Correction: all hugepages allocated by DPDK are shared
> between primary and secondary processes, not only memzones.
>
> I assume we're talking about processes within one host,
> because your previous similar question was about sharing memory between hosts
> (as we have discussed offline), which is out of scope for DPDK.
>
> As for the question directly, you need to map the same part of the same file
> in the second primary as the hugepage is mapped from in the first primary.
> I don't recommend to work with file paths, because their management
> is not straightforward (--single-file-segments, for one) and is undocumented.
>
> There is a way to share DPDK memory segment file descriptors.
> Although public, this DPDK API is dangerous in the sense that you must
> clearly understand what you're doing and how DPDK works.
> Hence the question: what is the task you need this sharing for?
> Maybe there is a simpler way.
>
> 1. In the first primary:
>
> mz = rte_memzone_reserve()
> ms = rte_mem_virt2memseg(mz->addr)
> fd = rte_memseg_get_fd(ms)
> offset = rte_memseg_get_fd_offset(ms)
>
> 2. Use Unix domain sockets with SCM_RIGHTS
>to send "fd" and "offset" to the second primary.
>
> 3. In the second primary, after receiving "fd" and "offset":
>
> flags = MAP_SHARED | MAP_HUGE | (30 << MAP_HUGE_SHIFT)
> addr = mmap(fd, offset, flags)
>
> Note that "mz" may consist of multiple "ms" depending on the sizes
> of the zone and hugepages, and on the zone alignment.
> Also "addr" may (and probably will) differ from "mz->addr".
> It is possible to pass "mz->addr" and try to force it,
> like DPDK does for primary/secondary.
>


Re: rte_eth_tx_burst() always returns 0 in tight loop

2022-07-06 Thread Antonio Di Bacco
I wonder why calling eth_dev_tx_burst in a tight loop doesn't allow to
write the packets into the transmit buffer. Only solution I found is
to include a small delay after the tx_burst that is less than the
estimated serialization time of the packet in order to be able to
saturate the ethernet line.

Anyway I wonder if this is the right approach.

Thx,
Antonio.

On Sun, Jul 3, 2022 at 10:19 PM Gábor LENCSE  wrote:
>
> Dear Antonio,
>
> According to my experience, the rte_eth_tx_burst() function reports the
> packets as "sent" (by a non-zero return value), when they are still in
> the transmit buffer.
>
> (If you are interested in the details, you can see them in Section 3.6.5
> of this paper: http://www.hit.bme.hu/~lencse/publications/e104-b_2_128.pdf )
>
> Therefore, I think that the return value of 0 may mean that
> rte_eth_tx_burst() can't even commit itself for the future delivery of
> the packets. I could only guess why. E.g. all its resources have been
> exhausted.
>
> Best regards,
>
> Gábor
>
>
> 7/3/2022 5:57 PM keltezéssel, Antonio Di Bacco írta:
> > I'm trying to send packets continuously in a  tight loop with a burst
> > size of 8 and packets are 9600 bytes long.
> > If I don't insert a delay after the rte_eth_tx_burst it always returns 0.
> >
> > What's the explanation of this behaviour ?
> >
> > Best regards,
> > Antonio.
>


rte_eth_tx_burst() always returns 0 in tight loop

2022-07-03 Thread Antonio Di Bacco
I'm trying to send packets continuously in a  tight loop with a burst
size of 8 and packets are 9600 bytes long.
If I don't insert a delay after the rte_eth_tx_burst it always returns 0.

What's the explanation of this behaviour ?

Best regards,
Antonio.


Re: Large interruptions for EAL thread running on isol core

2022-06-24 Thread Antonio Di Bacco
Thank you, didn't know that "System management units" could steal
CPU!!! That is scary

On Thu, Jun 23, 2022 at 8:42 PM Stephen Hemminger
 wrote:
>
> On Thu, 23 Jun 2022 20:03:02 +0200
> Antonio Di Bacco  wrote:
>
> > I'm running a DPDK thread on an isolated core. I also set some  flags
> > that could help keeping the core at rest on linux like: nosoftlockup
> > nohz_full rcu_nocbs irqaffinity.
> >
> > Unfortunately the thread gets some interruptions that stop the thread
> > for about 20-30 micro seconds. This seems smal but my application
> > suffers a lot.
> >
> > I also tried to use  rte_thread_set_priority that indeed has a strong
> > effect but unfortunately creates problems to Linux (like network not
> > working).
> >
> > Is there any other knob that could help running the DPDK thread with
> > minimum or no interruptions at all?
>
> Look with perf and see what is happening.
> First check for interrupt affinity.
> Don't try real time priority.
>
> The other thing to look for would be any BIOS settings.
> Some system management units can take away CPU silently for polling
> some internal housekeeping.


Re: Large interruptions for EAL thread running on isol core

2022-06-24 Thread Antonio Di Bacco
Yes, Hyperthreading is disabled.

On Thu, Jun 23, 2022 at 8:07 PM Omer Yamac  wrote:
>
> Hi Antonio,
>
> Did you try to disable logical cores on CPU?
>
> On 23.06.2022 21:03, Antonio Di Bacco wrote:
> > I'm running a DPDK thread on an isolated core. I also set some  flags
> > that could help keeping the core at rest on linux like: nosoftlockup
> > nohz_full rcu_nocbs irqaffinity.
> >
> > Unfortunately the thread gets some interruptions that stop the thread
> > for about 20-30 micro seconds. This seems smal but my application
> > suffers a lot.
> >
> > I also tried to use  rte_thread_set_priority that indeed has a strong
> > effect but unfortunately creates problems to Linux (like network not
> > working).
> >
> > Is there any other knob that could help running the DPDK thread with
> > minimum or no interruptions at all?


Large VF not supported on most of my Intel cards

2022-06-24 Thread Antonio Di Bacco
I have several different Intel ethernet cards. On each one I try the
following steps:

1) Enable 4 SRIOVs on one port
2) Bind the first VF with DPDK (sudo dpdk-devbind.py --bind vfio-pci
:xx:yy:z)
3) Launch a DPDK application that uses the VF and then calls:

rte_eth_dev_configure(port,
  0, /* nb rx queue */
  64, /* nb tx queue */
  ) < 0);

rte_eth_dev_set_mtu(port, 9600);

for (i = 0; i < NB_TX_QUEUES; i++)
rte_eth_tx_queue_setup(port,
i,
4096,  /* queue size */
cpus->numacore,
);

I always get a "Large VF not supported error" for these cards:
* XXV710 for 25GbE SFP28  (firmware-version: 8.70 0x8000c3eb 1.3179.0)
* X710/X557-AT 10GBASE-T

but these cards ARE working fine:
* 82599ES 10-Gigabit SFI/SFP+
* E810-C for QSFP

Consider I installed latest intel drivers on my LInux box:
* i40e-2.19.3
* iavf-4.4.2.1
* ice-1.8.9


Best regards


Large interruptions for EAL thread running on isol core

2022-06-23 Thread Antonio Di Bacco
I'm running a DPDK thread on an isolated core. I also set some  flags
that could help keeping the core at rest on linux like: nosoftlockup
nohz_full rcu_nocbs irqaffinity.

Unfortunately the thread gets some interruptions that stop the thread
for about 20-30 micro seconds. This seems smal but my application
suffers a lot.

I also tried to use  rte_thread_set_priority that indeed has a strong
effect but unfortunately creates problems to Linux (like network not
working).

Is there any other knob that could help running the DPDK thread with
minimum or no interruptions at all?


Purpose of memory channels option (-n) in DPDK

2022-05-25 Thread Antonio Di Bacco
If the processor spreads contiguous memory accesses onto different
channels in hardware thanks to the memory controller(s) that handles
multiple channels, then, what is the purpose of -n option in DPDK? Is
it not  overabundant ?


Re: Optimizing memory access with DPDK allocated memory

2022-05-25 Thread Antonio Di Bacco
Wonderful tool, now it is completely clear, I do not have a bottleneck
on the DDR but on the core to DDR interface.

Single core results:
Command line parameters: mlc --max_bandwidth -H -k3
ALL Reads:   9239.05
3:1 Reads-Writes :  13348.68
2:1 Reads-Writes :  14360.44
1:1 Reads-Writes :  13792.73


Two cores:
Command line parameters: mlc --max_bandwidth -H -k3-4
ALL Reads: 24666.55
3:1 Reads-Writes :  30905.30
2:1 Reads-Writes :  32256.26
1:1 Reads-Writes :  37349.44


Eight cores:
Command line parameters: mlc --max_bandwidth -H -k3-10
ALL Reads: 78109.94
3:1 Reads-Writes :  62105.06
2:1 Reads-Writes :  59628.81
1:1 Reads-Writes :  55320.34



On Wed, May 25, 2022 at 12:55 PM Kinsella, Ray  wrote:
>
> Hi Antonio,
>
> If it is an Intel Platform you are using.
> You can take a look at the Intel Memory Latency Checker.
> https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html
>
> (don't be fooled by the name, it does measure bandwidth).
>
> Ray K
>
> -Original Message-
> From: Antonio Di Bacco 
> Sent: Wednesday 25 May 2022 08:30
> To: Stephen Hemminger 
> Cc: users@dpdk.org
> Subject: Re: Optimizing memory access with DPDK allocated memory
>
> Just to add some more info that could possibly be useful to someone.
> Even if a processor has many memory channels; there is also another parameter 
> to take into consideration, a given "core" cannot exploit all the memory 
> bandwidth available.
> For example for a DDR4 2933 MT/s with 4 channels:
> the memory bandwidth is   2933 X 8 (# of bytes of width) X 4 (# of
> channels) = 93,866.88 MB/s bandwidth, or 94 GB/s but a single core (according 
> to my tests with DPDK process writing a 1GB hugepage) is about 12 GB/s (with 
> a block size exceeding the L3 cache size).
>
> Can anyone confirm that ?
>
> On Mon, May 23, 2022 at 3:16 PM Antonio Di Bacco  
> wrote:
> >
> > Got feedback from a guy working on HPC with DPDK and he told me that
> > with dpdk mem-test (don't know where to find it) I should be doing
> > 16GB/s with DDR4 (2666) per channel. In my case with 6 channels I
> > should be doing 90GB/s  that would be amazing!
> >
> > On Sat, May 21, 2022 at 11:42 AM Antonio Di Bacco
> >  wrote:
> > >
> > > I read a couple of articles
> > > (https://www.thomas-krenn.com/en/wiki/Optimize_memory_performance_of
> > > _Intel_Xeon_Scalable_systems?xtxsearchselecthit=1
> > > and this
> > > https://www.exxactcorp.com/blog/HPC/balance-memory-guidelines-for-in
> > > tel-xeon-scalable-family-processors)
> > > and I understood a little bit more.
> > >
> > > If the XEON memory controller is able to spread contiguous memory
> > > accesses onto different channels in hardware (as Stepphen correctly
> > > stated), then, how DPDK with option -n can benefit an application?
> > > I also coded a test application to write a 1GB hugepage and
> > > calculate time needed but, equipping an additional two DIMM on two
> > > unused channels of my available six channels motherboard (X11DPi-NT)
> > > , I didn't observe any improvement. This is strange because adding
> > > two channels to the 4 already equipped should make a noticeable
> > > difference.
> > >
> > > For reference this is the small program for allocating and writing memory.
> > > https://github.com/adibacco/simple_mp_mem_2
> > > and the results with 4 memory channels:
> > > https://docs.google.com/spreadsheets/d/1mDoKYLMhMMKDaOS3RuGEnpPgRNKu
> > > ZOy4lMIhG-1N7B8/edit?usp=sharing
> > >
> > >
> > > On Fri, May 20, 2022 at 5:48 PM Stephen Hemminger
> > >  wrote:
> > > >
> > > > On Fri, 20 May 2022 10:34:46 +0200 Antonio Di Bacco
> > > >  wrote:
> > > >
> > > > > Let us say I have two memory channels each one with its own 16GB
> > > > > memory module, I suppose the first memory channel will be used
> > > > > when addressing physical memory in the range 0 to 0x4  
> > > > > and the second when addressing physical memory in the range 0x4  
> > > > >  to  0x7  .
> > > > > Correct?
> > > > > Now, I need to have a 2GB buffer with one "writer" and one
> > > > > "reader", the writer writes on half of the buffer (call it A)
> > > > > and, in the meantime, the reader reads on the other half (B).
> > > > > When the writer finishes writing its half buffe

Re: Optimizing memory access with DPDK allocated memory

2022-05-25 Thread Antonio Di Bacco
Just to add some more info that could possibly be useful to someone.
Even if a processor has many memory channels; there is also another
parameter to take into consideration, a given "core" cannot exploit
all the memory bandwidth available.
For example for a DDR4 2933 MT/s with 4 channels:
the memory bandwidth is   2933 X 8 (# of bytes of width) X 4 (# of
channels) = 93,866.88 MB/s bandwidth, or 94 GB/s but a single core
(according to my tests with DPDK process writing a 1GB hugepage) is
about 12 GB/s (with a block size exceeding the L3 cache size).

Can anyone confirm that ?

On Mon, May 23, 2022 at 3:16 PM Antonio Di Bacco  wrote:
>
> Got feedback from a guy working on HPC with DPDK and he told me that
> with dpdk mem-test (don't know where to find it) I should be doing
> 16GB/s with DDR4 (2666) per channel. In my case with 6 channels I
> should be doing 90GB/s  that would be amazing!
>
> On Sat, May 21, 2022 at 11:42 AM Antonio Di Bacco
>  wrote:
> >
> > I read a couple of articles
> > (https://www.thomas-krenn.com/en/wiki/Optimize_memory_performance_of_Intel_Xeon_Scalable_systems?xtxsearchselecthit=1
> > and this 
> > https://www.exxactcorp.com/blog/HPC/balance-memory-guidelines-for-intel-xeon-scalable-family-processors)
> > and I understood a little bit more.
> >
> > If the XEON memory controller is able to spread contiguous memory
> > accesses onto different channels in hardware (as Stepphen correctly
> > stated), then, how DPDK with option -n can benefit an application?
> > I also coded a test application to write a 1GB hugepage and calculate
> > time needed but, equipping an additional two DIMM on two unused
> > channels of my available six channels motherboard (X11DPi-NT) , I
> > didn't observe any improvement. This is strange because adding two
> > channels to the 4 already equipped should make a noticeable
> > difference.
> >
> > For reference this is the small program for allocating and writing memory.
> > https://github.com/adibacco/simple_mp_mem_2
> > and the results with 4 memory channels:
> > https://docs.google.com/spreadsheets/d/1mDoKYLMhMMKDaOS3RuGEnpPgRNKuZOy4lMIhG-1N7B8/edit?usp=sharing
> >
> >
> > On Fri, May 20, 2022 at 5:48 PM Stephen Hemminger
> >  wrote:
> > >
> > > On Fri, 20 May 2022 10:34:46 +0200
> > > Antonio Di Bacco  wrote:
> > >
> > > > Let us say I have two memory channels each one with its own 16GB memory
> > > > module, I suppose the first memory channel will be used when addressing
> > > > physical memory in the range 0 to 0x4   and the second when
> > > > addressing physical memory in the range 0x4   to  0x7  .
> > > > Correct?
> > > > Now, I need to have a 2GB buffer with one "writer" and one "reader", the
> > > > writer writes on half of the buffer (call it A) and, in the meantime, 
> > > > the
> > > > reader reads on the other half (B). When the writer finishes writing its
> > > > half buffer (A), signal it to the reader and they swap,  the reader 
> > > > starts
> > > > to read from A and writer starts to write to B.
> > > > If I allocate the whole buffer (on two 1GB hugepages) across the two 
> > > > memory
> > > > channels, one half of the buffer is allocated on the end of first 
> > > > channel
> > > > while the other half is allocated on the start of the second memory
> > > > channel, would this increase performances compared to the whole buffer
> > > > allocated within the same memory channel?
> > >
> > > Most systems just interleave memory chips based on number of filled slots.
> > > This is handled by BIOS before kernel even starts.
> > > The DPDK has a number of memory channels parameter and what it does
> > > is try and optimize memory allocation by spreading.
> > >
> > > Looks like you are inventing your own limited version of what memif does.


Re: Optimizing memory access with DPDK allocated memory

2022-05-23 Thread Antonio Di Bacco
Got feedback from a guy working on HPC with DPDK and he told me that
with dpdk mem-test (don't know where to find it) I should be doing
16GB/s with DDR4 (2666) per channel. In my case with 6 channels I
should be doing 90GB/s  that would be amazing!

On Sat, May 21, 2022 at 11:42 AM Antonio Di Bacco
 wrote:
>
> I read a couple of articles
> (https://www.thomas-krenn.com/en/wiki/Optimize_memory_performance_of_Intel_Xeon_Scalable_systems?xtxsearchselecthit=1
> and this 
> https://www.exxactcorp.com/blog/HPC/balance-memory-guidelines-for-intel-xeon-scalable-family-processors)
> and I understood a little bit more.
>
> If the XEON memory controller is able to spread contiguous memory
> accesses onto different channels in hardware (as Stepphen correctly
> stated), then, how DPDK with option -n can benefit an application?
> I also coded a test application to write a 1GB hugepage and calculate
> time needed but, equipping an additional two DIMM on two unused
> channels of my available six channels motherboard (X11DPi-NT) , I
> didn't observe any improvement. This is strange because adding two
> channels to the 4 already equipped should make a noticeable
> difference.
>
> For reference this is the small program for allocating and writing memory.
> https://github.com/adibacco/simple_mp_mem_2
> and the results with 4 memory channels:
> https://docs.google.com/spreadsheets/d/1mDoKYLMhMMKDaOS3RuGEnpPgRNKuZOy4lMIhG-1N7B8/edit?usp=sharing
>
>
> On Fri, May 20, 2022 at 5:48 PM Stephen Hemminger
>  wrote:
> >
> > On Fri, 20 May 2022 10:34:46 +0200
> > Antonio Di Bacco  wrote:
> >
> > > Let us say I have two memory channels each one with its own 16GB memory
> > > module, I suppose the first memory channel will be used when addressing
> > > physical memory in the range 0 to 0x4   and the second when
> > > addressing physical memory in the range 0x4   to  0x7  .
> > > Correct?
> > > Now, I need to have a 2GB buffer with one "writer" and one "reader", the
> > > writer writes on half of the buffer (call it A) and, in the meantime, the
> > > reader reads on the other half (B). When the writer finishes writing its
> > > half buffer (A), signal it to the reader and they swap,  the reader starts
> > > to read from A and writer starts to write to B.
> > > If I allocate the whole buffer (on two 1GB hugepages) across the two 
> > > memory
> > > channels, one half of the buffer is allocated on the end of first channel
> > > while the other half is allocated on the start of the second memory
> > > channel, would this increase performances compared to the whole buffer
> > > allocated within the same memory channel?
> >
> > Most systems just interleave memory chips based on number of filled slots.
> > This is handled by BIOS before kernel even starts.
> > The DPDK has a number of memory channels parameter and what it does
> > is try and optimize memory allocation by spreading.
> >
> > Looks like you are inventing your own limited version of what memif does.


Re: Optimizing memory access with DPDK allocated memory

2022-05-21 Thread Antonio Di Bacco
I read a couple of articles
(https://www.thomas-krenn.com/en/wiki/Optimize_memory_performance_of_Intel_Xeon_Scalable_systems?xtxsearchselecthit=1
and this 
https://www.exxactcorp.com/blog/HPC/balance-memory-guidelines-for-intel-xeon-scalable-family-processors)
and I understood a little bit more.

If the XEON memory controller is able to spread contiguous memory
accesses onto different channels in hardware (as Stepphen correctly
stated), then, how DPDK with option -n can benefit an application?
I also coded a test application to write a 1GB hugepage and calculate
time needed but, equipping an additional two DIMM on two unused
channels of my available six channels motherboard (X11DPi-NT) , I
didn't observe any improvement. This is strange because adding two
channels to the 4 already equipped should make a noticeable
difference.

For reference this is the small program for allocating and writing memory.
https://github.com/adibacco/simple_mp_mem_2
and the results with 4 memory channels:
https://docs.google.com/spreadsheets/d/1mDoKYLMhMMKDaOS3RuGEnpPgRNKuZOy4lMIhG-1N7B8/edit?usp=sharing


On Fri, May 20, 2022 at 5:48 PM Stephen Hemminger
 wrote:
>
> On Fri, 20 May 2022 10:34:46 +0200
> Antonio Di Bacco  wrote:
>
> > Let us say I have two memory channels each one with its own 16GB memory
> > module, I suppose the first memory channel will be used when addressing
> > physical memory in the range 0 to 0x4   and the second when
> > addressing physical memory in the range 0x4   to  0x7  .
> > Correct?
> > Now, I need to have a 2GB buffer with one "writer" and one "reader", the
> > writer writes on half of the buffer (call it A) and, in the meantime, the
> > reader reads on the other half (B). When the writer finishes writing its
> > half buffer (A), signal it to the reader and they swap,  the reader starts
> > to read from A and writer starts to write to B.
> > If I allocate the whole buffer (on two 1GB hugepages) across the two memory
> > channels, one half of the buffer is allocated on the end of first channel
> > while the other half is allocated on the start of the second memory
> > channel, would this increase performances compared to the whole buffer
> > allocated within the same memory channel?
>
> Most systems just interleave memory chips based on number of filled slots.
> This is handled by BIOS before kernel even starts.
> The DPDK has a number of memory channels parameter and what it does
> is try and optimize memory allocation by spreading.
>
> Looks like you are inventing your own limited version of what memif does.


DPDK:mlx5_common: No Verbs device matches PCI device

2022-05-20 Thread Antonio Di Bacco
Installed  Mellanox ConnectX5  as below, created two VFs but when starting
DPDK I see this error:

DPDK:EAL: Probe PCI driver: mlx5_pci (15b3:1018) device: :3b:00.3
(socket 0)
DPDK:mlx5_common: No Verbs device matches PCI device :3b:00.3, are
kernel drivers loaded?
DPDK:mlx5_common: Verbs device not found: :3b:00.3
DPDK:mlx5_common: Failed to initialize device context.

Drivers are loaded:

user@metis:~$ lsmod | grep ml
mlx5_ib   348160  0
ib_uverbs 155648  2 irdma,mlx5_ib
ib_core   368640  3 irdma,ib_uverbs,mlx5_ib
mlx5_core1400832  1 mlx5_ib
mlxfw  32768  1 mlx5_core
psample20480  1 mlx5_core
tls   102400  1 mlx5_core
pci_hyperv_intf16384  1 mlx5_core

and this is installation procedure.

sudo  ./MLNX_OFED_LINUX-5.6-1.0.3.3-ubuntu22.04-x86_64/mlnxofedinstall
--dpdk --upstream-libs

Installing ofed-scripts-5.6...
Installing mstflint-4.16.1...
Installing mlnx-tools-5.2.0...
Installing mlnx-ofed-kernel-utils-5.6...
Installing mlnx-ofed-kernel-dkms-5.6...

Installing rdma-core-56mlnx40...
Installing libibverbs1-56mlnx40...
Installing ibverbs-utils-56mlnx40...
Installing ibverbs-providers-56mlnx40...
Installing libibverbs-dev-56mlnx40...
Installing librdmacm1-56mlnx40...
Installing rdmacm-utils-56mlnx40...
Installing librdmacm-dev-56mlnx40...
Installing libibumad3-56mlnx40...
Installing ibacm-56mlnx40...
Installing python3-pyverbs-56mlnx40...
Selecting previously unselected package mlnx-fw-updater.
(Reading database ... 208346 files and directories currently installed.)
Preparing to unpack .../mlnx-fw-updater_5.6-1.0.3.3_amd64.deb ...
Unpacking mlnx-fw-updater (5.6-1.0.3.3) ...
Setting up mlnx-fw-updater (5.6-1.0.3.3) ...

Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

Initializing...
Attempting to perform Firmware update...
Querying Mellanox devices firmware ...
Querying Mellanox devices firmware ...

Device #1:
--

  Device Type:  ConnectX5
  Part Number:  MCX516A-CCA_Ax
  Description:  ConnectX-5 EN network interface card; 100GbE dual-port
QSFP28; PCIe3.0 x16; tall bracket; ROHS R6
  PSID: MT_12
  PCI Device Name:  3b:00.0
  Base GUID:b8599f0300b7cebe
  Base MAC: b8599fb7cebe
  Versions: CurrentAvailable
 FW 16.33.1048 16.33.1048
 PXE3.6.0502   3.6.0502
 UEFI   14.26.0017 14.26.0017

  Status:   Up to date


Optimizing memory access with DPDK allocated memory

2022-05-20 Thread Antonio Di Bacco
Let us say I have two memory channels each one with its own 16GB memory
module, I suppose the first memory channel will be used when addressing
physical memory in the range 0 to 0x4   and the second when
addressing physical memory in the range 0x4   to  0x7  .
Correct?
Now, I need to have a 2GB buffer with one "writer" and one "reader", the
writer writes on half of the buffer (call it A) and, in the meantime, the
reader reads on the other half (B). When the writer finishes writing its
half buffer (A), signal it to the reader and they swap,  the reader starts
to read from A and writer starts to write to B.
If I allocate the whole buffer (on two 1GB hugepages) across the two memory
channels, one half of the buffer is allocated on the end of first channel
while the other half is allocated on the start of the second memory
channel, would this increase performances compared to the whole buffer
allocated within the same memory channel?


Re: DPDK performances surprise

2022-05-19 Thread Antonio Di Bacco
This tool seems awesome!!! Better than VTUNE?

On Thu, May 19, 2022 at 10:29 AM Kinsella, Ray 
wrote:

> I’d say that is likely yes.
>
>
>
> FYI - pcm-memory is very handy tool for looking at memory traffic.
>
> https://github.com/opcm/pcm
>
>
>
> Thanks,
>
>
>
> Ray K
>
>
>
> *From:* Sanford, Robert 
> *Sent:* Wednesday 18 May 2022 17:53
> *To:* Antonio Di Bacco ; users@dpdk.org
> *Subject:* Re: DPDK performances surprise
>
>
>
> My guess is that most of the packet data has a short life in the L3 cache
> (before being overwritten by newer packets), but is never flushed to memory.
>
>
>
> *From: *Antonio Di Bacco 
> *Date: *Wednesday, May 18, 2022 at 12:40 PM
> *To: *"users@dpdk.org" 
> *Subject: *DPDK performances surprise
>
>
>
> I recently read a performance test where l2fwd was able to receive packets
> (8000B) from a 100 Gbps card, swap the L2 addresses and send them back to
> the same port to be received by an ethernet analyzer. The throughput
> achieved was close to 100 Gbps on a XEON machine (Intel(R) Xeon(R) Platinum
> 8176 CPU @ 2.10GHz) . This is the same processor I have and I know that, if
> I try to write around 8000B to the attached DDR4 (2666MT/s) on an allocated
> 1GB hugepage, I get a maximum throughput of around 20GB/s.
>
>
>
> Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets have
> to be written to the DDR and then read back to swap L2 addresses and this
> leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s and is
> more than the 20GB/s of available bandwidth on the DDR4.
>
>
>
> How can this be possible ?
>


Re: DPDK performances surprise

2022-05-19 Thread Antonio Di Bacco
The packets are 8000B long.

On Wed, May 18, 2022 at 7:04 PM Stephen Hemminger <
step...@networkplumber.org> wrote:

> On Wed, 18 May 2022 16:53:04 +
> "Sanford, Robert"  wrote:
>
> > My guess is that most of the packet data has a short life in the L3
> cache (before being overwritten by newer packets), but is never flushed to
> memory.
> >
> > From: Antonio Di Bacco 
> > Date: Wednesday, May 18, 2022 at 12:40 PM
> > To: "users@dpdk.org" 
> > Subject: DPDK performances surprise
> >
> > I recently read a performance test where l2fwd was able to receive
> packets (8000B) from a 100 Gbps card, swap the L2 addresses and send them
> back to the same port to be received by an ethernet analyzer. The
> throughput achieved was close to 100 Gbps on a XEON machine (Intel(R)
> Xeon(R) Platinum 8176 CPU @ 2.10GHz) . This is the same processor I have
> and I know that, if I try to write around 8000B to the attached DDR4
> (2666MT/s) on an allocated 1GB hugepage, I get a maximum throughput of
> around 20GB/s.
> >
> > Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets
> have to be written to the DDR and then read back to swap L2 addresses and
> this leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s
> and is more than the 20GB/s of available bandwidth on the DDR4.
> >
> > How can this be possible ?
>
> Likely cache effects from DDIO. What is your packet size.
>


DPDK performances surprise

2022-05-18 Thread Antonio Di Bacco
I recently read a performance test where l2fwd was able to receive packets
(8000B) from a 100 Gbps card, swap the L2 addresses and send them back to
the same port to be received by an ethernet analyzer. The throughput
achieved was close to 100 Gbps on a XEON machine (Intel(R) Xeon(R) Platinum
8176 CPU @ 2.10GHz) . This is the same processor I have and I know that, if
I try to write around 8000B to the attached DDR4 (2666MT/s) on an allocated
1GB hugepage, I get a maximum throughput of around 20GB/s.

Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets have
to be written to the DDR and then read back to swap L2 addresses and this
leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s and is
more than the 20GB/s of available bandwidth on the DDR4.

How can this be possible ?


Re: AMD EPYC 7713 64-Core: cannot enable turbo boost

2022-04-28 Thread Antonio Di Bacco
Right now we are using an Intel XEON and to be able to enable turbo boost
we had to change a setting (Power performance tuning) in the BIOS sticking
it to "OS control EPB". Is there something to do in the BIOS of EPYC?



Il giorno mar 26 apr 2022 alle ore 18:25 Antonio Di Bacco <
a.dibacco...@gmail.com> ha scritto:

> That's fantastic:
>
> # sudo cpupower -c 99 frequency-info info
> analyzing CPU 99:
>   driver: acpi-cpufreq
>   CPUs which run at the same hardware frequency: 99
>   CPUs which need to have their frequency coordinated by software: 99
>   maximum transition latency:  Cannot determine or is not supported.
>   hardware limits: 1.50 GHz - 3.72 GHz
>   available frequency steps:  2.00 GHz, 1.70 GHz, 1.50 GHz
>   available cpufreq governors: conservative ondemand userspace powersave
> performance schedutil
>   current policy: frequency should be within 1.50 GHz and 3.72 GHz.
>   The governor "userspace" may decide which speed to use
>   within this range.
>   current CPU frequency: 2.00 GHz (asserted by call to hardware)
>   boost state support:
> Supported: yes
> Active: yes
> Boost States: 0
> Total States: 3
> Pstate-P0:  2000MHz
> Pstate-P1:  1700MHz
> Pstate-P2:  1500MHz
>
> Il giorno mar 26 apr 2022 alle ore 12:00 Tummala, Sivaprasad <
> sivaprasad.tumm...@amd.com> ha scritto:
>
>> [AMD Official Use Only - General]
>>
>>
>>
>> Hi Antonio,
>>
>>
>>
>> We are looking into this. Could you please share the below info:
>>
>> # cpupower -c  frequency-info info
>>
>> analyzing CPU :
>>
>>   driver: acpi-cpufreq
>>
>>
>>
>>
>>
>> *From:* Antonio Di Bacco 
>> *Sent:* Monday, April 25, 2022 10:43 PM
>> *To:* users@dpdk.org
>> *Subject:* AMD EPYC 7713 64-Core: cannot enable turbo boost
>>
>>
>>
>> [CAUTION: External Email]
>>
>> Trying to enable turbo boost on EPYC 7713 with this APIS:
>>
>>
>>
>>   rte_power_init(lcore)
>>   rte_power_freq_enable_turbo(lcore)
>>  rte_power_freq_max(lcore)
>>
>>
>>
>>  I receive this messages:
>>
>>
>>
>> DPDK:POWER: Env isn't set yet!
>> DPDK:POWER: Attempting to initialise ACPI cpufreq power management...
>> DPDK:POWER: Power management governor of lcore 99 has been set to
>> 'userspace' successfully
>> DPDK:POWER: Initialized successfully for lcore 99 power management
>> DPDK:POWER: Failed to enable turbo on lcore 99
>>
>


Re: AMD EPYC 7713 64-Core: cannot enable turbo boost

2022-04-26 Thread Antonio Di Bacco
That's fantastic:

# sudo cpupower -c 99 frequency-info info
analyzing CPU 99:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 99
  CPUs which need to have their frequency coordinated by software: 99
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 1.50 GHz - 3.72 GHz
  available frequency steps:  2.00 GHz, 1.70 GHz, 1.50 GHz
  available cpufreq governors: conservative ondemand userspace powersave
performance schedutil
  current policy: frequency should be within 1.50 GHz and 3.72 GHz.
  The governor "userspace" may decide which speed to use
  within this range.
  current CPU frequency: 2.00 GHz (asserted by call to hardware)
  boost state support:
Supported: yes
Active: yes
Boost States: 0
Total States: 3
Pstate-P0:  2000MHz
Pstate-P1:  1700MHz
Pstate-P2:  1500MHz

Il giorno mar 26 apr 2022 alle ore 12:00 Tummala, Sivaprasad <
sivaprasad.tumm...@amd.com> ha scritto:

> [AMD Official Use Only - General]
>
>
>
> Hi Antonio,
>
>
>
> We are looking into this. Could you please share the below info:
>
> # cpupower -c  frequency-info info
>
> analyzing CPU :
>
>   driver: acpi-cpufreq
>
>
>
>
>
> *From:* Antonio Di Bacco 
> *Sent:* Monday, April 25, 2022 10:43 PM
> *To:* users@dpdk.org
> *Subject:* AMD EPYC 7713 64-Core: cannot enable turbo boost
>
>
>
> [CAUTION: External Email]
>
> Trying to enable turbo boost on EPYC 7713 with this APIS:
>
>
>
>   rte_power_init(lcore)
>   rte_power_freq_enable_turbo(lcore)
>  rte_power_freq_max(lcore)
>
>
>
>  I receive this messages:
>
>
>
> DPDK:POWER: Env isn't set yet!
> DPDK:POWER: Attempting to initialise ACPI cpufreq power management...
> DPDK:POWER: Power management governor of lcore 99 has been set to
> 'userspace' successfully
> DPDK:POWER: Initialized successfully for lcore 99 power management
> DPDK:POWER: Failed to enable turbo on lcore 99
>


AMD EPYC 7713 64-Core: cannot enable turbo boost

2022-04-25 Thread Antonio Di Bacco
Trying to enable turbo boost on EPYC 7713 with this APIS:

  rte_power_init(lcore)
  rte_power_freq_enable_turbo(lcore)
 rte_power_freq_max(lcore)

 I receive this messages:

DPDK:POWER: Env isn't set yet!
DPDK:POWER: Attempting to initialise ACPI cpufreq power management...
DPDK:POWER: Power management governor of lcore 99 has been set to
'userspace' successfully
DPDK:POWER: Initialized successfully for lcore 99 power management
DPDK:POWER: Failed to enable turbo on lcore 99


Re: AMD NPS bios change is is not reflected onto NUMA domains detected by DPDK 21.11

2022-04-22 Thread Antonio Di Bacco
This was it !!!

Thanks.

Il giorno ven 22 apr 2022 alle ore 14:49 Tummala, Sivaprasad <
sivaprasad.tumm...@amd.com> ha scritto:

> [AMD Official Use Only]
>
>
>
> Hi Antonio,
>
>
>
> You need to change the BIOS parameter “L3 Cache as NUMA Domain” to
> “Disable”.
>
> With this, each L3 cache will not be reported as a NUMA domain (/NUMA
> node).
>
>
>
> Thank you!
>
>
>
> *From:* Antonio Di Bacco 
> *Sent:* Friday, April 22, 2022 10:22 AM
> *To:* users@dpdk.org
> *Subject:* AMD NPS bios change is is not reflected onto NUMA domains
> detected by DPDK 21.11
>
>
>
> Using an  AMD EPYC 7713 64-Core Processor (there are two on the server)
> for a total of 128 cores.
>
>
>
> If I change the NPS (numa per socket) in BIOS and power cycle the system
> and the I launch a DPDK application, I always get this message:
>
>
>
> EAL: Detected CPU lcores: 128
> EAL: Detected NUMA nodes: 16
>
>
>
> NPS is able to change the number of NUMA but DPDK always detects 16 NUMA
> domains.
>
> I tried with NPS1, NPS2, NPS4.
>
>
>
>
>
>
>


AMD NPS bios change is is not reflected onto NUMA domains detected by DPDK 21.11

2022-04-22 Thread Antonio Di Bacco
Using an  AMD EPYC 7713 64-Core Processor (there are two on the server) for
a total of 128 cores.

If I change the NPS (numa per socket) in BIOS and power cycle the system
and the I launch a DPDK application, I always get this message:

EAL: Detected CPU lcores: 128
EAL: Detected NUMA nodes: 16

NPS is able to change the number of NUMA but DPDK always detects 16 NUMA
domains.
I tried with NPS1, NPS2, NPS4.


Re: Shared memory between two primary DPDK processes

2022-04-18 Thread Antonio Di Bacco
Another info to add:

The process that allocates the 1GB page has this map:
antodib@Ubuntu-20.04-5:: /proc> sudo cat /proc/27812/maps | grep huge
14000-18000 rw-s  00:46 97193
 /dev/huge1G/rtemap_0

while the process that maps the 1GB page (--file-prefix p2) has this maps,
is stealing a new page?
antodib@Ubuntu-20.04-5:: /proc> sudo cat /proc/27906/maps | grep huge
14000-18000 rw-s  00:46 113170
/dev/huge1G/p2map_0
7f7bc000-7f7c rw-s  00:46 97193
 /dev/huge1G/rtemap_0

Il giorno lun 18 apr 2022 alle ore 19:34 Antonio Di Bacco <
a.dibacco...@gmail.com> ha scritto:

> At the end I tried the pidfd_getfd syscall that is working really fine and
> giving me back a "clone" fd of an fd in that was opened from another
> process. I tested it opening a text file in the first process  and after
> cloning the fd , I could really read the file also in the second process.
> Now the weird thing:
> 1) In the first process I allocate- a huge page, then get the fd
> 2) In the second process I get my "clone" fd and do an mmap, it works but
> if I write on that memory, the first process cannot see what I wrote
>
> int second_process(int remote_pid, int remote_mem_fd) {
>
> printf("remote_pid %d remote_mem_fd %d\n", remote_pid,
> remote_mem_fd);
> int pidfd = syscall(__NR_pidfd_open, remote_pid, 0);
>
> int my_mem_fd = syscall(438, pidfd, remote_mem_fd, 0);
> printf("my_mem_fd %d\n", my_mem_fd);   // This is nice
>
> int flags = MAP_SHARED | MAP_HUGETLB | (30 << MAP_HUGE_SHIFT);
> uint64_t* addr = (uint64_t*) mmap(NULL, 1024 * 1024 * 1024,
> PROT_READ|PROT_WRITE, flags, my_mem_fd, 0);
> if (addr == -1)
>     perror("mmap");
> *addr = 0x0101010102020202;
> }
>
>
> Il giorno gio 14 apr 2022 alle ore 21:51 Antonio Di Bacco <
> a.dibacco...@gmail.com> ha scritto:
>
>>
>>
>> Il giorno gio 14 apr 2022 alle ore 21:01 Dmitry Kozlyuk <
>> dmitry.kozl...@gmail.com> ha scritto:
>>
>>> 2022-04-14 10:20 (UTC+0200), Antonio Di Bacco:
>>> [...]
>>> > Ok, after having a look to memif I managed to exchange the fd  between
>>> the
>>> > two processes and it works.
>>> > Anyway the procedure seems a little bit clunky and I think I'm going
>>> to use
>>> > the new SYSCALL pidfd_getfd
>>> > to achieve the same result.  In your opinion this method (getfd_pidfd)
>>> > could also work if the two DPDK processes
>>> > are inside different docker containers?
>>>
>>> Honestly, I've just learned about pidfd_getfd() from you.
>>> But I know that containers use PID namespaces, so there's a question
>>> how you will obtain the pidfd of a process in another container.
>>>
>>> In general, any method of sharing FD will work.
>>> Remember that you also need offset and size.
>>> Given that some channel is required to share those,
>>> I think Unix domain socket is still the preferred way.
>>>
>>> > Or is there another mechanims like using handles to hugepages present
>>> in
>>> > the filesystem to share between two
>>> > different containers?
>>>
>>> FD is needed for mmap().
>>> You need to either pass the FD or open() the same hugepage file by path.
>>> I advise against using paths because they are not a part of DPDK API
>>> contract.
>>>
>>
>> Thank you very much Dmitry, your answers are always enlightening.
>> I'm going to ask a different question on the dpdk.org about the best
>> practice to share memory between two dpdk processes running in different
>> containers.
>>
>


Re: Shared memory between two primary DPDK processes

2022-04-18 Thread Antonio Di Bacco
At the end I tried the pidfd_getfd syscall that is working really fine and
giving me back a "clone" fd of an fd in that was opened from another
process. I tested it opening a text file in the first process  and after
cloning the fd , I could really read the file also in the second process.
Now the weird thing:
1) In the first process I allocate- a huge page, then get the fd
2) In the second process I get my "clone" fd and do an mmap, it works but
if I write on that memory, the first process cannot see what I wrote

int second_process(int remote_pid, int remote_mem_fd) {

printf("remote_pid %d remote_mem_fd %d\n", remote_pid,
remote_mem_fd);
int pidfd = syscall(__NR_pidfd_open, remote_pid, 0);

int my_mem_fd = syscall(438, pidfd, remote_mem_fd, 0);
printf("my_mem_fd %d\n", my_mem_fd);   // This is nice

int flags = MAP_SHARED | MAP_HUGETLB | (30 << MAP_HUGE_SHIFT);
uint64_t* addr = (uint64_t*) mmap(NULL, 1024 * 1024 * 1024,
PROT_READ|PROT_WRITE, flags, my_mem_fd, 0);
if (addr == -1)
perror("mmap");
*addr = 0x0101010102020202;
}


Il giorno gio 14 apr 2022 alle ore 21:51 Antonio Di Bacco <
a.dibacco...@gmail.com> ha scritto:

>
>
> Il giorno gio 14 apr 2022 alle ore 21:01 Dmitry Kozlyuk <
> dmitry.kozl...@gmail.com> ha scritto:
>
>> 2022-04-14 10:20 (UTC+0200), Antonio Di Bacco:
>> [...]
>> > Ok, after having a look to memif I managed to exchange the fd  between
>> the
>> > two processes and it works.
>> > Anyway the procedure seems a little bit clunky and I think I'm going to
>> use
>> > the new SYSCALL pidfd_getfd
>> > to achieve the same result.  In your opinion this method (getfd_pidfd)
>> > could also work if the two DPDK processes
>> > are inside different docker containers?
>>
>> Honestly, I've just learned about pidfd_getfd() from you.
>> But I know that containers use PID namespaces, so there's a question
>> how you will obtain the pidfd of a process in another container.
>>
>> In general, any method of sharing FD will work.
>> Remember that you also need offset and size.
>> Given that some channel is required to share those,
>> I think Unix domain socket is still the preferred way.
>>
>> > Or is there another mechanims like using handles to hugepages present in
>> > the filesystem to share between two
>> > different containers?
>>
>> FD is needed for mmap().
>> You need to either pass the FD or open() the same hugepage file by path.
>> I advise against using paths because they are not a part of DPDK API
>> contract.
>>
>
> Thank you very much Dmitry, your answers are always enlightening.
> I'm going to ask a different question on the dpdk.org about the best
> practice to share memory between two dpdk processes running in different
> containers.
>


Best practice to share memory between two DPDK processes in different docker containers

2022-04-14 Thread Antonio Di Bacco
I imagine that there should be a way of sharing memory between two DPDK
processes running in different docker containers. Probably the two DPDK
processes could mmap the same hugepage? Probably we could pass pid and fd
from one process to the other and (bypassing pid namespaces) be able to
mmap the same memory?
I wonder if DPDK architects thought about this scenario.


Re: Shared memory between two primary DPDK processes

2022-04-14 Thread Antonio Di Bacco
Il giorno gio 14 apr 2022 alle ore 21:01 Dmitry Kozlyuk <
dmitry.kozl...@gmail.com> ha scritto:

> 2022-04-14 10:20 (UTC+0200), Antonio Di Bacco:
> [...]
> > Ok, after having a look to memif I managed to exchange the fd  between
> the
> > two processes and it works.
> > Anyway the procedure seems a little bit clunky and I think I'm going to
> use
> > the new SYSCALL pidfd_getfd
> > to achieve the same result.  In your opinion this method (getfd_pidfd)
> > could also work if the two DPDK processes
> > are inside different docker containers?
>
> Honestly, I've just learned about pidfd_getfd() from you.
> But I know that containers use PID namespaces, so there's a question
> how you will obtain the pidfd of a process in another container.
>
> In general, any method of sharing FD will work.
> Remember that you also need offset and size.
> Given that some channel is required to share those,
> I think Unix domain socket is still the preferred way.
>
> > Or is there another mechanims like using handles to hugepages present in
> > the filesystem to share between two
> > different containers?
>
> FD is needed for mmap().
> You need to either pass the FD or open() the same hugepage file by path.
> I advise against using paths because they are not a part of DPDK API
> contract.
>

Thank you very much Dmitry, your answers are always enlightening.
I'm going to ask a different question on the dpdk.org about the best
practice to share memory between two dpdk processes running in different
containers.


Re: Shared memory between two primary DPDK processes

2022-04-14 Thread Antonio Di Bacco
Il giorno lun 11 apr 2022 alle ore 19:30 Dmitry Kozlyuk <
dmitry.kozl...@gmail.com> ha scritto:

> 2022-04-11 15:03 (UTC+0200), Antonio Di Bacco:
> > I did a short program where a  primary (--file-prefix=p1) allocates a
> > memzone and generates a file descriptor that is passed to another primary
> > (--file-prefix=p2) .
> > The process P2 tries to mmap the memory but I get an error (Bad file
> > descriptor):
> >
> > uint64_t* addr = mmap(NULL, 1024*1024*1024, PROT_READ, flags,
> > mem_fd, 0);
> > if (addr == -1)
> > perror("mmap");
>
> How do you pass the FD?
> Memif does the same thing under the hood, so you should be able too.
>


Ok, after having a look to memif I managed to exchange the fd  between the
two processes and it works.
Anyway the procedure seems a little bit clunky and I think I'm going to use
the new SYSCALL pidfd_getfd
to achieve the same result.  In your opinion this method (getfd_pidfd)
could also work if the two DPDK processes
are inside different docker containers?
Or is there another mechanims like using handles to hugepages present in
the filesystem to share between two
different containers?


Re: Shared memory between two primary DPDK processes

2022-04-11 Thread Antonio Di Bacco
I did a short program where a  primary (--file-prefix=p1) allocates a
memzone and generates a file descriptor that is passed to another primary
(--file-prefix=p2) .
The process P2 tries to mmap the memory but I get an error (Bad file
descriptor):

uint64_t* addr = mmap(NULL, 1024*1024*1024, PROT_READ, flags,
mem_fd, 0);
if (addr == -1)
perror("mmap");

Il giorno ven 8 apr 2022 alle ore 23:08 Antonio Di Bacco <
a.dibacco...@gmail.com> ha scritto:

>
>
> Il giorno ven 8 apr 2022 alle ore 15:26 Dmitry Kozlyuk <
> dmitry.kozl...@gmail.com> ha scritto:
>
>> 2022-04-08 14:31 (UTC+0200), Antonio Di Bacco:
>> > I know that it is possible to share memory between a primary and
>> secondary
>> > process using rte_memzone_reserve_aligned to allocate memory in primary
>> > that is "seen" also by the secondary. If we have two primary processes
>> > (started with different file-prefix) the same approach is not feasible.
>> I
>> > wonder how to share a chunk of memory hosted on a hugepage between two
>> > primaries.
>> >
>> > Regards.
>>
>> Hi Antonio,
>>
>> Correction: all hugepages allocated by DPDK are shared
>> between primary and secondary processes, not only memzones.
>>
>> I assume we're talking about processes within one host,
>> because your previous similar question was about sharing memory between
>> hosts
>> (as we have discussed offline), which is out of scope for DPDK.
>>
>> As for the question directly, you need to map the same part of the same
>> file
>> in the second primary as the hugepage is mapped from in the first primary.
>> I don't recommend to work with file paths, because their management
>> is not straightforward (--single-file-segments, for one) and is
>> undocumented.
>>
>> There is a way to share DPDK memory segment file descriptors.
>> Although public, this DPDK API is dangerous in the sense that you must
>> clearly understand what you're doing and how DPDK works.
>> Hence the question: what is the task you need this sharing for?
>> Maybe there is a simpler way.
>>
>> 1. In the first primary:
>>
>> mz = rte_memzone_reserve()
>> ms = rte_mem_virt2memseg(mz->addr)
>> fd = rte_memseg_get_fd(ms)
>> offset = rte_memseg_get_fd_offset(ms)
>>
>> 2. Use Unix domain sockets with SCM_RIGHTS
>>to send "fd" and "offset" to the second primary.
>>
>> 3. In the second primary, after receiving "fd" and "offset":
>>
>> flags = MAP_SHARED | MAP_HUGE | (30 << MAP_HUGE_SHIFT)
>> addr = mmap(fd, offset, flags)
>>
>> Note that "mz" may consist of multiple "ms" depending on the sizes
>> of the zone and hugepages, and on the zone alignment.
>> Also "addr" may (and probably will) differ from "mz->addr".
>> It is possible to pass "mz->addr" and try to force it,
>> like DPDK does for primary/secondary.
>>
>
>
> Thank you Dmitry, it is really incredible how deep your knowledge is. I
> will give it a try.
>


Re: Shared memory between two primary DPDK processes

2022-04-08 Thread Antonio Di Bacco
Il giorno ven 8 apr 2022 alle ore 16:36 Ferruh Yigit <
ferruh.yi...@xilinx.com> ha scritto:

> On 4/8/2022 2:26 PM, Dmitry Kozlyuk wrote:
> > CAUTION: This message has originated from an External Source. Please use
> proper judgment and caution when opening attachments, clicking links, or
> responding to this email.
> >
> >
> > 2022-04-08 14:31 (UTC+0200), Antonio Di Bacco:
> >> I know that it is possible to share memory between a primary and
> secondary
> >> process using rte_memzone_reserve_aligned to allocate memory in primary
> >> that is "seen" also by the secondary. If we have two primary processes
> >> (started with different file-prefix) the same approach is not feasible.
> I
> >> wonder how to share a chunk of memory hosted on a hugepage between two
> >> primaries.
> >>
> >> Regards.
> >
> > Hi Antonio,
> >
> > Correction: all hugepages allocated by DPDK are shared
> > between primary and secondary processes, not only memzones.
> >
> > I assume we're talking about processes within one host,
> > because your previous similar question was about sharing memory between
> hosts
> > (as we have discussed offline), which is out of scope for DPDK.
> >
> > As for the question directly, you need to map the same part of the same
> file
> > in the second primary as the hugepage is mapped from in the first
> primary.
> > I don't recommend to work with file paths, because their management
> > is not straightforward (--single-file-segments, for one) and is
> undocumented.
> >
> > There is a way to share DPDK memory segment file descriptors.
> > Although public, this DPDK API is dangerous in the sense that you must
> > clearly understand what you're doing and how DPDK works.
> > Hence the question: what is the task you need this sharing for?
> > Maybe there is a simpler way.
> >
> > 1. In the first primary:
> >
> >  mz = rte_memzone_reserve()
> >  ms = rte_mem_virt2memseg(mz->addr)
> >  fd = rte_memseg_get_fd(ms)
> >  offset = rte_memseg_get_fd_offset(ms)
> >
> > 2. Use Unix domain sockets with SCM_RIGHTS
> > to send "fd" and "offset" to the second primary.
> >
> > 3. In the second primary, after receiving "fd" and "offset":
> >
> >  flags = MAP_SHARED | MAP_HUGE | (30 << MAP_HUGE_SHIFT)
> >  addr = mmap(fd, offset, flags)
> >
> > Note that "mz" may consist of multiple "ms" depending on the sizes
> > of the zone and hugepages, and on the zone alignment.
> > Also "addr" may (and probably will) differ from "mz->addr".
> > It is possible to pass "mz->addr" and try to force it,
> > like DPDK does for primary/secondary.
> >
>
> Also 'net/memif' driver can be used:
> https://doc.dpdk.org/guides/nics/memif.html


Yes, I know about memif. Our application is currently using a chunk of
shared memory, a primary process writes on it and a secondary reads from
it.  Now the secondary will become a primary, sort of a promotion, and
MEMIF would be fine but the paradigm should change a little bit compared to
a shared memory approach.
MEMIF is an interface over a shared memory, we would need the opposite, a
shared memory over a network interface.

Thank you.


Re: Shared memory between two primary DPDK processes

2022-04-08 Thread Antonio Di Bacco
Il giorno ven 8 apr 2022 alle ore 15:26 Dmitry Kozlyuk <
dmitry.kozl...@gmail.com> ha scritto:

> 2022-04-08 14:31 (UTC+0200), Antonio Di Bacco:
> > I know that it is possible to share memory between a primary and
> secondary
> > process using rte_memzone_reserve_aligned to allocate memory in primary
> > that is "seen" also by the secondary. If we have two primary processes
> > (started with different file-prefix) the same approach is not feasible. I
> > wonder how to share a chunk of memory hosted on a hugepage between two
> > primaries.
> >
> > Regards.
>
> Hi Antonio,
>
> Correction: all hugepages allocated by DPDK are shared
> between primary and secondary processes, not only memzones.
>
> I assume we're talking about processes within one host,
> because your previous similar question was about sharing memory between
> hosts
> (as we have discussed offline), which is out of scope for DPDK.
>
> As for the question directly, you need to map the same part of the same
> file
> in the second primary as the hugepage is mapped from in the first primary.
> I don't recommend to work with file paths, because their management
> is not straightforward (--single-file-segments, for one) and is
> undocumented.
>
> There is a way to share DPDK memory segment file descriptors.
> Although public, this DPDK API is dangerous in the sense that you must
> clearly understand what you're doing and how DPDK works.
> Hence the question: what is the task you need this sharing for?
> Maybe there is a simpler way.
>
> 1. In the first primary:
>
> mz = rte_memzone_reserve()
> ms = rte_mem_virt2memseg(mz->addr)
> fd = rte_memseg_get_fd(ms)
> offset = rte_memseg_get_fd_offset(ms)
>
> 2. Use Unix domain sockets with SCM_RIGHTS
>to send "fd" and "offset" to the second primary.
>
> 3. In the second primary, after receiving "fd" and "offset":
>
> flags = MAP_SHARED | MAP_HUGE | (30 << MAP_HUGE_SHIFT)
> addr = mmap(fd, offset, flags)
>
> Note that "mz" may consist of multiple "ms" depending on the sizes
> of the zone and hugepages, and on the zone alignment.
> Also "addr" may (and probably will) differ from "mz->addr".
> It is possible to pass "mz->addr" and try to force it,
> like DPDK does for primary/secondary.
>


Thank you Dmitry, it is really incredible how deep your knowledge is. I
will give it a try.


Shared memory between two primary DPDK processes

2022-04-08 Thread Antonio Di Bacco
I know that it is possible to share memory between a primary and secondary
process using rte_memzone_reserve_aligned to allocate memory in primary
that is "seen" also by the secondary. If we have two primary processes
(started with different file-prefix) the same approach is not feasible. I
wonder how to share a chunk of memory hosted on a hugepage between two
primaries.

Regards.


Re: Fastest and easiest method to share mem between two DPDK processes

2022-04-04 Thread Antonio Di Bacco
Il giorno lun 4 apr 2022 alle ore 18:10 Dmitry Kozlyuk <
dmitry.kozl...@gmail.com> ha scritto:

> 2022-04-04 16:04 (UTC+0200), Antonio Di Bacco:
> > I have a Primary (Pa) and a secondary (Sa). Pa allocates memory that
> shares
> > with Sa.
> > Now I also have another Primary (Pb). I need to allocate some memory in
> Pa
> > or Sa that has to be shared with (Pb) or the reverse. Is this a feasible
> > configuration?
>
> Please tell more about the use case.
> Do you want to share an arbitrary block or part of DPDK memory?
> What the shared memory contains and how it will be used w.r.t. DPDK?
>

The memory will contain some signal samples, we would like the samples to
be on a hugepage (1GB) shared between Sa and Pb.
Sa will write and Pb will read.


Fastest and easiest method to share mem between two DPDK processes

2022-04-04 Thread Antonio Di Bacco
I have a Primary (Pa) and a secondary (Sa). Pa allocates memory that shares
with Sa.
Now I also have another Primary (Pb). I need to allocate some memory in Pa
or Sa that has to be shared with (Pb) or the reverse. Is this a feasible
configuration?


MEMIF usage with dpdk-replay

2022-03-20 Thread Antonio Di Bacco
I'm using a dpdk-replay application (that reads from pcap and sends to a
port) and I'm passing the parameter --vdev net_memif in order that the
application sends packets to the memif PMD interface. Before launching
dpdk-replay I launch the testpmd like this:

dpdk-testpmd -l 4-5 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif

The dpdk-replay is using the following code to start the port but
unfortunately the link status is always down and rte_eth_tx_burst doesn't
send anything.

/* Configure for each port (ethernet device), the number of rx queues &
tx queues */
if (rte_eth_dev_configure(port,
  0, /* nb rx queue */
  NB_TX_QUEUES, /* nb tx queue */
  ) < 0) {
fprintf(stderr, "DPDK: RTE ETH Ethernet device configuration
failed\n");
return (-1);
}

/* Then allocate and set up the transmit queues for this Ethernet
device  */
for (i = 0; i < NB_TX_QUEUES; i++) {
ret = rte_eth_tx_queue_setup(port,
 i,
 TX_QUEUE_SIZE,
 cpus->numacore,
 );
if (ret < 0) {
fprintf(stderr, "DPDK: RTE ETH Ethernet device tx queue %i
setup failed: %s",
i, strerror(-ret));
return (ret);
}
}

/* Start the ethernet device */
if (rte_eth_dev_start(port) < 0) {
fprintf(stderr, "DPDK: RTE ETH Ethernet device start failed\n");
return (-1);
}

/* Get link status and display it. */
rte_eth_link_get(port, _link);
if (eth_link.link_status) {
printf(" Link up - speed %u Mbps - %s\n",
   eth_link.link_speed,
   (eth_link.link_duplex == ETH_LINK_FULL_DUPLEX) ?
   "full-duplex" : "half-duplex\n");
} else {
printf("Link down\n");
}


Re: Connecting two DPDK applications through fake virtual functions

2022-03-17 Thread Antonio Di Bacco
Really? I know that the application is using a PMD driver for Intel card,
will this driver work with memif too?



Il giorno gio 17 mar 2022 alle ore 17:44 Dmitry Kozlyuk <
dmitry.kozl...@gmail.com> ha scritto:

> 2022-03-17 16:55 (UTC+0100), Antonio Di Bacco:
> > Unfortunately I cannot change the applications but I only can create some
> > fake VFs and connect them with software.
> > Could OVS come to the rescue?
>
> You don't need to modify app code to communicate via shared memory:
> https://doc.dpdk.org/guides/nics/memif.html
>
> >
> > Il giorno gio 17 mar 2022 alle ore 14:27 Timothy Wood 
> ha
> > scritto:
> >
> > > One option is to modify the applications to use DPDK's multi-process
> > > support:
> https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html
> > > Essentially you would have one app read from the real port and then
> write
> > > data to a software queue in shared memory. Instead of having the
> second app
> > > read from a port it would read from the queue.
> > >
> > > If you want to build more elaborate combinations of functions, check
> out
> > > our OpenNetVM research project which focused on high performance NF
> > > chaining: http://sdnfv.github.io/onvm/
> > >
> > > ---
> > > Timothy Wood, Ph. D.
> > > he/him/his
> > > Associate Professor
> > > Department of Computer Science
> > > The George Washington University
> > > http://www.seas.gwu.edu/~timwood
> > >
> > >
> > > On Thu, Mar 17, 2022 at 5:29 AM Antonio Di Bacco <
> a.dibacco...@gmail.com>
> > > wrote:
> > >
> > >> I have two DPDK applications that are using virtual functions built on
> > >> top of two physical functions that correspond to the two ports of a
> 25 Gbps
> > >> ethernet card. The two physical ports are connected one to the other
> with
> > >> an optic fiber.
> > >> Now, I would like to realize the same setup but without using a
> physical
> > >> 25 Gbps card, I wonder if this is possible.
> > >>
> > >>
>
>


Re: Connecting two DPDK applications through fake virtual functions

2022-03-17 Thread Antonio Di Bacco
Unfortunately I cannot change the applications but I only can create some
fake VFs and connect them with software.
Could OVS come to the rescue?

Il giorno gio 17 mar 2022 alle ore 14:27 Timothy Wood  ha
scritto:

> One option is to modify the applications to use DPDK's multi-process
> support: https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html
> Essentially you would have one app read from the real port and then write
> data to a software queue in shared memory. Instead of having the second app
> read from a port it would read from the queue.
>
> If you want to build more elaborate combinations of functions, check out
> our OpenNetVM research project which focused on high performance NF
> chaining: http://sdnfv.github.io/onvm/
>
> ---
> Timothy Wood, Ph. D.
> he/him/his
> Associate Professor
> Department of Computer Science
> The George Washington University
> http://www.seas.gwu.edu/~timwood
>
>
> On Thu, Mar 17, 2022 at 5:29 AM Antonio Di Bacco 
> wrote:
>
>> I have two DPDK applications that are using virtual functions built on
>> top of two physical functions that correspond to the two ports of a 25 Gbps
>> ethernet card. The two physical ports are connected one to the other with
>> an optic fiber.
>> Now, I would like to realize the same setup but without using a physical
>> 25 Gbps card, I wonder if this is possible.
>>
>>


Connecting two DPDK applications through fake virtual functions

2022-03-17 Thread Antonio Di Bacco
I have two DPDK applications that are using virtual functions built on top
of two physical functions that correspond to the two ports of a 25 Gbps
ethernet card. The two physical ports are connected one to the other with
an optic fiber.
Now, I would like to realize the same setup but without using a physical 25
Gbps card, I wonder if this is possible.


DPDK on isolated cores but I still see interrupts

2022-03-01 Thread Antonio Di Bacco
I am trying to run a DPDK application on x86_64 Machine (Ubuntu 20.04) on
isolated cores.
I expected not to have interrupts on isolated cores but I still have a lot
of CAL (*Function call interrupts*) interrupts and LOC interrupts (*Local
timer interrupts*). Is there any setting in DPDK to stop those too?

Regards.


Re: Max size for rte_mempool_create

2022-02-09 Thread Antonio Di Bacco
Thanks! I already reserve huge pages from kernel command line . I reserve 6
1G hugepages. Is there any other reason for the ENOMEM?

On Wed, 9 Feb 2022 at 22:44, Stephen Hemminger 
wrote:

> On Wed, 9 Feb 2022 22:20:34 +0100
> Antonio Di Bacco  wrote:
>
> > I have a system with two numa sockets. Each numa socket has 8GB of RAM.
> > I reserve a total of 6 hugepages (1G).
> >
> > When I try to create a mempool (API rte_mempool_create) of 530432 mbufs
> > (each one with 9108 bytes) I get a ENOMEM error.
> >
> > In theory this mempool should be around 4.8GB and the hugepages are
> enough
> > to hold it.
> > Why is this failing ?
>
> This is likely becaus the hugepages have to be contiguous and
> the kernel has to that many free pages (especially true with 1G pages).
> Therefore it is recommended to
> configure and reserve huge pages on kernel command line during boot.
>


Max size for rte_mempool_create

2022-02-09 Thread Antonio Di Bacco
I have a system with two numa sockets. Each numa socket has 8GB of RAM.
I reserve a total of 6 hugepages (1G).

When I try to create a mempool (API rte_mempool_create) of 530432 mbufs
(each one with 9108 bytes) I get a ENOMEM error.

In theory this mempool should be around 4.8GB and the hugepages are enough
to hold it.
Why is this failing ?


Accessing TLS for EAL threads

2021-11-17 Thread Antonio Di Bacco
I need to emulate the pthread_setspecific and pthread_getspecific for EAL 
threads. I don't find any suitable APIs in the DPDK to access the TLS and get 
and set keys.

I launched a number of threads using the rte_eal_remote launch but I don't find 
any API that allows me to access the TLS for those threads.






Accessing EAL threads TLS

2021-11-17 Thread Antonio Di Bacco
I need to emulate the pthread_setspecific and pthread_getspecific for EAL 
threads. I don't find any suitable APIs in the DPDK to access the TLS and get 
and set keys.

I launched a number of threads using the rte_eal_remote launch but I don't find 
any API that allows me to access the TLS for those threads.