Re: Bug in rte_mempool_do_generic_get?

2023-02-24 Thread Harris, James R


> On Feb 24, 2023, at 6:56 AM, Honnappa Nagarahalli 
>  wrote:
> 
> 
> 
>> -Original Message-
>> From: Morten Brørup 
>> Sent: Friday, February 24, 2023 6:13 AM
>> To: Harris, James R ; dev@dpdk.org
>> Subject: RE: Bug in rte_mempool_do_generic_get?
>> 
>>> 
>> 
>>> If you have a mempool with 2048 objects, shouldn't 4 cores each be able to 
>>> do a 512 buffer bulk get, regardless of the configured cache size?
>> 
>> No, the scenario you described above is the expected behavior. I think it is
>> documented somewhere that objects in the caches are unavailable for other
>> cores, but now I cannot find where this is documented.
>> 

Thanks Morten.

Yeah, I think it is documented somewhere, but I also couldn’t find it.  I was 
aware of cores not being able to allocate from another core’s cache.  My 
surprise was that in a pristine new mempool, that 4 cores could not each do one 
initial 512 buffer bulk get.  But I also see that even before the a2833ecc5 
patch, the cache would get populated on gets less than cache size, in addition 
to the buffers requested by the user.  So if cache size is 256, and bulk get is 
for 128 buffers, it pulls 384 buffers from backing pool - 128 for the caller, 
another 256 to prefill the cache.  Your patch makes this cache filling 
consistent between less-than-cache-size and greater-than-or-equal-to-cache-size 
cases.

>> Furthermore, since the effective per-core cache size is 1.5 * configured 
>> cache
>> size, a configured cache size of 256 may leave up to 384 objects in each per-
>> core cache.
>> 
>> With 4 cores, you can expect up to 3 * 384 = 1152 objects sitting in the
>> caches of other cores. If you want to be able to pull 512 objects with each
>> core, the pool size should be 4 * 512 + 1152 objects.
> May be we should document this in mempool library?
> 

Maybe.  But this case I described here is a bit wonky - SPDK should never have 
been specifying a non-zero cache in this case.  We only noticed this change in 
behavior because we were creating the mempool with a cache when we shouldn’t 
have.

-Jim




Bug in rte_mempool_do_generic_get?

2023-02-23 Thread Harris, James R
Hi,

I’ve tracked down a regression in SPDK to DPDK commit a2833ecc5 (“mempool: fix 
get objects from mempool with cache”).

Here’s an example that demonstrates the problem:

Allocate mempool with 2048 buffers and cache size 256.
Core 0 allocates 512 buffers.  Mempool pulls 512 + 256 buffers from backing 
ring, returns 512 of them to caller, puts the other 256 in core 0 cache.  
Backing ring now has 1280 buffers.
Core 1 allocates 512 buffers.  Mempool pulls 512 + 256 buffers from backing 
ring, returns 512 of them to caller, puts the other 256 in core 1 cache.  
Backing ring now has 512 buffers.
Core 2 allocates 512 buffers.  Mempool pulls remaining 512 buffers from backing 
ring and returns all of them to caller.  Backing ring now has 0 buffers.
Core 3 tries to allocate 512 buffers and it fails.

In the SPDK case, we don’t really need or use the mempool cache in this case, 
so changing the cache size to 0 fixes the problem and is what we’re going to 
move forward with.

But the behavior did cause a regression so I thought I’d mention it here.  If 
you have a mempool with 2048 objects, shouldn’t 4 cores each be able to do a 
512 buffer bulk get, regardless of the configured cache size?

Regards,

Jim Harris




Re: [RFC v3 00/26] Bus and device cleanup for 22.11

2022-09-23 Thread Harris, James R



> On Sep 23, 2022, at 12:13 AM, David Marchand  
> wrote:
> 
> On Thu, Sep 22, 2022 at 12:29 AM Harris, James R
>  wrote:
>> 
>> 
>> So sticking with the internal header copies for now, this works fine
>> for LTS and non-LTS releases.  What about stable releases (i.e. 22.11.1)?
>> IIUC, these header files could change in a stable release since they are
>> no longer part of the ABI?
> 
> In theory, yes they can change.
> But this is a price to pay, as no one seems willing to invest and
> maintain a stable API for PCI drivers.

Understood.  SPDK is preparing for this possibility, just wanted to confirm
it was necessary.

> I just noticed that some parts of the cleanups I had proposed have
> been merged in SPDK.
> Next time, I prefer getting some feedback from SPDK community before
> my SoB is applied (or stripped) on modified patches.

My apologies David.  The RTE_DEV_FOREACH cleanup was a nice one and an obvious
improvement.  I believe it was applied without any modifications (except for
fuzz offsets) but still should have given you a heads-up.

-Jim





Re: [RFC v3 00/26] Bus and device cleanup for 22.11

2022-09-21 Thread Harris, James R


> On Aug 30, 2022, at 8:09 AM, David Marchand  wrote:
> 
> On Mon, Aug 29, 2022 at 7:12 PM Walker, Benjamin
>  wrote:
 Can we keep rte_pci_register(), or a new variation of it that keeps
 the rte_pci_driver structure hidden?  Hiding rte_pci_register() would
 mean SPDK can no longer work with a packaged DPDK.  Or the DPDK
 packages would need to set enable_driver_sdk which I suspect is not the
>>> intent.
>>> 
>>> What do you think if SPDK maintains a copy of the internal headers?
>>> 
>>> The internal API are not supposed to change that often, but we (DPDK) won't
>>> guarantee it.
>>> This would still put some maintenance burden on SPDK but I think it is a 
>>> good
>>> compromise.
>>> 
>> 
>> Would these internal symbols be considered part of the public/official ABI? 
>> When
> 
> What do you mean by "public/official"?
> If you mean the "stable" ABI (as described in the ABI policy document
> and for which compatibility is preserved across minor versions of the
> ABI), the answer is no: internal symbols are not part of it.
> 
> 
>> SPDK goes to dynamically load a shared DPDK library, how can we detect
>> whether it's a version that we support linking against?
> 
> The runtime version of a DPDK library is available via rte_version().
> 
> 
> As for the PCI drivers that SPDK wants to register in DPDK, what do
> you think if SPDK people added and maintained a "generic" PCI driver
> in DPDK.
> This driver would expose a new API (which can not re-expose internal
> structures, like rte_pci_driver and consorts) and ensure its ABI is
> maintained in the long term.
> This makes me think of pci-stub, but in DPDK.
> 
> I did not think too much about it and I don't understand what SPDK
> requires, but is there something wrong with this approach?

The stub driver idea is interesting. In the past we’ve needed access to more
than what a stub driver would provide - for example, to work around kernel
vfio + pci bugs when we probe 24 NVMe SSDs behind a PCIe switch that is
hot-inserted, or conversely handling hot removal of that same switch.  So
I’m worried that even if we do the stub driver, we end up running into a
case where we need these header file copies anyways.  Not ruling the stub
driver out, but I think it's a longer-term goal, not something we would have
ready for 22.11 obviously. 

So sticking with the internal header copies for now, this works fine
for LTS and non-LTS releases.  What about stable releases (i.e. 22.11.1)?
IIUC, these header files could change in a stable release since they are
no longer part of the ABI?

-Jim




Re: [RFC v3 00/26] Bus and device cleanup for 22.11

2022-08-04 Thread Harris, James R


> On Jul 28, 2022, at 8:26 AM, David Marchand  wrote:
> 
> This is a PoC for hiding the rte_bus, rte_driver and rte_device objects.
> And mark associated driver only API as internal.
> 
> A good amount of the patches are preparation work on rte_bus.h,
> rte_dev.h, rte_devargs.h and rte_eal.h headers, removing dependencies
> between them.
> 
> PCI bus specific handling are removed from testpmd, unit tests and
> examples.
> 
> After this series, driver-only API headers for registering to buses are
> not exported anymore, unless the enable_driver_sdk meson option is
> selected.
> 
> New accessors for rte_bus, rte_driver and rte_device have been added,
> marked with an experimental tag first when introducing them, and later
> in the series marked as stable since external users will want to use
> those drop-in replacements right away.
> 
> A check is added to ensure we won't pollute app/ and examples/ again,
> though some unit tests are left intentionnally untouched as they test
> some internals of DPDK.
> 
> Comments welcome.

Hi David,

I certainly understand wanting to hide those objects to avoid ABI
maintenance.  SPDK is using fields directly from some of those structures
today, but I think all of them could be replaced with associated helper
functions.  This was discussed a bit late last year related to Chenbo’s
earlier patch set:

https://www.mail-archive.com/dev@dpdk.org/msg222560.html

It looks like SPDK never got back to the DPDK project on exactly what APIs
we would need, to see if DPDK could still support a minimal API without
requiring the enable_driver_sdk flag.  I’m generating an exact list, but it
would be rendered moot depending on the next question...

Can we keep rte_pci_register(), or a new variation of it that keeps the
rte_pci_driver structure hidden?  Hiding rte_pci_register() would mean
SPDK can no longer work with a packaged DPDK.  Or the DPDK packages
would need to set enable_driver_sdk which I suspect is not the intent.

If you’re open to this, we (SPDK) can make a proposal on exactly what we
would need to keep SPDK working with a standard DPDK build.

Thanks,

Jim
 



Re: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs

2021-10-11 Thread Harris, James R


On 10/11/21, 5:55 AM, "Thomas Monjalon"  wrote:

11/10/2021 08:58, Xia, Chenbo:
> From: Liu, Changpeng 
> > Another issue raised by Jim Harris is that for distro packaged DPDK, 
since
> > this option isn't enabled by default, this will not allow SPDK
> > to use the distro packaged DPDK after this release.
> 
> I think for this problem, we have two options: enable driver sdk by 
default or
> let OSV configure the option when building distros. I'm fine with either 
option.

The meson option enable_driver_sdk is described as "Install headers to 
build drivers."
Standard development packages should provide headers to build an 
application.
This option is for projects extending DPDK drivers out of the tree.
The preferred option is to develop drivers inside DPDK.

If a project needs the special option enable_driver_sdk,
1/ it is not following the recommended approach,
2/ it has to manage the burden of driver compatibility with DPDK,
3/ it can compile DPDK itself.

So I think we neither need to make it a default, nor force distros to 
enable it.
Am I missing something?

Hi Thomas,

This preference to develop PCI drivers inside of DPDK seems to be a very recent 
preference.  enable_driver_sdk was just added in DPDK 21.05, and for building 
out-of-tree ethdev drivers. But DPDK has always enabled building out-of-tree 
PCI drivers with its default build configuration - SPDK has relied on these 
APIs since its inception.

We have always viewed DPDK as being a very useful toolkit for building 
userspace drivers (especially storage drivers!) that aren't part of DPDK 
itself.  We hope that continues to be the case.

All of that being said, SPDK already compiles DPDK itself as the default 
configuration. We maintain a DPDK fork for patches that have not yet hit DPDK 
upstream. If this gets merged we can document that users building DPDK 
themselves must set enable_driver_sdk. We can also document to our users that 
SPDK may not build against distro DPDK packages, once distros pick up these 
changes.

-Jim




Re: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs

2021-10-04 Thread Harris, James R
Adding Changpeng Liu from SPDK side.

On 10/4/21, 6:48 AM, "David Marchand"  wrote:

On Thu, Sep 30, 2021 at 10:45 AM David Marchand
 wrote:
> On Wed, Sep 29, 2021 at 9:38 AM Xia, Chenbo  wrote:
> > @David, could you help me understand what is the compile error in 
Fedora 31?
> > DPDK_compile_spdk failure is expected as the header name for SPDK is 
changed,
> > I am not sure if it's the same error...
>
> The error log is odd (no compilation "backtrace").
> You'll need to test spdk manually I guess.

Tried your series with SPDK (w/o and w/ enable_driver_sdk).
I think the same, and the error is likely due to the file rename.

$ make
  CC lib/env_dpdk/env.o
In file included from env.c:39:0:
env_internal.h:64:25: error: field ‘driver’ has incomplete type
  struct rte_pci_driver  driver;
 ^
env_internal.h:75:59: warning: ‘struct rte_pci_device’ declared inside
parameter list [enabled by default]
 int pci_device_init(struct rte_pci_driver *driver, struct
rte_pci_device *device);
   ^
env_internal.h:75:59: warning: its scope is only this definition or
declaration, which is probably not what you want [enabled by default]
env_internal.h:76:28: warning: ‘struct rte_pci_device’ declared inside
parameter list [enabled by default]
 int pci_device_fini(struct rte_pci_device *device);
^
env_internal.h:89:38: warning: ‘struct rte_pci_device’ declared inside
parameter list [enabled by default]
 void vtophys_pci_device_added(struct rte_pci_device *pci_device);
  ^
env_internal.h:96:40: warning: ‘struct rte_pci_device’ declared inside
parameter list [enabled by default]
 void vtophys_pci_device_removed(struct rte_pci_device *pci_device);
^
make[2]: *** [env.o] Error 1
make[1]: *** [env_dpdk] Error 2
make: *** [lib] Error 2



So basically, SPDK needs some updates since it has its own pci drivers.
I copied some SPDK folks for info.

*Disclaimer* I only checked it links fine against my 21.11 dpdk env,
and did not test the other cases:

diff --git a/dpdkbuild/Makefile b/dpdkbuild/Makefile
index d51b1a6e5..0e666735d 100644
--- a/dpdkbuild/Makefile
+++ b/dpdkbuild/Makefile
@@ -166,6 +166,7 @@ all: $(SPDK_ROOT_DIR)/dpdk/build-tmp
 $(SPDK_ROOT_DIR)/dpdk/build-tmp: $(SPDK_ROOT_DIR)/mk/cc.mk
$(SPDK_ROOT_DIR)/include/spdk/config.h
$(Q)rm -rf $(SPDK_ROOT_DIR)/dpdk/build 
$(SPDK_ROOT_DIR)/dpdk/build-tmp
$(Q)cd "$(SPDK_ROOT_DIR)/dpdk"; CC="$(SUB_CC)" meson
--prefix="$(MESON_PREFIX)" --libdir lib -Dc_args="$(DPDK_CFLAGS)"
-Dc_link_args="$(DPDK_LDFLAGS)" $(DPDK_OPTS)
-Ddisable_drivers="$(shell echo $(DPDK_DISABLED_DRVERS) | sed -E "s/
+/,/g")" build-tmp
+   $(Q)! meson configure build-tmp | grep -qw enable_driver_sdk
|| meson configure build-tmp -Denable_driver_sdk=true
$(Q)sed $(SED_INPLACE_FLAG) 's/#define RTE_EAL_PMD_PATH
.*/#define RTE_EAL_PMD_PATH ""/g'
$(SPDK_ROOT_DIR)/dpdk/build-tmp/rte_build_config.h
$(Q) \
# TODO Meson build adds libbsd dependency when it's available.
This means any app will be \
diff --git a/lib/env_dpdk/env.mk b/lib/env_dpdk/env.mk
index cc7db8aab..e24c6942f 100644bits with an embedded dpdk
--- a/lib/env_dpdk/env.mk
+++ b/lib/env_dpdk/env.mk
@@ -172,6 +172,12 @@ DPDK_PRIVATE_LINKER_ARGS += -lnuma
 endif
 endif

+ifneq (,$(wildcard $(DPDK_INC_DIR)/rte_build_config.h))
+ifneq (,$(shell grep -e "define RTE_HAS_LIBARCHIVE 1"
$(DPDK_INC_DIR)/rte_build_config.h))
+DPDK_PRIVATE_LINKER_ARGS += -larchive
+endif
+endif
+
 ifeq ($(OS),Linux)
 DPDK_PRIVATE_LINKER_ARGS += -ldl
 endif
diff --git a/lib/env_dpdk/env_internal.h b/lib/env_dpdk/env_internal.h
index 2303f432c..24b377545 100644
--- a/lib/env_dpdk/env_internal.h
+++ b/lib/env_dpdk/env_internal.h
@@ -43,13 +43,18 @@
 #include 
 #include 
 #include 
-#include 
 #include 

 #if RTE_VERSION < RTE_VERSION_NUM(19, 11, 0, 0)
 #error RTE_VERSION is too old! Minimum 19.11 is required.
 #endif

+#if RTE_VERSION < RTE_VERSION_NUM(21, 11, 0, 0)
+#include 
+#else
+#include 
+#endif
+
 /* x86-64 and ARM userspace virtual addresses use only the low 48 bits 
[0..47],
  * which is enough to cover 256 TB.
  */



-- 
David Marchand




[dpdk-dev] Bug with commit 64051bb1 (devargs: unify scratch buffer storage)

2021-04-16 Thread Harris, James R
Hi,

SPDK has identified a regression with commit 64051bb1 (devargs: unify scratch 
buffer storage).  The issue seems to be with this part of the patch:

@@ -276,15 +287,8 @@ rte_devargs_insert(struct rte_devargs **da)
if (strcmp(listed_da->bus->name, (*da)->bus->name) == 0 &&
strcmp(listed_da->name, (*da)->name) == 0) {
/* device already in devargs list, must be updated */
-   listed_da->type = (*da)->type;
-   listed_da->policy = (*da)->policy;
-   free(listed_da->args);
-   listed_da->args = (*da)->args;
-   listed_da->bus = (*da)->bus;
-   listed_da->cls = (*da)->cls;
-   listed_da->bus_str = (*da)->bus_str;
-   listed_da->cls_str = (*da)->cls_str;
-   listed_da->data = (*da)->data;
+   rte_devargs_reset(listed_da);
+   *listed_da = **da;
/* replace provided devargs with found one */
free(*da);
*da = listed_da;


Previously the data members were copied one-by-one, preserving the pointers in 
the listed_da’s TAILQ_ENTRY.  But after this patch, rte_devargs_reset() zeroes 
the entire rte_devargs structure, including the pointers in the TAILQ_ENTRY.  
If we do a subsequent rte_devargs_remove() on this same entry, we segfault 
since the TAILQ_ENTRY’s pointers are invalid.  There could be similar segfaults 
with any subsequent rte_devargs_insert() calls that require iterating the 
global list of devargs entries.

rte_devargs_insert() could manually copy the TAILQ_ENTRY pointers to *da before 
calling rte_devargs_reset() – that at least fixes the SPDK regression.  But 
it’s not clear to me how many of the other rte_devargs_reset() callsites added 
by this patch also need to be changed in some way.

Thanks,

-Jim





Re: [dpdk-dev] FW: [SPDK] Re: Potential defect in pci_unplug()

2020-10-13 Thread Harris, James R

On 10/12/20, 9:31 PM, "Niu, Yawei"  wrote:

Thanks a lot for the reply, Gaetan!

So seems we have to explicitly use failsafe PMD to deal with this re-plug 
case? I'm not quite sure if that'll cooperate with current SPDK hotplug code 
seamlessly.

I think it would be great if SPDK/DPDK can fix it internally (reconstruct 
the devargs on re-plug somehow?) and provide a transparent hotplug capability 
to apps.

Thanks
-Niu

Yes, thanks Gaëtan.  It makes sense the DPDK only honors the PCI whitelist for 
the specific devices that were present at those BDFs at init time.

Niu - the failsafe PMD isn't needed here.  I think you can get what you need by 
avoiding the whitelist completely.  Let's continue that part of the discussion 
on the SPDK mailing list.

Thanks,

-Jim
 

On 2020/10/10, 12:10 AM, "Gaëtan Rivet"  wrote:

Hello,

On 29/09/20 01:15 +, Niu, Yawei wrote:
>
    >
    > On 2020/9/28, 11:44 PM, "Harris, James R"  
wrote:
>
> Hi Niu,
>
> I agree, this doesn't look right.  Could you file an SPDK issue 
for this to make sure we track it?  And then try sending an e-mail to the DPDK 
mailing list?  I'm open to submitting a patch to our DPDK submodule short-term, 
but only if we get agreement with DPDK community that this fix is correct.
>
> Thanks,
>
> Jim
>
>
> On 9/28/20, 12:17 AM, "Niu, Yawei"  wrote:
>
> Hi,
>
> In the pci_unplug() (dpdk/drivers/bus/pci/pci_common.c), why 
do we call rte_devargs_remove() to remove the saved device args? That looks a 
defect to me, since that’ll result in the hot removed device failed to be 
detected when it’s plugged back (when white list option was provided), the 
situation is described following:
>
>   1.  App starts with white list option provided, let’s 
suppose the device in white list is: :81:00.0;
>   2.  rte_devargs_add() is called to add this device arg on 
rte_eal_init();
>   3.  Issue “echo 1 > 
/sys/bus/pci/devices/\:81\:00.0/remove” to hot remove the device;
>   4.  pci_unplug() is called to remove the device, and 
rte_devargs_remove() is called, so this device arg for white list is remove too;
>   5.  Issue “echo 1 > /sys/bus/pci/rescan” then 
“PCI_WHITELIST=”:81:00.0” spdk/script/setup.sh” to do hot plug;
>   6.  In pci_scan_one(), new dev is generated, however, the 
dev->device.devargs is set NULL since the device args has been removed on 
device unplug;
>   7.  rte_pci_probe() does white list scan, however, it 
unexpectedly skips the device because the devargs of the device is NULL now;
>
> I don’t fully understand the DPDK code, but this 
rte_devargs_remove() in pci_unplug() doesn’t make sense to me (when I commented 
it out, seems everything will work as expected). Is this a glitch or I missed 
anything?

This is not a glitch, the behavior is intended.
When a device is unplugged from the system, there is no expectation that
it will reappear at the same PCI address.

When an application receives the RMV event for the device, it is alerted
that the device was removed. It is its responsibility then to detect
when a device it wants to use is added back. It can do so via udev on
linux for example, but it is tightly coupled with the operating system
or even the distribution. BSD and windows won't work that way.

This is the reason this responsibility was left to the application.

The failsafe PMD attempts to offer transparent hotplug capability to
existing applications. It had to deal with this exact issue. The 
solution
used there has been to use either:

  * the exec() sub-device type, that takes a shell command as
parameter, which should return a valid device after successful
execution. Intention is for the user to declare a script that will
detect device hotplug on the system using other matches than PCI
address (as it is unreliable), such as MAC address.

  * the fd() sub-device type, which is a pipe where another piece of 
code
will send device ID into. This is what vdev_netvsc does currently:
it detects on its own valid NetVSC devices and when one is suitable,
it sends its PCI id in this pipe to failsafe driver.

See https://doc.dpdk.org/guides/nics/fail_safe.html
section 55.3.1. Fail-safe command line parameters for documentation.

When a new device is hotplugged into DPDK, it is described by an
 

Re: [dpdk-dev] [PATCH v2] pci/linux: copy new id for inserted device

2020-10-12 Thread Harris, James R


On 10/7/20, 8:06 AM, "Thomas Monjalon"  wrote:

Hi Jim,

Sorry I see nobody reviewed your patch.

Jim Harris  wrote:
> + memcpy(&dev2->id, &dev->id, sizeof(dev2->id));
[...]
> + memcmp(&dev2->id, &dev->id, sizeof(dev2->id)))

Why using memcpy and memcmp instead of simple assignment and comparison?

Direct assignment and comparison would work too.  I did see some similar cases 
though using memcpy for rte_pci_addr (which is similar to rte_pci_id) in 
linux/pci_uio.c and linux/pci_vfio.c.  It wasn't clear to me if direct 
assignment/comparison for structures was the norm for DPDK.

I'm happy to send a v2 with a direct assignment/comparison though if that is 
preferred.

-Jim





Re: [dpdk-dev] Questions on DPDK pkg-config

2020-09-24 Thread Harris, James R


On 9/24/20, 2:27 AM, "Bruce Richardson"  wrote:

On Wed, Sep 23, 2020 at 07:21:16PM +0100, Harris, James R wrote:
>Hi,
> 
> 
>SPDK would like to use DPDK’s pkg-config files rather than rolling our
>own linker arguments.  I’m running into a couple of issues – one looks
>like a bug, the other is not as clear.
> 
> 
>First is the use of --as-needed
>([1]https://github.com/DPDK/dpdk/commit/b98447077b0609750c10b84b7b2e7be
>0c8504fad).  We have run into problems with this in the past,
>specifically related to the rte_mempool_ring MEMPOOL_REGISTER_OPS
>constructor functions.  --as-needed results in similar behavior to
>omitting --whole-archive when using static libraries – meaning the
>constructor function doesn’t get called and we cannot create any
>mempools since there are no registered mempool_ops.  I think this would
>also affect other uses of constructor functions within DPDK when using
>pkg-config, but this rte_mempool_ring one is the only that would impact
>SPDK so it’s the only one I’ve checked.
> 
> 
>Second is that rte_bus_pci is not included in the pkg-config output.
>SPDK relies on rte_bus_pci since we have our own DPDK drivers for
>things like nvme, virtio-blk and virtio-scsi and need access to
>rte_pci_read_config, rte_pci_write_config and rte_pci_register.
>Perhaps we could add bus_pci to the dpdk_libraries in lib/meson.build?
> 
> 
>Any help would be appreciated.
> 
> 
>Thanks,
> 
> 
>-Jim
> 
Hi Jim,

what command are you using to get the libs etc. from pkg-config? For static
linking you need to pass the --static flag which will include all the libs
and drivers, including the pci bus driver. [See output below from my system 
after
running "ninja install"]. The --as-needed is required to prevent the shared
libs also being linked into a static build in this case.

/Bruce



Yes, the output from pkg-config works fine for the static library use case.  I 
also see librte_bus_pci.a get emitted when specifying --static.

I chatted with Bruce offline about this, and we (mostly Bruce) root caused the 
issue with the shared library use case.  SPDK is using a non-standard DESTDIR 
and then not specifying the PMD directory via the -d command line argument.  So 
this needs to be fixed on the SPDK side.

Thanks,

Jim








[dpdk-dev] Questions on DPDK pkg-config

2020-09-23 Thread Harris, James R
Hi,

SPDK would like to use DPDK’s pkg-config files rather than rolling our own 
linker arguments.  I’m running into a couple of issues – one looks like a bug, 
the other is not as clear.

First is the use of --as-needed 
(https://github.com/DPDK/dpdk/commit/b98447077b0609750c10b84b7b2e7be0c8504fad). 
 We have run into problems with this in the past, specifically related to the 
rte_mempool_ring MEMPOOL_REGISTER_OPS constructor functions.  --as-needed 
results in similar behavior to omitting --whole-archive when using static 
libraries – meaning the constructor function doesn’t get called and we cannot 
create any mempools since there are no registered mempool_ops.  I think this 
would also affect other uses of constructor functions within DPDK when using 
pkg-config, but this rte_mempool_ring one is the only that would impact SPDK so 
it’s the only one I’ve checked.

Second is that rte_bus_pci is not included in the pkg-config output.  SPDK 
relies on rte_bus_pci since we have our own DPDK drivers for things like nvme, 
virtio-blk and virtio-scsi and need access to rte_pci_read_config, 
rte_pci_write_config and rte_pci_register.  Perhaps we could add bus_pci to the 
dpdk_libraries in lib/meson.build?

Any help would be appreciated.

Thanks,

-Jim


Re: [dpdk-dev] [PATCH] contigmem: cleanup properly when load fails

2020-03-19 Thread Harris, James R


On 3/19/20, 5:54 AM, "David Marchand"  wrote:

On Tue, Mar 10, 2020 at 10:32 AM Bruce Richardson
 wrote:
>
> On Mon, Mar 09, 2020 at 03:00:25AM -0700, Jim Harris wrote:
> > If contigmem is not able to allocate all of the
> > requested buffers, it frees whatever buffers were
> > able to be allocated up until that point.
> >
> > But the pointers are not set to NULL in that case.
> > After the load fails, the FreeBSD kernel will
> > immediately call the contigmem unload handler, which
> > tries to free the buffers again since the pointers
> > were not set to NULL.
> >
> > It's not clear that we should just rely on the unload
> > handler getting called after load failure. So let's
> > keep the existing cleanup code in the load handler,
> > but explicitly set the pointers to NULL after freeing
> > them.

Can you check this Fixes is correct?

Fixes: 5f51eca22489 ("contigmem: free allocated memory on error")
Cc: sta...@dpdk.org

Yes - that's correct.  Thanks!

-Jim




Re: [dpdk-dev] [PATCH] vfio: free mp_reply msgs in failure cases

2019-10-14 Thread Harris, James R


> On Oct 14, 2019, at 7:47 AM, David Marchand  wrote:
> 
> On Mon, Oct 14, 2019 at 3:49 PM Harris, James R
>  wrote:
>> On 10/14/19, 4:18 AM, "David Marchand"  wrote:
>> 
>>>On Fri, Aug 16, 2019 at 9:19 PM Jim Harris  
>>> wrote:
>>> 
>>> The code checks both rte_mp_request_sync() return
>>> code and that the number of messages in the reply
>>> equals 1.  If rte_mp_request_sync() succeeds but
>>> there was more than one message, those messages
>>> would get leaked.
>>> 
>>> Found via code review by Anatoly Burakov of patches
>>> that used the vhost code as a template for using
>>> rte_mp_request_sync().
>> 
>>The patch looks fine, I just want to make sure its title reflect what it 
>> fixes.
>>Can you give some insights of how common this issue is? If there are
>>known cases where it happens?
>> 
>> Hi David,
>> 
>> I don't think this issue is common at all.  I don't have any known cases in 
>> mind - it was only found via code inspection.
> 
> Anatoly, Jim,
> 
> Not really inspired for the title, what do you think of:
> vfio: fix potential leak with multiprocess
> 
> Plus, it deserves a Fixes: line.
> Fixes: 83a73c5fef66 ("vfio: use generic multi-process channel")
> Cc: sta...@dpdk.org
> 
> If you are okay with this, I will do the change when applying.

I am ok with those changes. Thanks!

> 
> -- 
> David Marchand
> 


Re: [dpdk-dev] [PATCH] vfio: free mp_reply msgs in failure cases

2019-10-14 Thread Harris, James R


On 10/14/19, 4:18 AM, "David Marchand"  wrote:

On Fri, Aug 16, 2019 at 9:19 PM Jim Harris  wrote:
>
> The code checks both rte_mp_request_sync() return
> code and that the number of messages in the reply
> equals 1.  If rte_mp_request_sync() succeeds but
> there was more than one message, those messages
> would get leaked.
>
> Found via code review by Anatoly Burakov of patches
> that used the vhost code as a template for using
> rte_mp_request_sync().

The patch looks fine, I just want to make sure its title reflect what it 
fixes.
Can you give some insights of how common this issue is? If there are
known cases where it happens?

Hi David,

I don't think this issue is common at all.  I don't have any known cases in 
mind - it was only found via code inspection.

I might have spotted another issue (could be worth a followup patch
later if confirmed), please see below.

>
> Signed-off-by: Jim Harris 
> ---
>  lib/librte_eal/linux/eal/eal_vfio.c |   16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/lib/librte_eal/linux/eal/eal_vfio.c 
b/lib/librte_eal/linux/eal/eal_vfio.c
> index 501c74f23..d9541b122 100644
> --- a/lib/librte_eal/linux/eal/eal_vfio.c
> +++ b/lib/librte_eal/linux/eal/eal_vfio.c

[snip]

> @@ -1021,7 +1021,7 @@ int
>  vfio_get_default_container_fd(void)
>  {
> struct rte_mp_msg mp_req, *mp_rep;
> -   struct rte_mp_reply mp_reply;
> +   struct rte_mp_reply mp_reply = {0};
> struct timespec ts = {.tv_sec = 5, .tv_nsec = 0};
> struct vfio_mp_param *p = (struct vfio_mp_param *)mp_req.param;
>
> @@ -1049,9 +1049,9 @@ vfio_get_default_container_fd(void)
> free(mp_reply.msgs);
> return mp_rep->fds[0];

Do we have a use after free on mp_rep which points to &mp_reply.msgs[0] ?

You're right.  It needs to save mp_rep->fds[0] into a local variable before we 
free that array.  That would be a good follow-up patch!

-Jim

> }
> -   free(mp_reply.msgs);
> }
>
> +   free(mp_reply.msgs);
> RTE_LOG(ERR, EAL, "  cannot request default container fd\n");
> return -1;
>  }



-- 
David Marchand





Re: [dpdk-dev] [PATCH] vhost: add __rte_experimental to rte_vhost_va_from_guest_pa

2019-09-23 Thread Harris, James R


On 9/23/19, 8:41 AM, "Yigit, Ferruh"  wrote:

On 9/18/2019 2:12 PM, Maxime Coquelin wrote:
> 
> 
> On 8/20/19 11:37 AM, Jim Harris wrote:
>> This function is listed under EXPERIMENTAL in the
>> rte_vhost_version.map, so it needs to be marked
>> with __rte_experimental in the header file as well.
>>
>> Found by check-experimental-syms.sh when trying to compile
>> DPDK with -finstrument-functions.  This script didn't
>> catch this in the normal case, since the function is
>> declared __rte_always_inline.
>>
>> Signed-off-by: Jim Harris 
>> ---
>>  lib/librte_vhost/rte_vhost.h |1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
>> index 7fb172912..fc27bc21e 100644
>> --- a/lib/librte_vhost/rte_vhost.h
>> +++ b/lib/librte_vhost/rte_vhost.h
>> @@ -225,6 +225,7 @@ rte_vhost_gpa_to_vva(struct rte_vhost_memory *mem, 
uint64_t gpa)
>>   * @return
>>   *  the host virtual address on success, 0 on failure
>>   */
>> +__rte_experimental
>>  static __rte_always_inline uint64_t
>>  rte_vhost_va_from_guest_pa(struct rte_vhost_memory *mem,
>> uint64_t gpa, 
uint64_t *len)
>>
> 
> Fixed commit message to comply with check-git-log tool.
> 
> Applied to dpdk-next-virtio/master.

This is breaking the 'vhost_scsi' sample application since it is using this
experimental API [1].
Build system should be updated to say sample application is allowed to using
experimental APIs.
Can you please send a new version?

Thanks Ferruh.  I missed that.  Does this mean adding the following to 
examples/vhost_scsi/Makefile?

CFLAGS += -DALLOW_EXPERIMENTAL_API

And this to examples/vhost_scsi/meson.build?

allow_experimental_apis = true

Maxime - please let me know if you need a patch from me that you'll apply to 
dpdk-next-virtio/master, or if you would send such a patch yourself.

Thanks,

-Jim
 



Re: [dpdk-dev] [PATCH] vfio: free mp_reply msgs in failure cases

2019-08-20 Thread Harris, James R



> On Aug 20, 2019, at 6:13 AM, Burakov, Anatoly  
> wrote:
> 
>> On 16-Aug-19 1:13 PM, Jim Harris wrote:
>> The code checks both rte_mp_request_sync() return
>> code and that the number of messages in the reply
>> equals 1.  If rte_mp_request_sync() succeeds but
>> there was more than one message, those messages
>> would get leaked.
>> Found via code review by Anatoly Burakov of patches
>> that used the vhost code as a template for using
>> rte_mp_request_sync().
>> Signed-off-by: Jim Harris 
>> ---
>>  lib/librte_eal/linux/eal/eal_vfio.c |   16 
>>  1 file changed, 8 insertions(+), 8 deletions(-)
>> diff --git a/lib/librte_eal/linux/eal/eal_vfio.c 
>> b/lib/librte_eal/linux/eal/eal_vfio.c
>> index 501c74f23..d9541b122 100644
>> --- a/lib/librte_eal/linux/eal/eal_vfio.c
>> +++ b/lib/librte_eal/linux/eal/eal_vfio.c
>> @@ -264,7 +264,7 @@ vfio_open_group_fd(int iommu_group_num)
>>  int vfio_group_fd;
>>  char filename[PATH_MAX];
>>  struct rte_mp_msg mp_req, *mp_rep;
>> -struct rte_mp_reply mp_reply;
>> +struct rte_mp_reply mp_reply = {0};
>>  struct timespec ts = {.tv_sec = 5, .tv_nsec = 0};
>>  struct vfio_mp_param *p = (struct vfio_mp_param *)mp_req.param;
>>  @@ -320,9 +320,9 @@ vfio_open_group_fd(int iommu_group_num)
>>  RTE_LOG(ERR, EAL, "  bad VFIO group fd\n");
>>  vfio_group_fd = 0;
>>  }
>> -free(mp_reply.msgs);
>>  }
>>  +free(mp_reply.msgs);
> 
> That's not quite correct. This fixes the problem of missing free() when 
> nb_received mismatches, but this /adds/ a problem of doing an unnecessary 
> free() when rte_mp_request_sync() returns -1. Same for other places, i 
> believe.

This would just resolve to free(NULL) in the -1 case.

Jim

> 
> -- 
> Thanks,
> Anatoly


Re: [dpdk-dev] [PATCH v3 1/2] timer: use rte_mp_msg to pass TSC hz to secondary procs

2019-08-19 Thread Harris, James R


On 8/16/19, 12:19 PM, "dev on behalf of Jim Harris"  wrote:

rte_eal_init() is much faster in secondary processes since
hugepages don't need to be zeroed.  But there's still
non-trivial delays in the timer subsystem initialization
due to the 100ms sleep used to calculate TSC hz.  So use
the rte_mp_msg framework to allow secondary processes
to get the TSC hz from the primary process.

This cuts rte_eal_init() execution time in a secondary
process from 165ms to 66ms in my test program.

I'm withdrawing this patch.  Some Linux distros (Ubuntu) build msr as a module 
and if it's not loaded,
will go down the sleep path.  The (much) simpler solution is to just modprobe 
msr.



Signed-off-by: Jim Harris 
---
 lib/librte_eal/common/eal_common_timer.c |   70 
+-
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_timer.c 
b/lib/librte_eal/common/eal_common_timer.c
index 145543de7..3eb0309f5 100644
--- a/lib/librte_eal/common/eal_common_timer.c
+++ b/lib/librte_eal/common/eal_common_timer.c
@@ -15,9 +15,16 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "eal_private.h"
 
+#define EAL_TIMER_MP "eal_timer_mp_sync"
+
+struct timer_mp_param {
+   uint64_t tsc;
+};
+
 /* The frequency of the RDTSC timer resolution */
 static uint64_t eal_tsc_resolution_hz;
 
@@ -74,8 +81,8 @@ estimate_tsc_freq(void)
return RTE_ALIGN_MUL_NEAR(rte_rdtsc() - start, CYC_PER_10MHZ);
 }
 
-void
-set_tsc_freq(void)
+static void
+set_tsc_freq_primary(void)
 {
uint64_t freq;
 
@@ -89,6 +96,65 @@ set_tsc_freq(void)
eal_tsc_resolution_hz = freq;
 }
 
+static void
+set_tsc_freq_secondary(void)
+{
+   struct rte_mp_msg mp_req = {0};
+   struct rte_mp_reply mp_reply = {0};
+   struct timer_mp_param *r;
+   struct timespec ts = {.tv_sec = 1, .tv_nsec = 0};
+
+   strcpy(mp_req.name, EAL_TIMER_MP);
+   if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) || 
mp_reply.nb_received != 1) {
+   /* We weren't able to get the tsc hz from the primary process.  
So we will
+* just calculate it here in the secondary process instead.
+*/
+   set_tsc_freq_primary();
+   free(mp_reply.msgs);
+   return;
+   }
+
+   r = (struct timer_mp_param *)mp_reply.msgs[0].param;
+   eal_tsc_resolution_hz = r->tsc;
+   free(mp_reply.msgs);
+}
+
+static int
+timer_mp_primary(__attribute__((unused)) const struct rte_mp_msg *msg, 
const void *peer)
+{
+   struct rte_mp_msg reply;
+   struct timer_mp_param *r = (struct timer_mp_param *)reply.param;
+
+   memset(&reply, 0, sizeof(reply));
+   r->tsc = eal_tsc_resolution_hz;
+   strcpy(reply.name, EAL_TIMER_MP);
+   reply.len_param = sizeof(*r);
+
+   return rte_mp_reply(&reply, peer);
+}
+
+void
+set_tsc_freq(void)
+{
+   int rc;
+
+   /* We use a 100ms timer to calculate the TSC hz.  We can save this 
100ms in
+* secondary processes, by getting the TSC hz from the primary process.
+* So register an mp_action callback in the primary process, which 
secondary
+* processes will use to get the TSC hz.
+*/
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   set_tsc_freq_primary();
+   rc = rte_mp_action_register(EAL_TIMER_MP, timer_mp_primary);
+   if (rc) {
+   RTE_LOG(WARNING, EAL, "Could not register mp_action - 
secondary "
+   " processes will calculate TSC 
independently.\n");
+   }
+   } else {
+   set_tsc_freq_secondary();
+   }
+}
+
 void rte_delay_us_callback_register(void (*userfunc)(unsigned int))
 {
rte_delay_us = userfunc;





Re: [dpdk-dev] [PATCH 1/2] timer: use rte_mp_msg to pass TSC hz to secondary procs

2019-08-16 Thread Harris, James R


On 8/16/19, 12:56 AM, "Burakov, Anatoly"  wrote:




> @@ -89,6 +96,65 @@ set_tsc_freq(void)
>   eal_tsc_resolution_hz = freq;
>   }
>   
> +static void
> +set_tsc_freq_secondary(void)
> +{
> + struct rte_mp_msg mp_req;
> + struct rte_mp_reply mp_reply;
> + struct timer_mp_param *r;
> + struct timespec ts = {.tv_sec = 1, .tv_nsec = 0};
> +
> + memset(&mp_req, 0, sizeof(mp_req));
> + strcpy(mp_req.name, EAL_TIMER_MP);
> + if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) || 
mp_reply.nb_received != 1) {

If rte_mp_request_sync returns 0 but mp_reply.nb_receieved isn't set to 
1, you'll be leaking mp_reply.msgs.


Ha - of course you're right.  I didn't notice this when using the VFIO code as 
a reference for
how to code this.  I'll respin this patch and then push patches to fix the VFIO 
code separately.

Thanks,

-Jim



Re: [dpdk-dev] Aligned rte_mempool for storage applications

2019-03-26 Thread Harris, James R


On 3/26/19, 11:34 AM, "Howell, Seth"  wrote:

Hi Vipin,

Thanks for your quick reply. I will respond to your queries in order.
1. Yes, in at least one case we have buffers of size 4096 bytes. Some of 
our other buffers are much larger (>64KiB)
2. These buffers are used in the I/O path, so performance is very 
important. Allocating and freeing a buffer each time we use it could be pretty 
costly.

I think Vipin may have been suggesting allocating one (or multiple) very large 
buffers, and then splitting that buffer on 4KB boundaries in SPDK.  If so, that 
would still require SPDK to develop its own mempool-like feature to hold those 
buffers.  We'd really like to use the DPDK rte_mempool implementation rather 
than inventing our own.

3. Could you describe the idea of an indirect buffer in more detail? I 
don't think I quite understand that concept. I know we couldn't use mbufs 
because we often have buffers that are larger than 64k. I think there are more 
reasons we don't use the mbuf structure in our use case, but am not familiar 
with all of them. Maybe Jim can explain those in more detail. 

SPDK doesn't use rte_mbufs (except when absolutely required for things like 
DPDK cryptodev/compressdev).  Most of that data structure is filled with 
network packet related fields that would never be used for storage.  We could 
create our own very small data structure and do something similar to Vipin's 
indirect mbuf suggestion.  And I think this is what Vipin was starting to 
allude to in query #2.

It would be less optimal than a native aligned mempool because we'd be adding 
an extra pointer dereference on every get from the mempool - but probably only 
slightly less optimal.  Seth - let's sync up offline and see if we can quickly 
collect some benchmarking data to measure the performance impact of this extra 
dereference.

Thanks Vipin - this definitely gives us an alternative direction to investigate 
that we hadn't considered.

-Jim



Thanks,

Seth
-Original Message-
From: Varghese, Vipin 
Sent: Monday, March 25, 2019 7:53 PM
To: Harris, James R ; Howell, Seth 
; dev@dpdk.org
Subject: RE: Aligned rte_mempool for storage applications

Hi Seth,

If I may I would like to suggest and ask a query on the mempool alignment 
details. Please find my suggestion and query inline to the email.

Snipped
> 
> In SPDK, we use the rte_mempool struct for many internal structure 
> collections. The per-thread cache and ease of allocation of mempools 
> are very useful features.
> Some of the collections we store in SPDK are pools of I/O buffers. 
> Typically, these pools contain elements of at least 4096 bytes, and we 
> would like them to be aligned to 4k for performance reasons.
Query-1> is the total memory required to be 4096 only (data portion)?

> 
> [Jim] Just to clarify Seth's point - the performance reasons are 
> specifically to avoid wasteful memcopies.  The vast majority of NVMe 
> SSDs in the market today do not have full scatter/gather support - 
> rather they only support something called PRP (Physical Region Pages) 
> which require all scatter gather elements except the first to be 4KB 
> aligned.  There are other storage interfaces such as Linux AIO that also 
impose alignment restrictions.
> 
> -Jim
> 
> 
> Currently, the rte_mempool API doesn't support aligned mempool 
> objects. This means that when we allocate a 4k buffer and want it 
> aligned to 4k, we actually need to allocate an 8k buffer and calculate 
> an offset into it each time we want to use it.
Query-2> why not create contiguous 4K aligned memory with rte_malloc?

> We recently did a proof of concept using the rte_mempool_ops hook 
> where we allocated a mempool and populated it with aligned entries. 
> This allowed us to retrieve aligned addresses directly from 
> rte_mempool_get(), but didn't help with the allocation size.
> Because the rte_mempool struct assumes that each element has a 
> header attached to it, we still need to live up to that assumption for 
> each object we create in a mempool. This means that the actual size of 
> a buffer becomes 4k + 24 bytes. In order to get to our next aligned 
> address, we need to add about 4k of padding to each element.
> Modifying the current rte_mempool struct to allow entries without 
> headers seems impossible since it would break rte_mempool_for_obj_iter 
> and rte_mempool_from_obj. However I still think there is a lot of 
> benefit to be gained from a mempool structure that supports aligned 
objects without headers.
> I am w

Re: [dpdk-dev] Aligned rte_mempool for storage applications

2019-03-25 Thread Harris, James R


On 3/25/19, 2:06 PM, "Howell, Seth"  wrote:

Hello,

In SPDK, we use the rte_mempool struct for many internal structure 
collections. The per-thread cache and ease of allocation of mempools are very 
useful features.
Some of the collections we store in SPDK are pools of I/O buffers. 
Typically, these pools contain elements of at least 4096 bytes, and we would 
like them to be aligned to 4k for performance reasons.

[Jim] Just to clarify Seth's point - the performance reasons are specifically 
to avoid wasteful memcopies.  The vast majority of NVMe SSDs in the market 
today do not have full scatter/gather support - rather they only support 
something called PRP (Physical Region Pages) which require all scatter gather 
elements except the first to be 4KB aligned.  There are other storage 
interfaces such as Linux AIO that also impose alignment restrictions.

-Jim


Currently, the rte_mempool API doesn't support aligned mempool objects. 
This means that when we allocate a 4k buffer and want it aligned to 4k, we 
actually need to allocate an 8k buffer and calculate an offset into it each 
time we want to use it.
We recently did a proof of concept using the rte_mempool_ops hook where we 
allocated a mempool and populated it with aligned entries. This allowed us to 
retrieve aligned addresses directly from rte_mempool_get(), but didn't help 
with the allocation size.
Because the rte_mempool struct assumes that each element has a header 
attached to it, we still need to live up to that assumption for each object we 
create in a mempool. This means that the actual size of a buffer becomes 4k + 
24 bytes. In order to get to our next aligned address, we need to add about 4k 
of padding to each element.
Modifying the current rte_mempool struct to allow entries without headers 
seems impossible since it would break rte_mempool_for_obj_iter and 
rte_mempool_from_obj. However I still think there is a lot of benefit to be 
gained from a mempool structure that supports aligned objects without headers.
I am wondering if DPDK would be open to us introducing an 
rte_mempool_aligned structure. This structure would essentially be a wrapper 
around a regular mempool struct. However, it would not require headers or 
trailers for each object in the pool.

This structure would only be applicable to a subset of mempools with the 
following characteristics:
1. mempools for which the following flags were set: 
MEMPOOL_F_NO_CACHE_ALIGNED, MEMPOOL_F_NO_IOVA_CONTIG , MEMPOOL_F_NO_SPREAD
2. mempools that do not require the use of the following functions 
rte_mempool_from_obj (requires a pointer to the mp in the header of each obj), 
rte_mempool_for_obj_iter.
3. Any attempt to create this object when RTE_LIBRTE_MEMPOOL_DEBUG was 
enabled would necessarily fail since we can't check the header cookies.

My thought would be that we could implement this data structure in a header 
and it would look something like this:

Struct rte_mempool_aligned {
Struct rte_mempool mp;
Size_t obj_alignment;
};

The rest of the functions in the header would primarily be wrappers around 
the original functions. Most functions (rte_mempool_alloc, rte_mempool_free, 
rte_mempool_enqueue/dequeue, rte_mempool_get_count, etc.) could be implemented 
directly as wrappers, and others such as rte_mempool_create and the populate 
functions would have to be re-implemented to some degree in the new header. The 
remaining functions (check_cookies, for_obj_iter) would not be implemented in 
the rte_mempool_aligned.h file. 

Would the community be welcoming of a new rte_mempool_aligned struct? If 
you don't feel like this would be the way to go, are there other options in 
DPDK for creating a pool of pre-allocated aligned objects? 

Thank you,

Seth Howell






[dpdk-dev] [dpdk-users] rte_zmalloc() returning non-zeroed memory on FreeBSD

2016-08-16 Thread Harris, James R


> -Original Message-
> From: Gonzalez Monroy, Sergio
> Sent: Tuesday, August 16, 2016 12:37 AM
> To: Harris, James R; Thomas Monjalon; users at dpdk.org; dev at dpdk.org;
> Richardson, Bruce
> Cc: Verkamp, Daniel
> Subject: Re: [dpdk-dev] [dpdk-users] rte_zmalloc() returning non-zeroed
> memory on FreeBSD
> 
> On 15/08/2016 18:23, Harris, James R wrote:
> >



> > The problem is that the FreeBSD contigmem driver does not re-zero the
> huge
> > pages each time they are mmap'd - they are only zeroed when contigmem
> > initially loads.  I will push a patch for this shortly.
> 
> So that is the case where we run the app more than once, right?
> I missed that, I only ran it once.

Correct - it works OK the first time.  The problem only occurs if you run the 
app
more than once (unless you unload and reload contigmem before running the
app a second time).

-Jim





[dpdk-dev] [dpdk-users] rte_zmalloc() returning non-zeroed memory on FreeBSD

2016-08-15 Thread Harris, James R


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Thursday, August 11, 2016 12:05 AM
> To: users at dpdk.org; dev at dpdk.org; Gonzalez Monroy, Sergio; Richardson,
> Bruce
> Cc: Verkamp, Daniel
> Subject: Re: [dpdk-dev] [dpdk-users] rte_zmalloc() returning non-zeroed
> memory on FreeBSD
> 
> Hi,
> 
> 2016-08-10 23:30, Verkamp, Daniel:
> > It seems that with DPDK 16.07, rte_zmalloc() and related functions no
> > longer return zeroed memory reliably on FreeBSD.
> >
> > I notice that commit b78c9175118f7d61022ddc5c62ce54a1bd73cea5 ("mem:
> do
> > not zero out memory on zmalloc") removed the explicit memset() that
> used
> > to ensure the buffer was zeroed; its log message says:
> >
> > "Zeroing out memory on rte_zmalloc_socket is not required anymore since
> > all allocated memory is already zeroed."
> 
> On Linux, the memory is zeroed by the kernel.
> Then the zero value is maintained in the rte_malloc pool by rte_free.
> 
> > However, I don't see how this is guaranteed (at least for FreeBSD), and
> > it is not true in practice.  I've attached a minimized reproducer program -
> > running it twice in a row fails reliably for me.
> >
> > Is there a missing step in FreeBSD, or is it a more general problem for
> > other platforms?
> 
> I guess the initial value from the kernel has been verified only on Linux.
> We could re-add a memset for FreeBSD.

The problem is that the FreeBSD contigmem driver does not re-zero the huge
pages each time they are mmap'd - they are only zeroed when contigmem
initially loads.  I will push a patch for this shortly.