Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
On Tue, Sep 24, 2019 at 11:19:29PM +0200, Vlastimil Babka wrote: > On 9/23/19 7:51 PM, Darrick J. Wong wrote: > > On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote: > >> On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote: > >>> So if anyone thinks this is a good idea, please express it (preferably > >>> in a formal way such as Acked-by), otherwise it seems the patch will be > >>> dropped (due to a private NACK, apparently). > > > > Oh, I didn't realize that *some* of us are allowed the > > privilege of gutting a patch via private NAK without any of that open > > development discussion incovenience. > > > > As far as XFS is concerned I merged Dave's series that checks the > > alignment of io memory allocations and falls back to vmalloc if the > > alignment won't work, because I got tired of scrolling past the endless > > discussion and bug reports and inaction spanning months. > > I think it's a big fail of kmalloc API that you have to do that, and > especially with vmalloc, which has the overhead of setting up page > tables, and it's a waste for allocation requests smaller than page size. > I wish we could have nice things. I don't think the problem here is the code. The problem here is that we have a dysfunctional development community and there are no processes we can follow to ensure architectural problems in core subsystems are addressed in a timely manner... And this criticism isn't just of the mm/ here - this alignment problem is exacerbated by exactly the same issue on the block layer side. i.e. the block layer and drivers have -zero- bounds checking to catch these sorts of things and the block layer maintainer will not accept patches for runtime checks that would catch these issues and make them instantly visible to us. These are not code problems: we can fix the problems with code (and I have done so to demonstrate "this is how we do what you say is impossible"). The problem here is people in positions of control/power are repeatedly demonstrating an inability to compromise to reach a solution that works for everyone. It's far better for us just to work around bullshit like this in XFS now, then when the core subsystems get they act together years down the track we can remove the workaround from XFS. Users don't care how we fix the problem, they just want it fixed. If that means we have to route around dysfunctional developer groups, then we'll just have to do that Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
On Tue, Sep 03, 2019 at 08:13:45PM +, Christopher Lameter wrote: > On Sat, 31 Aug 2019, Matthew Wilcox wrote: > > > > The current behavior without special alignment for these caches has been > > > in the wild for over a decade. And this is now coming up? > > > > In the wild ... and rarely enabled. When it is enabled, it may or may > > not be noticed as data corruption, or tripping other debugging asserts. > > Users then turn off the rare debugging option. > > Its enabled in all full debug session as far as I know. Fedora for > example has been running this for ages to find breakage in device drivers > etc etc. Are you telling me nobody uses the ramdisk driver on fedora? Because that's one of the affected drivers. > > > If there is an exceptional alignment requirement then that needs to be > > > communicated to the allocator. A special flag or create a special > > > kmem_cache or something. > > > > The only way I'd agree to that is if we deliberately misalign every > > allocation that doesn't have this special flag set. Because right now, > > breakage happens everywhere when these debug options are enabled, and > > the very people who need to be helped are being hurt by the debugging. > > That is customarily occurring for testing by adding "slub_debug" to the > kernel commandline (or adding debug kernel options) and since my > information is that this is done frequently (and has been for over a > decade now) I am having a hard time believing the stories of great > breakage here. These drivers were not tested with debugging on before? > Never ran with a debug kernel? Whatever is being done is clearly not enough to trigger the bug. So how about it? Create an option to slab/slub to always return misaligned memory.
Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
On Sat, 31 Aug 2019, Matthew Wilcox wrote: > > The current behavior without special alignment for these caches has been > > in the wild for over a decade. And this is now coming up? > > In the wild ... and rarely enabled. When it is enabled, it may or may > not be noticed as data corruption, or tripping other debugging asserts. > Users then turn off the rare debugging option. Its enabled in all full debug session as far as I know. Fedora for example has been running this for ages to find breakage in device drivers etc etc. > > If there is an exceptional alignment requirement then that needs to be > > communicated to the allocator. A special flag or create a special > > kmem_cache or something. > > The only way I'd agree to that is if we deliberately misalign every > allocation that doesn't have this special flag set. Because right now, > breakage happens everywhere when these debug options are enabled, and > the very people who need to be helped are being hurt by the debugging. That is customarily occurring for testing by adding "slub_debug" to the kernel commandline (or adding debug kernel options) and since my information is that this is done frequently (and has been for over a decade now) I am having a hard time believing the stories of great breakage here. These drivers were not tested with debugging on before? Never ran with a debug kernel?
[PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
In most configurations, kmalloc() happens to return naturally aligned (i.e. aligned to the block size itself) blocks for power of two sizes. That means some kmalloc() users might unknowingly rely on that alignment, until stuff breaks when the kernel is built with e.g. CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then developers have to devise workaround such as own kmem caches with specified alignment [1], which is not always practical, as recently evidenced in [2]. The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()' variant would not help with code unknowingly relying on the implicit alignment. For slab implementations it would either require creating more kmalloc caches, or allocate a larger size and only give back part of it. That would be wasteful, especially with a generic alignment parameter (in contrast with a fixed alignment to size). Ideally we should provide to mm users what they need without difficult workarounds or own reimplementations, so let's make the kmalloc() alignment to size explicitly guaranteed for power-of-two sizes under all configurations. What this means for the three available allocators? * SLAB object layout happens to be mostly unchanged by the patch. The implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for caches with alignment larger than unsigned long long. Practically on at least x86 this includes kmalloc caches as they use cache line alignment, which is larger than that. Still, this patch ensures alignment on all arches and cache sizes. * SLUB layout is also unchanged unless redzoning is enabled through CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With this patch, explicit alignment is guaranteed with redzoning as well. This will result in more memory being wasted, but that should be acceptable in a debugging scenario. * SLOB has no implicit alignment so this patch adds it explicitly for kmalloc(). The potential downside is increased fragmentation. While pathological allocation scenarios are certainly possible, in my testing, after booting a x86_64 kernel+userspace with virtme, around 16MB memory was consumed by slab pages both before and after the patch, with difference in the noise. [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.le...@c-s.fr/ [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming@redhat.com/ [3] https://lwn.net/Articles/787740/ Signed-off-by: Vlastimil Babka --- Documentation/core-api/memory-allocation.rst | 4 ++ include/linux/slab.h | 4 ++ mm/slab_common.c | 11 - mm/slob.c| 42 +++- 4 files changed, 49 insertions(+), 12 deletions(-) diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst index 7744aa3bf2e0..27c54854b508 100644 --- a/Documentation/core-api/memory-allocation.rst +++ b/Documentation/core-api/memory-allocation.rst @@ -98,6 +98,10 @@ limited. The actual limit depends on the hardware and the kernel configuration, but it is a good practice to use `kmalloc` for objects smaller than page size. +The address of a chunk allocated with `kmalloc` is aligned to at least +ARCH_KMALLOC_MINALIGN bytes. For sizes of power of two bytes, the +alignment is also guaranteed to be at least to the respective size. + For large allocations you can use :c:func:`vmalloc` and :c:func:`vzalloc`, or directly request pages from the page allocator. The memory allocated by `vmalloc` and related functions is diff --git a/include/linux/slab.h b/include/linux/slab.h index 56c9c7eed34e..0d4c26395785 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -493,6 +493,10 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags) * kmalloc is the normal method of allocating memory * for objects smaller than page size in the kernel. * + * The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN + * bytes. For @size of power of two bytes, the alignment is also guaranteed + * to be at least to the size. + * * The @flags argument may be one of the GFP flags defined at * include/linux/gfp.h and described at * :ref:`Documentation/core-api/mm-api.rst ` diff --git a/mm/slab_common.c b/mm/slab_common.c index 929c02a90fba..b9ba93ad5c7f 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -993,10 +993,19 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, unsigned int useroffset, unsigned int usersize) { int err; + unsigned int align = ARCH_KMALLOC_MINALIGN; s->name = name; s->size = s->object_size = size; - s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size); + + /* +* For power of two sizes, guarantee