Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

2019-09-24 Thread Dave Chinner
On Tue, Sep 24, 2019 at 11:19:29PM +0200, Vlastimil Babka wrote:
> On 9/23/19 7:51 PM, Darrick J. Wong wrote:
> > On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> >> On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> >>> So if anyone thinks this is a good idea, please express it (preferably
> >>> in a formal way such as Acked-by), otherwise it seems the patch will be
> >>> dropped (due to a private NACK, apparently).
> > 
> > Oh, I didn't realize   that *some* of us are allowed the
> > privilege of gutting a patch via private NAK without any of that open
> > development discussion incovenience. 
> > 
> > As far as XFS is concerned I merged Dave's series that checks the
> > alignment of io memory allocations and falls back to vmalloc if the
> > alignment won't work, because I got tired of scrolling past the endless
> > discussion and bug reports and inaction spanning months.
> 
> I think it's a big fail of kmalloc API that you have to do that, and
> especially with vmalloc, which has the overhead of setting up page
> tables, and it's a waste for allocation requests smaller than page size.
> I wish we could have nice things.

I don't think the problem here is the code. The problem here is that
we have a dysfunctional development community and there are no
processes we can follow to ensure architectural problems in core
subsystems are addressed in a timely manner...

And this criticism isn't just of the mm/ here - this alignment
problem is exacerbated by exactly the same issue on the block layer
side. i.e. the block layer and drivers have -zero- bounds checking
to catch these sorts of things and the block layer maintainer will
not accept patches for runtime checks that would catch these issues
and make them instantly visible to us.

These are not code problems: we can fix the problems with code (and
I have done so to demonstrate "this is how we do what you say is
impossible").  The problem here is people in positions of
control/power are repeatedly demonstrating an inability to
compromise to reach a solution that works for everyone.

It's far better for us just to work around bullshit like this in XFS
now, then when the core subsystems get they act together years down
the track we can remove the workaround from XFS. Users don't care
how we fix the problem, they just want it fixed. If that means we
have to route around dysfunctional developer groups, then we'll just
have to do that

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

2019-09-03 Thread Matthew Wilcox
On Tue, Sep 03, 2019 at 08:13:45PM +, Christopher Lameter wrote:
> On Sat, 31 Aug 2019, Matthew Wilcox wrote:
> 
> > > The current behavior without special alignment for these caches has been
> > > in the wild for over a decade. And this is now coming up?
> >
> > In the wild ... and rarely enabled.  When it is enabled, it may or may
> > not be noticed as data corruption, or tripping other debugging asserts.
> > Users then turn off the rare debugging option.
> 
> Its enabled in all full debug session as far as I know. Fedora for
> example has been running this for ages to find breakage in device drivers
> etc etc.

Are you telling me nobody uses the ramdisk driver on fedora?  Because
that's one of the affected drivers.

> > > If there is an exceptional alignment requirement then that needs to be
> > > communicated to the allocator. A special flag or create a special
> > > kmem_cache or something.
> >
> > The only way I'd agree to that is if we deliberately misalign every
> > allocation that doesn't have this special flag set.  Because right now,
> > breakage happens everywhere when these debug options are enabled, and
> > the very people who need to be helped are being hurt by the debugging.
> 
> That is customarily occurring for testing by adding "slub_debug" to the
> kernel commandline (or adding debug kernel options) and since my
> information is that this is done frequently (and has been for over a
> decade now) I am having a hard time believing the stories of great
> breakage here. These drivers were not tested with debugging on before?
> Never ran with a debug kernel?

Whatever is being done is clearly not enough to trigger the bug.  So how
about it?  Create an option to slab/slub to always return misaligned
memory.



Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

2019-09-03 Thread Christopher Lameter
On Sat, 31 Aug 2019, Matthew Wilcox wrote:

> > The current behavior without special alignment for these caches has been
> > in the wild for over a decade. And this is now coming up?
>
> In the wild ... and rarely enabled.  When it is enabled, it may or may
> not be noticed as data corruption, or tripping other debugging asserts.
> Users then turn off the rare debugging option.

Its enabled in all full debug session as far as I know. Fedora for
example has been running this for ages to find breakage in device drivers
etc etc.

> > If there is an exceptional alignment requirement then that needs to be
> > communicated to the allocator. A special flag or create a special
> > kmem_cache or something.
>
> The only way I'd agree to that is if we deliberately misalign every
> allocation that doesn't have this special flag set.  Because right now,
> breakage happens everywhere when these debug options are enabled, and
> the very people who need to be helped are being hurt by the debugging.

That is customarily occurring for testing by adding "slub_debug" to the
kernel commandline (or adding debug kernel options) and since my
information is that this is done frequently (and has been for over a
decade now) I am having a hard time believing the stories of great
breakage here. These drivers were not tested with debugging on before?
Never ran with a debug kernel?


[PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

2019-08-26 Thread Vlastimil Babka
In most configurations, kmalloc() happens to return naturally aligned (i.e.
aligned to the block size itself) blocks for power of two sizes. That means
some kmalloc() users might unknowingly rely on that alignment, until stuff
breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
and blocks stop being aligned. Then developers have to devise workaround such
as own kmem caches with specified alignment [1], which is not always practical,
as recently evidenced in [2].

The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
variant would not help with code unknowingly relying on the implicit alignment.
For slab implementations it would either require creating more kmalloc caches,
or allocate a larger size and only give back part of it. That would be
wasteful, especially with a generic alignment parameter (in contrast with a
fixed alignment to size).

Ideally we should provide to mm users what they need without difficult
workarounds or own reimplementations, so let's make the kmalloc() alignment to
size explicitly guaranteed for power-of-two sizes under all configurations.
What this means for the three available allocators?

* SLAB object layout happens to be mostly unchanged by the patch. The
  implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
  to redzoning, however SLAB disables redzoning for caches with alignment
  larger than unsigned long long. Practically on at least x86 this includes
  kmalloc caches as they use cache line alignment, which is larger than that.
  Still, this patch ensures alignment on all arches and cache sizes.

* SLUB layout is also unchanged unless redzoning is enabled through
  CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
  this patch, explicit alignment is guaranteed with redzoning as well. This
  will result in more memory being wasted, but that should be acceptable in a
  debugging scenario.

* SLOB has no implicit alignment so this patch adds it explicitly for
  kmalloc(). The potential downside is increased fragmentation. While
  pathological allocation scenarios are certainly possible, in my testing,
  after booting a x86_64 kernel+userspace with virtme, around 16MB memory
  was consumed by slab pages both before and after the patch, with difference
  in the noise.

[1] 
https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.le...@c-s.fr/
[2] 
https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming@redhat.com/
[3] https://lwn.net/Articles/787740/

Signed-off-by: Vlastimil Babka 
---
 Documentation/core-api/memory-allocation.rst |  4 ++
 include/linux/slab.h |  4 ++
 mm/slab_common.c | 11 -
 mm/slob.c| 42 +++-
 4 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/Documentation/core-api/memory-allocation.rst 
b/Documentation/core-api/memory-allocation.rst
index 7744aa3bf2e0..27c54854b508 100644
--- a/Documentation/core-api/memory-allocation.rst
+++ b/Documentation/core-api/memory-allocation.rst
@@ -98,6 +98,10 @@ limited. The actual limit depends on the hardware and the 
kernel
 configuration, but it is a good practice to use `kmalloc` for objects
 smaller than page size.
 
+The address of a chunk allocated with `kmalloc` is aligned to at least
+ARCH_KMALLOC_MINALIGN bytes. For sizes of power of two bytes, the
+alignment is also guaranteed to be at least to the respective size.
+
 For large allocations you can use :c:func:`vmalloc` and
 :c:func:`vzalloc`, or directly request pages from the page
 allocator. The memory allocated by `vmalloc` and related functions is
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 56c9c7eed34e..0d4c26395785 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -493,6 +493,10 @@ static __always_inline void *kmalloc_large(size_t size, 
gfp_t flags)
  * kmalloc is the normal method of allocating memory
  * for objects smaller than page size in the kernel.
  *
+ * The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN
+ * bytes. For @size of power of two bytes, the alignment is also guaranteed
+ * to be at least to the size.
+ *
  * The @flags argument may be one of the GFP flags defined at
  * include/linux/gfp.h and described at
  * :ref:`Documentation/core-api/mm-api.rst `
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 929c02a90fba..b9ba93ad5c7f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -993,10 +993,19 @@ void __init create_boot_cache(struct kmem_cache *s, const 
char *name,
unsigned int useroffset, unsigned int usersize)
 {
int err;
+   unsigned int align = ARCH_KMALLOC_MINALIGN;
 
s->name = name;
s->size = s->object_size = size;
-   s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
+
+   /*
+* For power of two sizes, guarantee