Re: [PATCH] riscv: fix bugon.cocci warnings

2021-02-28 Thread Pekka Enberg
On Sun, Feb 28, 2021 at 12:10:22PM +0100, Julia Lawall wrote:
> From: kernel test robot 
> 
> Use BUG_ON instead of a if condition followed by BUG.
> 
> Generated by: scripts/coccinelle/misc/bugon.cocci
> 
> Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported")
> CC: Guo Ren 
> Reported-by: kernel test robot 
> Signed-off-by: kernel test robot 
> Signed-off-by: Julia Lawall 

Reviewed-by: Pekka Enberg 


Re: [PATCH] riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL

2021-01-03 Thread Pekka Enberg
On Sat, Jan 2, 2021 at 3:26 PM  wrote:
>
> From: Guo Ren 
>
> The patch fix commit: ad5d112 ("riscv: use vDSO common flow to
> reduce the latency of the time-related functions").
>
> The GENERIC_TIME_VSYSCALL should be CONFIG_GENERIC_TIME_VSYSCALL
> or vgettimeofday won't work.
>
> Signed-off-by: Guo Ren 
> Cc: Atish Patra 
> Cc: Palmer Dabbelt 
> Cc: Vincent Chen 

Reviewed-by: Pekka Enberg 


Re: [PATCH] riscv: mm: abort uaccess retries upon fatal signal

2020-12-30 Thread Pekka Enberg
On Wed, Dec 30, 2020 at 4:40 PM  wrote:
> From: Guo Ren 
>
> Pick up the patch from the 'Link' made by Mark Rutland. Keep the
> same with x86, arm, arm64, arc, sh, power.
>
> Link: 
> https://lore.kernel.org/linux-arm-kernel/1499782763-31418-1-git-send-email-mark.rutl...@arm.com/
> Signed-off-by: Guo Ren 
> Cc: Mark Rutland 
> Cc: Pekka Enberg 
> Cc: Palmer Dabbelt 

Reviewed-by: Pekka Enberg 


Re: [PATCH v2 0/2] Let illegal access to user-space memory die

2020-12-02 Thread Pekka Enberg
Hi Eric,

On Thu, Dec 3, 2020 at 8:51 AM Eric Lin  wrote:
>
> Accesses to user-space memory without calling uaccess routine
> leads to hanging in page fault handler. Like arm64, we let it
> die earlier in page fault handler.
>
> Changes in v2:
> -Add a die_kernel_fault() helper
> -Split one long line code into two

Please also make no_context() use the new helper. Other than that:

Reviewed-by: Pekka Enberg 


Re: [PATCH] riscv/mm: Prevent kernel module access user-space memory without uaccess routines

2020-11-30 Thread Pekka Enberg
On Mon, Nov 30, 2020 at 7:33 AM Eric Lin  wrote:
>
> In the page fault handler, an access to user-space memory
> without get/put_user() or copy_from/to_user() routines is
> not resolved properly. Like arm and other architectures,
> we need to let it die earlier in page fault handler.

Fix looks good to me. Can you elaborate on how you found the issue and
how the bug manifests itself?

>
> Signed-off-by: Eric Lin 
> Cc: Alan Kao 
> ---
>  arch/riscv/mm/fault.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> index 3c8b9e433c67..a452cfa266a2 100644
> --- a/arch/riscv/mm/fault.c
> +++ b/arch/riscv/mm/fault.c
> @@ -232,6 +232,9 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
> if (user_mode(regs))
> flags |= FAULT_FLAG_USER;
>
> +   if (!user_mode(regs) && addr < TASK_SIZE && unlikely(!(regs->status & 
> SR_SUM)))
> +   die(regs, "Accessing user space memory without uaccess 
> routines\n");

Let's introduce a die_kernel_fault() helper (similar to arm64, for
example) to ensure same semantics for the different kernel faults. You
can extract the helper from no_context().

> +
> perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
>
> if (cause == EXC_STORE_PAGE_FAULT)
> --
> 2.17.0
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH v3] riscv: fix pfn_to_virt err in do_page_fault().

2020-10-26 Thread Pekka Enberg
On Mon, Oct 26, 2020 at 08:26:54PM +0800, liush wrote:
> From: Liu Shaohua 
> 
> The argument to pfn_to_virt() should be pfn not the value of CSR_SATP.
> 
> Reviewed-by: Palmer Dabbelt 
> Reviewed-by: Anup Patel 
> Signed-off-by: liush 

Reviewed-by: Pekka Enberg 

> ---
>  arch/riscv/mm/fault.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> index 1359e21..3c8b9e4 100644
> --- a/arch/riscv/mm/fault.c
> +++ b/arch/riscv/mm/fault.c
> @@ -86,6 +86,7 @@ static inline void vmalloc_fault(struct pt_regs *regs, int 
> code, unsigned long a
>   pmd_t *pmd, *pmd_k;
>   pte_t *pte_k;
>   int index;
> + unsigned long pfn;
>  
>   /* User mode accesses just cause a SIGSEGV */
>   if (user_mode(regs))
> @@ -100,7 +101,8 @@ static inline void vmalloc_fault(struct pt_regs *regs, 
> int code, unsigned long a
>* of a task switch.
>*/
>   index = pgd_index(addr);
> - pgd = (pgd_t *)pfn_to_virt(csr_read(CSR_SATP)) + index;
> + pfn = csr_read(CSR_SATP) & SATP_PPN;
> + pgd = (pgd_t *)pfn_to_virt(pfn) + index;
>   pgd_k = init_mm.pgd + index;
>  
>   if (!pgd_present(*pgd_k)) {
> -- 
> 2.7.4
> 


Re: [PATCH -next] riscv/mm/fault: fix old-style-declaration warning

2020-09-09 Thread Pekka Enberg
Hi,

On Wed, Sep 9, 2020 at 2:20 PM Wei Yongjun  wrote:
>
> gcc report build warning as follows:
>
> arch/riscv/mm/fault.c:81:1: warning:
>  'inline' is not at beginning of declaration [-Wold-style-declaration]
>81 | static void inline vmalloc_fault(struct pt_regs *regs, int code, 
> unsigned long addr)
>   | ^~
>
> This commit fix it by moving 'inline' after 'static'.
>
> Signed-off-by: Wei Yongjun 

Thanks for the fix, but Palmer committed similar fix from me to for-next:

https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/commit/?h=for-next&id=6f36a9635082e1d6910bc7853d0c9fd12d7890b5

- Pekka


Re: [PATCH v4 0/3] Get cache information from userland

2020-08-31 Thread Pekka Enberg
For the series:

Reviewed-by: Pekka Enberg 

- Pekka


Re: [PATCH v3 1/3] riscv: Set more data to cacheinfo

2020-08-31 Thread Pekka Enberg
Hi,

On Mon, Aug 31, 2020 at 9:15 AM Zong Li  wrote:
> If the sets is one, it means that the cache is fully associative, then
> we don't need to fill the ways number, just keep way number as zero,
> so here we want to find the fully associative case first and make the
> if expression fail at the beginning. We might also rewrite it as
> follow:

[snip]

Thanks for the explanation! The rewritten version is much easier to
read, at least to me.

- Pekka


Re: [PATCH v3 3/3] riscv: Add cache information in AUX vector

2020-08-30 Thread Pekka Enberg
On Fri, Aug 28, 2020 at 10:09 AM Zong Li  wrote:
> +uintptr_t get_cache_geometry(u32 level, enum cache_type type)
> +{
> +   struct cacheinfo *this_leaf = get_cacheinfo(level, type);
> +   uintptr_t ret = (this_leaf->ways_of_associativity << 16 |
> +this_leaf->coherency_line_size);

You are dereferencing "this_leaf" without checking if it's NULL here.

> +
> +   return this_leaf ? ret : 0;
> +}

- Pekka


Re: [PATCH v3 1/3] riscv: Set more data to cacheinfo

2020-08-30 Thread Pekka Enberg
Hi,

On Fri, Aug 28, 2020 at 10:09 AM Zong Li  wrote:
>
> Set cacheinfo.{size,sets,line_size} for each cache node, then we can
> get these information from userland through auxiliary vector.
>
> Signed-off-by: Zong Li 
> ---
>  arch/riscv/kernel/cacheinfo.c | 59 ++-
>  1 file changed, 44 insertions(+), 15 deletions(-)
>
> diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
> index bd0f122965c3..8b85abfbd77a 100644
> --- a/arch/riscv/kernel/cacheinfo.c
> +++ b/arch/riscv/kernel/cacheinfo.c
> @@ -25,12 +25,46 @@ cache_get_priv_group(struct cacheinfo *this_leaf)
> return NULL;
>  }
>
> -static void ci_leaf_init(struct cacheinfo *this_leaf,
> -struct device_node *node,
> -enum cache_type type, unsigned int level)
> +static void ci_leaf_init(struct cacheinfo *this_leaf, enum cache_type type,
> +unsigned int level, unsigned int size,
> +unsigned int sets, unsigned int line_size)
>  {
> this_leaf->level = level;
> this_leaf->type = type;
> +   this_leaf->size = size;
> +   this_leaf->number_of_sets = sets;
> +   this_leaf->coherency_line_size = line_size;
> +
> +   /*
> +* If the cache is fully associative, there is no need to
> +* check the other properties.
> +*/
> +   if (!(sets == 1) && (sets > 0 && size > 0 && line_size > 0))

Can you explain what this is attempting to do? AFAICT, the if
expression can be reduced to "sets > 1 && size > 0 && size > 0", but
what do you mean with the comment about fully associative caches?

> +   this_leaf->ways_of_associativity = (size / sets) / line_size;
> +}


Re: [PATCH v2 17/23] riscv: use asm-generic/mmu_context.h for no-op implementations

2020-08-26 Thread Pekka Enberg
On Wed, Aug 26, 2020 at 5:54 PM Nicholas Piggin  wrote:
>
> Cc: Paul Walmsley 
> Cc: Palmer Dabbelt 
> Cc: Albert Ou 
> Cc: linux-ri...@lists.infradead.org
> Acked-by: Palmer Dabbelt 
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Pekka Enberg 


[RFC PATCH] x86/mm/fault: Inline page fault paths to reduce kernel text size

2020-08-25 Thread Pekka Enberg
From: Pekka Enberg 

The commit 92181f190b649f7ef2b79cbf5c00f26ccc66da2a ("x86: optimise
x86's do_page_fault (C entry point for the page fault path)") from 2009
shows significant stack savings when infrequent page fault handling
paths are moved out of line with the "noinline" annotation, with some increase
in kernel text size.

However, a decade of GCC improvements (and changes in the code) has
eliminated such wins in stack usage:

With "noinline":

0b30 :
 b30:   41 57   push   %r15
 b32:   41 56   push   %r14
 b34:   41 55   push   %r13
 b36:   49 89 d5mov%rdx,%r13
 b39:   41 54   push   %r12
 b3b:   49 89 fcmov%rdi,%r12
 b3e:   55  push   %rbp
 b3f:   48 89 f5mov%rsi,%rbp
 b42:   53  push   %rbx
 b43:   65 48 8b 04 25 00 00mov%gs:0x0,%rax
 b4a:   00 00
 b4c:   48 83 ec 18 sub$0x18,%rsp

With "inline":

08a0 :
 8a0:   41 57   push   %r15
 8a2:   41 56   push   %r14
 8a4:   41 55   push   %r13
 8a6:   49 89 d5mov%rdx,%r13
 8a9:   41 54   push   %r12
 8ab:   49 89 fcmov%rdi,%r12
 8ae:   55  push   %rbp
 8af:   48 89 f5mov%rsi,%rbp
 8b2:   53  push   %rbx
 8b3:   65 48 8b 04 25 00 00mov%gs:0x0,%rax
 8ba:   00 00
 8bc:   48 83 ec 18 sub$0x18,%rsp

So all we're left with is the increase in kernel text:

  add/remove: 1/5 grow/shrink: 2/1 up/down: 1049/-1144 (-95)
  Function old new   delta
  spurious_kernel_fault.part - 559+559
  do_user_addr_fault   9881446+458
  do_kern_addr_fault   103 135 +32
  bad_area_nosemaphore  13   - -13
  bad_area  66   - -66
  pgtable_bad  107   --107
  mm_fault_error   199   --199
  bad_area_access_error226   --226
  spurious_kernel_fault553  20-533
  Total: Before=7359, After=7264, chg -1.29%

Therefore, let's inline the page fault handling paths to reduce kernel
text size.

Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Signed-off-by: Pekka Enberg 
---
 arch/x86/mm/fault.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 35f1498e9832..b8893684d80c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -533,7 +533,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long 
error_code, unsigned long ad
dump_pagetable(address);
 }
 
-static noinline void
+static inline void
 pgtable_bad(struct pt_regs *regs, unsigned long error_code,
unsigned long address)
 {
@@ -577,7 +577,7 @@ static void set_signal_archinfo(unsigned long address,
tsk->thread.cr2 = address;
 }
 
-static noinline void
+static inline void
 no_context(struct pt_regs *regs, unsigned long error_code,
   unsigned long address, int signal, int si_code)
 {
@@ -788,7 +788,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long 
error_code,
no_context(regs, error_code, address, SIGSEGV, si_code);
 }
 
-static noinline void
+static inline void
 bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 unsigned long address)
 {
@@ -809,7 +809,7 @@ __bad_area(struct pt_regs *regs, unsigned long error_code,
__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
 }
 
-static noinline void
+static inline void
 bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
__bad_area(regs, error_code, address, 0, SEGV_MAPERR);
@@ -832,7 +832,7 @@ static inline bool bad_area_access_from_pkeys(unsigned long 
error_code,
return false;
 }
 
-static noinline void
+static inline void
 bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
  unsigned long address, struct vm_area_struct *vma)
 {
@@ -905,7 +905,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, 
unsigned long address,
force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
 }
 
-static noinline void
+static inline void
 mm_fault_error(struct pt_regs *regs, unsigned long error_code,
   unsign

Re: [PATCH v2 0/3] mm/slub: Fix count_partial() problem

2020-08-20 Thread Pekka Enberg
On Mon, Aug 10, 2020 at 3:18 PM Xunlei Pang  wrote:
>
> v1->v2:
> - Improved changelog and variable naming for PATCH 1~2.
> - PATCH3 adds per-cpu counter to avoid performance regression
>   in concurrent __slab_free().
>
> [Testing]
> On my 32-cpu 2-socket physical machine:
> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
> perf stat --null --repeat 10 -- hackbench 20 thread 2
>
> == original, no patched
>   19.211637055 seconds time elapsed   
>( +-  0.57% )
>
> == patched with patch1~2
>  Performance counter stats for 'hackbench 20 thread 2' (10 runs):
>
>   21.731833146 seconds time elapsed   
>( +-  0.17% )
>
> == patched with patch1~3
>  Performance counter stats for 'hackbench 20 thread 2' (10 runs):
>
>   19.112106847 seconds time elapsed   
>( +-  0.64% )
>
>
> Xunlei Pang (3):
>   mm/slub: Introduce two counters for partial objects
>   mm/slub: Get rid of count_partial()
>   mm/slub: Use percpu partial free counter
>
>  mm/slab.h |   2 +
>  mm/slub.c | 124 
> +++---
>  2 files changed, 89 insertions(+), 37 deletions(-)

We probably need to wrap the counters under CONFIG_SLUB_DEBUG because
AFAICT all the code that uses them is also wrapped under it.

An alternative approach for this patch would be to somehow make the
lock in count_partial() more granular, but I don't know how feasible
that actually is.

Anyway, I am OK with this approach:

Reviewed-by: Pekka Enberg 

You still need to convince Christoph, though, because he had
objections over this approach.

- Pekka


Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects

2020-08-20 Thread Pekka Enberg
Hi Christopher,

On Tue, Aug 11, 2020 at 3:52 PM Christopher Lameter  wrote:
>
> On Fri, 7 Aug 2020, Pekka Enberg wrote:
>
> > Why do you consider this to be a fast path? This is all partial list
> > accounting when we allocate/deallocate a slab, no? Just like
> > ___slab_alloc() says, I assumed this to be the slow path... What am I
> > missing?
>
> I thought these were per object counters? If you just want to count the
> number of slabs then you do not need the lock at all. We already have a
> counter for the number of slabs.

The patch attempts to speed up count_partial(), which holds on to the
"n->list_lock" (with IRQs off) for the whole duration it takes to walk
the partial slab list:

spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list)
x += get_count(page);
spin_unlock_irqrestore(&n->list_lock, flags);

It's counting the number of *objects*, but the counters are only
updated in bulk when we add/remove a slab to/from the partial list.
The counter updates are therefore *not* in the fast-path AFAICT.

Xunlei, please correct me if I'm reading your patches wrong.

On Tue, Aug 11, 2020 at 3:52 PM Christopher Lameter  wrote:
> > No objections to alternative fixes, of course, but wrapping the
> > counters under CONFIG_DEBUG seems like just hiding the actual issue...
>
> CONFIG_DEBUG is on by default. It just compiles in the debug code and
> disables it so we can enable it with a kernel boot option. This is because
> we have had numerous issues in the past with "production" kernels that
> could not be recompiled with debug options. So just running the prod
> kernel with another option will allow you to find hard to debug issues in
> a full scale producton deployment with potentially proprietary modules
> etc.

Yeah, it's been too long since I last looked at the code and did not
realize even count_partial() is wrapped in CONFIG_DEBUG. So by all
means, let's also wrap the counters with that too.

- Pekka


Re: [PATCH] mm: sort freelist by rank number

2020-08-19 Thread Pekka Enberg
Hi KyongHo and David,

On 07.08.20 09:08, Pekka Enberg wrote:
> > > I think having more knowledge of DRAM controller details in the OS
> > > would be potentially beneficial for better page allocation policy, so
> > > maybe try come up with something more generic, even if the fallback to
> > > providing this information is a kernel command line option.

On Tue, Aug 18, 2020 at 12:02 PM Cho KyongHo  wrote:
> I don't find if there is a way to deliver detailed DRAM information
> through ACPI, ATAG or something similar. But I didn't find.

Perhaps that's still the case today then.

In the context of this patch, what I am hoping is that we'd turn the
"dram_rank_granule" parameter -- even if it's still a manually
configured thing -- into an abstraction that we can later extend. IOW,
something similar to what the "energy model"
(kernel/power/energy_model.c) today is.

On Mon, Aug 10, 2020 at 09:32:18AM +0200, David Hildenbrand wrote:
> I guess one universal approach is by measuring access times ... not what
> we might be looking for :)

I don't think it's an unreasonable approach, but measurement accuracy
and speed will determine if it's actually practical. There are
examples of this kind of approach elsewhere too. For example, MCTOP
[1] which aims to provide topology-aware OS abstractions, uses cache
latency measurements to infer the topology.

1. https://timharris.uk/papers/2017-eurosys-mctop.pdf

Regards,

- Pekka


Re: [PATCH -next] x86/mpparse: remove duplicate include

2020-08-19 Thread Pekka Enberg
On Wed, Aug 19, 2020 at 07:29:10PM +0800, Wang Hai wrote:
> Remove asm/io_apic.h which is included more than once
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Wang Hai 

Reviewed-by: Pekka Enberg 


Re: [PATCH -next] mm: slab: Remove duplicate include

2020-08-18 Thread Pekka Enberg
On Tue, Aug 18, 2020 at 2:43 PM YueHaibing  wrote:
>
> Remove duplicate header which is included twice.
>
> Signed-off-by: YueHaibing 

Reviewed-by: Pekka Enberg 


Re: [PATCH] mm/slub: fix missing ALLOC_SLOWPATH stat when bulk alloc

2020-08-10 Thread Pekka Enberg
On Tue, Aug 11, 2020 at 5:25 AM  wrote:
>
> From: Abel Wu 
>
> The ALLOC_SLOWPATH statistics is missing in bulk allocation now.
> Fix it by doing statistics in alloc slow path.
>
> Signed-off-by: Abel Wu 

Reviewed-by: Pekka Enberg 


Re: Odd-sized kmem_cache_alloc and slub_debug=Z

2020-08-07 Thread Pekka Enberg
Hi Marco and Kees,

On Fri, Aug 07, 2020 at 08:06PM +0300, Pekka Enberg wrote:
> > Anything interesting in your .config? The fault does not reproduce
> > with 5.8.0 + x86-64 defconfig.

On Fri, Aug 7, 2020 at 8:18 PM Marco Elver  wrote:
> It's quite close to defconfig, just some extra options for my test
> environment. But none that I'd imagine change this behaviour -- but
> maybe I missed something. I've attached my config. Also, just in case,
> I'm on mainline from Tuesday: 2324d50d051ec0f14a548e78554fb02513d6dcef.

Yeah, it reproduces with defconfig too, as long as you remember to
pass "slub_debug=Z"... :-/

The following seems to be the culprit:

commit 3202fa62fb43087387c65bfa9c100feffac74aa6
Author: Kees Cook 
Date:   Wed Apr 1 21:04:27 2020 -0700

slub: relocate freelist pointer to middle of object

Reverting this commit and one of it's follow up fixes from Kees from
v5.8 makes the issue go away for me. Btw, please note that caches with
size 24 and larger do not trigger this bug, so the issue is that with
small enough object size, we're stomping on allocator metadata (I
assume part of the freelist).

- Pekka


Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects

2020-08-07 Thread Pekka Enberg
Hi Christopher,

On Fri, 7 Aug 2020, Pekka Enberg wrote:
> > I think we can just default to the counters. After all, if I
> > understood correctly, we're talking about up to 100 ms time period
> > with IRQs disabled when count_partial() is called. As this is
> > triggerable from user space, that's a performance bug whatever way you
> > look at it.

On Fri, Aug 7, 2020 at 4:02 PM Christopher Lameter  wrote:
> Well yes under extreme conditions and this is only happening for sysfs
> counter retrieval.

You will likely get some stall even in less extreme conditions, and in
any case, the kernel should not allow user space to trigger such a
stall.

On Fri, Aug 7, 2020 at 4:02 PM Christopher Lameter  wrote:
> There could be other solutions to this. This solution here is penalizing
> evertu hotpath slab allocation for the sake of relatively infrequently
> used counter monitoring. There the possibility of not traversing the list
> ande simply estimating the value based on the number of slab pages
> allocated on that node.

Why do you consider this to be a fast path? This is all partial list
accounting when we allocate/deallocate a slab, no? Just like
___slab_alloc() says, I assumed this to be the slow path... What am I
missing?

No objections to alternative fixes, of course, but wrapping the
counters under CONFIG_DEBUG seems like just hiding the actual issue...

- Pekka


Re: Odd-sized kmem_cache_alloc and slub_debug=Z

2020-08-07 Thread Pekka Enberg
Hi Marco,

On Fri, Aug 7, 2020 at 7:07 PM Marco Elver  wrote:
> I found that the below debug-code using kmem_cache_alloc(), when using
> slub_debug=Z, results in the following crash:
>
> general protection fault, probably for non-canonical address 
> 0xcca41caea170:  [#1] PREEMPT SMP PTI
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.0+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 
> 04/01/2014
> RIP: 0010:freelist_dereference mm/slub.c:272 [inline]
> RIP: 0010:get_freepointer mm/slub.c:278 [inline]
> RIP: 0010:deactivate_slab+0x54/0x460 mm/slub.c:2111
> Code: 8b bc c7 e0 00 00 00 48 85 d2 0f 84 00 01 00 00 49 89 d5 31 c0 
> 48 89 44 24 08 66 66 2e 0f 1f 84 00 00 00 00 00 90 44 8b 43 20 <4b> 8b 44 05 
> 00 48 85 c0 0f 84 1e 01 00 00 4c 89 ed 49 89 c5 8b 43
> RSP: :a7e03e18 EFLAGS: 00010046
> RAX:  RBX: a3a41c972340 RCX: 
> RDX: cca41caea160 RSI: e7c6a072ba80 RDI: a3a41c972340
> RBP: a3a41caea008 R08: 0010 R09: a3a41caea01d
> R10: a7f8dc50 R11: a68f44c0 R12: a3a41c972340
> R13: cca41caea160 R14: e7c6a072ba80 R15: a3a41c96d540
> FS:  () GS:a3a41fc0() 
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: a3a051c01000 CR3: 00045140a001 CR4: 00770ef0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> PKRU: 
> Call Trace:
>  ___slab_alloc+0x336/0x340 mm/slub.c:2690
>  __slab_alloc mm/slub.c:2714 [inline]
>  slab_alloc_node mm/slub.c:2788 [inline]
>  slab_alloc mm/slub.c:2832 [inline]
>  kmem_cache_alloc+0x135/0x200 mm/slub.c:2837
>  start_kernel+0x3d6/0x44e init/main.c:1049
>  secondary_startup_64+0xb6/0xc0 arch/x86/kernel/head_64.S:243
>
> Any ideas what might be wrong?
>
> This does not crash when redzones are not enabled.
>
> Thanks,
> -- Marco
>
> -- >8 --
>
> diff --git a/init/main.c b/init/main.c
> index 15bd0efff3df..f4aa5bb3f2ec 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1041,6 +1041,16 @@ asmlinkage __visible void __init start_kernel(void)
> sfi_init_late();
> kcsan_init();
>
> +   /* DEBUG CODE */
> +   {
> +   struct kmem_cache *c = kmem_cache_create("test", 21, 1, 0, 
> NULL);
> +   char *buf;
> +   BUG_ON(!c);
> +   buf = kmem_cache_alloc(c, GFP_KERNEL);
> +   kmem_cache_free(c, buf);
> +   kmem_cache_destroy(c);
> +   }
> +
> /* Do the rest non-__init'ed, we're now alive */
> arch_call_rest_init();

Anything interesting in your .config? The fault does not reproduce
with 5.8.0 + x86-64 defconfig.

- Pekka


Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects

2020-08-07 Thread Pekka Enberg
On Thu, Aug 6, 2020 at 3:42 PM Vlastimil Babka  wrote:
>
> On 7/2/20 10:32 AM, Xunlei Pang wrote:
> > The node list_lock in count_partial() spend long time iterating
> > in case of large amount of partial page lists, which can cause
> > thunder herd effect to the list_lock contention, e.g. it cause
> > business response-time jitters when accessing "/proc/slabinfo"
> > in our production environments.
> >
> > This patch introduces two counters to maintain the actual number
> > of partial objects dynamically instead of iterating the partial
> > page lists with list_lock held.
> >
> > New counters of kmem_cache_node are: pfree_objects, ptotal_objects.
> > The main operations are under list_lock in slow path, its performance
> > impact is minimal.
> >
> > Co-developed-by: Wen Yang 
> > Signed-off-by: Xunlei Pang 
>
> This or similar things seem to be reported every few months now, last time was
> here [1] AFAIK. The solution was to just stop counting at some point.
>
> Shall we perhaps add these counters under CONFIG_SLUB_DEBUG then and be done
> with it? If anyone needs the extreme performance and builds without
> CONFIG_SLUB_DEBUG, I'd assume they also don't have userspace programs reading
> /proc/slabinfo periodically anyway?

I think we can just default to the counters. After all, if I
understood correctly, we're talking about up to 100 ms time period
with IRQs disabled when count_partial() is called. As this is
triggerable from user space, that's a performance bug whatever way you
look at it.

Whoever needs to eliminate these counters from fast-path, can wrap
them in a CONFIG_MAKE_SLABINFO_EXTREMELY_SLOW option.

So for this patch, with updated information about the severity of the
problem, and the hackbench numbers:

Acked-by: Pekka Enberg 

Christoph, others, any objections?

- Pekka


Re: [PATCH] mm: sort freelist by rank number

2020-08-07 Thread Pekka Enberg
Hi Cho and David,

On Mon, Aug 3, 2020 at 10:57 AM David Hildenbrand  wrote:
>
> On 03.08.20 08:10, pullip@samsung.com wrote:
> > From: Cho KyongHo 
> >
> > LPDDR5 introduces rank switch delay. If three successive DRAM accesses
> > happens and the first and the second ones access one rank and the last
> > access happens on the other rank, the latency of the last access will
> > be longer than the second one.
> > To address this panelty, we can sort the freelist so that a specific
> > rank is allocated prior to another rank. We expect the page allocator
> > can allocate the pages from the same rank successively with this
> > change. It will hopefully improves the proportion of the consecutive
> > memory accesses to the same rank.
>
> This certainly needs performance numbers to justify ... and I am sorry,
> "hopefully improves" is not a valid justification :)
>
> I can imagine that this works well initially, when there hasn't been a
> lot of memory fragmentation going on. But quickly after your system is
> under stress, I doubt this will be very useful. Proof me wrong. ;)
>
> ... I dislike this manual setting of "dram_rank_granule". Yet another mm
> feature that can only be enabled by a magic command line parameter where
> users have to guess the right values.
>
> (side note, there have been similar research approaches to improve
> energy consumption by switching off ranks when not needed).

I was thinking of the exact same thing. PALLOC [1] comes to mind, but
perhaps there are more recent ones?

I also dislike the manual knob, but is there a way for the OS to
detect this by itself? My (perhaps outdated) understanding was that
the DRAM address mapping scheme, for example, is not exposed to the
OS.

I think having more knowledge of DRAM controller details in the OS
would be potentially beneficial for better page allocation policy, so
maybe try come up with something more generic, even if the fallback to
providing this information is a kernel command line option.

[1] http://cs-people.bu.edu/rmancuso/files/papers/palloc-rtas2014.pdf

- Pekka


Re: [PATCH v2 1/2] riscv: Fix build warning for mm/init

2020-07-16 Thread Pekka Enberg
On Thu, Jul 16, 2020 at 4:06 PM Zong Li  wrote:
>
> Add static keyword for resource_init, this function is only used in this
> object file.
>
> The warning message as follow (with W=1 build):
>
> arch/riscv/mm/init.c:520:13:
> warning: no previous prototype for 'resource_init' [-Wmissing-prototypes]
>
> Signed-off-by: Zong Li 

Reviewed-by: Pekka Enberg 

- Pekka


Re: [PATCH 2/2] riscv: fix build warning of mm/pageattr

2020-07-16 Thread Pekka Enberg
On Thu, Jul 16, 2020 at 10:11 AM Pekka Enberg  wrote:
>
> On Thu, Jul 16, 2020 at 9:16 AM Zong Li  wrote:
> >
> > Add hearder for missing prototype. Also, static keyword should be at
> > beginning of declaration.
> >
> > Signed-off-by: Zong Li 
>
> Which prototype is missing?

Aah, never mind, you mention them in the cover letter. I think patch
description would be a better place to ensure they end up in git logs.

For both patches:

Reviewed-by: Pekka Enberg 

- Pekka


Re: [PATCH 2/2] riscv: fix build warning of mm/pageattr

2020-07-16 Thread Pekka Enberg
On Thu, Jul 16, 2020 at 9:16 AM Zong Li  wrote:
>
> Add hearder for missing prototype. Also, static keyword should be at
> beginning of declaration.
>
> Signed-off-by: Zong Li 

Which prototype is missing?

- Pekka


Re: [PATCH 17/26] mm/riscv: Use general page fault accounting

2020-07-14 Thread Pekka Enberg
Hi Palmer,

On Sat, Jul 11, 2020 at 10:43 PM Palmer Dabbelt  wrote:
> This still slightly changes the accounting numbers, but I don't think it does
> so in a way that's meaningful enough to care about.  SIGBUS is the only one
> that might happen frequently enough to notice, I doubt anyone cares about
> whether faults are accounted for during OOM.

What do you mean?

AFAICT, this patch _fixes_ the accounting to be correct on page fault
restarts. I sent a patch to fix this up some time ago, but of course
this generic solutions is better:

http://lists.infradead.org/pipermail/linux-riscv/2020-June/000775.html

- Pekka


Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects

2020-07-07 Thread Pekka Enberg
Hi!

(Sorry for the delay, I missed your response.)

On Fri, Jul 3, 2020 at 12:38 PM xunlei  wrote:
>
> On 2020/7/2 PM 7:59, Pekka Enberg wrote:
> > On Thu, Jul 2, 2020 at 11:32 AM Xunlei Pang  
> > wrote:
> >> The node list_lock in count_partial() spend long time iterating
> >> in case of large amount of partial page lists, which can cause
> >> thunder herd effect to the list_lock contention, e.g. it cause
> >> business response-time jitters when accessing "/proc/slabinfo"
> >> in our production environments.
> >
> > Would you have any numbers to share to quantify this jitter? I have no
>
> We have HSF RT(High-speed Service Framework Response-Time) monitors, the
> RT figures fluctuated randomly, then we deployed a tool detecting "irq
> off" and "preempt off" to dump the culprit's calltrace, capturing the
> list_lock cost up to 100ms with irq off issued by "ss", this also caused
> network timeouts.

Thanks for the follow up. This sounds like a good enough motivation
for this patch, but please include it in the changelog.

> > objections to this approach, but I think the original design
> > deliberately made reading "/proc/slabinfo" more expensive to avoid
> > atomic operations in the allocation/deallocation paths. It would be
> > good to understand what is the gain of this approach before we switch
> > to it. Maybe even run some slab-related benchmark (not sure if there's
> > something better than hackbench these days) to see if the overhead of
> > this approach shows up.
>
> I thought that before, but most atomic operations are serialized by the
> list_lock. Another possible way is to hold list_lock in __slab_free(),
> then these two counters can be changed from atomic to long.
>
> I also have no idea what's the standard SLUB benchmark for the
> regression test, any specific suggestion?

I don't know what people use these days. When I did benchmarking in
the past, hackbench and netperf were known to be slab-allocation
intensive macro-benchmarks. Christoph also had some SLUB
micro-benchmarks, but I don't think we ever merged them into the tree.

- Pekka


Re: [PATCH v2] mm/page_alloc: skip setting nodemask when we are in interrupt

2020-07-06 Thread Pekka Enberg
On Mon, Jul 6, 2020 at 5:59 AM Muchun Song  wrote:
> When we are in the interrupt context, it is irrelevant to the
> current task context. If we use current task's mems_allowed, we
> can fair to alloc pages in the fast path and fall back to slow
> path memory allocation when the current node(which is the current
> task mems_allowed) does not have enough memory to allocate. In
> this case, it slows down the memory allocation speed of interrupt
> context. So we can skip setting the nodemask to allow any node
> to allocate memory, so that fast path allocation can success.
>
> Signed-off-by: Muchun Song 

Reviewed-by: Pekka Enberg 


Re: [PATCH V1 0/5] riscv: Add k/uprobe supported

2020-07-04 Thread Pekka Enberg
Hi Guo,

On Sat, Jul 4, 2020 at 6:34 AM  wrote:
> > > There is no single step exception in riscv ISA, so utilize ebreak to
> > > simulate. Some pc related instructions couldn't be executed out of line
> > > and some system/fence instructions couldn't be a trace site at all.
> > > So we give out a reject list and simulate list in decode-insn.c.

On Sat, Jul 4, 2020 at 2:40 PM Pekka Enberg  wrote:
> > Can you elaborate on what you mean by this? Why would you need a
> > single-step facility for kprobes? Is it for executing the instruction
> > that was replaced with a probe breakpoint?

On Sat, Jul 4, 2020 at 5:55 PM Guo Ren  wrote:
> It's the single-step exception, not single-step facility!

Aah, right, I didn't read the specification carefully enough. Thanks
for taking the time to clarify this!

FWIW, for the whole series:

Reviewed-by: Pekka Enberg 

- Pekka


Re: [PATCH V1 0/5] riscv: Add k/uprobe supported

2020-07-03 Thread Pekka Enberg
On Sat, Jul 4, 2020 at 6:34 AM  wrote:
> The patchset includes kprobe/uprobe support and some related fixups.

Nice!

On Sat, Jul 4, 2020 at 6:34 AM  wrote:
> There is no single step exception in riscv ISA, so utilize ebreak to
> simulate. Some pc related instructions couldn't be executed out of line
> and some system/fence instructions couldn't be a trace site at all.
> So we give out a reject list and simulate list in decode-insn.c.

Can you elaborate on what you mean by this? Why would you need a
single-step facility for kprobes? Is it for executing the instruction
that was replaced with a probe breakpoint?

Also, the "Debug Specification" [1] specifies a single-step facility
for RISC-V -- why is that not useful for implementing kprobes?

1. https://riscv.org/specifications/debug-specification/

- Pekka


Re: [PATCH RESEND] mm/page_alloc: skip setting nodemask when we are in interrupt

2020-07-03 Thread Pekka Enberg
On 03.07.20 08:34, Pekka Enberg wrote:
> > if (cpusets_enabled()) {
> > *alloc_mask |= __GFP_HARDWALL;
> > if (!in_interrupt() && !ac->nodemask)
> > ac->nodemask = &cpuset_current_mems_allowed;
> > else
> > *alloc_flags |= ALLOC_CPUSET;
> > }

On Fri, Jul 3, 2020 at 10:20 AM David Hildenbrand  wrote:
> ^ looks much cleaner as well. Do we want to add a summarizing comment?

I see no harm in adding one. I'm sure the next person starting a
journey in the maze some call the page allocator will appreciate it.

- Pekka


Re: [PATCH RESEND] mm/page_alloc: skip setting nodemask when we are in interrupt

2020-07-02 Thread Pekka Enberg
On Fri, Jul 3, 2020 at 9:14 AM Muchun Song  wrote:
>
> When we are in the interrupt context, it is irrelevant to the
> current task context. If we use current task's mems_allowed, we
> can fair to alloc pages in the fast path and fall back to slow
> path memory allocation when the current node(which is the current
> task mems_allowed) does not have enough memory to allocate. In
> this case, it slows down the memory allocation speed of interrupt
> context. So we can skip setting the nodemask to allow any node
> to allocate memory, so that fast path allocation can success.
>
> Signed-off-by: Muchun Song 
> ---
>  mm/page_alloc.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b48336e20bdcd..a6c36cd557d1d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4726,10 +4726,12 @@ static inline bool prepare_alloc_pages(gfp_t 
> gfp_mask, unsigned int order,
>
> if (cpusets_enabled()) {
> *alloc_mask |= __GFP_HARDWALL;
> -   if (!ac->nodemask)
> -   ac->nodemask = &cpuset_current_mems_allowed;
> -   else
> +   if (!ac->nodemask) {
> +   if (!in_interrupt())
> +   ac->nodemask = &cpuset_current_mems_allowed;

If !ac->nodemask and in_interrupt() the ALLOC_CPUSET flag is not set,
which by-passes the __cpuset_zone_allowed() check for allocations.
This works fine because in the case if in_interrupt() the function
allows allocation on any zone/node.

> +   } else {
> *alloc_flags |= ALLOC_CPUSET;
> +   }
> }

However, if you write the condition as follows:

if (cpusets_enabled()) {
*alloc_mask |= __GFP_HARDWALL;
if (!in_interrupt() && !ac->nodemask)
ac->nodemask = &cpuset_current_mems_allowed;
else
*alloc_flags |= ALLOC_CPUSET;
}

then the code is future-proof in case of __cpuset_zone_allowed() is
one day extended to support IRQ context too (it probably should
eventually respect IRQ SMP affinity).

>
> fs_reclaim_acquire(gfp_mask);
> --
> 2.11.0
>
>


Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects

2020-07-02 Thread Pekka Enberg
On Thu, Jul 2, 2020 at 11:32 AM Xunlei Pang  wrote:
> The node list_lock in count_partial() spend long time iterating
> in case of large amount of partial page lists, which can cause
> thunder herd effect to the list_lock contention, e.g. it cause
> business response-time jitters when accessing "/proc/slabinfo"
> in our production environments.

Would you have any numbers to share to quantify this jitter? I have no
objections to this approach, but I think the original design
deliberately made reading "/proc/slabinfo" more expensive to avoid
atomic operations in the allocation/deallocation paths. It would be
good to understand what is the gain of this approach before we switch
to it. Maybe even run some slab-related benchmark (not sure if there's
something better than hackbench these days) to see if the overhead of
this approach shows up.

> This patch introduces two counters to maintain the actual number
> of partial objects dynamically instead of iterating the partial
> page lists with list_lock held.
>
> New counters of kmem_cache_node are: pfree_objects, ptotal_objects.
> The main operations are under list_lock in slow path, its performance
> impact is minimal.
>
> Co-developed-by: Wen Yang 
> Signed-off-by: Xunlei Pang 
> ---
>  mm/slab.h |  2 ++
>  mm/slub.c | 38 +-
>  2 files changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 7e94700..5935749 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -616,6 +616,8 @@ struct kmem_cache_node {
>  #ifdef CONFIG_SLUB
> unsigned long nr_partial;
> struct list_head partial;
> +   atomic_long_t pfree_objects; /* partial free objects */
> +   atomic_long_t ptotal_objects; /* partial total objects */

You could rename these to "nr_partial_free_objs" and
"nr_partial_total_objs" for readability.

- Pekka


Re: [PATCH v3] mm, slab: Check GFP_SLAB_BUG_MASK before alloc_pages in kmalloc_order

2020-07-01 Thread Pekka Enberg
On Wed, Jul 1, 2020 at 6:19 PM Long Li  wrote:
>
> kmalloc cannot allocate memory from HIGHMEM.  Allocating large amounts
> of memory currently bypasses the check and will simply leak the memory
> when page_address() returns NULL.  To fix this, factor the
> GFP_SLAB_BUG_MASK check out of slab & slub, and call it from
> kmalloc_order() as well. In order to make the code clear, the warning
> message is put in one place.
>
> Signed-off-by: Long Li 

Reviewed-by: Pekka Enberg 


Re: [PATCH 17/26] mm/riscv: Use general page fault accounting

2020-07-01 Thread Pekka Enberg
Hi Peter,

On Sat, Jun 27, 2020 at 1:36 AM Peter Xu  wrote:
> Use the general page fault accounting by passing regs into handle_mm_fault().
> It naturally solve the issue of multiple page fault accounting when page fault
> retry happened.

I sent a patch to fix up riscv page fault accounting some days ago:

http://lists.infradead.org/pipermail/linux-riscv/2020-June/000775.html

However, your fix is obviously even better. For the generic and riscv parts:

Reviewed-by: Pekka Enberg 

- Pekka


Re: [PATCH 0/8] mm: cleanup usage of

2020-06-29 Thread Pekka Enberg
On Sat, Jun 27, 2020 at 5:35 PM Mike Rapoport  wrote:
> Most architectures have very similar versions of pXd_alloc_one() and
> pXd_free_one() for intermediate levels of page table.
> These patches add generic versions of these functions in
>  and enable use of the generic functions where
> appropriate.

Very nice cleanup series to the page table code!

FWIW:

Reviewed-by: Pekka Enberg 


Re: [PATCH] mm: Fix kobject memleak in SLUB

2019-04-27 Thread Pekka Enberg




On 28/04/2019 2.40, Tobin C. Harding wrote:

Currently error return from kobject_init_and_add() is not followed by a
call to kobject_put().  This means there is a memory leak.

Add call to kobject_put() in error path of kobject_init_and_add().

Signed-off-by: Tobin C. Harding 


Reviewed-by: Pekka Enberg 


Re: [PATCH 0/1] mm: Remove the SLAB allocator

2019-04-11 Thread Pekka Enberg

Hi,

On 4/11/19 10:55 AM, Michal Hocko wrote:

Please please have it more rigorous then what happened when SLUB was
forced to become a default


This is the hard part.

Even if you are able to show that SLUB is as fast as SLAB for all the 
benchmarks you run, there's bound to be that one workload where SLUB 
regresses. You will then have people complaining about that (rightly so) 
and you're again stuck with two allocators.


To move forward, I think we should look at possible *pathological* cases 
where we think SLAB might have an advantage. For example, SLUB had much 
more difficulties with remote CPU frees than SLAB. Now I don't know if 
this is the case, but it should be easy to construct a synthetic 
benchmark to measure this.


For example, have a userspace process that does networking, which is 
often memory allocation intensive, so that we know that SKBs traverse 
between CPUs. You can do this by making sure that the NIC queues are 
mapped to CPU N (so that network softirqs have to run on that CPU) but 
the process is pinned to CPU M.


It's, of course, worth thinking about other pathological cases too. 
Workloads that cause large allocations is one. Workloads that cause lots 
of slab cache shrinking is another.


- Pekka


Re: [PATCH v4] kmemleak: survive in a low-memory situation

2019-03-28 Thread Pekka Enberg

Hi Catalin,

On 27/03/2019 2.59, Qian Cai wrote:

Unless there is a brave soul to reimplement the kmemleak to embed it's
metadata into the tracked memory itself in a foreseeable future, this
provides a good balance between enabling kmemleak in a low-memory
situation and not introducing too much hackiness into the existing
code for now.


On Thu, Mar 28, 2019 at 08:05:31AM +0200, Pekka Enberg wrote:

Unfortunately I am not that brave soul, but I'm wondering what the
complication here is? It shouldn't be too hard to teach calculate_sizes() in
SLUB about a new SLAB_KMEMLEAK flag that reserves spaces for the metadata.


On 28/03/2019 12.30, Catalin Marinas wrote:> I don't think it's the 
calculate_sizes() that's the hard part. The way

kmemleak is designed assumes that the metadata has a longer lifespan
than the slab object it is tracking (and refcounted via
get_object/put_object()). We'd have to replace some of the
rcu_read_(un)lock() regions with a full kmemleak_lock together with a
few more tweaks to allow the release of kmemleak_lock during memory
scanning (which can take minutes; so it needs to be safe w.r.t. metadata
freeing, currently relying on a deferred RCU freeing).


Right.

I think SLUB already supports delaying object freeing because of KASAN 
(see the slab_free_freelist_hook() function) so the issue with metadata 
outliving object is solvable (although will consume more memory).


I can't say I remember enough details from kmemleak to comment on the 
locking complications you point out, though.


- Pekka


Re: [PATCH v4] kmemleak: survive in a low-memory situation

2019-03-27 Thread Pekka Enberg

Hi,

On 27/03/2019 2.59, Qian Cai wrote:

Unless there is a brave soul to reimplement the kmemleak to embed it's
metadata into the tracked memory itself in a foreseeable future, this
provides a good balance between enabling kmemleak in a low-memory
situation and not introducing too much hackiness into the existing
code for now.


Unfortunately I am not that brave soul, but I'm wondering what the 
complication here is? It shouldn't be too hard to teach 
calculate_sizes() in SLUB about a new SLAB_KMEMLEAK flag that reserves 
spaces for the metadata.


- Pekka


Re: [PATCH] slub: untag object before slab end

2019-02-12 Thread Pekka Enberg




On 13/02/2019 4.05, Qian Cai wrote:

get_freepointer() could return NULL if there is no more free objects in
the slab. However, it could return a tagged pointer (like
0x2200) with KASAN_SW_TAGS which would escape the NULL
object checking in check_valid_pointer() and trigger errors below, so
untag the object before checking for a NULL object there.


Reviewed-by: Pekka Enberg 


Re: [PATCH] mm/slab: Increase width of first /proc/slabinfo column

2019-02-01 Thread Pekka Enberg

Hi,

On 01/02/2019 4.34, Christopher Lameter wrote:

On Fri, 1 Feb 2019, Tobin C. Harding wrote:


Currently when displaying /proc/slabinfo if any cache names are too long
then the output columns are not aligned.  We could do something fancy to
get the maximum length of any cache name in the system or we could just
increase the hardcoded width.  Currently it is 17 characters.  Monitors
are wide these days so lets just increase it to 30 characters.


Hmm.. I wonder if there are any tools that depend on the field width here?



It's possible, but it's more likely that userspace parses by whitespace 
because it's easier to write it that way.


At least procps, which is used by slabtop, is prepared to parse a cache 
name of 128 characters. See the scanf() call in parse_slabinfo20() 
function in proc/slab.c of procps:


  http://procps.sourceforge.net/

Of course, testing with slabtop would make sense.

- Pekka


Re: [PATCH 0/3] slub: Do trivial comments fixes

2019-01-30 Thread Pekka Enberg

On 31/01/2019 6.10, Tobin C. Harding wrote:

From: "Tobin C. Harding" 

Hi Christopher,

Here is a trivial patchset to wet my toes. This is my first patchset to
mm, if there are some mm specific nuances in relation to when in the dev
cycle (if ever) that minor (*cough* trivial) pathsets are acceptable
please say so

This patchset fixes comments strings in the SLUB subsystem.

As per discussion at LCA I am working on getting my head around the SLUB
allocator.  If you specifically do *not* want me to do minor clean up
while I'm reading please say so, I will not be offended.


For the series:

Reviewed-by: Pekka Enberg 


Re: [PATCH] mm: Prevent mapping slab pages to userspace

2019-01-30 Thread Pekka Enberg

On 25/01/2019 19.38, Matthew Wilcox wrote:

It's never appropriate to map a page allocated by SLAB into userspace.
A buggy device driver might try this, or an attacker might be able to
find a way to make it happen.

Signed-off-by: Matthew Wilcox 


Acked-by: Pekka Enberg 

A WARN_ON_ONCE() would be nice here to let those buggy drivers know that 
they will no longer work.



---
  mm/memory.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index e11ca9dd823f..ce8c90b752be 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1451,7 +1451,7 @@ static int insert_page(struct vm_area_struct *vma, 
unsigned long addr,
spinlock_t *ptl;
  
  	retval = -EINVAL;

-   if (PageAnon(page))
+   if (PageAnon(page) || PageSlab(page))
goto out;
retval = -ENOMEM;
flush_dcache_page(page);



Re: [PATCH] mm/slub.c: freelist is ensured to be NULL when new_slab() fails

2018-12-29 Thread Pekka Enberg

On 29/12/2018 8.25, Peng Wang wrote:

new_slab_objects() will return immediately if freelist is not NULL.

  if (freelist)
  return freelist;

One more assignment operation could be avoided.

Signed-off-by: Peng Wang 


Reviewed-by: Pekka Enberg 


---
  mm/slub.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 36c0befeebd8..cf2ef4ababff 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2463,8 +2463,7 @@ static inline void *new_slab_objects(struct kmem_cache 
*s, gfp_t flags,
stat(s, ALLOC_SLAB);
c->page = page;
*pc = c;
-   } else
-   freelist = NULL;
+   }
  
  	return freelist;

  }



Re: [PATCH] mm, slab: reschedule cache_reap() on the same CPU

2018-04-11 Thread Pekka Enberg



On 11/04/2018 10.00, Vlastimil Babka wrote:

cache_reap() is initially scheduled in start_cpu_timer() via
schedule_delayed_work_on(). But then the next iterations are scheduled via
schedule_delayed_work(), i.e. using WORK_CPU_UNBOUND.

Thus since commit ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on
wq_unbound_cpumask CPUs") there is no guarantee the future iterations will run
on the originally intended cpu, although it's still preferred. I was able to
demonstrate this with /sys/module/workqueue/parameters/debug_force_rr_cpu.
IIUC, it may also happen due to migrating timers in nohz context. As a result,
some cpu's would be calling cache_reap() more frequently and others never.

This patch uses schedule_delayed_work_on() with the current cpu when scheduling
the next iteration.

Signed-off-by: Vlastimil Babka 
Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on 
wq_unbound_cpumask CPUs")
CC: 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Pekka Enberg 
Cc: Christoph Lameter 
Cc: Tejun Heo 
Cc: Lai Jiangshan 
Cc: John Stultz 
Cc: Thomas Gleixner 
Cc: Stephen Boyd 


Acked-by: Pekka Enberg 


---
  mm/slab.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9095c3945425..a76006aae857 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4074,7 +4074,8 @@ static void cache_reap(struct work_struct *w)
next_reap_node();
  out:
/* Set up the next iteration */
-   schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
+   schedule_delayed_work_on(smp_processor_id(), work,
+   round_jiffies_relative(REAPTIMEOUT_AC));
  }
  
  void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)




Re: [PATCH 0/4] zswap: Optimize compressed pool memory utilization

2016-08-17 Thread Pekka Enberg
On Wed, Aug 17, 2016 at 1:03 PM, Srividya Desireddy
 wrote:
> This series of patches optimize the memory utilized by zswap for storing
> the swapped out pages.
>
> Zswap is a cache which compresses the pages that are being swapped out
> and stores them into a dynamically allocated RAM-based memory pool.
> Experiments have shown that around 10-15% of pages stored in zswap are
> duplicates which results in 10-12% more RAM required to store these
> duplicate compressed pages. Around 10-20% of pages stored in zswap
> are zero-filled pages, but these pages are handled as normal pages by
> compressing and allocating memory in the pool.
>
> The following patch-set optimizes memory utilized by zswap by avoiding the
> storage of duplicate pages and zero-filled pages in zswap compressed memory
> pool.
>
> Patch 1/4: zswap: Share zpool memory of duplicate pages
> This patch shares compressed pool memory of the duplicate pages. When a new
> page is requested for swap-out to zswap; search for an identical page in
> the pages already stored in zswap. If an identical page is found then share
> the compressed page data of the identical page with the new page. This
> avoids allocation of memory in the compressed pool for a duplicate page.
> This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing
> performance test at low memory conditions. Around 15-20% of the pages
> swapped are duplicate of the pages existing in zswap, resulting in 15%
> saving of zswap memory pool when compared to the baseline version.
>
> Test Parameters BaselineWith patch  Improvement
> Total RAM   955MB   955MB
> Available RAM 254MB   269MB   15MB
> Avg. App entry time 2.469sec2.207sec7%
> Avg. App close time 1.151sec1.085sec6%
> Apps launched in 1sec   5 12 7
>
> There is little overhead in zswap store function due to the search
> operation for finding duplicate pages. However, if duplicate page is
> found it saves the compression and allocation time of the page. The average
> overhead per zswap_frontswap_store() function call in the experimental
> device is 9us. There is no overhead in case of zswap_frontswap_load()
> operation.
>
> Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime
> This patch adds a module parameter to enable or disable the sharing of
> duplicate zswap pages at runtime.
>
> Patch 3/4: zswap: Zero-filled pages handling
> This patch checks if a page to be stored in zswap is a zero-filled page
> (i.e. contents of the page are all zeros). If such page is found,
> compression and allocation of memory for the compressed page is avoided
> and instead the page is just marked as zero-filled page.
> Although, compressed size of a zero-filled page using LZO compressor is
> very less (52 bytes including zswap_header), this patch saves compression
> and allocation time during store operation and decompression time during
> zswap load operation for zero-filled pages. Experiments have shown that
> around 10-20% of pages stored in zswap are zero-filled.

Aren't zero-filled pages already handled by patch 1/4 as their
contents match? So the overall memory saving is 52 bytes?

- Pekka


Re: [PATCH 3/4] zswap: Zero-filled pages handling

2016-08-17 Thread Pekka Enberg
On Wed, Aug 17, 2016 at 1:18 PM, Srividya Desireddy
 wrote:
>> This patch adds a check in zswap_frontswap_store() to identify zero-filled
>> page before compression of the page. If the page is a zero-filled page, set
>> zswap_entry.zeroflag and skip the compression of the page and alloction
>> of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is
>> set for the page in zswap_entry. If the flag is set, memset the page with
>> zero. This saves the decompression time during load.
>>
>> The overall overhead caused due to zero-filled page check is very minimal
>> when compared to the time saved by avoiding compression and allocation in
>> case of zero-filled pages. The load time of a zero-filled page is reduced
>> by 80% when compared to baseline.

On Wed, Aug 17, 2016 at 3:25 PM, Pekka Enberg  wrote:
> AFAICT, that's an overall improvement only if there are a lot of
> zero-filled pages because it's just overhead for pages that we *need*
> to compress, no? So I suppose the question is, are there a lot of
> zero-filled pages that we need to swap and why is that the case?

I suppose reading your cover letter would have been helpful before
sending out my email:

"Experiments have shown that around 10-15% of pages stored in zswap are
duplicates which results in 10-12% more RAM required to store these
duplicate compressed pages."

But I still don't understand why we have zero-filled pages that we are
swapping out.

- Pekka


Re: [PATCH 3/4] zswap: Zero-filled pages handling

2016-08-17 Thread Pekka Enberg
On Wed, Aug 17, 2016 at 1:18 PM, Srividya Desireddy
 wrote:
> This patch adds a check in zswap_frontswap_store() to identify zero-filled
> page before compression of the page. If the page is a zero-filled page, set
> zswap_entry.zeroflag and skip the compression of the page and alloction
> of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is
> set for the page in zswap_entry. If the flag is set, memset the page with
> zero. This saves the decompression time during load.
>
> The overall overhead caused due to zero-filled page check is very minimal
> when compared to the time saved by avoiding compression and allocation in
> case of zero-filled pages. The load time of a zero-filled page is reduced
> by 80% when compared to baseline.

AFAICT, that's an overall improvement only if there are a lot of
zero-filled pages because it's just overhead for pages that we *need*
to compress, no? So I suppose the question is, are there a lot of
zero-filled pages that we need to swap and why is that the case?

> @@ -1314,6 +1347,13 @@ static int zswap_frontswap_load(unsigned type, pgoff_t 
> offset,
> }
> spin_unlock(&tree->lock);
>
> +   if (entry->zeroflag) {
> +   dst = kmap_atomic(page);
> +   memset(dst, 0, PAGE_SIZE);
> +   kunmap_atomic(dst);
> +   goto freeentry;
> +   }

Don't we need the same thing in zswap_writeback_entry() for the
ZSWAP_SWAPCACHE_NEW case?

> +
> /* decompress */
> dlen = PAGE_SIZE;
> src = (u8 *)zpool_map_handle(entry->pool->zpool, 
> entry->zhandle->handle,
> @@ -1327,6 +1367,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t 
> offset,
> zpool_unmap_handle(entry->pool->zpool, entry->zhandle->handle);
> BUG_ON(ret);

- Pekka


Re: [PATCH, REGRESSION v3] mm: make apply_to_page_range more robust

2016-01-21 Thread Pekka Enberg

On 01/22/2016 01:12 AM, David Rientjes wrote:
NACK to your patch as it is just covering up buggy code silently. The 
problem needs to be addressed in change_memory_common() to return if 
there is no size to change (numpages == 0). It's a two line fix to 
that function. 


So add a WARN_ON there to *warn* about the situations. There's really no 
need to BUG_ON here.


- Pekka


Re: [PATCH] mm: slab: convert slab_is_available to boolean

2015-09-15 Thread Pekka Enberg

On 9/15/15 8:50 PM, Denis Kirjanov wrote:

A good one candidate to return a boolean result

Signed-off-by: Denis Kirjanov 


Reviewed-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/14] cxgb4: Use kvfree() in t4_free_mem()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: Hariprasad S 
Signed-off-by: Pekka Enberg 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 803d91b..9143917 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -1121,10 +1121,7 @@ void *t4_alloc_mem(size_t size)
  */
 void t4_free_mem(void *addr)
 {
-   if (is_vmalloc_addr(addr))
-   vfree(addr);
-   else
-   kfree(addr);
+   kvfree(addr);
 }
 
 /* Send a Work Request to write the filter at a specified index.  We construct
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/14] drm/nouveau/gem: Use kvfree() in u_free()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: David Airlie 
Signed-off-by: Pekka Enberg 
---
 drivers/gpu/drm/nouveau/nouveau_gem.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 0e690bf..af1ee51 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -555,10 +555,7 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
 static inline void
 u_free(void *addr)
 {
-   if (!is_vmalloc_addr(addr))
-   kfree(addr);
-   else
-   vfree(addr);
+   kvfree(addr);
 }
 
 static inline void *
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/14] cxgb3: Use kvfree() in cxgb_free_mem()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: Santosh Raspatur 
Signed-off-by: Pekka Enberg 
---
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c 
b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
index b0cbb2b..76684dc 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
@@ -1169,10 +1169,7 @@ void *cxgb_alloc_mem(unsigned long size)
  */
 void cxgb_free_mem(void *addr)
 {
-   if (is_vmalloc_addr(addr))
-   vfree(addr);
-   else
-   kfree(addr);
+   kvfree(addr);
 }
 
 /*
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/14] IB/ehca: Use kvfree() in ipz_queue_{cd}tor()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: Hoang-Nam Nguyen 
Cc: Christoph Raisch 
Signed-off-by: Pekka Enberg 
---
 drivers/infiniband/hw/ehca/ipz_pt_fn.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.c 
b/drivers/infiniband/hw/ehca/ipz_pt_fn.c
index 8d59451..7ffc748 100644
--- a/drivers/infiniband/hw/ehca/ipz_pt_fn.c
+++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.c
@@ -245,10 +245,7 @@ int ipz_queue_ctor(struct ehca_pd *pd, struct ipz_queue 
*queue,
 ipz_queue_ctor_exit0:
ehca_gen_err("Couldn't alloc pages queue=%p "
 "nr_of_pages=%x",  queue, nr_of_pages);
-   if (is_vmalloc_addr(queue->queue_pages))
-   vfree(queue->queue_pages);
-   else
-   kfree(queue->queue_pages);
+   kvfree(queue->queue_pages);
 
return 0;
 }
@@ -270,10 +267,7 @@ int ipz_queue_dtor(struct ehca_pd *pd, struct ipz_queue 
*queue)
free_page((unsigned long)queue->queue_pages[i]);
}
 
-   if (is_vmalloc_addr(queue->queue_pages))
-   vfree(queue->queue_pages);
-   else
-   kfree(queue->queue_pages);
+   kvfree(queue->queue_pages);
 
return 1;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/14] drivers/input/evdev.c: Use kvfree() in evdev_release()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: Dmitry Torokhov 
Signed-off-by: Pekka Enberg 
---
 drivers/input/evdev.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index a18f41b..9d35499 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -422,10 +422,7 @@ static int evdev_release(struct inode *inode, struct file 
*file)
 
evdev_detach_client(evdev, client);
 
-   if (is_vmalloc_addr(client))
-   vfree(client);
-   else
-   kfree(client);
+   kvfree(client);
 
evdev_close_device(evdev);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/14] ipc/util.c: Use kvfree() in ipc_rcu_free()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Signed-off-by: Pekka Enberg 
---
 ipc/util.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/ipc/util.c b/ipc/util.c
index ff3323e..537a41c 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -467,10 +467,7 @@ void ipc_rcu_free(struct rcu_head *head)
 {
struct ipc_rcu *p = container_of(head, struct ipc_rcu, rcu);
 
-   if (is_vmalloc_addr(p))
-   vfree(p);
-   else
-   kfree(p);
+   kvfree(p);
 }
 
 /**
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/14] target: Use kvfree() in session alloc and free

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: "Nicholas A. Bellinger" 
Signed-off-by: Pekka Enberg 
---
 drivers/target/target_core_transport.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 3fe5cb2..9c058c0 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -280,10 +280,7 @@ int transport_alloc_session_tags(struct se_session 
*se_sess,
if (rc < 0) {
pr_err("Unable to init se_sess->sess_tag_pool,"
" tag_num: %u\n", tag_num);
-   if (is_vmalloc_addr(se_sess->sess_cmd_map))
-   vfree(se_sess->sess_cmd_map);
-   else
-   kfree(se_sess->sess_cmd_map);
+   kvfree(se_sess->sess_cmd_map);
se_sess->sess_cmd_map = NULL;
return -ENOMEM;
}
@@ -490,10 +487,7 @@ void transport_free_session(struct se_session *se_sess)
 {
if (se_sess->sess_cmd_map) {
percpu_ida_destroy(&se_sess->sess_tag_pool);
-   if (is_vmalloc_addr(se_sess->sess_cmd_map))
-   vfree(se_sess->sess_cmd_map);
-   else
-   kfree(se_sess->sess_cmd_map);
+   kvfree(se_sess->sess_cmd_map);
}
kmem_cache_free(se_sess_cache, se_sess);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/14] dm: Use kvfree() in dm_kvfree()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Signed-off-by: Pekka Enberg 
---
 drivers/md/dm-stats.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index f478a4c..492fe6a 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -160,10 +160,7 @@ static void dm_kvfree(void *ptr, size_t alloc_size)
 
free_shared_memory(alloc_size);
 
-   if (is_vmalloc_addr(ptr))
-   vfree(ptr);
-   else
-   kfree(ptr);
+   kvfree(ptr);
 }
 
 static void dm_stat_free(struct rcu_head *head)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/14] libcxgbi: Use kvfree() in cxgbi_free_big_mem()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: "James E.J. Bottomley" 
Signed-off-by: Pekka Enberg 
---
 drivers/scsi/cxgbi/libcxgbi.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/scsi/cxgbi/libcxgbi.h b/drivers/scsi/cxgbi/libcxgbi.h
index aba1af7..c2eb7ea 100644
--- a/drivers/scsi/cxgbi/libcxgbi.h
+++ b/drivers/scsi/cxgbi/libcxgbi.h
@@ -685,10 +685,7 @@ static inline void *cxgbi_alloc_big_mem(unsigned int size,
 
 static inline void cxgbi_free_big_mem(void *addr)
 {
-   if (is_vmalloc_addr(addr))
-   vfree(addr);
-   else
-   kfree(addr);
+   kvfree(addr);
 }
 
 static inline void cxgbi_set_iscsi_ipv4(struct cxgbi_hba *chba, __be32 ipaddr)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/14] bcache: Use kvfree() in various places

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: Kent Overstreet 
Signed-off-by: Pekka Enberg 
---
 drivers/md/bcache/super.c | 10 ++
 drivers/md/bcache/util.h  | 10 ++
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 4dd2bb7..94980bf 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -760,14 +760,8 @@ static void bcache_device_free(struct bcache_device *d)
bio_split_pool_free(&d->bio_split_hook);
if (d->bio_split)
bioset_free(d->bio_split);
-   if (is_vmalloc_addr(d->full_dirty_stripes))
-   vfree(d->full_dirty_stripes);
-   else
-   kfree(d->full_dirty_stripes);
-   if (is_vmalloc_addr(d->stripe_sectors_dirty))
-   vfree(d->stripe_sectors_dirty);
-   else
-   kfree(d->stripe_sectors_dirty);
+   kvfree(d->full_dirty_stripes);
+   kvfree(d->stripe_sectors_dirty);
 
closure_debug_destroy(&d->cl);
 }
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index 98df757..1d04c48 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -52,10 +52,7 @@ struct closure;
 
 #define free_heap(heap)
\
 do {   \
-   if (is_vmalloc_addr((heap)->data))  \
-   vfree((heap)->data);\
-   else\
-   kfree((heap)->data);\
+   kvfree((heap)->data);   \
(heap)->data = NULL;\
 } while (0)
 
@@ -163,10 +160,7 @@ do {   
\
 
 #define free_fifo(fifo)
\
 do {   \
-   if (is_vmalloc_addr((fifo)->data))  \
-   vfree((fifo)->data);\
-   else\
-   kfree((fifo)->data);\
+   kvfree((fifo)->data);   \
(fifo)->data = NULL;\
 } while (0)
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/14] NTFS: Use kvfree() in ntfs_free()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: Anton Altaparmakov 
Signed-off-by: Pekka Enberg 
---
 fs/ntfs/malloc.h | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/ntfs/malloc.h b/fs/ntfs/malloc.h
index a44b14c..ab172e5 100644
--- a/fs/ntfs/malloc.h
+++ b/fs/ntfs/malloc.h
@@ -85,12 +85,7 @@ static inline void *ntfs_malloc_nofs_nofail(unsigned long 
size)
 
 static inline void ntfs_free(void *addr)
 {
-   if (!is_vmalloc_addr(addr)) {
-   kfree(addr);
-   /* free_page((unsigned long)addr); */
-   return;
-   }
-   vfree(addr);
+   kvfree(addr);
 }
 
 #endif /* _LINUX_NTFS_MALLOC_H */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/14] drm: Use kvfree() in drm_free_large()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Cc: David Airlie 
Signed-off-by: Pekka Enberg 
---
 include/drm/drm_mem_util.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/drm/drm_mem_util.h b/include/drm/drm_mem_util.h
index 19a2404..e42495a 100644
--- a/include/drm/drm_mem_util.h
+++ b/include/drm/drm_mem_util.h
@@ -56,10 +56,7 @@ static __inline__ void *drm_malloc_ab(size_t nmemb, size_t 
size)
 
 static __inline void drm_free_large(void *ptr)
 {
-   if (!is_vmalloc_addr(ptr))
-   return kfree(ptr);
-
-   vfree(ptr);
+   kvfree(ptr);
 }
 
 #endif
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/14] kernel/relay.c: Use kvfree() in relay_free_page_array()

2015-05-15 Thread Pekka Enberg
Use kvfree() instead of open-coding it.

Signed-off-by: Pekka Enberg 
---
 kernel/relay.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/relay.c b/kernel/relay.c
index e9dbaeb..0b4570c 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -81,10 +81,7 @@ static struct page **relay_alloc_page_array(unsigned int 
n_pages)
  */
 static void relay_free_page_array(struct page **array)
 {
-   if (is_vmalloc_addr(array))
-   vfree(array);
-   else
-   kfree(array);
+   kvfree(array);
 }
 
 /**
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/14] ceph: Use kvfree() in ceph_put_page_vector()

2015-05-15 Thread Pekka Enberg
Use kvfree instead of open-coding it.

Cc: "Yan, Zheng" 
Cc: Sage Weil 
Signed-off-by: Pekka Enberg 
---
 net/ceph/pagevec.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
index 096d914..d4f5f22 100644
--- a/net/ceph/pagevec.c
+++ b/net/ceph/pagevec.c
@@ -51,10 +51,7 @@ void ceph_put_page_vector(struct page **pages, int 
num_pages, bool dirty)
set_page_dirty_lock(pages[i]);
put_page(pages[i]);
}
-   if (is_vmalloc_addr(pages))
-   vfree(pages);
-   else
-   kfree(pages);
+   kvfree(pages);
 }
 EXPORT_SYMBOL(ceph_put_page_vector);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] perf kmem: Show warning when trying to run stat without record

2015-05-05 Thread Pekka Enberg

On 05/05/2015 05:07 PM, Arnaldo Carvalho de Melo wrote:

Em Tue, May 05, 2015 at 09:58:12AM +0900, Namhyung Kim escreveu:

Sometimes one can mistakenly run perf kmem stat without perf kmem
record before or different configuration like recoding --slab and stat
--page.  Show a warning message like below to inform user:

   # perf kmem stat --page --caller
   Not found page events.  Have you run 'perf kmem record --page' before?

Acked-by: Pekka Enberg 
Signed-off-by: Namhyung Kim 

Thanks, applied.

I just found the messages a bit odd souding, perhaps:

# perf kmem stat --page --caller
No page allocation events found.  Have you run 'perf kmem record --page'?

Pekka?


Sure, that sounds less confusing.

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] Parallel struct page initialisation v4

2015-04-28 Thread Pekka Enberg
On Tue, Apr 28, 2015 at 5:36 PM, Mel Gorman  wrote:
> Struct page initialisation had been identified as one of the reasons why
> large machines take a long time to boot. Patches were posted a long time ago
> to defer initialisation until they were first used.  This was rejected on
> the grounds it should not be necessary to hurt the fast paths. This series
> reuses much of the work from that time but defers the initialisation of
> memory to kswapd so that one thread per node initialises memory local to
> that node.
>
> After applying the series and setting the appropriate Kconfig variable I
> see this in the boot log on a 64G machine
>
> [7.383764] kswapd 0 initialised deferred memory in 188ms
> [7.404253] kswapd 1 initialised deferred memory in 208ms
> [7.411044] kswapd 3 initialised deferred memory in 216ms
> [7.411551] kswapd 2 initialised deferred memory in 216ms
>
> On a 1TB machine, I see
>
> [8.406511] kswapd 3 initialised deferred memory in 1116ms
> [8.428518] kswapd 1 initialised deferred memory in 1140ms
> [8.435977] kswapd 0 initialised deferred memory in 1148ms
> [8.437416] kswapd 2 initialised deferred memory in 1148ms
>
> Once booted the machine appears to work as normal. Boot times were measured
> from the time shutdown was called until ssh was available again.  In the
> 64G case, the boot time savings are negligible. On the 1TB machine, the
> savings were 16 seconds.

FWIW,

Acked-by: Pekka Enberg 

for the whole series.

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET 0/6] perf kmem: Implement page allocation analysis (v7)

2015-04-13 Thread Pekka Enberg
On Tue, Apr 14, 2015 at 5:52 AM, Namhyung Kim  wrote:
> Currently perf kmem command only analyzes SLAB memory allocation.  And
> I'd like to introduce page allocation analysis also.  Users can use
>  --slab and/or --page option to select it.  If none of these options
>  are used, it does slab allocation analysis for backward compatibility.

Nice addition!

Acked-by: Pekka Enberg 

for the whole series.

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/4] perf tools: add Java demangling support

2015-03-31 Thread Pekka Enberg
Hi Stephane,

On Tue, Mar 31, 2015 at 1:19 AM, Stephane Eranian  wrote:
> +#define BASE_ENT(c, n) [c-'A']=n
> +static const char *base_types['Z'-'A' + 1]={
> +   BASE_ENT('B', "byte" ),
> +   BASE_ENT('C', "char" ),
> +   BASE_ENT('D', "double" ),
> +   BASE_ENT('F', "float" ),
> +   BASE_ENT('I', "int" ),
> +   BASE_ENT('J', "long" ),
> +   BASE_ENT('S', "short" ),
> +   BASE_ENT('Z', "bool" ),

It's "boolean", not "bool" in JVM speak.

> +static char *
> +__demangle_java_sym(const char *str, const char *end, char *buf, int maxlen, 
> int mode)
> +{
> +   int rlen = 0;
> +   int array = 0;
> +   int narg = 0;
> +   const char *q;
> +
> +   if (!end)
> +   end = str + strlen(str);
> +
> +   for (q = str; q != end; q++) {
> +
> +   if (rlen == (maxlen - 1))
> +   break;
> +
> +   switch (*q) {
> +   case 'L':
> +   if (mode == MODE_PREFIX || mode == MODE_CTYPE) {
> +   if (mode == MODE_CTYPE) {
> +   if (narg)
> +   rlen += scnprintf(buf + rlen, 
> maxlen - rlen, ", ");
> +   narg++;
> +   }
> +   rlen += scnprintf(buf + rlen, maxlen - rlen, 
> "class ");
> +   if (mode == MODE_PREFIX)
> +   mode = MODE_CLASS;
> +   } else
> +   buf[rlen++] = *q;
> +   break;

This looks odd to me. "L" marks the beginning of an class name and
it's terminated by ";". You could just strhchr() the terminator and
simply copy the name to "buf" and drop cases ';', '/', and the default
label fro the switch statement.

> +char *
> +java_demangle_sym(const char *str, int flags)
> +{

[snip]

> +   /*
> +* expansion factor estimated to 3x
> +*/
> +   len = strlen(str) * 3 + 1;
> +   buf = malloc(len);
> +   if (!buf)
> +   return NULL;

Truncated symbols are lame. Can't you use realloc() to ensure that
never happens?

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: stand-alone kvmtool

2015-03-01 Thread Pekka Enberg

On 2/26/15 1:02 PM, Alex Bennée wrote:

If you can have it all it would be nice to preserve buildability all
through your history for bisecting (and the moon on a stick please ;-)

Is the dependency on the kernel sources something that has been stable
over the projects history or something that's been declining/increasing
over time?


AFAICT, it's been mostly stable after the initial sweep to use 
kernel-provided data structures such as lists and rbtrees.


- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: stand-alone kvmtool

2015-02-23 Thread Pekka Enberg

Hi,

On 2/18/15 5:50 PM, Will Deacon wrote:

Thanks for doing this. Since it looks unlikely that kvmtool will ever be
merged back into the kernel tree, it makes sense to cut the dependency
in my opinion.


I am certainly OK with a standalone repository which preserves the 
history. Will, would you like to take over the proposed new repository 
and put it somewhere on git.kernel.org, perhaps?


- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/5] slab: Correct size_index table before replacing the bootstrap kmem_cache_node.

2015-02-05 Thread Pekka Enberg


On 02/04/2015 11:06 PM, Daniel Sanders wrote:

This patch moves the initialization of the size_index table slightly
earlier so that the first few kmem_cache_node's can be safely allocated
when KMALLOC_MIN_SIZE is large.

There are currently two ways to generate indices into kmalloc_caches
(via kmalloc_index() and via the size_index table in slab_common.c)
and on some arches (possibly only MIPS) they potentially disagree with
each other until create_kmalloc_caches() has been called. It seems
that the intention is that the size_index table is a fast equivalent
to kmalloc_index() and that create_kmalloc_caches() patches the table
to return the correct value for the cases where kmalloc_index()'s
if-statements apply.

The failing sequence was:
* kmalloc_caches contains NULL elements
* kmem_cache_init initialises the element that 'struct
   kmem_cache_node' will be allocated to. For 32-bit Mips, this is a
   56-byte struct and kmalloc_index returns KMALLOC_SHIFT_LOW (7).
* init_list is called which calls kmalloc_node to allocate a 'struct
   kmem_cache_node'.
* kmalloc_slab selects the kmem_caches element using
   size_index[size_index_elem(size)]. For MIPS, size is 56, and the
   expression returns 6.
* This element of kmalloc_caches is NULL and allocation fails.
* If it had not already failed, it would have called
   create_kmalloc_caches() at this point which would have changed
   size_index[size_index_elem(size)] to 7.

Signed-off-by: Daniel Sanders 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org


Acked-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] LLVMLinux: Correct size_index table before replacing the bootstrap kmem_cache_node.

2015-02-04 Thread Pekka Enberg
On Wed, Feb 4, 2015 at 10:38 PM, Daniel Sanders
 wrote:
> I don't believe the bug to be LLVM specific but GCC doesn't normally 
> encounter the problem. I haven't been able to identify exactly what GCC is 
> doing better (probably inlining) but it seems that GCC is managing to 
> optimize  to the point that it eliminates the problematic allocations. This 
> theory is supported by the fact that GCC can be made to fail in the same way 
> by changing inline, __inline, __inline__, and __always_inline in 
> include/linux/compiler-gcc.h such that they don't actually inline things.

OK, makes sense. Please include that explanation in the changelog and
drop use proper "slab" prefix instead of the confusing "LLVMLinux"
prefix in the subject line.

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] LLVMLinux: Correct size_index table before replacing the bootstrap kmem_cache_node.

2015-02-04 Thread Pekka Enberg

On 2/3/15 3:37 PM, Daniel Sanders wrote:

This patch moves the initialization of the size_index table slightly
earlier so that the first few kmem_cache_node's can be safely allocated
when KMALLOC_MIN_SIZE is large.


The patch looks OK to me but how is this related to LLVM?

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm v2 1/3] slub: never fail to shrink cache

2015-01-28 Thread Pekka Enberg

On 1/28/15 6:31 PM, Christoph Lameter wrote:

On Wed, 28 Jan 2015, Vladimir Davydov wrote:


This patch therefore makes __kmem_cache_shrink() allocate the array on
stack instead of calling kmalloc, which may fail. The array size is
chosen to be equal to 32, because most SLUB caches store not more than
32 objects per slab page. Slab pages with <= 32 free objects are sorted
using the array by the number of objects in use and promoted to the head
of the partial list, while slab pages with > 32 free objects are left in
the end of the list without any ordering imposed on them.

Acked-by: Christoph Lameter 

Acked-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] slub: Do not use c->page on free

2014-12-10 Thread Pekka Enberg
On Wed, Dec 10, 2014 at 7:08 PM, Christoph Lameter  wrote:
>> > +{
>> > +   long d = p - page->address;
>> > +
>> > +   return d > 0 && d < (1 << MAX_ORDER) && d < (compound_order(page) 
>> > << PAGE_SHIFT);
>> > +}
>>
>> Can you elaborate on what this is doing? I don't really understand it.
>
> Checks if the pointer points to the slab page. Also it tres to avoid
> having to call compound_order needlessly. Not sure if that optimization is
> worth it.

Aah, it's the (1 << MAX_ORDER) optimization that confused me. Perhaps
add a comment there to make it more obvious?

I'm fine with the optimization:

Reviewed-by: Pekka Enberg 

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] slub: Drop ->page field from kmem_cache_cpu

2014-12-10 Thread Pekka Enberg
On Wed, Dec 10, 2014 at 6:30 PM, Christoph Lameter  wrote:
> Dropping the page field is possible since the page struct address
> of an object or a freelist pointer can now always be calcualted from
> the address. No freelist pointer will be NULL anymore so use
> NULL to signify the condition that the current cpu has no
> percpu slab attached to it.
>
> Signed-off-by: Christoph Lameter 

Reviewed-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/7] slub: Use end_token instead of NULL to terminate freelists

2014-12-10 Thread Pekka Enberg
On Wed, Dec 10, 2014 at 6:30 PM, Christoph Lameter  wrote:
> Ending a list with NULL means that the termination of a list is the same
> for all slab pages. The pointers of freelists otherwise always are
> pointing to the address space of the page. Make termination of a
> list possible by setting the lowest bit in the freelist address
> and use the start address of a page if no other address is available
> for list termination.
>
> This will allow us to determine the page struct address from a
> freelist pointer in the future.
>
> Signed-off-by: Christoph Lameter 

Reviewed-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/7] slub: Avoid using the page struct address in allocation fastpath

2014-12-10 Thread Pekka Enberg
On Wed, Dec 10, 2014 at 6:30 PM, Christoph Lameter  wrote:
> We can use virt_to_page there and only invoke the costly function if
> actually a node is specified and we have to check the NUMA locality.
>
> Increases the cost of allocating on a specific NUMA node but then that
> was never cheap since we may have to dump our caches and retrieve memory
> from the correct node.
>
> Signed-off-by: Christoph Lameter 
>
> Index: linux/mm/slub.c
> ===
> --- linux.orig/mm/slub.c2014-12-09 12:27:49.414686959 -0600
> +++ linux/mm/slub.c 2014-12-09 12:27:49.414686959 -0600
> @@ -2097,6 +2097,15 @@ static inline int node_match(struct page
> return 1;
>  }
>
> +static inline int node_match_ptr(void *p, int node)
> +{
> +#ifdef CONFIG_NUMA
> +   if (!p || (node != NUMA_NO_NODE && page_to_nid(virt_to_page(p)) != 
> node))

You already test that object != NULL before calling node_match_ptr().

> +   return 0;
> +#endif
> +   return 1;
> +}
> +
>  #ifdef CONFIG_SLUB_DEBUG
>  static int count_free(struct page *page)
>  {
> @@ -2410,7 +2419,7 @@ redo:
>
> object = c->freelist;
> page = c->page;
> -   if (unlikely(!object || !node_match(page, node))) {
> +   if (unlikely(!object || !node_match_ptr(object, node))) {
> object = __slab_alloc(s, gfpflags, node, addr, c);
> stat(s, ALLOC_SLOWPATH);
> } else {
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] slub: Do not use c->page on free

2014-12-10 Thread Pekka Enberg
On Wed, Dec 10, 2014 at 6:30 PM, Christoph Lameter  wrote:
> Avoid using the page struct address on free by just doing an
> address comparison. That is easily doable now that the page address
> is available in the page struct and we already have the page struct
> address of the object to be freed calculated.
>
> Signed-off-by: Christoph Lameter 
>
> Index: linux/mm/slub.c
> ===
> --- linux.orig/mm/slub.c2014-12-09 12:25:45.770405462 -0600
> +++ linux/mm/slub.c 2014-12-09 12:25:45.766405582 -0600
> @@ -2625,6 +2625,13 @@ slab_empty:
> discard_slab(s, page);
>  }
>
> +static bool same_slab_page(struct kmem_cache *s, struct page *page, void *p)

Why are you passing a pointer to struct kmem_cache here? You don't
seem to use it.

> +{
> +   long d = p - page->address;
> +
> +   return d > 0 && d < (1 << MAX_ORDER) && d < (compound_order(page) << 
> PAGE_SHIFT);
> +}

Can you elaborate on what this is doing? I don't really understand it.

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] slub: Use page-mapping to store address of page frame like done in SLAB

2014-12-10 Thread Pekka Enberg
On Wed, Dec 10, 2014 at 6:30 PM, Christoph Lameter  wrote:
> SLAB uses the mapping field of the page struct to store a pointer to the
> begining of the objects in the page frame. Use the same field to store
> the address of the objects in SLUB as well. This allows us to avoid a
> number of invocations of page_address(). Those are mostly only used for
> debugging though so this should have no performance benefit.
>
> Signed-off-by: Christoph Lameter 

Reviewed-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] slub: Remove __slab_alloc code duplication

2014-12-10 Thread Pekka Enberg
On Wed, Dec 10, 2014 at 6:30 PM, Christoph Lameter  wrote:
> Somehow the two branches in __slab_alloc do the same.
> Unify them.
>
> Signed-off-by: Christoph Lameter 

Reviewed-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-12-01 Thread Pekka Enberg

On 12/1/14 6:28 AM, Paul Mackerras wrote:

---
v2: include the oops message in the patch description

  mm/slab.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
void *obj;
int x;
  
-	VM_BUG_ON(nodeid > num_online_nodes());

+   VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
n = get_node(cachep, nodeid);
BUG_ON(!n);


Reviewed-by: Pekka Enberg 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLAB Maintainer update

2014-06-05 Thread Pekka Enberg

On 06/04/2014 10:20 PM, Christoph Lameter wrote:

As discussed in various threads on the side:


Remove one inactive maintainer, add two new ones and update
my email address. Plus add Andrew. And fix the glob to include
files like mm/slab_common.c

Signed-off-by: Christoph Lameter 


Acked-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab_common: fix the check for duplicate slab names

2014-05-23 Thread Pekka Enberg

On 05/23/2014 11:16 PM, Mike Snitzer wrote:

On Tue, Mar 25 2014 at  2:07pm -0400,
Christoph Lameter  wrote:


On Tue, 25 Mar 2014, Mike Snitzer wrote:


This patch still isn't upstream.  Who should be shepherding it to Linus?

Pekka usually does that.

Acked-by: Christoph Lameter 

This still hasn't gotten upstream.

Pekka, any chance you can pick it up?  Here it is in dm-devel's
kernel.org patchwork: https://patchwork.kernel.org/patch/3768901/

(Though it looks like it needs to be rebased due to the recent commit
794b1248, should Mikulas rebase and re-send?)


I applied it and fixed the conflict by hand.

Please double-check commit 694617474e33b8603fc76e090ed7d09376514b1a in 
my tree:


https://git.kernel.org/cgit/linux/kernel/git/penberg/linux.git/

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: randconfig build error with next-20140512, in mm/slub.c

2014-05-12 Thread Pekka Enberg

On 05/12/2014 09:47 PM, Christoph Lameter wrote:

A patch was posted today for this issue.


AFAICT, it's coming from -mm. Andrew, can you pick up the fix?


Date: Mon, 12 May 2014 09:36:30 -0300
From: Fabio Estevam 
To: a...@linux-foundation.org
Cc: linux...@kvack.org, feste...@gmail.com, Fabio Estevam
,Christoph Lameter , David Rientjes 
, Pekka Enberg 
Subject: [PATCH] mm: slub: Place count_partial() outside CONFIG_SLUB_DEBUG if 
block


On Mon, 12 May 2014, Jim Davis wrote:


Building with the attached random configuration file,

mm/slub.c: In function ‘show_slab_objects’:
mm/slub.c:4361:5: error: implicit declaration of function ‘count_partial’ [-Werr
or=implicit-function-declaration]
  x = count_partial(n, count_total);
  ^
cc1: some warnings being treated as errors
make[1]: *** [mm/slub.o] Error 1

>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] slab: Fix off by one in object max number tests.

2014-05-05 Thread Pekka Enberg
On Tue, May 6, 2014 at 6:32 AM, Linus Torvalds
 wrote:
> On Mon, May 5, 2014 at 8:25 PM, David Miller  wrote:
>>
>>> Sam Ravnborg  wrote:

 There is a related patch in this area which I think is not yet applied.

 See: https://lkml.org/lkml/2014/4/18/28

 Maybe this is related.
>>>
>>> Thanks, I'm testing with that patch added as well.
>>
>> I can confirm that this fixes my problems completely.
>
> Ugh. That patch has been out two weeks already. Pekka?

Sorry, I missed that completely.

> I guess I'll apply both patches directly.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PROBLEM] perf report --gtk doesn't work

2014-04-25 Thread Pekka Enberg
Hello,

I'm seeing the following with v3.15-rc2:

$ ~/bin/perf report --gtk
GTK browser requested but could not find libperf-gtk.so

The library file is in $HOME/lib64 and perf attempts to look it up.
However, printing out dlerror() output shows the following:

[penberg@localhost hornet]$ ~/bin/perf report --gtk
/home/penberg/lib64/libperf-gtk.so: undefined symbol: sort_sym
GTK browser requested but could not find libperf-gtk.so

I'm not sure how this is supposed to work. sort_sym() is defined in
the 'perf' executable so how is libperf-gtk.so going to see it?

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] kmemcheck: Switch to using kernel disassembler

2014-04-15 Thread Pekka Enberg

On 04/14/2014 08:44 PM, Sasha Levin wrote:

kmemcheck used to do it's own basic instruction decoding, which is
just a duplication of the work done in arch/x86/lib/insn.c.

Instead, switch it to using the already existing dissasembler, and
switch the magic opcode numbers into something meaningful.

Signed-off-by: Sasha Levin 


Vegard probably should take a closer look at this but:

Acked-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] SLAB changes for v3.15-rc1

2014-04-13 Thread Pekka Enberg
Hi Linus,

Please pull the latest SLAB tree from:

  git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux.git slab/next

The biggest change is byte-sized freelist indices which reduces slab freelist
memory usage:

  https://lkml.org/lkml/2013/12/2/64

Pekka

-->
The following changes since commit 38dbfb59d1175ef458d006556061adeaa8751b72:

  Linus 3.14-rc1 (2014-02-02 16:42:13 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux.git slab/next

for you to fetch changes up to 34bf6ef94a835a8f1d8abd3e7d38c6c08d205867:

  mm: slab/slub: use page->list consistently instead of page->lru (2014-04-11 
10:06:06 +0300)


Dave Hansen (1):
  mm: slab/slub: use page->list consistently instead of page->lru

Jianyu Zhan (1):
  mm/slab.c: cleanup outdated comments and unify variables naming

Joe Perches (1):
  slab: Make allocations with GFP_ZERO slightly more efficient

Joonsoo Kim (7):
  slab: factor out calculate nr objects in cache_estimate
  slab: introduce helper functions to get/set free object
  slab: restrict the number of objects in a slab
  slab: introduce byte sized index for the freelist of a slab
  slab: make more slab management structure off the slab
  slub: fix high order page allocation problem with __GFP_NOFAIL
  slab: fix wrongly used macro

 include/linux/mm_types.h |   3 +-
 include/linux/slab.h |  11 +++
 mm/slab.c| 183 +++
 mm/slob.c|  10 +--
 mm/slub.c|   5 +-
 5 files changed, 128 insertions(+), 84 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oops in slab/leaks_show

2014-04-11 Thread Pekka Enberg

On 03/11/2014 10:30 AM, Joonsoo Kim wrote:

-8<-
 From ff6fe77fb764ca5bf8705bf53d07d38e4111e84c Mon Sep 17 00:00:00 2001
From: Joonsoo Kim 
Date: Tue, 11 Mar 2014 14:14:25 +0900
Subject: [PATCH] slab: remove kernel_map_pages() optimization in slab
  poisoning

If CONFIG_DEBUG_PAGEALLOC enables, slab poisoning functionality uses
kernel_map_pages(), instead of real poisoning, to detect memory corruption
with low overhead. But, in that case, slab leak detector trigger oops.
Reason is that slab leak detector accesses all active objects, especially
including objects in cpu slab caches to get the caller information.
These objects are already unmapped via kernel_map_pages() to detect memory
corruption, so oops could be triggered.

Following is oops message reported from Dave.

It blew up when something tried to read /proc/slab_allocators
(Just cat it, and you should see the oops below)

   Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
   Modules linked in: fuse hidp snd_seq_dummy tun rfcomm bnep llc2 af_key 
can_raw ipt_ULOG can_bcm nfnetlink scsi_transport_iscsi nfc caif_socket caif 
af_802154 phonet af_rxrpc can pppoe pppox ppp_generic
   +slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 
llc ax25 cfg80211 xfs coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm 
crct10dif_pclmul crc32c_intel ghash_clmulni_intel
   +libcrc32c usb_debug microcode snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_codec_generic pcspkr btusb bluetooth 6lowpan_iphc rfkill snd_hda_intel 
snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm
   +snd_timer e1000e snd ptp shpchp soundcore pps_core serio_raw
   CPU: 1 PID: 9386 Comm: trinity-c33 Not tainted 3.14.0-rc5+ #131
   task: 8801aa46e890 ti: 880076924000 task.ti: 880076924000
   RIP: 0010:[]  [] handle_slab+0x8a/0x180
   RSP: 0018:880076925de0  EFLAGS: 00010002
   RAX: 1000 RBX:  RCX: 5ce85ce7
   RDX: ea00079be100 RSI: 1000 RDI: 880107458000
   RBP: 880076925e18 R08: 0001 R09: 
   R10:  R11: 000f R12: 8801e6f84000
   R13: ea00079be100 R14: 880107458000 R15: 88022bb8d2c0
   FS:  7fb769e45740() GS:88024d04() knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2: 8801e6f84ff8 CR3: a22db000 CR4: 001407e0
   DR0: 02695000 DR1: 02695000 DR2: 
   DR3:  DR6: fffe0ff0 DR7: 00070602
   Stack:
8802339dcfc0 88022bb8d2c0 880107458000 88022bb8d2c0
8802339dd008 8802339dcfc0 ea00079be100 880076925e68
aa1ad9be 880203fe4f00 88022bb8d318 76925e98
   Call Trace:
[] leaks_show+0xce/0x240
[] seq_read+0x28e/0x490
[] proc_reg_read+0x3d/0x80
[] vfs_read+0x9b/0x160
[] SyS_read+0x58/0xb0
[] tracesys+0xd4/0xd9
   Code: f5 00 00 00 0f 1f 44 00 00 48 63 c8 44 3b 0c 8a 0f 84 e3 00 00 00 83 c0 01 
44 39 c0 72 eb 41 f6 47 1a 01 0f 84 e9 00 00 00 89 f0 <4d> 8b 4c 04 f8 4d 85 c9 
0f 84 88 00 00 00 49 8b 7e 08 4d 8d 46
   RIP  [] handle_slab+0x8a/0x180
RSP 
   CR2: 8801e6f84ff8

There are two solutions to fix the problem. One is to disable
CONFIG_DEBUG_SLAB_LEAK if CONFIG_DEBUG_PAGEALLOC=y. The other is to remove
kernel_map_pages() optimization in slab poisoning. I think that
second one is better, since we can use all functionality with some more
overhead. slab poisoning is already heavy operation, so adding more
overhead doesn't weaken their value.

Reported-by: Dave Jones 
Signed-off-by: Joonsoo Kim 


Joonsoo, can you please resend against slab/next? I'm seeing some 
rejects here.


- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] x86, kmemcheck: Use kstrtoint() instead of sscanf()

2014-03-03 Thread Pekka Enberg

On 02/20/2014 12:14 AM, David Rientjes wrote:

Kmemcheck should use the preferred interface for parsing command line
arguments, kstrto*(), rather than sscanf() itself.  Use it appropriately.

Signed-off-by: David Rientjes 


Acked-by: Pekka Enberg 

Andrew, can you pick this up?


---
  arch/x86/mm/kmemcheck/kmemcheck.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kmemcheck/kmemcheck.c 
b/arch/x86/mm/kmemcheck/kmemcheck.c
--- a/arch/x86/mm/kmemcheck/kmemcheck.c
+++ b/arch/x86/mm/kmemcheck/kmemcheck.c
@@ -78,10 +78,16 @@ early_initcall(kmemcheck_init);
   */
  static int __init param_kmemcheck(char *str)
  {
+   int val;
+   int ret;
+
if (!str)
return -EINVAL;
  
-	sscanf(str, "%d", &kmemcheck_enabled);

+   ret = kstrtoint(str, 0, &val);
+   if (ret)
+   return ret;
+   kmemcheck_enabled = val;
return 0;
  }
  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/13] perf ui/gtk: Reuse generic __hpp__fmt() code

2014-03-03 Thread Pekka Enberg

On 03/03/2014 03:14 AM, Namhyung Kim wrote:

The __hpp__color_fmt used in the gtk code can be replace by the
generic code with small change in print_fn callback.

This is a preparation to upcoming changes and no functional changes
intended.

Cc: Jiri Olsa 
Cc: Pekka Enberg 
Signed-off-by: Namhyung Kim 


Acked-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Memory allocator semantics

2014-02-11 Thread Pekka Enberg
On Tue, Feb 11, 2014 at 2:14 PM, Paul E. McKenney
 wrote:
> In contrast, from kfree() to a kmalloc() returning some of the kfree()ed
> memory, I believe the kfree()/kmalloc() implementation must do any needed
> synchronization and ordering.  But that is a different set of examples,
> for example, this one:
>
> CPU 0   CPU 1
> p->a = 42;  q = kmalloc(...); /* returning p */
> kfree(p);   q->a = 5;
> BUG_ON(q->a != 5);
>
> Unlike the situation with (A), (B), and (C), in this case I believe
> that it is kfree()'s and kmalloc()'s responsibility to ensure that
> the BUG_ON() never triggers.
>
> Make sense?

I'm not sure...

It's the caller's responsibility not to touch "p" after it's handed over to
kfree() - otherwise that's a "use-after-free" error.  If there's some reordering
going on here, I'm tempted to blame the caller for lack of locking.

   Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Memory allocator semantics

2014-02-11 Thread Pekka Enberg
Hi Paul,

On Sun, Feb 9, 2014 at 4:00 AM, Paul E. McKenney
 wrote:
> From what I can see, (A) works by accident, but is kind of useless because
> you allocate and free the memory without touching it.  (B) and (C) are the
> lightest touches I could imagine, and as you say, both are bad.  So I
> believe that it is reasonable to prohibit (A).
>
> Or is there some use for (A) that I am missing?

So again, there's nothing in (A) that the memory allocator is
concerned about.  kmalloc() makes no guarantees whatsoever about the
visibility of "r1" across CPUs.  If you're saying that there's an
implicit barrier between kmalloc() and kfree(), that's an unintended
side-effect, not a design decision AFAICT.

 Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Memory allocator semantics

2014-02-08 Thread Pekka Enberg

Hi Paul,

On 01/02/2014 10:33 PM, Paul E. McKenney wrote:

 From what I can see, the Linux-kernel's SLAB, SLOB, and SLUB memory
allocators would deal with the following sort of race:

A.  CPU 0: r1 = kmalloc(...); ACCESS_ONCE(gp) = r1;

CPU 1: r2 = ACCESS_ONCE(gp); if (r2) kfree(r2);

However, my guess is that this should be considered an accident of the
current implementation rather than a feature.  The reason for this is
that I cannot see how you would usefully do (A) above without also allowing
(B) and (C) below, both of which look to me to be quite destructive:

B.  CPU 0: r1 = kmalloc(...);  ACCESS_ONCE(shared_x) = r1;

 CPU 1: r2 = ACCESS_ONCE(shared_x); if (r2) kfree(r2);

CPU 2: r3 = ACCESS_ONCE(shared_x); if (r3) kfree(r3);

This results in the memory being on two different freelists.

C.  CPU 0: r1 = kmalloc(...);  ACCESS_ONCE(shared_x) = r1;

CPU 1: r2 = ACCESS_ONCE(shared_x); r2->a = 1; r2->b = 2;

CPU 2: r3 = ACCESS_ONCE(shared_x); if (r3) kfree(r3);

CPU 3: r4 = kmalloc(...);  r4->s = 3; r4->t = 4;

This results in the memory being used by two different CPUs,
each of which believe that they have sole access.

But I thought I should ask the experts.

So, am I correct that kernel hackers are required to avoid "drive-by"
kfree()s of kmalloc()ed memory?


So to be completely honest, I don't understand what is the race in (A) 
that concerns the *memory allocator*.  I also don't what the memory 
allocator can do in (B) and (C) which look like double-free and 
use-after-free, respectively, to me. :-)


  Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 5/5] slab: make more slab management structure off the slab

2014-02-08 Thread Pekka Enberg
On Fri, Dec 13, 2013 at 9:03 AM, Joonsoo Kim  wrote:
> Hello, Pekka.
>
> Below is updated patch for 5/5 in this series.
> Now I get acks from Christoph to all patches in this series.
> So, could you merge this patchset? :)
> If you want to resend wholeset with proper ack, I will do it
> with pleasure.

Applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] SLAB changes for v3.14-rc1

2014-02-02 Thread Pekka Enberg
Hi Linus,

Please pull the latest SLAB tree from:

  git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux.git slab/next

It contains random bug fixes that have accumulated in my inbox over
the past few months.

Pekka

-->
The following changes since commit 6ce4eac1f600b34f2f7f58f9cd8f0503d79e42ae:

  Linux 3.13-rc1 (2013-11-22 11:30:55 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux.git slab/next

for you to fetch changes up to cb8ee1a3d429f8898972c869dd4792afb04e961a:

  mm: Fix warning on make htmldocs caused by slab.c (2014-01-31 13:52:25 +0200)


Dave Hansen (2):
  mm: sl[uo]b: fix misleading comments
  mm: slub: work around unneeded lockdep warning

Li Zefan (1):
  slub: Fix calculation of cpu slabs

Masanari Iida (1):
  mm: Fix warning on make htmldocs caused by slab.c

Peter Zijlstra (1):
  slub: use lockdep_assert_held

Randy Dunlap (1):
  slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings

Tetsuo Handa (1):
  slub: Fix possible format string bug.

 include/linux/slab.h | 110 +++
 mm/slab.c|   2 +-
 mm/slub.c|  56 +++---
 3 files changed, 85 insertions(+), 83 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >