Re: [PATCH 31/35] powerpc: docs: cxl.rst: mark two section titles as such

2020-04-08 Thread Mauro Carvalho Chehab
Em Thu, 9 Apr 2020 10:37:52 +1000
Andrew Donnellan  escreveu:

> On 9/4/20 1:46 am, Mauro Carvalho Chehab wrote:
> > The User API chapter contains two sub-chapters. Mark them as
> > such.
> > 
> > Signed-off-by: Mauro Carvalho Chehab   
> 
> Thanks.
> 
> Though the other subsections in this file use - rather than ^, 
> what's the difference?

ReST syntax allows the usage of several different markup symbols for
titles. It dynamically attributes the first one it finds as level 1,
the second one as level 2 and so on.

As we added the "" markup before "-", after this patch, it now has:

===
level 1
===

level 2
===

level 3
^^^

level 4
---


> 
> Acked-by: Andrew Donnellan 
> 
> > ---
> >   Documentation/powerpc/cxl.rst | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/Documentation/powerpc/cxl.rst b/Documentation/powerpc/cxl.rst
> > index 920546d81326..d2d77057610e 100644
> > --- a/Documentation/powerpc/cxl.rst
> > +++ b/Documentation/powerpc/cxl.rst
> > @@ -133,6 +133,7 @@ User API
> >   
> >   
> >   1. AFU character devices
> > +
> >   
> >   For AFUs operating in AFU directed mode, two character device
> >   files will be created. /dev/cxl/afu0.0m will correspond to a
> > @@ -395,6 +396,7 @@ read
> >   
> >   
> >   2. Card character device (powerVM guest only)
> > +^
> >   
> >   In a powerVM guest, an extra character device is created for the
> >   card. The device is only used to write (flash) a new image on the
> >   
> 



Thanks,
Mauro


Re: [PATCH v5 13/21] powerpc/xmon: Use a function for reading instructions

2020-04-08 Thread Jordan Niethe
On Thu, Apr 9, 2020 at 3:04 PM Balamuruhan S  wrote:
>
> On Wed, 2020-04-08 at 12:18 +1000, Jordan Niethe wrote:
> > On Tue, Apr 7, 2020 at 9:31 PM Balamuruhan S  wrote:
> > > On Mon, 2020-04-06 at 18:09 +1000, Jordan Niethe wrote:
> > > > Currently in xmon, mread() is used for reading instructions. In
> > > > preparation for prefixed instructions, create and use a new function,
> > > > mread_instr(), especially for reading instructions.
> > > >
> > > > Signed-off-by: Jordan Niethe 
> > > > ---
> > > > v5: New to series, seperated from "Add prefixed instructions to
> > > > instruction data type"
> > > > ---
> > > >  arch/powerpc/xmon/xmon.c | 24 
> > > >  1 file changed, 20 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> > > > index 5e3949322a6c..6f4cf01a58c1 100644
> > > > --- a/arch/powerpc/xmon/xmon.c
> > > > +++ b/arch/powerpc/xmon/xmon.c
> > > > @@ -125,6 +125,7 @@ extern unsigned int bpt_table[NBPTS * BPT_WORDS];
> > > >  static int cmds(struct pt_regs *);
> > > >  static int mread(unsigned long, void *, int);
> > > >  static int mwrite(unsigned long, void *, int);
> > > > +static int mread_instr(unsigned long, struct ppc_inst *);
> > > >  static int handle_fault(struct pt_regs *);
> > > >  static void byterev(unsigned char *, int);
> > > >  static void memex(void);
> > > > @@ -899,7 +900,7 @@ static void insert_bpts(void)
> > > >   for (i = 0; i < NBPTS; ++i, ++bp) {
> > > >   if ((bp->enabled & (BP_TRAP|BP_CIABR)) == 0)
> > > >   continue;
> > > > - if (mread(bp->address, , 4) != 4) {
> > > > + if (!mread_instr(bp->address, )) {
> > >
> > > Are these checks made based on whether `ppc_inst_len()` returns bool from
> > > mread_instr() ?
> > No, it was meant to be the length itself returned with a length of 0
> > indicating an error. I will need to fix that.
>
>
> I doubt it would return 0, whether we read instruction or not ppc_inst_len()
> would always return sizeof(struct ppc_inst).
Yes, sorry I meant I would have to change the function so that it
would return 0.
>
> can we do something like,
>
> static int
> mread_instr(unsigned long adrs, struct ppc_inst *instr)
> {
> int size = 0;
> if (setjmp(bus_error_jmp) == 0) {
> catch_memory_errors = 1;
> sync();
> *instr = ppc_inst_read((struct ppc_inst *)adrs);
> sync();
> /* wait a little while to see if we get a machine check */
> __delay(200);
> size = ppc_inst_len(instr);
> }
> catch_memory_errors = 0;
> return size;
> }
Yeah that looks right.
>
> -- Bala
> > > -- Bala
> > >
> > >
> > > >   printf("Couldn't read instruction at %lx, "
> > > >  "disabling breakpoint there\n", bp-
> > > > >address);
> > > >   bp->enabled = 0;
> > > > @@ -949,7 +950,7 @@ static void remove_bpts(void)
> > > >   for (i = 0; i < NBPTS; ++i, ++bp) {
> > > >   if ((bp->enabled & (BP_TRAP|BP_CIABR)) != BP_TRAP)
> > > >   continue;
> > > > - if (mread(bp->address, , 4) == 4
> > > > + if (mread_instr(bp->address, )
> > > >   && ppc_inst_equal(instr, ppc_inst(bpinstr))
> > > >   && patch_instruction(
> > > >   (struct ppc_inst *)bp->address, ppc_inst_read(bp-
> > > > > instr)) != 0)
> > > > @@ -1165,7 +1166,7 @@ static int do_step(struct pt_regs *regs)
> > > >   force_enable_xmon();
> > > >   /* check we are in 64-bit kernel mode, translation enabled */
> > > >   if ((regs->msr & (MSR_64BIT|MSR_PR|MSR_IR)) == (MSR_64BIT|MSR_IR))
> > > > {
> > > > - if (mread(regs->nip, , 4) == 4) {
> > > > + if (mread_instr(regs->nip, )) {
> > > >   stepped = emulate_step(regs, instr);
> > > >   if (stepped < 0) {
> > > >   printf("Couldn't single-step %s
> > > > instruction\n",
> > > > @@ -1332,7 +1333,7 @@ static long check_bp_loc(unsigned long addr)
> > > >   printf("Breakpoints may only be placed at kernel
> > > > addresses\n");
> > > >   return 0;
> > > >   }
> > > > - if (!mread(addr, , sizeof(instr))) {
> > > > + if (!mread_instr(addr, )) {
> > > >   printf("Can't read instruction at address %lx\n", addr);
> > > >   return 0;
> > > >   }
> > > > @@ -2125,6 +2126,21 @@ mwrite(unsigned long adrs, void *buf, int size)
> > > >   return n;
> > > >  }
> > > >
> > > > +static int
> > > > +mread_instr(unsigned long adrs, struct ppc_inst *instr)
> > > > +{
> > > > + if (setjmp(bus_error_jmp) == 0) {
> > > > + catch_memory_errors = 1;
> > > > + sync();
> > > > + *instr = ppc_inst_read((struct ppc_inst *)adrs);
> > > > 

Re: [PATCH v5 13/21] powerpc/xmon: Use a function for reading instructions

2020-04-08 Thread Balamuruhan S
On Wed, 2020-04-08 at 12:18 +1000, Jordan Niethe wrote:
> On Tue, Apr 7, 2020 at 9:31 PM Balamuruhan S  wrote:
> > On Mon, 2020-04-06 at 18:09 +1000, Jordan Niethe wrote:
> > > Currently in xmon, mread() is used for reading instructions. In
> > > preparation for prefixed instructions, create and use a new function,
> > > mread_instr(), especially for reading instructions.
> > > 
> > > Signed-off-by: Jordan Niethe 
> > > ---
> > > v5: New to series, seperated from "Add prefixed instructions to
> > > instruction data type"
> > > ---
> > >  arch/powerpc/xmon/xmon.c | 24 
> > >  1 file changed, 20 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> > > index 5e3949322a6c..6f4cf01a58c1 100644
> > > --- a/arch/powerpc/xmon/xmon.c
> > > +++ b/arch/powerpc/xmon/xmon.c
> > > @@ -125,6 +125,7 @@ extern unsigned int bpt_table[NBPTS * BPT_WORDS];
> > >  static int cmds(struct pt_regs *);
> > >  static int mread(unsigned long, void *, int);
> > >  static int mwrite(unsigned long, void *, int);
> > > +static int mread_instr(unsigned long, struct ppc_inst *);
> > >  static int handle_fault(struct pt_regs *);
> > >  static void byterev(unsigned char *, int);
> > >  static void memex(void);
> > > @@ -899,7 +900,7 @@ static void insert_bpts(void)
> > >   for (i = 0; i < NBPTS; ++i, ++bp) {
> > >   if ((bp->enabled & (BP_TRAP|BP_CIABR)) == 0)
> > >   continue;
> > > - if (mread(bp->address, , 4) != 4) {
> > > + if (!mread_instr(bp->address, )) {
> > 
> > Are these checks made based on whether `ppc_inst_len()` returns bool from
> > mread_instr() ?
> No, it was meant to be the length itself returned with a length of 0
> indicating an error. I will need to fix that.


I doubt it would return 0, whether we read instruction or not ppc_inst_len()
would always return sizeof(struct ppc_inst).

can we do something like,

static int
mread_instr(unsigned long adrs, struct ppc_inst *instr)
{
int size = 0;
if (setjmp(bus_error_jmp) == 0) {
catch_memory_errors = 1;
sync();
*instr = ppc_inst_read((struct ppc_inst *)adrs);
sync();
/* wait a little while to see if we get a machine check */
__delay(200);
size = ppc_inst_len(instr);
}
catch_memory_errors = 0;
return size;
}

-- Bala
> > -- Bala
> > 
> > 
> > >   printf("Couldn't read instruction at %lx, "
> > >  "disabling breakpoint there\n", bp-
> > > >address);
> > >   bp->enabled = 0;
> > > @@ -949,7 +950,7 @@ static void remove_bpts(void)
> > >   for (i = 0; i < NBPTS; ++i, ++bp) {
> > >   if ((bp->enabled & (BP_TRAP|BP_CIABR)) != BP_TRAP)
> > >   continue;
> > > - if (mread(bp->address, , 4) == 4
> > > + if (mread_instr(bp->address, )
> > >   && ppc_inst_equal(instr, ppc_inst(bpinstr))
> > >   && patch_instruction(
> > >   (struct ppc_inst *)bp->address, ppc_inst_read(bp-
> > > > instr)) != 0)
> > > @@ -1165,7 +1166,7 @@ static int do_step(struct pt_regs *regs)
> > >   force_enable_xmon();
> > >   /* check we are in 64-bit kernel mode, translation enabled */
> > >   if ((regs->msr & (MSR_64BIT|MSR_PR|MSR_IR)) == (MSR_64BIT|MSR_IR))
> > > {
> > > - if (mread(regs->nip, , 4) == 4) {
> > > + if (mread_instr(regs->nip, )) {
> > >   stepped = emulate_step(regs, instr);
> > >   if (stepped < 0) {
> > >   printf("Couldn't single-step %s
> > > instruction\n",
> > > @@ -1332,7 +1333,7 @@ static long check_bp_loc(unsigned long addr)
> > >   printf("Breakpoints may only be placed at kernel
> > > addresses\n");
> > >   return 0;
> > >   }
> > > - if (!mread(addr, , sizeof(instr))) {
> > > + if (!mread_instr(addr, )) {
> > >   printf("Can't read instruction at address %lx\n", addr);
> > >   return 0;
> > >   }
> > > @@ -2125,6 +2126,21 @@ mwrite(unsigned long adrs, void *buf, int size)
> > >   return n;
> > >  }
> > > 
> > > +static int
> > > +mread_instr(unsigned long adrs, struct ppc_inst *instr)
> > > +{
> > > + if (setjmp(bus_error_jmp) == 0) {
> > > + catch_memory_errors = 1;
> > > + sync();
> > > + *instr = ppc_inst_read((struct ppc_inst *)adrs);
> > > + sync();
> > > + /* wait a little while to see if we get a machine check */
> > > + __delay(200);
> > > + }
> > > + catch_memory_errors = 0;
> > > + return ppc_inst_len(*instr);
> > > +}
> > > +
> > >  static int fault_type;
> > >  static int fault_except;
> > >  static char *fault_chars[] = { "--", "**", "##" };



Re: [PATCH v5 05/21] powerpc: Use a function for getting the instruction op code

2020-04-08 Thread Jordan Niethe
On Thu, Apr 9, 2020 at 4:21 AM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Mon, Apr 06, 2020 at 06:09:20PM +1000, Jordan Niethe wrote:
> > +static inline int ppc_inst_opcode(u32 x)
> > +{
> > + return x >> 26;
> > +}
>
> Maybe you should have "primary opcode" in this function name?
Thanks, that is a good idea.
>
>
> Segher


Re: [RFC PATCH v0 0/5] powerpc/mm/radix: Memory unplug fixes

2020-04-08 Thread Bharata B Rao
On Mon, Apr 06, 2020 at 09:19:20AM +0530, Bharata B Rao wrote:
> Memory unplug has a few bugs which I had attempted to fix ealier
> at https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-July/194087.html
> 
> Now with Leonardo's patch for PAPR changes that add a separate flag bit
> to LMB flags for explicitly identifying hot-removable memory
> (https://lore.kernel.org/linuxppc-dev/f55a7b65a43cc9dc7b22385cf9960f8b11d5ce2e.ca...@linux.ibm.com/T/#t),
> a few other issues around memory unplug on radix can be fixed. This
> series is a combination of those fixes.
> 
> This series works on top of above mentioned Leonardo's patch.
> 
> Bharata B Rao (5):
>   powerpc/pseries/hotplug-memory: Set DRCONF_MEM_HOTREMOVABLE for
> hot-plugged mem
>   powerpc/mm/radix: Create separate mappings for hot-plugged memory
>   powerpc/mm/radix: Fix PTE/PMD fragment count for early page table
> mappings
>   powerpc/mm/radix: Free PUD table when freeing pagetable
>   powerpc/mm/radix: Remove split_kernel_mapping()

3/5 in this series fixes long-standing bug and multiple versions of it
has been posted outside of this series earlier.

4/5 fixes a memory leak.

I included the above tow in this series because with the patches to
explicitly mark the hotplugged memory (1/5 and 2/5), reproducing the bug
fixed by 3/5 becomes easier.

Hence 3/5 and 4/5 can be considered as standalone fixes too.

Regards,
Bharata.



Re: Linux-next POWER9 NULL pointer NIP since 1st Apr.

2020-04-08 Thread Qian Cai



> On Apr 7, 2020, at 9:30 AM, Steven Rostedt  wrote:
> 
> On Tue, 7 Apr 2020 09:01:10 -0400
> Qian Cai  wrote:
> 
>> + Steven
>> 
>>> On Apr 7, 2020, at 8:42 AM, Michael Ellerman  wrote:
>>> 
>>> Qian Cai  writes:  
 Ever since 1st Apr, linux-next starts to trigger a NULL pointer NIP on 
 POWER9 below using
 this config,
 
 https://raw.githubusercontent.com/cailca/linux-mm/master/powerpc.config
 
 It takes a while to reproduce, so before I bury myself into bisecting and 
 just send a head-up
 to see if anyone spots anything obvious.
 
 [  206.744625][T13224] LTP: starting fallocate04
 [  207.601583][T27684] /dev/zero: Can't open blockdev
 [  208.674301][T27684] EXT4-fs (loop0): mounting ext3 file system using 
 the ext4 subsystem
 [  208.680347][T27684] BUG: Unable to handle kernel instruction fetch 
 (NULL pointer?)
 [  208.680383][T27684] Faulting instruction address: 0x
 [  208.680406][T27684] Oops: Kernel access of bad area, sig: 11 [#1]
 [  208.680439][T27684] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 
 DEBUG_PAGEALLOC NUMA PowerNV
 [  208.680474][T27684] Modules linked in: ext4 crc16 mbcache jbd2 loop 
 kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci libahci mdio tg3 
 libata libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
 [  208.680576][T27684] CPU: 117 PID: 27684 Comm: fallocate04 Tainted: G
 W 5.6.0-next-20200401+ #288
 [  208.680614][T27684] NIP:   LR: c008102c0048 CTR: 
 
 [  208.680657][T27684] REGS: c000200361def420 TRAP: 0400   Tainted: G  
   W  (5.6.0-next-20200401+)
 [  208.680700][T27684] MSR:  90004280b033 
   CR: 4208  XER: 2004
 [  208.680760][T27684] CFAR: c0081032c494 IRQMASK: 0 
 [  208.680760][T27684] GPR00: c05ac3f8 c000200361def6b0 
 c165c200 c00020107dae0bd0 
 [  208.680760][T27684] GPR04:  0400 
   
 [  208.680760][T27684] GPR08: c000200361def6e8 c008102c0040 
 7fff c1614e80 
 [  208.680760][T27684] GPR12:  c000201fff671280 
  0002 
 [  208.680760][T27684] GPR16: 0002 00040001 
 c00020030f5a1000 c00020030f5a1548 
 [  208.680760][T27684] GPR20: c15fbad8 c168c654 
 c000200361def818 c05b4c10 
 [  208.680760][T27684] GPR24:  c008103365b8 
 c00020107dae0bd0 0400 
 [  208.680760][T27684] GPR28: c168c3a8  
   
 [  208.681014][T27684] NIP [] 0x0
 [  208.681065][T27684] LR [c008102c0048] ext4_iomap_end+0x8/0x30 
 [ext4]  
>>> 
>>> That LR looks like it's pointing to the return from _mcount in
>>> ext4_iomap_end(), which means we have probably crashed in ftrace
>>> somewhere.
>>> 
>>> Did you have tracing enabled when you ran the test? Or does it do
>>> tracing itself?  
>> 
>> Yes, it run ftrace at first before running LTP to trigger it,
>> 
>> https://github.com/cailca/linux-mm/blob/master/test.sh
>> 
>> echo function > /sys/kernel/debug/tracing/current_tracer
>> echo nop > /sys/kernel/debug/tracing/current_tracer
>> 
>> There is another crash with even non-NULL NIP, but then symbol behaves weird.
>> 
>> # ./scripts/faddr2line vmlinux sysctl_net_busy_read+0x0/0x4
>> skipping sysctl_net_busy_read address at 0xc16804ac due to 
>> non-function symbol of type 'D'
>> 
>> [  148.110969][T13115] LTP: starting chown04_16
>> [  148.255048][T13380] kernel tried to execute exec-protected page 
>> (c16804ac) - exploit attempt? (uid: 0)
>> [  148.255099][T13380] BUG: Unable to handle kernel instruction fetch
>> [  148.255122][T13380] Faulting instruction address: 0xc16804ac
>> [  148.255136][T13380] Oops: Kernel access of bad area, sig: 11 [#1]
>> [  148.255157][T13380] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 
>> DEBUG_PAGEALLOC NUMA PowerNV
>> [  148.255171][T13380] Modules linked in: loop kvm_hv kvm xfs sd_mod bnx2x 
>> mdio ahci tg3 libahci libphy libata firmware_class dm_mirror dm_region_hash 
>> dm_log dm_mod
>> [  148.255213][T13380] CPU: 45 PID: 13380 Comm: chown04_16 Tainted: G
>> W 5.6.0+ #7
>> [  148.255236][T13380] NIP:  c16804ac LR: c0080fa60408 CTR: 
>> c16804ac
>> [  148.255250][T13380] REGS: c010a6fafa00 TRAP: 0400   Tainted: G
>> W  (5.6.0+)
>> [  148.255281][T13380] MSR:  900010009033   CR: 
>> 84000248  XER: 2004
>> [  148.255310][T13380] CFAR: c0080fa66534 IRQMASK: 0 
>> [  148.255310][T13380] GPR00: c0973268 c010a6fafc90 
>> c1648200  
>> [  148.255310][T13380] GPR04: c00d8a22dc00 c010a6fafd30 
>> b5e98331 00012c9f 
>> [  

Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-08 Thread piliu



On 04/08/2020 10:46 AM, Baoquan He wrote:
> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
> 
> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>> blocks as removable"), the user space interface to compute whether a memory
>> block can be offlined (exposed via
>> /sys/devices/system/memory/memoryX/removable) has effectively been
>> deprecated. We want to remove the leftovers of the kernel implementation.
> 
> Pingfan, can you have a look at this change on PPC?  Please feel free to
> give comments if any concern, or offer ack if it's OK to you.
> 
>>
>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>> we'll start by:
>> 1. Testing if it contains any holes, and reject if so
>> 2. Testing if pages belong to different zones, and reject if so
>> 3. Isolating the page range, checking if it contains any unmovable pages
>>
>> Using is_mem_section_removable() before trying to offline is not only racy,
>> it can easily result in false positives/negatives. Let's stop manually
>> checking is_mem_section_removable(), and let device_offline() handle it
>> completely instead. We can remove the racy is_mem_section_removable()
>> implementation next.
>>
>> We now take more locks (e.g., memory hotplug lock when offlining and the
>> zone lock when isolating), but maybe we should optimize that
>> implementation instead if this ever becomes a real problem (after all,
>> memory unplug is already an expensive operation). We started using
>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>> Implement memory hotplug remove in the kernel"), with the initial
>> hotremove support of lmbs.
>>
>> Cc: Nathan Fontenot 
>> Cc: Michael Ellerman 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Michal Hocko 
>> Cc: Andrew Morton 
>> Cc: Oscar Salvador 
>> Cc: Baoquan He 
>> Cc: Wei Yang 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  .../platforms/pseries/hotplug-memory.c| 26 +++
>>  1 file changed, 3 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
>> b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index b2cde1732301..5ace2f9a277e 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node 
>> *np)
>>  
>>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>>  {
>> -int i, scns_per_block;
>> -bool rc = true;
>> -unsigned long pfn, block_sz;
>> -u64 phys_addr;
>> -
>>  if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>>  return false;
>>  
>> -block_sz = memory_block_size_bytes();
>> -scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>> -phys_addr = lmb->base_addr;
>> -
>>  #ifdef CONFIG_FA_DUMP
>>  /*
>>   * Don't hot-remove memory that falls in fadump boot memory area
>>   * and memory that is reserved for capturing old kernel memory.
>>   */
>> -if (is_fadump_memory_area(phys_addr, block_sz))
>> +if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>>  return false;
>>  #endif
>> -
>> -for (i = 0; i < scns_per_block; i++) {
>> -pfn = PFN_DOWN(phys_addr);
>> -if (!pfn_in_present_section(pfn)) {
>> -phys_addr += MIN_MEMORY_BLOCK_SIZE;
>> -continue;
>> -}
>> -
>> -rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
>> -phys_addr += MIN_MEMORY_BLOCK_SIZE;
>> -}
>> -
>> -return rc;
>> +/* device_offline() will determine if we can actually remove this lmb */
>> +return true;
So I think here swaps the check and do sequence. At least it breaks
dlpar_memory_remove_by_count(). It is doable to remove
is_mem_section_removable(), but here should be more effort to re-arrange
the code.

Thanks,
Pingfan
>>  }
>>  
>>  static int dlpar_add_lmb(struct drmem_lmb *);
>> -- 
>> 2.25.1
>>



[PATCHv2 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents

2020-04-08 Thread Pingfan Liu
A bug is observed on pseries by taking the following steps on rhel:
-1. drmgr -c mem -r -q 5
-2. echo c > /proc/sysrq-trigger

And then, the failure looks like:
kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
 Checking for memory holes : [  0.0 %] /
   Checking for memory holes : [100.0 %] |  
 Excluding unnecessary pages   : [100.0 %] \
   Copying data  : [  0.3 %] -  
eta: 38s[   44.337636] hash-mmu: mm: Hashing failure ! EA=0x7fffba40 
access=0x8004 current=makedumpfile
[   44.337663] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 
psize 2 pte=0xc0005504
[   44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba40 
access=0x8004 current=makedumpfile
[   44.337692] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 
psize 2 pte=0xc0005504
[   44.337708] makedumpfile[469]: unhandled signal 7 at 7fffba40 nip 
7fffbbc4d7fc lr 00011356ca3c code 2
[   44.338548] Core dump to |/bin/false pipe failed
/lib/kdump-lib-initramfs.sh: line 98:   469 Bus error   
$CORE_COLLECTOR /proc/vmcore 
$_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
kdump: saving vmcore failed

* Root cause *
  After analyzing, it turns out that in the current implementation,
when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
the code __remove_memory() comes before drmem_update_dt().
So in kdump kernel, when read_from_oldmem() resorts to
pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
can be observed "Bus error"

>From a viewpoint of listener and publisher, the publisher notifies the
listener before data is ready.  This introduces a problem where udev
launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
updating. And in capture kernel, makedumpfile will access the memory based
on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.

* Fix *
  In order to fix this issue, update dt before __remove_memory(), and
accordingly the same rule in hot-add path.

This will introduce extra dt updating payload for each involved lmb when 
hotplug.
But it should be fine since drmem_update_dt() is memory based operation and
hotplug is not a hot path.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Hari Bathini 
Cc: Leonardo Bras  
Cc: Libor Pechacek  
Cc: Nathan Fontenot  
To: linuxppc-dev@lists.ozlabs.org
Cc: ke...@lists.infradead.org
---
v1 -> v2: improve commit, and more detail about the SIGBUG failure path
 arch/powerpc/platforms/pseries/hotplug-memory.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 4bd9004..72cd4a5 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -394,6 +394,9 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+   rtas_hp_event = true;
+   drmem_update_dt();
+   rtas_hp_event = false;
 
__remove_memory(nid, base_addr, block_sz);
 
@@ -667,6 +670,9 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
lmb_set_nid(lmb);
lmb->flags |= DRCONF_MEM_ASSIGNED;
+   rtas_hp_event = true;
+   drmem_update_dt();
+   rtas_hp_event = false;
 
block_sz = memory_block_size_bytes();
 
@@ -685,6 +691,9 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+   rtas_hp_event = true;
+   drmem_update_dt();
+   rtas_hp_event = false;
 
__remove_memory(nid, base_addr, block_sz);
}
@@ -941,12 +950,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
break;
}
 
-   if (!rc) {
-   rtas_hp_event = true;
-   rc = drmem_update_dt();
-   rtas_hp_event = false;
-   }
-
unlock_device_hotplug();
return rc;
 }
-- 
2.7.5



[PATCHv2 1/2] powerpc/pseries: group lmb operation and memblock's

2020-04-08 Thread Pingfan Liu
This patch prepares for the incoming patch which swaps the order of KOBJ_
uevent and dt's updating.

It has no functional effect, just groups lmb operation and memblock's in
order to insert dt updating operation easily, and makes it easier to
review.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Hari Bathini 
Cc: Leonardo Bras 
Cc: Libor Pechacek 
Cc: Nathan Fontenot 
To: linuxppc-dev@lists.ozlabs.org
Cc: ke...@lists.infradead.org
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 26 -
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index b2cde17..4bd9004 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -377,7 +377,8 @@ static int dlpar_add_lmb(struct drmem_lmb *);
 static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 {
unsigned long block_sz;
-   int rc;
+   phys_addr_t base_addr;
+   int rc, nid;
 
if (!lmb_is_removable(lmb))
return -EINVAL;
@@ -386,17 +387,19 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
if (rc)
return rc;
 
+   base_addr = lmb->base_addr;
+   nid = lmb->nid;
block_sz = pseries_memory_block_size();
 
-   __remove_memory(lmb->nid, lmb->base_addr, block_sz);
-
-   /* Update memory regions for memory remove */
-   memblock_remove(lmb->base_addr, block_sz);
-
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
 
+   __remove_memory(nid, base_addr, block_sz);
+
+   /* Update memory regions for memory remove */
+   memblock_remove(base_addr, block_sz);
+
return 0;
 }
 
@@ -663,6 +666,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
}
 
lmb_set_nid(lmb);
+   lmb->flags |= DRCONF_MEM_ASSIGNED;
+
block_sz = memory_block_size_bytes();
 
/* Add the memory */
@@ -674,11 +679,14 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
rc = dlpar_online_lmb(lmb);
if (rc) {
-   __remove_memory(lmb->nid, lmb->base_addr, block_sz);
+   int nid = lmb->nid;
+   phys_addr_t base_addr = lmb->base_addr;
+
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
-   } else {
-   lmb->flags |= DRCONF_MEM_ASSIGNED;
+   lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+
+   __remove_memory(nid, base_addr, block_sz);
}
 
return rc;
-- 
2.7.5



Re: [PATCH 17/28] mm: remove the prot argument from vm_map_ram

2020-04-08 Thread Gao Xiang
On Wed, Apr 08, 2020 at 01:59:15PM +0200, Christoph Hellwig wrote:
> This is always GFP_KERNEL - for long term mappings with other properties
> vmap should be used.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c   | 2 +-
>  drivers/media/common/videobuf2/videobuf2-dma-sg.c  | 3 +--
>  drivers/media/common/videobuf2/videobuf2-vmalloc.c | 3 +--
>  fs/erofs/decompressor.c| 2 +-

For EROFS part,

Acked-by: Gao Xiang 

Thanks,
Gao Xiang



Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

2020-04-08 Thread Anshuman Khandual


On 04/08/2020 05:45 PM, Gerald Schaefer wrote:
> On Wed, 8 Apr 2020 12:41:51 +0530
> Anshuman Khandual  wrote:
> 
> [...]
>>>   

 Some thing like this instead.

 pte_t pte = READ_ONCE(*ptep);
 pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));

 We cannot use mk_pte_phys() as it is defined only on some platforms
 without any generic fallback for others.  
>>>
>>> Oh, didn't know that, sorry. What about using mk_pte() instead, at least
>>> it would result in a present pte:
>>>
>>> pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));  
>>
>> Lets use mk_pte() here but can we do this instead
>>
>> paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK;
>> pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot));
>>
> 
> Sure, that will also work.
> 
> BTW, this RANDOM_ORVALUE is not really very random, the way it is
> defined. For s390 we already changed it to mask out some arch bits,
> but I guess there are other archs and bits that would always be
> set with this "not so random" value, and I wonder if/how that would
> affect all the tests using this value, see also below.

RANDOM_ORVALUE is a constant which was added in order to make sure
that the page table entries should have some non-zero value before
getting called with pxx_clear() and followed by a pxx_none() check.
This is currently used only in pxx_clear_tests() tests. Hence there
is no impact for the existing tests.

> 
>>>
>>> And if you also want to do some with the existing value, which seems
>>> to be an empty pte, then maybe just check if writing and reading that
>>> value with set_huge_pte_at() / huge_ptep_get() returns the same,
>>> i.e. initially w/o RANDOM_ORVALUE.
>>>
>>> So, in combination, like this (BTW, why is the barrier() needed, it
>>> is not used for the other set_huge_pte_at() calls later?):  
>>
>> Ahh missed, will add them. Earlier we faced problem without it after
>> set_pte_at() for a test on powerpc (64) platform. Hence just added it
>> here to be extra careful.
>>
>>>
>>> @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test
>>> struct page *page = pfn_to_page(pfn);
>>> pte_t pte = READ_ONCE(*ptep);
>>>  
>>> -   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>>> +   set_huge_pte_at(mm, vaddr, ptep, pte);
>>> +   WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
>>> +
>>> +   pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), 
>>> prot));
>>> set_huge_pte_at(mm, vaddr, ptep, pte);
>>> barrier();
>>> WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
>>>
>>> This would actually add a new test "write empty pte with
>>> set_huge_pte_at(), then verify with huge_ptep_get()", which happens
>>> to trigger a warning on s390 :-)  
>>
>> On arm64 as well which checks for pte_present() in set_huge_pte_at().
>> But PTE present check is not really present in each set_huge_pte_at()
>> implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT.
>> Hence wondering if we should add this new test here which will keep
>> giving warnings on s390 and arm64 (at the least).
> 
> Hmm, interesting. I forgot about huge swap / migration, which is not
> (and probably cannot be) supported on s390. The pte_present() check
> on arm64 seems to check for such huge swap / migration entries,
> according to the comment.
> 
> The new test "write empty pte with set_huge_pte_at(), then verify
> with huge_ptep_get()" would then probably trigger the
> WARN_ON(!pte_present(pte)) in arm64 code. So I guess "writing empty
> ptes with set_huge_pte_at()" is not really a valid use case in practice,
> or else you would have seen this warning before. In that case, it
> might not be a good idea to add this test.

Got it.

> 
> I also do wonder now, why the original test with
> "pte = __pte(pte_val(pte) | RANDOM_ORVALUE);"
> did not also trigger that warning on arm64. On s390 this test failed
> exactly because the constructed pte was not present (initially empty,
> or'ing RANDOM_ORVALUE does not make it present for s390). I guess this
> just worked by chance on arm64, because the bits from RANDOM_ORVALUE
> also happened to mark the pte present for arm64.

That is correct. RANDOM_ORVALUE has got PTE_PROT_NONE bit set that makes
the PTE test for pte_present().

On arm64 platform,

#define pte_present(pte)  (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))

> 
> This brings us back to the question above, regarding the "randomness"
> of RANDOM_ORVALUE. Not really sure what the intention behind that was,
> but maybe it would make sense to restrict this RANDOM_ORVALUE to
> non-arch-specific bits, i.e. only bits that would be part of the
> address value within a page table entry? Or was it intentionally
> chosen to also mess with other bits?

As mentioned before, RANDOM_ORVALUE just helped make a given page table
entry contain non-zero values before getting cleared. AFAICS we should
not need RANDOM_ORVALUE for HugeTLB test here. I believe 

Re: [PATCH 31/35] powerpc: docs: cxl.rst: mark two section titles as such

2020-04-08 Thread Andrew Donnellan

On 9/4/20 1:46 am, Mauro Carvalho Chehab wrote:

The User API chapter contains two sub-chapters. Mark them as
such.

Signed-off-by: Mauro Carvalho Chehab 


Thanks.

Though the other subsections in this file use - rather than ^, 
what's the difference?


Acked-by: Andrew Donnellan 


---
  Documentation/powerpc/cxl.rst | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/Documentation/powerpc/cxl.rst b/Documentation/powerpc/cxl.rst
index 920546d81326..d2d77057610e 100644
--- a/Documentation/powerpc/cxl.rst
+++ b/Documentation/powerpc/cxl.rst
@@ -133,6 +133,7 @@ User API
  
  
  1. AFU character devices

+
  
  For AFUs operating in AFU directed mode, two character device

  files will be created. /dev/cxl/afu0.0m will correspond to a
@@ -395,6 +396,7 @@ read
  
  
  2. Card character device (powerVM guest only)

+^
  
  In a powerVM guest, an extra character device is created for the

  card. The device is only used to write (flash) a new image on the



--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-04-08 Thread Paul Mackerras
On Wed, Apr 08, 2020 at 10:21:29PM +1000, Michael Ellerman wrote:
> 
> We should be able to just allocate the rtas_args on the stack, it's only
> ~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
> take the global lock.

Do we instantiate a 64-bit RTAS these days, or is it still 32-bit?
In the old days we had to make sure the RTAS argument buffer was
below the 4GB point.  If that's still necessary then perhaps putting
rtas_args inside the PACA would be the way to go.

Paul.


Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-04-08 Thread Leonardo Bras
On Wed, 2020-04-08 at 22:21 +1000, Michael Ellerman wrote:
[...]
> > According with LoPAR, for both of these rtas-calls, we have:
> > 
> > For the PowerPC External Interrupt option: The call must be reentrant
> > to the number of processors on the platform.
> > For the PowerPC External Interrupt option: The argument call buffer for
> > each simultaneous call must be physically unique.
> 
> Oh well spotted. Where is that in the doc?
> 
> > Which I think means this rtas-calls can be done simultaneously.
> 
> I think so too. I'll read PAPR in the morning and make sure.
> 
> > Would it mean that busting the rtas.lock for these calls would be safe?
> 
> What would be better is to make those specific calls not take the global
> RTAS lock to begin with.
> 
> We should be able to just allocate the rtas_args on the stack, it's only
> ~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
> take the global lock.

Hello Michael,

I did the simplest possible version of this change:
http://patchwork.ozlabs.org/patch/1268371/

Where I create a rtas_call_reentrant(), and replace rtas_call() for
that in all the possible calls of "ibm,int-on", "ibm,int-off",ibm,get-
xive" and  "ibm,set-xive".

At first, I was planning on creating a function that tells if the
requested token is one of above, before automatically choosing between
the common and reentrant versions. But it seemed like unnecessary
overhead, since the current calls are very few and very straight. 

What do you think on this?

Best regards,
Leonardo Bras


signature.asc
Description: This is a digitally signed message part


Re: usb: gadget: fsl_udc_core: Checking for a failed platform_get_irq() call in fsl_udc_probe()

2020-04-08 Thread Li Yang
On Wed, Apr 8, 2020 at 9:19 AM Markus Elfring  wrote:
>
> Hello,
>
> I have taken another look at the implementation of the function 
> “fsl_udc_probe”.
> A software analysis approach points the following source code out for
> further development considerations.
> https://elixir.bootlin.com/linux/v5.6.2/source/drivers/usb/gadget/udc/fsl_udc_core.c#L2443
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/udc/fsl_udc_core.c?id=f5e94d10e4c468357019e5c28d48499f677b284f#n2442
>
> udc_controller->irq = platform_get_irq(pdev, 0);
> if (!udc_controller->irq) {
> ret = -ENODEV;
> goto err_iounmap;
> }
>
>
> The software documentation is providing the following information
> for the used programming interface.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c?id=f5e94d10e4c468357019e5c28d48499f677b284f#n221
> https://elixir.bootlin.com/linux/v5.6.2/source/drivers/base/platform.c#L202
>
> “…
>  * Return: IRQ number on success, negative error number on failure.
> …”
>
> Would you like to reconsider the shown condition check?

Thanks for the finding.  This is truly a software issue that need to
be fixed.  Would you submit a patch for it or you want us to fix it?

Regards,
Leo


[PATCH 1/1] powerpc/rtas: Implement reentrant rtas call

2020-04-08 Thread Leonardo Bras
Implement rtas_call_reentrant() for reentrant rtas-calls:
"ibm,int-on", "ibm,int-off",ibm,get-xive" and  "ibm,set-xive".

On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
items 2 and 3 say:

2 - For the PowerPC External Interrupt option: The * call must be
reentrant to the number of processors on the platform.
3 - For the PowerPC External Interrupt option: The * argument call
buffer for each simultaneous call must be physically unique.

So, these rtas-calls can be called in a lockless way, if using
a different buffer for each call.

This can be useful to avoid deadlocks in crashing, where rtas-calls are
needed, but some other thread crashed holding the rtas.lock.

Signed-off-by: Leonardo Bras 
---
 arch/powerpc/include/asm/rtas.h |  1 +
 arch/powerpc/kernel/rtas.c  | 21 +
 arch/powerpc/sysdev/xics/ics-rtas.c | 22 +++---
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 3c1887351c71..1ad1c85dab5e 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -352,6 +352,7 @@ extern struct rtas_t rtas;
 extern int rtas_token(const char *service);
 extern int rtas_service_present(const char *service);
 extern int rtas_call(int token, int, int, int *, ...);
+int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...);
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
int nret, ...);
 extern void __noreturn rtas_restart(char *cmd);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index c5fa251b8950..85e7511afe25 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -483,6 +483,27 @@ int rtas_call(int token, int nargs, int nret, int 
*outputs, ...)
 }
 EXPORT_SYMBOL(rtas_call);
 
+/*
+ * Used for reentrant rtas calls.
+ * According to LoPAR documentation, only "ibm,int-on", "ibm,int-off",
+ * "ibm,get-xive" and "ibm,set-xive" are currently reentrant.
+ * Reentrant calls need their own rtas_args buffer, so not using rtas.args.
+ */
+int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...)
+{
+   va_list list;
+   struct rtas_args rtas_args;
+
+   if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
+   return -1;
+
+   va_start(list, outputs);
+   va_rtas_call_unlocked(_args, token, nargs, nret, list);
+   va_end(list);
+
+   return be32_to_cpu(rtas_args.rets[0]);
+}
+
 /* For RTAS_BUSY (-2), delay for 1 millisecond.  For an extended busy status
  * code of 990n, perform the hinted delay of 10^n (last digit) milliseconds.
  */
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c 
b/arch/powerpc/sysdev/xics/ics-rtas.c
index 6aabc74688a6..4cf18000f07c 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -50,8 +50,8 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
 
server = xics_get_irq_server(d->irq, irq_data_get_affinity_mask(d), 0);
 
-   call_status = rtas_call(ibm_set_xive, 3, 1, NULL, hw_irq, server,
-   DEFAULT_PRIORITY);
+   call_status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL, hw_irq,
+ server, DEFAULT_PRIORITY);
if (call_status != 0) {
printk(KERN_ERR
"%s: ibm_set_xive irq %u server %x returned %d\n",
@@ -60,7 +60,7 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
}
 
/* Now unmask the interrupt (often a no-op) */
-   call_status = rtas_call(ibm_int_on, 1, 1, NULL, hw_irq);
+   call_status = rtas_call_reentrant(ibm_int_on, 1, 1, NULL, hw_irq);
if (call_status != 0) {
printk(KERN_ERR "%s: ibm_int_on irq=%u returned %d\n",
__func__, hw_irq, call_status);
@@ -91,7 +91,7 @@ static void ics_rtas_mask_real_irq(unsigned int hw_irq)
if (hw_irq == XICS_IPI)
return;
 
-   call_status = rtas_call(ibm_int_off, 1, 1, NULL, hw_irq);
+   call_status = rtas_call_reentrant(ibm_int_off, 1, 1, NULL, hw_irq);
if (call_status != 0) {
printk(KERN_ERR "%s: ibm_int_off irq=%u returned %d\n",
__func__, hw_irq, call_status);
@@ -99,8 +99,8 @@ static void ics_rtas_mask_real_irq(unsigned int hw_irq)
}
 
/* Have to set XIVE to 0xff to be able to remove a slot */
-   call_status = rtas_call(ibm_set_xive, 3, 1, NULL, hw_irq,
-   xics_default_server, 0xff);
+   call_status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL, hw_irq,
+ xics_default_server, 0xff);
if (call_status != 0) {
printk(KERN_ERR "%s: ibm_set_xive(0xff) irq=%u returned %d\n",
__func__, hw_irq, call_status);
@@ -131,7 +131,7 @@ static int ics_rtas_set_affinity(struct 

[PATCH 3/3] selftests/powerpc: ensure PMC reads are set and ordered on count_pmc

2020-04-08 Thread Desnes A. Nunes do Rosario
Function count_pmc() needs a memory barrier to ensure that PMC reads are
fully consistent. The lack of it can occasionally fail pmc56_overflow test,
since depending on the workload on the system, PMC5 & 6 can have past val-
ues from the time the counters are frozen and turned back on. These past
values will be accounted as overflows and make the test fail.

=
test: pmc56_overflow
...
ebb_state:
...
>>pmc[5] count = 0xfd4cbc8c
>>pmc[6] count = 0xddd8b3b6
HW state:
MMCR0 0x8400 FC PMAE
MMCR2 0x
EBBHR 0x10003f68
BESCR 0x8000 GE
...
PMC5  0x
PMC6  0x
SIAR  0x10003398
...
  [3]: register SPRN_PMC2  = 0x8003
  [4]: register SPRN_PMC5  = 0x
  [5]: register SPRN_PMC6  = 0x
  [6]: register SPRN_PMC2  = 0x8003
>>[7]: register SPRN_PMC5  = 0x8f21266d
>>[8]: register SPRN_PMC6  = 0x0da80f8d
  [9]: register SPRN_PMC2  = 0x8003
>>[10]: register SPRN_PMC5  = 0x6e2b961f
>>[11]: register SPRN_PMC6  = 0xd030a429
  [12]: register SPRN_PMC2  = 0x8003
  [13]: register SPRN_PMC5  = 0x
  [14]: register SPRN_PMC6  = 0x
...
PMC5/6 overflow 2
[FAIL] Test FAILED on line 87
failure: pmc56_overflow
=

Signed-off-by: Desnes A. Nunes do Rosario 
---
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index bf6f25dfcf7b..6199f3cea0f9 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -258,6 +258,10 @@ int count_pmc(int pmc, uint32_t sample_period)
start_value = pmc_sample_period(sample_period);
 
val = read_pmc(pmc);
+
+   /* Ensure pmc value is consistent between freezes */
+   mb();
+
if (val < start_value)
ebb_state.stats.negative++;
else
-- 
2.21.1



[PATCH, RESEND, 1/3] selftests/powerpc: Use write_pmc instead of count_pmc to reset PMCs on ebb tests

2020-04-08 Thread Desnes A. Nunes do Rosario
By using count_pmc() to reset PMCs instead of write_pmc(), an extra count
is performed on ebb_state.stats.pmc_count[PMC_INDEX(pmc)]. This extra
pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra ebb_state.stats.pmc_count.

Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
>> [30]: register SPRN_PMC1  = 0x451e
PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

[desnesn: reflow of original comment]
Signed-off-by: Desnes A. Nunes do Rosario 
---
 .../powerpc/pmu/ebb/back_to_back_ebbs_test.c |  2 +-
 .../testing/selftests/powerpc/pmu/ebb/cycles_test.c  |  2 +-
 .../powerpc/pmu/ebb/cycles_with_freeze_test.c|  2 +-
 .../powerpc/pmu/ebb/cycles_with_mmcr2_test.c |  2 +-
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c|  2 +-
 .../powerpc/pmu/ebb/ebb_on_willing_child_test.c  |  2 +-
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c  |  2 +-
 .../selftests/powerpc/pmu/ebb/multi_counter_test.c   | 12 ++--
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c |  2 +-
 .../selftests/powerpc/pmu/ebb/pmae_handling_test.c   |  2 +-
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c  |  2 +-
 11 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
index a2d7b0e3dca9..f133ab425f10 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
@@ -91,7 +91,7 @@ int back_to_back_ebbs(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
index bc893813483e..14a399a64729 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
@@ -42,7 +42,7 @@ int cycles(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index dcd351d20328..0f2089f6f82c 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -99,7 +99,7 @@ int cycles_with_freeze(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
index 94c99c12c0f2..a8f3bee04cd8 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
@@ -71,7 +71,7 @@ int cycles_with_mmcr2(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index dfbc5c3ad52d..bf6f25dfcf7b 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -396,7 +396,7 @@ int ebb_child(union pipe read_pipe, union pipe write_pipe)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git 
a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
index ca2f7d729155..513812cdcca1 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
@@ -38,7 +38,7 @@ static int victim_child(union pipe read_pipe, union pipe 
write_pipe)

[PATCH, RESEND, 0/3] selftests/powerpc: A few fixes on powerpc selftests

2020-04-08 Thread Desnes A. Nunes do Rosario
This patchseries addresses a few fixes on powerpc selftests (first and
second patch are being resent).

The first fix has to do with the extra counts on PMCs resets, which not
only are shown on the trace_logs, but can also invalidate the results of a
few selftests. On the other hand, the second fix proper addresses the Per-
formance Monitor Alert (PMAE) bit on MMCR0 when freezing counters are dis-
abled on cycles_with_freeze_test selftest. Lastly, the third fix adds a
memory barrier on count_pmc() to ensure read consistency of PMCs. This is
necessary since these values are usually accounted on ebb_handlers to val-
lidade tests results, such as the overflow values on pmc56_overflow_test.

Desnes A. Nunes do Rosario (2):
  selftests/powerpc: Use write_pmc instead of count_pmc to reset PMCs on
ebb tests
  selftests/powerpc: ensure PMC reads are set and ordered on count_pmc

Gustavo Romero (1):
  selftests/powerpc: enable performance alerts when freezing counters on
cycles_with_freeze_test selftest

 .../powerpc/pmu/ebb/back_to_back_ebbs_test.c |  2 +-
 .../testing/selftests/powerpc/pmu/ebb/cycles_test.c  |  2 +-
 .../powerpc/pmu/ebb/cycles_with_freeze_test.c|  4 ++--
 .../powerpc/pmu/ebb/cycles_with_mmcr2_test.c |  2 +-
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c|  6 +-
 .../powerpc/pmu/ebb/ebb_on_willing_child_test.c  |  2 +-
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c  |  2 +-
 .../selftests/powerpc/pmu/ebb/multi_counter_test.c   | 12 ++--
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c |  2 +-
 .../selftests/powerpc/pmu/ebb/pmae_handling_test.c   |  2 +-
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c  |  2 +-
 11 files changed, 21 insertions(+), 17 deletions(-)

-- 
2.21.1



[PATCH, RESEND, 2/3] selftests/powerpc: enable performance alerts when freezing counters on cycles_with_freeze_test selftest

2020-04-08 Thread Desnes A. Nunes do Rosario
From: Gustavo Romero 

When disabling freezing counters by setting MMCR0 FC bit to 0, the MMCR0
PMAE bit must also be enabled if a Performance Monitor Alert (and the cor-
responding Performance Monitor Interrupt) is still desired to be received
when an enabled condition or event occurs.

This is the case of the cycles_with_freeze_test selftest, since the test
disables the MMCR0 PMAE due to the usage of PMU to trigger EBBs. This can
make the test loop up to the point of being killed by the test harness
timeout (2500 ms), since no other ebb event will happen because the MMCR0
PMAE bit is disabled, and thus, no more increments to ebb_count occur.

Fixes: 3752e453f6bafd7 ("selftests/powerpc: Add tests of PMU EBBs")
Signed-off-by: Gustavo Romero 
[desnesn: Only set MMCR0_PMAE when disabling MMCR0_FC, reflow comment]
Signed-off-by: Desnes A. Nunes do Rosario 
---
 .../testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index 0f2089f6f82c..d368199144fb 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -81,7 +81,7 @@ int cycles_with_freeze(void)
{
counters_frozen = false;
mb();
-   mtspr(SPRN_MMCR0, mfspr(SPRN_MMCR0) & ~MMCR0_FC);
+   mtspr(SPRN_MMCR0, (mfspr(SPRN_MMCR0) & ~MMCR0_FC) | MMCR0_PMAE);
 
FAIL_IF(core_busy_loop());
 
-- 
2.21.1



Re: [PATCH 03/35] docs: fix broken references to text files

2020-04-08 Thread Paul E. McKenney
On Wed, Apr 08, 2020 at 05:45:55PM +0200, Mauro Carvalho Chehab wrote:
> Several references got broken due to txt to ReST conversion.
> 
> Several of them can be automatically fixed with:
> 
>   scripts/documentation-file-ref-check --fix
> 
> Reviewed-by: Mathieu Poirier  # 
> hwtracing/coresight/Kconfig
> Signed-off-by: Mauro Carvalho Chehab 

For the memory-barrier.txt portions:

Reviewed-by: Paul E. McKenney 

> ---
>  Documentation/memory-barriers.txt|  2 +-
>  Documentation/process/submit-checklist.rst   |  2 +-
>  .../translations/it_IT/process/submit-checklist.rst  |  2 +-
>  Documentation/translations/ko_KR/memory-barriers.txt |  2 +-
>  .../translations/zh_CN/filesystems/sysfs.txt |  2 +-
>  .../translations/zh_CN/process/submit-checklist.rst  |  2 +-
>  Documentation/virt/kvm/arm/pvtime.rst|  2 +-
>  Documentation/virt/kvm/devices/vcpu.rst  |  2 +-
>  Documentation/virt/kvm/hypercalls.rst|  4 ++--
>  arch/powerpc/include/uapi/asm/kvm_para.h |  2 +-
>  drivers/gpu/drm/Kconfig  |  2 +-
>  drivers/gpu/drm/drm_ioctl.c  |  2 +-
>  drivers/hwtracing/coresight/Kconfig  |  2 +-
>  fs/fat/Kconfig   |  8 
>  fs/fuse/Kconfig  |  2 +-
>  fs/fuse/dev.c|  2 +-
>  fs/overlayfs/Kconfig |  6 +++---
>  include/linux/mm.h   |  4 ++--
>  include/uapi/linux/ethtool_netlink.h |  2 +-
>  include/uapi/rdma/rdma_user_ioctl_cmds.h |  2 +-
>  mm/gup.c | 12 ++--
>  virt/kvm/arm/vgic/vgic-mmio-v3.c |  2 +-
>  virt/kvm/arm/vgic/vgic.h |  4 ++--
>  23 files changed, 36 insertions(+), 36 deletions(-)
> 
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index e1c355e84edd..eaabc3134294 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -620,7 +620,7 @@ because the CPUs that the Linux kernel supports don't do 
> writes
>  until they are certain (1) that the write will actually happen, (2)
>  of the location of the write, and (3) of the value to be written.
>  But please carefully read the "CONTROL DEPENDENCIES" section and the
> -Documentation/RCU/rcu_dereference.txt file:  The compiler can and does
> +Documentation/RCU/rcu_dereference.rst file:  The compiler can and does
>  break dependencies in a great many highly creative ways.
>  
>   CPU 1 CPU 2
> diff --git a/Documentation/process/submit-checklist.rst 
> b/Documentation/process/submit-checklist.rst
> index 8e56337d422d..3f8e9d5d95c2 100644
> --- a/Documentation/process/submit-checklist.rst
> +++ b/Documentation/process/submit-checklist.rst
> @@ -107,7 +107,7 @@ and elsewhere regarding submitting Linux kernel patches.
>  and why.
>  
>  26) If any ioctl's are added by the patch, then also update
> -``Documentation/ioctl/ioctl-number.rst``.
> +``Documentation/userspace-api/ioctl/ioctl-number.rst``.
>  
>  27) If your modified source code depends on or uses any of the kernel
>  APIs or features that are related to the following ``Kconfig`` symbols,
> diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst 
> b/Documentation/translations/it_IT/process/submit-checklist.rst
> index 995ee69fab11..3e575502690f 100644
> --- a/Documentation/translations/it_IT/process/submit-checklist.rst
> +++ b/Documentation/translations/it_IT/process/submit-checklist.rst
> @@ -117,7 +117,7 @@ sottomissione delle patch, in particolare
>  sorgenti che ne spieghi la logica: cosa fanno e perché.
>  
>  25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate
> -``Documentation/ioctl/ioctl-number.rst``.
> +``Documentation/userspace-api/ioctl/ioctl-number.rst``.
>  
>  26) Se il codice che avete modificato dipende o usa una qualsiasi 
> interfaccia o
>  funzionalità del kernel che è associata a uno dei seguenti simboli
> diff --git a/Documentation/translations/ko_KR/memory-barriers.txt 
> b/Documentation/translations/ko_KR/memory-barriers.txt
> index 2e831ece6e26..e50fe6541335 100644
> --- a/Documentation/translations/ko_KR/memory-barriers.txt
> +++ b/Documentation/translations/ko_KR/memory-barriers.txt
> @@ -641,7 +641,7 @@ P 는 짝수 번호 캐시 라인에 저장되어 있고, 변수 B 는 홀수 
>  리눅스 커널이 지원하는 CPU 들은 (1) 쓰기가 정말로 일어날지, (2) 쓰기가 어디에
>  이루어질지, 그리고 (3) 쓰여질 값을 확실히 알기 전까지는 쓰기를 수행하지 않기
>  때문입니다.  하지만 "컨트롤 의존성" 섹션과
> -Documentation/RCU/rcu_dereference.txt 파일을 주의 깊게 읽어 주시기 바랍니다:
> +Documentation/RCU/rcu_dereference.rst 파일을 주의 깊게 읽어 주시기 바랍니다:
>  컴파일러는 매우 창의적인 많은 방법으로 종속성을 깰 수 있습니다.
>  
>   CPU 1 CPU 2
> diff --git a/Documentation/translations/zh_CN/filesystems/sysfs.txt 
> 

Re: [PATCH] soc: fsl: dpio: avoid stack usage warning

2020-04-08 Thread Russell King - ARM Linux admin
On Wed, Apr 08, 2020 at 08:58:16PM +0200, Arnd Bergmann wrote:
> A 1024 byte variable on the stack will warn on any 32-bit architecture
> during compile-testing, and is generally a bad idea anyway:
> 
> fsl/dpio/dpio-service.c: In function 
> 'dpaa2_io_service_enqueue_multiple_desc_fq':
> fsl/dpio/dpio-service.c:495:1: error: the frame size of 1032 bytes is larger 
> than 1024 bytes [-Werror=frame-larger-than=]
> 
> There are currently no callers of this function, so I cannot tell whether
> dynamic memory allocation is allowed once callers are added. Change
> it to kcalloc for now, if anyone gets a warning about calling this in
> atomic context after they start using it, they can fix it later.
> 
> Fixes: 9d98809711ae ("soc: fsl: dpio: Adding QMAN multiple enqueue interface")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/soc/fsl/dpio/dpio-service.c | 18 +-
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/soc/fsl/dpio/dpio-service.c 
> b/drivers/soc/fsl/dpio/dpio-service.c
> index cd4f6410e8c2..ff0ef8cbdbff 100644
> --- a/drivers/soc/fsl/dpio/dpio-service.c
> +++ b/drivers/soc/fsl/dpio/dpio-service.c
> @@ -478,12 +478,17 @@ int dpaa2_io_service_enqueue_multiple_desc_fq(struct 
> dpaa2_io *d,
>   const struct dpaa2_fd *fd,
>   int nb)
>  {
> - int i;
> - struct qbman_eq_desc ed[32];
> + struct qbman_eq_desc *ed = kcalloc(sizeof(struct qbman_eq_desc), 32, 
> GFP_KERNEL);

I think you need to rearrange this to be more compliant with the coding
style.

> + int i, ret;
> +
> + if (!ed)
> + return -ENOMEM;
>  
>   d = service_select(d);
> - if (!d)
> - return -ENODEV;
> + if (!d) {
> + ret = -ENODEV;
> + goto out;
> + }
>  
>   for (i = 0; i < nb; i++) {
>   qbman_eq_desc_clear([i]);
> @@ -491,7 +496,10 @@ int dpaa2_io_service_enqueue_multiple_desc_fq(struct 
> dpaa2_io *d,
>   qbman_eq_desc_set_fq([i], fqid[i]);
>   }
>  
> - return qbman_swp_enqueue_multiple_desc(d->swp, [0], fd, nb);
> + ret = qbman_swp_enqueue_multiple_desc(d->swp, [0], fd, nb);
> +out:
> + kfree(ed);
> + return ret;
>  }
>  EXPORT_SYMBOL(dpaa2_io_service_enqueue_multiple_desc_fq);
>  
> -- 
> 2.26.0
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up


[PATCH] soc: fsl: dpio: fix incorrect pointer conversions

2020-04-08 Thread Arnd Bergmann
Building dpio for 32 bit shows a new compiler warning from converting
a pointer to a u64:

drivers/soc/fsl/dpio/qbman-portal.c: In function 
'qbman_swp_enqueue_multiple_desc_direct':
drivers/soc/fsl/dpio/qbman-portal.c:870:14: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
  870 |  addr_cena = (uint64_t)s->addr_cena;

The variable is not used anywhere, so removing the assignment seems
to be the correct workaround. After spotting what seemed to be
some confusion about address spaces, I ran the file through sparse,
which showed more warnings:

drivers/soc/fsl/dpio/qbman-portal.c:756:42: warning: incorrect type in argument 
1 (different address spaces)
drivers/soc/fsl/dpio/qbman-portal.c:756:42:expected void const volatile 
[noderef]  *addr
drivers/soc/fsl/dpio/qbman-portal.c:756:42:got unsigned int [usertype] 
*[assigned] p
drivers/soc/fsl/dpio/qbman-portal.c:902:42: warning: incorrect type in argument 
1 (different address spaces)
drivers/soc/fsl/dpio/qbman-portal.c:902:42:expected void const volatile 
[noderef]  *addr
drivers/soc/fsl/dpio/qbman-portal.c:902:42:got unsigned int [usertype] 
*[assigned] p

Here, the problem is passing a token from memremap() into __raw_readl(),
which is only defined to work on MMIO addresses but not RAM. Turning
this into a simple pointer dereference avoids this warning as well.

Fixes: 3b2abda7d28c ("soc: fsl: dpio: Replace QMAN array mode with ring mode 
enqueue")
Signed-off-by: Arnd Bergmann 
---
 drivers/soc/fsl/dpio/qbman-portal.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/dpio/qbman-portal.c 
b/drivers/soc/fsl/dpio/qbman-portal.c
index d1f49caa5b13..804b8ba9bf5c 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -753,7 +753,7 @@ int qbman_swp_enqueue_multiple_mem_back(struct qbman_swp *s,
if (!s->eqcr.available) {
eqcr_ci = s->eqcr.ci;
p = s->addr_cena + QBMAN_CENA_SWP_EQCR_CI_MEMBACK;
-   s->eqcr.ci = __raw_readl(p) & full_mask;
+   s->eqcr.ci = *p & full_mask;
s->eqcr.available = qm_cyc_diff(s->eqcr.pi_ring_size,
eqcr_ci, s->eqcr.ci);
if (!s->eqcr.available) {
@@ -823,7 +823,6 @@ int qbman_swp_enqueue_multiple_desc_direct(struct qbman_swp 
*s,
const uint32_t *cl;
uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
int i, num_enqueued = 0;
-   uint64_t addr_cena;
 
half_mask = (s->eqcr.pi_ci_mask>>1);
full_mask = s->eqcr.pi_ci_mask;
@@ -867,7 +866,6 @@ int qbman_swp_enqueue_multiple_desc_direct(struct qbman_swp 
*s,
 
/* Flush all the cacheline without load/store in between */
eqcr_pi = s->eqcr.pi;
-   addr_cena = (uint64_t)s->addr_cena;
for (i = 0; i < num_enqueued; i++)
eqcr_pi++;
s->eqcr.pi = eqcr_pi & full_mask;
@@ -901,7 +899,7 @@ int qbman_swp_enqueue_multiple_desc_mem_back(struct 
qbman_swp *s,
if (!s->eqcr.available) {
eqcr_ci = s->eqcr.ci;
p = s->addr_cena + QBMAN_CENA_SWP_EQCR_CI_MEMBACK;
-   s->eqcr.ci = __raw_readl(p) & full_mask;
+   s->eqcr.ci = *p & full_mask;
s->eqcr.available = qm_cyc_diff(s->eqcr.pi_ring_size,
eqcr_ci, s->eqcr.ci);
if (!s->eqcr.available)
-- 
2.26.0



[PATCH] soc: fsl: dpio: avoid stack usage warning

2020-04-08 Thread Arnd Bergmann
A 1024 byte variable on the stack will warn on any 32-bit architecture
during compile-testing, and is generally a bad idea anyway:

fsl/dpio/dpio-service.c: In function 
'dpaa2_io_service_enqueue_multiple_desc_fq':
fsl/dpio/dpio-service.c:495:1: error: the frame size of 1032 bytes is larger 
than 1024 bytes [-Werror=frame-larger-than=]

There are currently no callers of this function, so I cannot tell whether
dynamic memory allocation is allowed once callers are added. Change
it to kcalloc for now, if anyone gets a warning about calling this in
atomic context after they start using it, they can fix it later.

Fixes: 9d98809711ae ("soc: fsl: dpio: Adding QMAN multiple enqueue interface")
Signed-off-by: Arnd Bergmann 
---
 drivers/soc/fsl/dpio/dpio-service.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/soc/fsl/dpio/dpio-service.c 
b/drivers/soc/fsl/dpio/dpio-service.c
index cd4f6410e8c2..ff0ef8cbdbff 100644
--- a/drivers/soc/fsl/dpio/dpio-service.c
+++ b/drivers/soc/fsl/dpio/dpio-service.c
@@ -478,12 +478,17 @@ int dpaa2_io_service_enqueue_multiple_desc_fq(struct 
dpaa2_io *d,
const struct dpaa2_fd *fd,
int nb)
 {
-   int i;
-   struct qbman_eq_desc ed[32];
+   struct qbman_eq_desc *ed = kcalloc(sizeof(struct qbman_eq_desc), 32, 
GFP_KERNEL);
+   int i, ret;
+
+   if (!ed)
+   return -ENOMEM;
 
d = service_select(d);
-   if (!d)
-   return -ENODEV;
+   if (!d) {
+   ret = -ENODEV;
+   goto out;
+   }
 
for (i = 0; i < nb; i++) {
qbman_eq_desc_clear([i]);
@@ -491,7 +496,10 @@ int dpaa2_io_service_enqueue_multiple_desc_fq(struct 
dpaa2_io *d,
qbman_eq_desc_set_fq([i], fqid[i]);
}
 
-   return qbman_swp_enqueue_multiple_desc(d->swp, [0], fd, nb);
+   ret = qbman_swp_enqueue_multiple_desc(d->swp, [0], fd, nb);
+out:
+   kfree(ed);
+   return ret;
 }
 EXPORT_SYMBOL(dpaa2_io_service_enqueue_multiple_desc_fq);
 
-- 
2.26.0



Re: [PATCH v5 18/21] powerpc64: Add prefixed instructions to instruction data type

2020-04-08 Thread Christophe Leroy




Le 08/04/2020 à 20:11, Segher Boessenkool a écrit :

On Mon, Apr 06, 2020 at 12:25:27PM +0200, Christophe Leroy wrote:

if (ppc_inst_prefixed(x) != ppc_inst_prefixed(y))
return false;
else if (ppc_inst_prefixed(x))
return !memcmp(, , sizeof(struct ppc_inst));


Are we sure memcmp() is a good candidate for the comparison ? Can we do
simpler ? Especially, I understood a prefixed instruction is a 64 bits
properly aligned instruction, can we do a simple u64 compare ? Or is GCC
intelligent enough to do that without calling memcmp() function which is
heavy ?


A prefixed insn is *not* 8-byte aligned, it is 4-byte aligned, fwiw.


Ah, yes, I read too fast https://patchwork.ozlabs.org/patch/1266721/

It's not 64 bits, it is 64 bytes.



memcmp() isn't as heavy as you fear, not with a non-ancient GCC at least.
But this could be written in a nicer way, sure :-)


Segher



Christophe


Re: [PATCH 1/1] powerpc/crash: Use NMI context for printk after crashing other CPUs

2020-04-08 Thread Leonardo Bras
On Wed, 2020-04-08 at 22:13 +1000, Michael Ellerman wrote:
[...]
> Added context:
> 
>   printk(KERN_EMERG "Sending IPI to other CPUs\n");
> 
>   if (crash_wake_offline)
>   ncpus = num_present_cpus() - 1;
> 
> >  
> > crash_send_ipi(crash_ipi_callback);
> > smp_wmb();
> > +   printk_nmi_enter();
>   
> Why did you decide to put it there, rather than at the start of
> default_machine_crash_shutdown() like I did?
> 
> The printk() above could have already deadlocked if another CPU is stuck
> with the logbuf lock held.

Oh, I thought the CPUs would start crashing after crash_send_ipi(), so
only printk() after that would possibly deadlock.

I was not able to see how the printk() above would deadlock, but I see
no problem adding that at the start of the function.

Best regards,
Leonardo Bras


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v5 05/21] powerpc: Use a function for getting the instruction op code

2020-04-08 Thread Segher Boessenkool
Hi!

On Mon, Apr 06, 2020 at 06:09:20PM +1000, Jordan Niethe wrote:
> +static inline int ppc_inst_opcode(u32 x)
> +{
> + return x >> 26;
> +}

Maybe you should have "primary opcode" in this function name?


Segher


Re: [PATCH v5 18/21] powerpc64: Add prefixed instructions to instruction data type

2020-04-08 Thread Segher Boessenkool
On Mon, Apr 06, 2020 at 12:25:27PM +0200, Christophe Leroy wrote:
> > if (ppc_inst_prefixed(x) != ppc_inst_prefixed(y))
> > return false;
> > else if (ppc_inst_prefixed(x))
> > return !memcmp(, , sizeof(struct ppc_inst));
> 
> Are we sure memcmp() is a good candidate for the comparison ? Can we do 
> simpler ? Especially, I understood a prefixed instruction is a 64 bits 
> properly aligned instruction, can we do a simple u64 compare ? Or is GCC 
> intelligent enough to do that without calling memcmp() function which is 
> heavy ?

A prefixed insn is *not* 8-byte aligned, it is 4-byte aligned, fwiw.

memcmp() isn't as heavy as you fear, not with a non-ancient GCC at least.
But this could be written in a nicer way, sure :-)


Segher


Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-04-08 Thread Leonardo Bras
On Wed, 2020-04-08 at 22:21 +1000, Michael Ellerman wrote:
> We should be able to just allocate the rtas_args on the stack, it's only
> ~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
> take the global lock.

At this point, would it be a problem using kmalloc? 

Best regards,


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-04-08 Thread Leonardo Bras
On Wed, 2020-04-08 at 22:21 +1000, Michael Ellerman wrote:
[...]
> > On the other hand, busting the rtas.lock could be dangerous, because
> > it's code we can't control.
> > 
> > According with LoPAR, for both of these rtas-calls, we have:
> > 
> > For the PowerPC External Interrupt option: The call must be reentrant
> > to the number of processors on the platform.
> > For the PowerPC External Interrupt option: The argument call buffer for
> > each simultaneous call must be physically unique.
> 
> Oh well spotted. Where is that in the doc?

In the current LoPAR available on OpenPower Foundation, it's on page
170, '7.3.10.2 ibm,set-xive' and '7.3.10.3 ibm,int-off'.

> > Which I think means this rtas-calls can be done simultaneously.
> 
> I think so too. I'll read PAPR in the morning and make sure.
> 
> > Would it mean that busting the rtas.lock for these calls would be safe?
> 
> What would be better is to make those specific calls not take the global
> RTAS lock to begin with.
> 
> We should be able to just allocate the rtas_args on the stack, it's only
> ~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
> take the global lock.

Good idea. I will try getting some work done on this.

Best regards,


signature.asc
Description: This is a digitally signed message part


Section mismatch in reference from the function .early_init_mmu() to the function .init.text:.radix__early_init_mmu() after PowerPC updates 5.7-1

2020-04-08 Thread Christian Zigotzky

Hello,

Since the PowerPC updates 5.7-1 we have the following issue during the 
linking of vmlinux:


MODPOST vmlinux.o
WARNING: modpost: vmlinux.o(.text.unlikely+0x1a0): Section mismatch in 
reference from the function .early_init_mmu() to the function 
.init.text:.radix__early_init_mmu()

The function .early_init_mmu() references
the function __init .radix__early_init_mmu().
This is often because .early_init_mmu lacks a __init
annotation or the annotation of .radix__early_init_mmu is wrong.

WARNING: modpost: vmlinux.o(.text.unlikely+0x1ac): Section mismatch in 
reference from the function .early_init_mmu() to the function 
.init.text:.hash__early_init_mmu()

The function .early_init_mmu() references
the function __init .hash__early_init_mmu().
This is often because .early_init_mmu lacks a __init
annotation or the annotation of .hash__early_init_mmu is wrong.

---

But the kernel works without any problems after the linking.

I reverted the following commits:

70fbdfef4ba63eeef83b2c94eac9a5a9f913e442 -- sysfs: remove redundant 
__compat_only_sysfs_link_entry_to_kobj fn


d38c07afc356ddebaa3ed8ecb3f553340e05c969 -- Merge tag 'powerpc-5.7-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux


And after that the linking of vmlinux works again.

Please check the PowerPC updates 5.7-1.

Thanks,
Christian


Re: decruft the vmalloc API

2020-04-08 Thread Christoph Hellwig
On Wed, Apr 08, 2020 at 05:03:24PM +0100, Russell King - ARM Linux admin wrote:
> I haven't read all your patches yet.
> 
> Have you tested it on 32-bit ARM, where the module area is located
> _below_ PAGE_OFFSET and outside of the vmalloc area?

I have not tested it.  However existing in-kernel users that use
different areas (and we have quite a few of those) have not been
changed at all.  I think the arm32 module loader (like various other
module loaders) falls into that category.


Re: decruft the vmalloc API

2020-04-08 Thread Russell King - ARM Linux admin
On Wed, Apr 08, 2020 at 01:58:58PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> Peter noticed that with some dumb luck you can toast the kernel address
> space with exported vmalloc symbols.
> 
> I used this as an opportunity to decruft the vmalloc.c API and make it
> much more systematic.  This also removes any chance to create vmalloc
> mappings outside the designated areas or using executable permissions
> from modules.  Besides that it removes more than 300 lines of code.

I haven't read all your patches yet.

Have you tested it on 32-bit ARM, where the module area is located
_below_ PAGE_OFFSET and outside of the vmalloc area?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up


[Bug 207129] PowerMac G4 DP (5.6.2 debug kernel + inline KASAN) freezes shortly after booting with "do_IRQ: stack overflow: 1760"

2020-04-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=207129

--- Comment #6 from Erhard F. (erhar...@mailbox.org) ---
Yes, precisely summarized! Thanks for your efforts!

CONFIG_KASAN though only is x86_64 not x86 AFAIK.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH] powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT

2020-04-08 Thread Christophe Leroy
When CONFIG_KASAN is selected, the stack usage is increased.

In the same way as x86 and arm64 architectures, increase
THREAD_SHIFT when CONFIG_KASAN is selected.

Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=207129
Reported-by: 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 05d20a8d6581..511f9bbc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -771,6 +771,7 @@ config THREAD_SHIFT
range 13 15
default "15" if PPC_256K_PAGES
default "14" if PPC64
+   default "14" if KASAN
default "13"
help
  Used to define the stack size. The default is almost always what you
-- 
2.25.0



Re: [PATCH 05/35] docs: filesystems: fix renamed references

2020-04-08 Thread David Sterba
On Wed, Apr 08, 2020 at 05:45:57PM +0200, Mauro Carvalho Chehab wrote:
> Some filesystem references got broken by a previous patch
> series I submitted. Address those.
> 
> Signed-off-by: Mauro Carvalho Chehab 

For

>  fs/affs/Kconfig | 2 +-

Acked-by: David Sterba 


[PATCH 05/35] docs: filesystems: fix renamed references

2020-04-08 Thread Mauro Carvalho Chehab
Some filesystem references got broken by a previous patch
series I submitted. Address those.

Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/ABI/stable/sysfs-devices-node | 2 +-
 Documentation/ABI/testing/procfs-smaps_rollup   | 2 +-
 Documentation/admin-guide/cpu-load.rst  | 2 +-
 Documentation/admin-guide/nfs/nfsroot.rst   | 2 +-
 Documentation/driver-api/driver-model/device.rst| 4 ++--
 Documentation/driver-api/driver-model/overview.rst  | 2 +-
 Documentation/filesystems/dax.txt   | 2 +-
 Documentation/filesystems/dnotify.txt   | 2 +-
 Documentation/filesystems/ramfs-rootfs-initramfs.rst| 2 +-
 Documentation/filesystems/sysfs.rst | 2 +-
 Documentation/powerpc/firmware-assisted-dump.rst| 2 +-
 Documentation/process/adding-syscalls.rst   | 2 +-
 .../translations/it_IT/process/adding-syscalls.rst  | 2 +-
 Documentation/translations/zh_CN/filesystems/sysfs.txt  | 6 +++---
 drivers/base/core.c | 2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 2 +-
 fs/Kconfig  | 2 +-
 fs/Kconfig.binfmt   | 2 +-
 fs/adfs/Kconfig | 2 +-
 fs/affs/Kconfig | 2 +-
 fs/afs/Kconfig  | 6 +++---
 fs/bfs/Kconfig  | 2 +-
 fs/cramfs/Kconfig   | 2 +-
 fs/ecryptfs/Kconfig | 2 +-
 fs/hfs/Kconfig  | 2 +-
 fs/hpfs/Kconfig | 2 +-
 fs/isofs/Kconfig| 2 +-
 fs/namespace.c  | 2 +-
 fs/notify/inotify/Kconfig   | 2 +-
 fs/ntfs/Kconfig | 2 +-
 fs/ocfs2/Kconfig| 2 +-
 fs/proc/Kconfig | 4 ++--
 fs/romfs/Kconfig| 2 +-
 fs/sysfs/dir.c  | 2 +-
 fs/sysfs/file.c | 2 +-
 fs/sysfs/mount.c| 2 +-
 fs/sysfs/symlink.c  | 2 +-
 fs/sysv/Kconfig | 2 +-
 fs/udf/Kconfig  | 2 +-
 include/linux/kobject.h | 2 +-
 include/linux/kobject_ns.h  | 2 +-
 include/linux/relay.h   | 2 +-
 include/linux/sysfs.h   | 2 +-
 kernel/relay.c  | 2 +-
 lib/kobject.c   | 4 ++--
 45 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-devices-node 
b/Documentation/ABI/stable/sysfs-devices-node
index df8413cf1468..484fc04bcc25 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -54,7 +54,7 @@ Date: October 2002
 Contact:   Linux Memory Management list 
 Description:
Provides information about the node's distribution and memory
-   utilization. Similar to /proc/meminfo, see 
Documentation/filesystems/proc.txt
+   utilization. Similar to /proc/meminfo, see 
Documentation/filesystems/proc.rst
 
 What:  /sys/devices/system/node/nodeX/numastat
 Date:  October 2002
diff --git a/Documentation/ABI/testing/procfs-smaps_rollup 
b/Documentation/ABI/testing/procfs-smaps_rollup
index 274df44d8b1b..046978193368 100644
--- a/Documentation/ABI/testing/procfs-smaps_rollup
+++ b/Documentation/ABI/testing/procfs-smaps_rollup
@@ -11,7 +11,7 @@ Description:
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
are not present in /proc/pid/smaps.  These fields represent
the sum of the Pss field of each type (anon, file, shmem).
-   For more details, see Documentation/filesystems/proc.txt
+   For more details, see Documentation/filesystems/proc.rst
and the procfs man page.
 
Typical output looks like this:
diff --git a/Documentation/admin-guide/cpu-load.rst 
b/Documentation/admin-guide/cpu-load.rst
index 2d01ce43d2a2..ebdecf864080 100644
--- a/Documentation/admin-guide/cpu-load.rst
+++ b/Documentation/admin-guide/cpu-load.rst
@@ -105,7 +105,7 @@ References
 --
 
 - 

[PATCH 03/35] docs: fix broken references to text files

2020-04-08 Thread Mauro Carvalho Chehab
Several references got broken due to txt to ReST conversion.

Several of them can be automatically fixed with:

scripts/documentation-file-ref-check --fix

Reviewed-by: Mathieu Poirier  # 
hwtracing/coresight/Kconfig
Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/memory-barriers.txt|  2 +-
 Documentation/process/submit-checklist.rst   |  2 +-
 .../translations/it_IT/process/submit-checklist.rst  |  2 +-
 Documentation/translations/ko_KR/memory-barriers.txt |  2 +-
 .../translations/zh_CN/filesystems/sysfs.txt |  2 +-
 .../translations/zh_CN/process/submit-checklist.rst  |  2 +-
 Documentation/virt/kvm/arm/pvtime.rst|  2 +-
 Documentation/virt/kvm/devices/vcpu.rst  |  2 +-
 Documentation/virt/kvm/hypercalls.rst|  4 ++--
 arch/powerpc/include/uapi/asm/kvm_para.h |  2 +-
 drivers/gpu/drm/Kconfig  |  2 +-
 drivers/gpu/drm/drm_ioctl.c  |  2 +-
 drivers/hwtracing/coresight/Kconfig  |  2 +-
 fs/fat/Kconfig   |  8 
 fs/fuse/Kconfig  |  2 +-
 fs/fuse/dev.c|  2 +-
 fs/overlayfs/Kconfig |  6 +++---
 include/linux/mm.h   |  4 ++--
 include/uapi/linux/ethtool_netlink.h |  2 +-
 include/uapi/rdma/rdma_user_ioctl_cmds.h |  2 +-
 mm/gup.c | 12 ++--
 virt/kvm/arm/vgic/vgic-mmio-v3.c |  2 +-
 virt/kvm/arm/vgic/vgic.h |  4 ++--
 23 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index e1c355e84edd..eaabc3134294 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -620,7 +620,7 @@ because the CPUs that the Linux kernel supports don't do 
writes
 until they are certain (1) that the write will actually happen, (2)
 of the location of the write, and (3) of the value to be written.
 But please carefully read the "CONTROL DEPENDENCIES" section and the
-Documentation/RCU/rcu_dereference.txt file:  The compiler can and does
+Documentation/RCU/rcu_dereference.rst file:  The compiler can and does
 break dependencies in a great many highly creative ways.
 
CPU 1 CPU 2
diff --git a/Documentation/process/submit-checklist.rst 
b/Documentation/process/submit-checklist.rst
index 8e56337d422d..3f8e9d5d95c2 100644
--- a/Documentation/process/submit-checklist.rst
+++ b/Documentation/process/submit-checklist.rst
@@ -107,7 +107,7 @@ and elsewhere regarding submitting Linux kernel patches.
 and why.
 
 26) If any ioctl's are added by the patch, then also update
-``Documentation/ioctl/ioctl-number.rst``.
+``Documentation/userspace-api/ioctl/ioctl-number.rst``.
 
 27) If your modified source code depends on or uses any of the kernel
 APIs or features that are related to the following ``Kconfig`` symbols,
diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst 
b/Documentation/translations/it_IT/process/submit-checklist.rst
index 995ee69fab11..3e575502690f 100644
--- a/Documentation/translations/it_IT/process/submit-checklist.rst
+++ b/Documentation/translations/it_IT/process/submit-checklist.rst
@@ -117,7 +117,7 @@ sottomissione delle patch, in particolare
 sorgenti che ne spieghi la logica: cosa fanno e perché.
 
 25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate
-``Documentation/ioctl/ioctl-number.rst``.
+``Documentation/userspace-api/ioctl/ioctl-number.rst``.
 
 26) Se il codice che avete modificato dipende o usa una qualsiasi interfaccia o
 funzionalità del kernel che è associata a uno dei seguenti simboli
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt 
b/Documentation/translations/ko_KR/memory-barriers.txt
index 2e831ece6e26..e50fe6541335 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -641,7 +641,7 @@ P 는 짝수 번호 캐시 라인에 저장되어 있고, 변수 B 는 홀수 
 리눅스 커널이 지원하는 CPU 들은 (1) 쓰기가 정말로 일어날지, (2) 쓰기가 어디에
 이루어질지, 그리고 (3) 쓰여질 값을 확실히 알기 전까지는 쓰기를 수행하지 않기
 때문입니다.  하지만 "컨트롤 의존성" 섹션과
-Documentation/RCU/rcu_dereference.txt 파일을 주의 깊게 읽어 주시기 바랍니다:
+Documentation/RCU/rcu_dereference.rst 파일을 주의 깊게 읽어 주시기 바랍니다:
 컴파일러는 매우 창의적인 많은 방법으로 종속성을 깰 수 있습니다.
 
CPU 1 CPU 2
diff --git a/Documentation/translations/zh_CN/filesystems/sysfs.txt 
b/Documentation/translations/zh_CN/filesystems/sysfs.txt
index ee1f37da5b23..a15c3ebdfa82 100644
--- a/Documentation/translations/zh_CN/filesystems/sysfs.txt
+++ b/Documentation/translations/zh_CN/filesystems/sysfs.txt
@@ -281,7 +281,7 @@ drivers/ 包含了每个已为特定总线上的设备而挂载的驱动程序
 假定驱动没有跨越多个总线类型)。
 
 fs/ 包含了一个为文件系统设立的目录。现在每个想要导出属性的文件系统必须
-在 fs/ 

[PATCH 31/35] powerpc: docs: cxl.rst: mark two section titles as such

2020-04-08 Thread Mauro Carvalho Chehab
The User API chapter contains two sub-chapters. Mark them as
such.

Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/powerpc/cxl.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/powerpc/cxl.rst b/Documentation/powerpc/cxl.rst
index 920546d81326..d2d77057610e 100644
--- a/Documentation/powerpc/cxl.rst
+++ b/Documentation/powerpc/cxl.rst
@@ -133,6 +133,7 @@ User API
 
 
 1. AFU character devices
+
 
 For AFUs operating in AFU directed mode, two character device
 files will be created. /dev/cxl/afu0.0m will correspond to a
@@ -395,6 +396,7 @@ read
 
 
 2. Card character device (powerVM guest only)
+^
 
 In a powerVM guest, an extra character device is created for the
 card. The device is only used to write (flash) a new image on the
-- 
2.25.2



[PATCH 00/35] Documentation fixes for Kernel 5.8

2020-04-08 Thread Mauro Carvalho Chehab
Hi Jon,

I have a large list of patches this time for the Documentation/. So, I'm
starting sending them a little earier. Yet, those are meant to be applied
after the end of the merge window. They're based on today's linux-next,
with has only 49 patches pending to be applied upstream touching
Documentation/, so I don't expect much conflicts if applied early at
-rc cycle.

Most of the patches here were already submitted, but weren't
merged yet at next. So, it seems that nobody picked them yet.

In any case, most of those patches here are independent from 
the others.

The number of doc build warnings have been rising with time.
The main goal with this series is to get rid of most Sphinx warnings
and other errors.

Patches 1 to 5: fix broken references detected by this tool:

./scripts/documentation-file-ref-check

The other patches fix other random errors due to tags being
mis-interpreted or mis-used.

You should notice that several patches touch kernel-doc scripts.
IMHO, some of the warnings are actually due to kernel-doc being
too pedantic. So, I ended by improving some things at the toolset,
in order to make it smarter. That's the case of those patches:

docs: scripts/kernel-doc: accept blank lines on parameter description
scripts: kernel-doc: accept negation like !@var
scripts: kernel-doc: proper handle @foo->bar()

The last 4 patches address problems with PDF building.

The first one address a conflict that will rise during the merge
window: Documentation/media will be removed. Instead of
just drop it from the list of PDF documents, I opted to drop the
entire list, as conf.py will auto-generate from the sources:

docs: LaTeX/PDF: drop list of documents

Also, right now, PDF output is broken due to a namespace conflict 
at I2c (two pdf outputs there will have the same name).

docs: i2c: rename i2c.svg to i2c_bus.svg

The third PDF patch is not really a fix, but it helps a lot to identify
if the build succeeded or not, by placing the final PDF output on
a separate dir:

docs: Makefile: place final pdf docs on a separate dir

Finally, the last one solves a bug since the first supported Sphinx
version, with also impacts PDF output: basically while nested tables
are valid with ReST notation, the toolset only started supporting
it on PDF output since version 2.4:

docs: update recommended Sphinx version to 2.4.4

PS.: Due to the large number of C/C, I opted to keep a smaller
set of C/C at this first e-mail (only e-mails with "L:" tag from
MAINTAINERS file).

Mauro Carvalho Chehab (35):
  MAINTAINERS: dt: update display/allwinner file entry
  docs: dt: fix broken reference to phy-cadence-torrent.yaml
  docs: fix broken references to text files
  docs: fix broken references for ReST files that moved around
  docs: filesystems: fix renamed references
  docs: amu: supress some Sphinx warnings
  docs: arm64: booting.rst: get rid of some warnings
  docs: pci: boot-interrupts.rst: improve html output
  futex: get rid of a kernel-docs build warning
  firewire: firewire-cdev.hL get rid of a docs warning
  scripts: kernel-doc: proper handle @foo->bar()
  lib: bitmap.c: get rid of some doc warnings
  ata: libata-core: fix a doc warning
  fs: inode.c: get rid of docs warnings
  docs: ras: get rid of some warnings
  docs: ras: don't need to repeat twice the same thing
  docs: watch_queue.rst: supress some Sphinx warnings
  scripts: kernel-doc: accept negation like !@var
  docs: infiniband: verbs.c: fix some documentation warnings
  docs: scripts/kernel-doc: accept blank lines on parameter description
  docs: spi: spi.h: fix a doc building warning
  docs: drivers: fix some warnings at base/platform.c when building docs
  docs: fusion: mptbase.c: get rid of a doc build warning
  docs: mm: slab.h: fix a broken cross-reference
  docs mm: userfaultfd.rst: use ``foo`` for literals
  docs: mm: userfaultfd.rst: use a cross-reference for a section
  docs: vm: index.rst: add an orphan doc to the building system
  docs: dt: qcom,dwc3.txt: fix cross-reference for a converted file
  MAINTAINERS: dt: fix pointers for ARM Integrator, Versatile and
RealView
  docs: dt: fix a broken reference for a file converted to json
  powerpc: docs: cxl.rst: mark two section titles as such
  docs: LaTeX/PDF: drop list of documents
  docs: i2c: rename i2c.svg to i2c_bus.svg
  docs: Makefile: place final pdf docs on a separate dir
  docs: update recommended Sphinx version to 2.4.4

 Documentation/ABI/stable/sysfs-devices-node   |   2 +-
 Documentation/ABI/testing/procfs-smaps_rollup |   2 +-
 Documentation/Makefile|   6 +-
 Documentation/PCI/boot-interrupts.rst |  34 +--
 Documentation/admin-guide/cpu-load.rst|   2 +-
 Documentation/admin-guide/mm/userfaultfd.rst  | 209 +-
 Documentation/admin-guide/nfs/nfsroot.rst |   2 +-
 Documentation/admin-guide/ras.rst |  18 +-
 Documentation/arm64/amu.rst   

Re: [PATCH 02/28] staging: android: ion: use vmap instead of vm_map_ram

2020-04-08 Thread Christoph Hellwig
On Wed, Apr 08, 2020 at 08:48:33PM +0800, Hillf Danton wrote:
> > -   void *addr = vm_map_ram(pages, num, -1, pgprot);
> > +   void *addr = vmap(pages, num, VM_MAP);
> 
> A merge glitch?
> 
> void *vmap(struct page **pages, unsigned int count,
>  unsigned long flags, pgprot_t prot)

Yes, thanks for the headsup, you were as fast as the build bot :)

Fixed now.


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Randy Dunlap
On 4/8/20 8:36 AM, Christoph Hellwig wrote:
> On Wed, Apr 08, 2020 at 08:15:19AM -0700, Matthew Wilcox wrote:
>  config ZSMALLOC_PGTABLE_MAPPING
>   bool "Use page table mapping to access object in zsmalloc"
> - depends on ZSMALLOC
> + depends on ZSMALLOC=y

 It's a bool so this shouldn't matter... not needed.
>>>
>>> My mm/Kconfig has:
>>>
>>> config ZSMALLOC
>>> tristate "Memory allocator for compressed pages"
>>> depends on MMU
>>>
>>> which I think means it can be modular, no?
>>
>> Randy means that ZSMALLOC_PGTABLE_MAPPING is a bool, so I think hch's patch
>> is wrong ... if ZSMALLOC is 'm' then ZSMALLOC_PGTABLE_MAPPING would become
>> 'n' instead of 'y'.
> 
> In Linus' tree you can select PGTABLE_MAPPING=y with ZSMALLOC=m,
> and that fits my understanding of the kbuild language.  With this
> patch I can't anymore.
> 

Makes sense. thanks.

-- 
~Randy



Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Randy Dunlap
On 4/8/20 8:15 AM, Matthew Wilcox wrote:
> On Wed, Apr 08, 2020 at 05:12:03PM +0200, Peter Zijlstra wrote:
>> On Wed, Apr 08, 2020 at 08:01:00AM -0700, Randy Dunlap wrote:
>>> Hi,
>>>
>>> On 4/8/20 4:59 AM, Christoph Hellwig wrote:
 diff --git a/mm/Kconfig b/mm/Kconfig
 index 36949a9425b8..614cc786b519 100644
 --- a/mm/Kconfig
 +++ b/mm/Kconfig
 @@ -702,7 +702,7 @@ config ZSMALLOC
  
  config ZSMALLOC_PGTABLE_MAPPING
bool "Use page table mapping to access object in zsmalloc"
 -  depends on ZSMALLOC
 +  depends on ZSMALLOC=y
>>>
>>> It's a bool so this shouldn't matter... not needed.
>>
>> My mm/Kconfig has:
>>
>> config ZSMALLOC
>>  tristate "Memory allocator for compressed pages"
>>  depends on MMU
>>
>> which I think means it can be modular, no?

ack. I misread it.

> Randy means that ZSMALLOC_PGTABLE_MAPPING is a bool, so I think hch's patch
> is wrong ... if ZSMALLOC is 'm' then ZSMALLOC_PGTABLE_MAPPING would become
> 'n' instead of 'y'.

sigh, I wish that I had meant that. :)

thanks.

-- 
~Randy



Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Christoph Hellwig
On Wed, Apr 08, 2020 at 08:15:19AM -0700, Matthew Wilcox wrote:
> > > >  config ZSMALLOC_PGTABLE_MAPPING
> > > > bool "Use page table mapping to access object in zsmalloc"
> > > > -   depends on ZSMALLOC
> > > > +   depends on ZSMALLOC=y
> > > 
> > > It's a bool so this shouldn't matter... not needed.
> > 
> > My mm/Kconfig has:
> > 
> > config ZSMALLOC
> > tristate "Memory allocator for compressed pages"
> > depends on MMU
> > 
> > which I think means it can be modular, no?
> 
> Randy means that ZSMALLOC_PGTABLE_MAPPING is a bool, so I think hch's patch
> is wrong ... if ZSMALLOC is 'm' then ZSMALLOC_PGTABLE_MAPPING would become
> 'n' instead of 'y'.

In Linus' tree you can select PGTABLE_MAPPING=y with ZSMALLOC=m,
and that fits my understanding of the kbuild language.  With this
patch I can't anymore.


Re: [PATCH 18/28] mm: enforce that vmap can't map pages executable

2020-04-08 Thread Christoph Hellwig
On Wed, Apr 08, 2020 at 01:38:36PM +0100, Mark Rutland wrote:
> > +static inline pgprot_t pgprot_nx(pgprot_t prot)
> > +{
> > +   return __pgprot(pgprot_val(prot) | _PAGE_NX);
> > +}
> > +#define pgprot_nx pgprot_nx
> > +
> >  #ifdef CONFIG_X86_PAE
> 
> I reckon for arm64 we can do similar in our :
> 
> #define pgprot_nx(pgprot_t prot) \
>   __pgprot_modify(prot, 0, PTE_PXN)
> 
> ... matching the style of our existing pgprot_*() modifier helpers.

I've added that for the next version with attribution to you.


Re: [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c

2020-04-08 Thread Frank Rowand
Hi Michael,

On 4/7/20 10:13 PM, Michael Ellerman wrote:
> bugzilla-dae...@bugzilla.kernel.org writes:
>> https://bugzilla.kernel.org/show_bug.cgi?id=206203
>>
>> Erhard F. (erhar...@mailbox.org) changed:
>>
>>What|Removed |Added
>> 
>>  Attachment #286801|0   |1
>> is obsolete||
>>
>> --- Comment #10 from Erhard F. (erhar...@mailbox.org) ---
>> Created attachment 288189
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=288189=edit
>> kmemleak output (kernel 5.6.2, Talos II)
> 
> These are all in or triggered by the of unittest code AFAICS.
> Content of the log reproduced below.
> 
> Frank/Rob, are these memory leaks expected?

Thanks for the report.  I'll look at each one.

-Frank


> 
> cheers
> 
> 
> unreferenced object 0xc007eb89ca58 (size 192):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 32 bytes):
> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00  ..!8
> c0 00 00 07 ec 97 80 08 00 00 00 00 00 00 00 00  
>   backtrace:
> [<07b50c76>] .__of_node_dup+0x38/0x1c0
> [] .of_unittest_changeset+0x13c/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> unreferenced object 0xc007ec978008 (size 8):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 8 bytes):
> 6e 31 00 6b 6b 6b 6b a5  n1..
>   backtrace:
> [] .kstrdup+0x44/0xb0
> [] .__of_node_dup+0x50/0x1c0
> [] .of_unittest_changeset+0x13c/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> unreferenced object 0xc007eb89e318 (size 192):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 32 bytes):
> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00  ..!8
> c0 00 00 07 ec 97 ab 08 00 00 00 00 00 00 00 00  
>   backtrace:
> [<07b50c76>] .__of_node_dup+0x38/0x1c0
> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> unreferenced object 0xc007ec97ab08 (size 8):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 8 bytes):
> 6e 32 00 6b 6b 6b 6b a5  n2..
>   backtrace:
> [] .kstrdup+0x44/0xb0
> [] .__of_node_dup+0x50/0x1c0
> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> unreferenced object 0xc007eb89e528 (size 192):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 32 bytes):
> c0 00 00 07 ec 97 bd d8 00 00 00 00 00 00 00 00  
> c0 00 00 07 ec 97 b3 18 00 00 00 00 00 00 00 00  
>   backtrace:
> [<07b50c76>] .__of_node_dup+0x38/0x1c0
> [] .of_unittest_changeset+0x1ec/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> unreferenced object 0xc007ec97b318 (size 8):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 8 bytes):
> 6e 32 31 00 6b 6b 6b a5  n21.kkk.
>   backtrace:
> [] .kstrdup+0x44/0xb0
> [] .__of_node_dup+0x50/0x1c0
> [] .of_unittest_changeset+0x1ec/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> 

Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Matthew Wilcox
On Wed, Apr 08, 2020 at 05:12:03PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 08, 2020 at 08:01:00AM -0700, Randy Dunlap wrote:
> > Hi,
> > 
> > On 4/8/20 4:59 AM, Christoph Hellwig wrote:
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 36949a9425b8..614cc786b519 100644
> > > --- a/mm/Kconfig
> > > +++ b/mm/Kconfig
> > > @@ -702,7 +702,7 @@ config ZSMALLOC
> > >  
> > >  config ZSMALLOC_PGTABLE_MAPPING
> > >   bool "Use page table mapping to access object in zsmalloc"
> > > - depends on ZSMALLOC
> > > + depends on ZSMALLOC=y
> > 
> > It's a bool so this shouldn't matter... not needed.
> 
> My mm/Kconfig has:
> 
> config ZSMALLOC
>   tristate "Memory allocator for compressed pages"
>   depends on MMU
> 
> which I think means it can be modular, no?

Randy means that ZSMALLOC_PGTABLE_MAPPING is a bool, so I think hch's patch
is wrong ... if ZSMALLOC is 'm' then ZSMALLOC_PGTABLE_MAPPING would become
'n' instead of 'y'.


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Peter Zijlstra
On Wed, Apr 08, 2020 at 08:01:00AM -0700, Randy Dunlap wrote:
> Hi,
> 
> On 4/8/20 4:59 AM, Christoph Hellwig wrote:
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 36949a9425b8..614cc786b519 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -702,7 +702,7 @@ config ZSMALLOC
> >  
> >  config ZSMALLOC_PGTABLE_MAPPING
> > bool "Use page table mapping to access object in zsmalloc"
> > -   depends on ZSMALLOC
> > +   depends on ZSMALLOC=y
> 
> It's a bool so this shouldn't matter... not needed.

My mm/Kconfig has:

config ZSMALLOC
tristate "Memory allocator for compressed pages"
depends on MMU

which I think means it can be modular, no?


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Randy Dunlap
Hi,

On 4/8/20 4:59 AM, Christoph Hellwig wrote:
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 36949a9425b8..614cc786b519 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -702,7 +702,7 @@ config ZSMALLOC
>  
>  config ZSMALLOC_PGTABLE_MAPPING
>   bool "Use page table mapping to access object in zsmalloc"
> - depends on ZSMALLOC
> + depends on ZSMALLOC=y

It's a bool so this shouldn't matter... not needed.

>   help
> By default, zsmalloc uses a copy-based object mapping method to
> access allocations that span two pages. However, if a particular


-- 
~Randy



Re: [PATCH 09/28] mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING

2020-04-08 Thread Randy Dunlap
On 4/8/20 4:59 AM, Christoph Hellwig wrote:
> Rename the Kconfig variable to clarify the scope.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm/configs/omap2plus_defconfig | 2 +-
>  include/linux/zsmalloc.h | 2 +-
>  mm/Kconfig   | 2 +-
>  mm/zsmalloc.c| 8 
>  4 files changed, 7 insertions(+), 7 deletions(-)
> 

Looks good. Thanks.

Acked-by: Randy Dunlap 


-- 
~Randy



[Bug 207129] PowerMac G4 DP (5.6.2 debug kernel + inline KASAN) freezes shortly after booting with "do_IRQ: stack overflow: 1760"

2020-04-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=207129

--- Comment #5 from Christophe Leroy (christophe.le...@c-s.fr) ---
Ok, so as a summary:
- With CONFIG_THREAD_SHIFT = 13 and CONFIG_DEBUG_STACKOVERFLOW, the system gets
stuck
- With CONFIG_THREAD_SHIFT = 13 and without CONFIG_DEBUG_STACKOVERFLOW, stack
overflow is not really detected until it gets into kernel text !!!
- With CONFIG_THREAD_SHIFT = 14 it runs fine
- With CONFIG_VMAP_STACK, the automatic restart doesn't work
- Without CONFIG_VMAP_STACK, the automatic restart works

So I'll send a patch to set CONFIG_THREAD_SHIFT to 14 when CONFIG_KASAN is
selected. x86 and arm64 already do that.

And I'll try to investigate the other points when I have time.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

usb: gadget: fsl_udc_core: Checking for a failed platform_get_irq() call in fsl_udc_probe()

2020-04-08 Thread Markus Elfring
Hello,

I have taken another look at the implementation of the function “fsl_udc_probe”.
A software analysis approach points the following source code out for
further development considerations.
https://elixir.bootlin.com/linux/v5.6.2/source/drivers/usb/gadget/udc/fsl_udc_core.c#L2443
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/udc/fsl_udc_core.c?id=f5e94d10e4c468357019e5c28d48499f677b284f#n2442

udc_controller->irq = platform_get_irq(pdev, 0);
if (!udc_controller->irq) {
ret = -ENODEV;
goto err_iounmap;
}


The software documentation is providing the following information
for the used programming interface.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c?id=f5e94d10e4c468357019e5c28d48499f677b284f#n221
https://elixir.bootlin.com/linux/v5.6.2/source/drivers/base/platform.c#L202

“…
 * Return: IRQ number on success, negative error number on failure.
…”

Would you like to reconsider the shown condition check?

Regards,
Markus


Re: [PATCH 28/28] s390: use __vmalloc_node in stack_alloc

2020-04-08 Thread Christian Borntraeger



On 08.04.20 13:59, Christoph Hellwig wrote:
> stack_alloc can use a slightly higher level vmalloc function.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/s390/kernel/setup.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
> index 36445dd40fdb..0f0b140b5558 100644
> --- a/arch/s390/kernel/setup.c
> +++ b/arch/s390/kernel/setup.c
> @@ -305,12 +305,9 @@ void *restart_stack __section(.data);
>  unsigned long stack_alloc(void)
>  {
>  #ifdef CONFIG_VMAP_STACK
> - return (unsigned long)
> - __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE,
> -  VMALLOC_START, VMALLOC_END,
> -  THREADINFO_GFP,
> -  PAGE_KERNEL, 0, NUMA_NO_NODE,
> -  __builtin_return_address(0));
> + return (unsigned long)__vmalloc_node(THREAD_SIZE, THREAD_SIZE,
> + THREADINFO_GFP, NUMA_NO_NODE,
> + __builtin_return_address(0));

Looks sane.

Acked-by: Christian Borntraeger 


>  #else
>   return __get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
>  #endif
> 



Re: [PATCH 27/28] s390: use __vmalloc_node in alloc_vm_stack

2020-04-08 Thread Christian Borntraeger
On 08.04.20 13:59, Christoph Hellwig wrote:
> alloc_vm_stack can use a slightly higher level vmalloc function.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/kernel/irq.c | 5 ++---

wrong subject (power vs s390)

>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index a25ed47087ee..4518fb1d6bf4 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -735,9 +735,8 @@ void do_IRQ(struct pt_regs *regs)
>  
>  static void *__init alloc_vm_stack(void)
>  {
> - return __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN, VMALLOC_START,
> - VMALLOC_END, THREADINFO_GFP, PAGE_KERNEL,
> -  0, NUMA_NO_NODE, (void*)_RET_IP_);
> + return __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, THREADINFO_GFP,
> +   NUMA_NO_NODE, (void *)_RET_IP_);
>  }
>  
>  static void __init vmap_irqstack_init(void)
> 



Re: [PATCH 02/28] staging: android: ion: use vmap instead of vm_map_ram

2020-04-08 Thread Greg KH
On Wed, Apr 08, 2020 at 01:59:00PM +0200, Christoph Hellwig wrote:
> vm_map_ram can keep mappings around after the vm_unmap_ram.  Using that
> with non-PAGE_KERNEL mappings can lead to all kinds of aliasing issues.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Greg Kroah-Hartman 


Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-04-08 Thread Arnd Bergmann
+On Wed, Apr 8, 2020 at 2:04 PM Michael Ellerman  wrote:
> Benjamin Herrenschmidt  writes:
> > On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:
> >> Benjamin Herrenschmidt  writes:
> > IBM still put 40x cores inside POWER chips no ?
>
> Oh yeah that's true. I guess most folks don't know that, or that they
> run RHEL on them.

Is there a reason for not having those dts files in mainline then?
If nothing else, it would document what machines are still being
used with future kernels.

Also, if that's the only 405 based product that is still relevant with
a 5.7+ kernel, it may be useful to know at which point they
move to a 476 core and stop updating kernels on the old ones.

  Arnd


[PATCH V2 3/5] selftests/powerpc: Add NX-GZIP engine compress testcase

2020-04-08 Thread Raphael Moreira Zinsly
Daniel Axtens  writes:
> Raphael Moreira Zinsly  writes:
...
>> +#define hwsync()({ asm volatile("hwsync" ::: "memory"); })
>
> This doesn't compile on the clang version I tried as it doesn't
> recognise 'hwsync'.  Does
> asm volatile("sync" ::: "memory");
> do the same thing?

Both hwsync and sync are extended mnemonics to 'sync 0'.
I just replaced hwsync for sync on this patch, but I'm
surprised that this is not recognized by clang.

--- >8 ---
Add a compression testcase for the powerpc NX-GZIP engine.

Signed-off-by: Bulent Abali 
Signed-off-by: Raphael Moreira Zinsly 
---
 .../selftests/powerpc/nx-gzip/Makefile|  21 +
 .../selftests/powerpc/nx-gzip/gzfht_test.c| 489 ++
 .../selftests/powerpc/nx-gzip/gzip_vas.c  | 259 ++
 3 files changed, 769 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/nx-gzip/Makefile
 create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
 create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gzip_vas.c

diff --git a/tools/testing/selftests/powerpc/nx-gzip/Makefile 
b/tools/testing/selftests/powerpc/nx-gzip/Makefile
new file mode 100644
index ..ab903f63bbbd
--- /dev/null
+++ b/tools/testing/selftests/powerpc/nx-gzip/Makefile
@@ -0,0 +1,21 @@
+CC = gcc
+CFLAGS = -O3
+INC = ./inc
+SRC = gzfht_test.c
+OBJ = $(SRC:.c=.o)
+TESTS = gzfht_test
+EXTRA_SOURCES = gzip_vas.c
+
+all:   $(TESTS)
+
+$(OBJ): %.o: %.c
+   $(CC) $(CFLAGS) -I$(INC) -c $<
+
+$(TESTS): $(OBJ)
+   $(CC) $(CFLAGS) -I$(INC) -o $@ $@.o $(EXTRA_SOURCES)
+
+run_tests: $(TESTS)
+   ./gzfht_test gzip_vas.c
+
+clean:
+   rm -f $(TESTS) *.o *~ *.gz
diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c 
b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
new file mode 100644
index ..7a21c25f5611
--- /dev/null
+++ b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
@@ -0,0 +1,489 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+/* P9 gzip sample code for demonstrating the P9 NX hardware interface.
+ * Not intended for productive uses or for performance or compression
+ * ratio measurements.  For simplicity of demonstration, this sample
+ * code compresses in to fixed Huffman blocks only (Deflate btype=1)
+ * and has very simple memory management.  Dynamic Huffman blocks
+ * (Deflate btype=2) are more involved as detailed in the user guide.
+ * Note also that /dev/crypto/gzip, VAS and skiboot support are
+ * required.
+ *
+ * Copyright 2020 IBM Corp.
+ *
+ * https://github.com/libnxz/power-gzip for zlib api and other utils
+ *
+ * Author: Bulent Abali 
+ *
+ * Definitions of acronyms used here. See
+ * P9 NX Gzip Accelerator User's Manual for details:
+ * https://github.com/libnxz/power-gzip/blob/develop/doc/power_nx_gzip_um.pdf
+ *
+ * adler/crc: 32 bit checksums appended to stream tail
+ * ce:   completion extension
+ * cpb:  coprocessor parameter block (metadata)
+ * crb:  coprocessor request block (command)
+ * csb:  coprocessor status block (status)
+ * dht:  dynamic huffman table
+ * dde:  data descriptor element (address, length)
+ * ddl:  list of ddes
+ * dh/fh:dynamic and fixed huffman types
+ * fc:   coprocessor function code
+ * histlen:  history/dictionary length
+ * history:  sliding window of up to 32KB of data
+ * lzcount:  Deflate LZ symbol counts
+ * rembytecnt: remaining byte count
+ * sfbt: source final block type; last block's type during decomp
+ * spbc: source processed byte count
+ * subc: source unprocessed bit count
+ * tebc: target ending bit count; valid bits in the last byte
+ * tpbc: target processed byte count
+ * vas:  virtual accelerator switch; the user mode interface
+ */
+
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "nxu.h"
+#include "nx.h"
+
+int nx_dbg;
+FILE *nx_gzip_log;
+void *nx_fault_storage_address;
+
+#define NX_MIN(X, Y) (((X) < (Y)) ? (X) : (Y))
+#define FNAME_MAX 1024
+#define FEXT ".nx.gz"
+
+/*
+ * LZ counts returned in the user supplied nx_gzip_crb_cpb_t structure.
+ */
+static int compress_fht_sample(char *src, uint32_t srclen, char *dst,
+   uint32_t dstlen, int with_count,
+   struct nx_gzip_crb_cpb_t *cmdp, void *handle)
+{
+   int cc;
+   uint32_t fc;
+
+   assert(!!cmdp);
+
+   put32(cmdp->crb, gzip_fc, 0);  /* clear */
+   fc = (with_count) ? GZIP_FC_COMPRESS_RESUME_FHT_COUNT :
+   GZIP_FC_COMPRESS_RESUME_FHT;
+   putnn(cmdp->crb, gzip_fc, fc);
+   putnn(cmdp->cpb, in_histlen, 0); /* resuming with no history */
+   memset((void *) >crb.csb, 0, sizeof(cmdp->crb.csb));
+
+   /* Section 6.6 programming notes; spbc may be in two different
+* places depending on FC.
+*/
+   if (!with_count)
+   

Re: [PATCH 02/28] staging: android: ion: use vmap instead of vm_map_ram

2020-04-08 Thread Hillf Danton


On Wed,  8 Apr 2020 13:59:00 +0200
> 
> vm_map_ram can keep mappings around after the vm_unmap_ram.  Using that
> with non-PAGE_KERNEL mappings can lead to all kinds of aliasing issues.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/staging/android/ion/ion_heap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/android/ion/ion_heap.c 
> b/drivers/staging/android/ion/ion_heap.c
> index 473b465724f1..a2d5c6df4b96 100644
> --- a/drivers/staging/android/ion/ion_heap.c
> +++ b/drivers/staging/android/ion/ion_heap.c
> @@ -99,12 +99,12 @@ int ion_heap_map_user(struct ion_heap *heap, struct 
> ion_buffer *buffer,
>  
>  static int ion_heap_clear_pages(struct page **pages, int num, pgprot_t 
> pgprot)
>  {
> - void *addr = vm_map_ram(pages, num, -1, pgprot);
> + void *addr = vmap(pages, num, VM_MAP);

A merge glitch?

void *vmap(struct page **pages, unsigned int count,
   unsigned long flags, pgprot_t prot)
>  
>   if (!addr)
>   return -ENOMEM;
>   memset(addr, 0, PAGE_SIZE * num);
> - vm_unmap_ram(addr, num);
> + vunmap(addr);
>  
>   return 0;
>  }
> -- 
> 2.25.1



Re: [PATCH 18/28] mm: enforce that vmap can't map pages executable

2020-04-08 Thread Mark Rutland
On Wed, Apr 08, 2020 at 01:59:16PM +0200, Christoph Hellwig wrote:
> To help enforcing the W^X protection don't allow remapping existing
> pages as executable.
> 
> Based on patch from Peter Zijlstra .
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/include/asm/pgtable_types.h | 6 ++
>  include/asm-generic/pgtable.h| 4 
>  mm/vmalloc.c | 2 +-
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_types.h 
> b/arch/x86/include/asm/pgtable_types.h
> index 947867f112ea..2e7c442cc618 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -282,6 +282,12 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
>  
>  typedef struct { pgdval_t pgd; } pgd_t;
>  
> +static inline pgprot_t pgprot_nx(pgprot_t prot)
> +{
> + return __pgprot(pgprot_val(prot) | _PAGE_NX);
> +}
> +#define pgprot_nx pgprot_nx
> +
>  #ifdef CONFIG_X86_PAE

I reckon for arm64 we can do similar in our :

#define pgprot_nx(pgprot_t prot) \
__pgprot_modify(prot, 0, PTE_PXN)

... matching the style of our existing pgprot_*() modifier helpers.

Mark.

>  
>  /*
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index 329b8c8ca703..8c5f9c29698b 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -491,6 +491,10 @@ static inline int arch_unmap_one(struct mm_struct *mm,
>  #define flush_tlb_fix_spurious_fault(vma, address) flush_tlb_page(vma, 
> address)
>  #endif
>  
> +#ifndef pgprot_nx
> +#define pgprot_nx(prot)  (prot)
> +#endif
> +
>  #ifndef pgprot_noncached
>  #define pgprot_noncached(prot)   (prot)
>  #endif
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 7356b3f07bd8..334c75251ddb 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2390,7 +2390,7 @@ void *vmap(struct page **pages, unsigned int count,
>   if (!area)
>   return NULL;
>  
> - if (map_kernel_range((unsigned long)area->addr, size, prot,
> + if (map_kernel_range((unsigned long)area->addr, size, pgprot_nx(prot),
>   pages) < 0) {
>   vunmap(area->addr);
>   return NULL;
> -- 
> 2.25.1
> 


Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-08 Thread Daniel Vetter
On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
> If this code was broken for non-coherent caches a crude powerpc hack
> isn't going to help anyone else.  Remove the hack as it is the last
> user of __vmalloc passing a page protection flag other than PAGE_KERNEL.

Well Ben added this to make stuff work on ppc, ofc the home grown dma
layer in drm from back then isn't going to work in other places. I guess
should have at least an ack from him, in case anyone still cares about
this on ppc. Adding Ben to cc.
-Daniel

> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/gpu/drm/drm_scatter.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c
> index ca520028b2cb..f4e6184d1877 100644
> --- a/drivers/gpu/drm/drm_scatter.c
> +++ b/drivers/gpu/drm/drm_scatter.c
> @@ -43,15 +43,6 @@
>  
>  #define DEBUG_SCATTER 0
>  
> -static inline void *drm_vmalloc_dma(unsigned long size)
> -{
> -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
> - return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL));
> -#else
> - return vmalloc_32(size);
> -#endif
> -}
> -
>  static void drm_sg_cleanup(struct drm_sg_mem * entry)
>  {
>   struct page *page;
> @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void 
> *data,
>   return -ENOMEM;
>   }
>  
> - entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
> + entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
>   if (!entry->virtual) {
>   kfree(entry->busaddr);
>   kfree(entry->pagelist);
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: decruft the vmalloc API

2020-04-08 Thread Peter Zijlstra
On Wed, Apr 08, 2020 at 01:58:58PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> Peter noticed that with some dumb luck you can toast the kernel address
> space with exported vmalloc symbols.
> 
> I used this as an opportunity to decruft the vmalloc.c API and make it
> much more systematic.  This also removes any chance to create vmalloc
> mappings outside the designated areas or using executable permissions
> from modules.  Besides that it removes more than 300 lines of code.
> 

Looks great, thanks for doing this!

Acked-by: Peter Zijlstra (Intel) 


Re: [PATCH 17/28] mm: remove the prot argument from vm_map_ram

2020-04-08 Thread Christoph Hellwig
On Wed, Apr 08, 2020 at 02:21:04PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 08, 2020 at 01:59:15PM +0200, Christoph Hellwig wrote:
> > This is always GFP_KERNEL - for long term mappings with other properties
> > vmap should be used.
> 
>  PAGE_KERNEL != GFP_KERNEL :-)

Yep.  The compiler complained about that a few times :)


Re: [PATCH 17/28] mm: remove the prot argument from vm_map_ram

2020-04-08 Thread Peter Zijlstra
On Wed, Apr 08, 2020 at 01:59:15PM +0200, Christoph Hellwig wrote:
> This is always GFP_KERNEL - for long term mappings with other properties
> vmap should be used.

 PAGE_KERNEL != GFP_KERNEL :-)

> - return vm_map_ram(mock->pages, mock->npages, 0, PAGE_KERNEL);
> + return vm_map_ram(mock->pages, mock->npages, 0);


Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-04-08 Thread Michael Ellerman
Leonardo Bras  writes:
> Hello Nick, Michael,
>
> On Fri, 2020-04-03 at 16:41 +1000, Nicholas Piggin wrote:
> [...]
>> > > PAPR says we are not allowed to have multiple CPUs calling RTAS at once,
>> > > except for a very small list of RTAS calls. So if we bust the RTAS lock
>> > > there's a risk we violate that part of PAPR and crash even harder.
>> > 
>> > Interesting, I was not aware.
>> > 
>> > > Also it's not specific to kdump, we can't even get through a normal
>> > > reboot if we crash with the RTAS lock held.
>> > > 
>> > > Anyway here's a patch with some ideas. That allows me to get from a
>> > > crash with the RTAS lock held through kdump into the 2nd kernel. But it
>> > > only works if it's the crashing CPU that holds the RTAS lock.
>> > > 
>> > 
>> > Nice idea. 
>> > But my test environment is just triggering a crash from sysrq, so I
>> > think it would not improve the result, given that this thread is
>> > probably not holding the lock by the time.
>> 
>> Crash paths should not take that RTAS lock, it's a massive pain. I'm 
>> fixing it for machine check, for other crashes I think it can be removed 
>> too, it just needs to be unpicked. The good thing with crashing is that 
>> you can reasonably *know* that you're single threaded, so you can 
>> usually reason through situations like above.
>> 
>> > I noticed that when rtas is locked, irqs and preemption are also
>> > disabled.
>> > 
>> > Should the IPI send by crash be able to interrupt a thread with
>> > disabled irqs?
>> 
>> Yes. It's been a bit painful, but in the long term it means that a CPU 
>> which hangs with interrupts off can be debugged, and it means we can 
>> take it offline to crash without risking that it will be clobbering what 
>> we're doing.
>> 
>> Arguably what I should have done is try a regular IPI first, wait a few 
>> seconds, then NMI IPI.
>> 
>> A couple of problems with that. Firstly it probably avoids this issue 
>> you hit almost all the time, so it won't get fixed. So when we really 
>> need the NMI IPI in the field, it'll still be riddled with deadlocks.
>> 
>> Secondly, sending the IPI first in theory can be more intrusive to the 
>> state that we want to debug. It uses the currently running stack, paca 
>> save areas, ec. NMI IPI uses its own stack and save regions so it's a 
>> little more isolated. Maybe this is only a small advantage but I'd like 
>> to have it if we can.  
>
> I think the printk issue is solved (sent a patch on that), now what is
> missing is the rtas call spinlock.
>
> I noticed that rtas.lock is taken on machine_kexec_mask_interrupts(),
> which happen after crashing the other threads and getting into
> realmode. 
>
> The following rtas are called each IRQ with valid interrupt descriptor:
> ibm,int-off : Reset mask bit for that interrupt
> ibm,set_xive : Set XIVE priority to 0xff
>
> By what I could understand, these rtas calls happen to put the next
> kexec kernel (kdump kernel) in a safer environment, so I think it's not
> safe to just remove them.

Yes.

> (See commit d6c1a9081080c6c4658acf2a06d851feb2855933)

In hindsight the person who wrote that commit was being lazy. We
*should* have made the 2nd kernel robust against the IRQ state being
messed up.

> On the other hand, busting the rtas.lock could be dangerous, because
> it's code we can't control.
>
> According with LoPAR, for both of these rtas-calls, we have:
>
> For the PowerPC External Interrupt option: The call must be reentrant
> to the number of processors on the platform.
> For the PowerPC External Interrupt option: The argument call buffer for
> each simultaneous call must be physically unique.

Oh well spotted. Where is that in the doc?

> Which I think means this rtas-calls can be done simultaneously.

I think so too. I'll read PAPR in the morning and make sure.

> Would it mean that busting the rtas.lock for these calls would be safe?

What would be better is to make those specific calls not take the global
RTAS lock to begin with.

We should be able to just allocate the rtas_args on the stack, it's only
~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
take the global lock.

cheers


Re: [PATCH 26/28] arm64: use __vmalloc_node in arch_alloc_vmap_stack

2020-04-08 Thread Mark Rutland
On Wed, Apr 08, 2020 at 01:59:24PM +0200, Christoph Hellwig wrote:
> arch_alloc_vmap_stack can use a slightly higher level vmalloc function.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/include/asm/vmap_stack.h | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/vmap_stack.h 
> b/arch/arm64/include/asm/vmap_stack.h
> index 0a12115d9638..0cc6636e3f15 100644
> --- a/arch/arm64/include/asm/vmap_stack.h
> +++ b/arch/arm64/include/asm/vmap_stack.h
> @@ -19,10 +19,8 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t 
> stack_size, int node)
>  {
>   BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));
>  
> - return __vmalloc_node_range(stack_size, THREAD_ALIGN,
> - VMALLOC_START, VMALLOC_END,
> - THREADINFO_GFP, PAGE_KERNEL, 0, node,
> - __builtin_return_address(0));
> + return __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,
> + __builtin_return_address(0));
>  }
>  
>  #endif /* __ASM_VMAP_STACK_H */
> -- 
> 2.25.1
> 


Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

2020-04-08 Thread Gerald Schaefer
On Wed, 8 Apr 2020 12:41:51 +0530
Anshuman Khandual  wrote:

[...]
> >   
> >>
> >> Some thing like this instead.
> >>
> >> pte_t pte = READ_ONCE(*ptep);
> >> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));
> >>
> >> We cannot use mk_pte_phys() as it is defined only on some platforms
> >> without any generic fallback for others.  
> > 
> > Oh, didn't know that, sorry. What about using mk_pte() instead, at least
> > it would result in a present pte:
> > 
> > pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));  
> 
> Lets use mk_pte() here but can we do this instead
> 
> paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK;
> pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot));
> 

Sure, that will also work.

BTW, this RANDOM_ORVALUE is not really very random, the way it is
defined. For s390 we already changed it to mask out some arch bits,
but I guess there are other archs and bits that would always be
set with this "not so random" value, and I wonder if/how that would
affect all the tests using this value, see also below.

> > 
> > And if you also want to do some with the existing value, which seems
> > to be an empty pte, then maybe just check if writing and reading that
> > value with set_huge_pte_at() / huge_ptep_get() returns the same,
> > i.e. initially w/o RANDOM_ORVALUE.
> > 
> > So, in combination, like this (BTW, why is the barrier() needed, it
> > is not used for the other set_huge_pte_at() calls later?):  
> 
> Ahh missed, will add them. Earlier we faced problem without it after
> set_pte_at() for a test on powerpc (64) platform. Hence just added it
> here to be extra careful.
> 
> > 
> > @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test
> > struct page *page = pfn_to_page(pfn);
> > pte_t pte = READ_ONCE(*ptep);
> >  
> > -   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> > +   set_huge_pte_at(mm, vaddr, ptep, pte);
> > +   WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> > +
> > +   pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), 
> > prot));
> > set_huge_pte_at(mm, vaddr, ptep, pte);
> > barrier();
> > WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> > 
> > This would actually add a new test "write empty pte with
> > set_huge_pte_at(), then verify with huge_ptep_get()", which happens
> > to trigger a warning on s390 :-)  
> 
> On arm64 as well which checks for pte_present() in set_huge_pte_at().
> But PTE present check is not really present in each set_huge_pte_at()
> implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT.
> Hence wondering if we should add this new test here which will keep
> giving warnings on s390 and arm64 (at the least).

Hmm, interesting. I forgot about huge swap / migration, which is not
(and probably cannot be) supported on s390. The pte_present() check
on arm64 seems to check for such huge swap / migration entries,
according to the comment.

The new test "write empty pte with set_huge_pte_at(), then verify
with huge_ptep_get()" would then probably trigger the
WARN_ON(!pte_present(pte)) in arm64 code. So I guess "writing empty
ptes with set_huge_pte_at()" is not really a valid use case in practice,
or else you would have seen this warning before. In that case, it
might not be a good idea to add this test.

I also do wonder now, why the original test with
"pte = __pte(pte_val(pte) | RANDOM_ORVALUE);"
did not also trigger that warning on arm64. On s390 this test failed
exactly because the constructed pte was not present (initially empty,
or'ing RANDOM_ORVALUE does not make it present for s390). I guess this
just worked by chance on arm64, because the bits from RANDOM_ORVALUE
also happened to mark the pte present for arm64.

This brings us back to the question above, regarding the "randomness"
of RANDOM_ORVALUE. Not really sure what the intention behind that was,
but maybe it would make sense to restrict this RANDOM_ORVALUE to
non-arch-specific bits, i.e. only bits that would be part of the
address value within a page table entry? Or was it intentionally
chosen to also mess with other bits?

Regards,
Gerald



Re: [PATCH 1/1] powerpc/crash: Use NMI context for printk after crashing other CPUs

2020-04-08 Thread Michael Ellerman
Leonardo Bras  writes:
> Currently, if printk lock (logbuf_lock) is held by other thread during
> crash, there is a chance of deadlocking the crash on next printk, and
> blocking a possibly desired kdump.
>
> After sending IPI to all other CPUs, make printk enter in NMI context,
> as it will use per-cpu buffers to store the message, and avoid locking
> logbuf_lock.
>
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/kexec/crash.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
> index d488311efab1..9b73e3991bf4 100644
> --- a/arch/powerpc/kexec/crash.c
> +++ b/arch/powerpc/kexec/crash.c
> @@ -115,6 +115,7 @@ static void crash_kexec_prepare_cpus(int cpu)

Added context:

printk(KERN_EMERG "Sending IPI to other CPUs\n");

if (crash_wake_offline)
ncpus = num_present_cpus() - 1;

>  
>   crash_send_ipi(crash_ipi_callback);
>   smp_wmb();
> + printk_nmi_enter();
  
Why did you decide to put it there, rather than at the start of
default_machine_crash_shutdown() like I did?

The printk() above could have already deadlocked if another CPU is stuck
with the logbuf lock held.

cheers


Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-04-08 Thread Michael Ellerman
Benjamin Herrenschmidt  writes:
> On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:
>> Benjamin Herrenschmidt  writes:
>> > On Tue, 2020-03-31 at 16:30 +1100, Michael Ellerman wrote:
>> > > I have no attachment to 40x, and I'd certainly be happy to have
>> > > less
>> > > code in the tree, we struggle to keep even the modern platforms
>> > > well
>> > > maintained.
>> > > 
>> > > At the same time I don't want to render anyone's hardware
>> > > obsolete
>> > > unnecessarily. But if there's really no one using 40x then we
>> > > should
>> > > remove it, it could well be broken already.
>> > > 
>> > > So I guess post a series to do the removal and we'll see if
>> > > anyone
>> > > speaks up.
>> > 
>> > We shouldn't remove 40x completely. Just remove the Xilinx 405
>> > stuff.
>> 
>> Congratulations on becoming the 40x maintainer!
>
> Didn't I give you my last 40x system ? :-)

Probably, but my desk is nearly as messy as yours so it's probably
buried under some even more obscure hardware :P

> IBM still put 40x cores inside POWER chips no ?

Oh yeah that's true. I guess most folks don't know that, or that they
run RHEL on them.

cheers


[PATCH 28/28] s390: use __vmalloc_node in stack_alloc

2020-04-08 Thread Christoph Hellwig
stack_alloc can use a slightly higher level vmalloc function.

Signed-off-by: Christoph Hellwig 
---
 arch/s390/kernel/setup.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 36445dd40fdb..0f0b140b5558 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -305,12 +305,9 @@ void *restart_stack __section(.data);
 unsigned long stack_alloc(void)
 {
 #ifdef CONFIG_VMAP_STACK
-   return (unsigned long)
-   __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE,
-VMALLOC_START, VMALLOC_END,
-THREADINFO_GFP,
-PAGE_KERNEL, 0, NUMA_NO_NODE,
-__builtin_return_address(0));
+   return (unsigned long)__vmalloc_node(THREAD_SIZE, THREAD_SIZE,
+   THREADINFO_GFP, NUMA_NO_NODE,
+   __builtin_return_address(0));
 #else
return __get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
 #endif
-- 
2.25.1



[PATCH 25/28] mm: remove vmalloc_user_node_flags

2020-04-08 Thread Christoph Hellwig
Open code it in __bpf_map_area_alloc, which is the only caller.  Also
clean up __bpf_map_area_alloc to have a single vmalloc call with
slightly different flags instead of the current two different calls.

For this to compile for the nommu case add a __vmalloc_node_range stub
to nommu.c.

Signed-off-by: Christoph Hellwig 
---
 include/linux/vmalloc.h |  1 -
 kernel/bpf/syscall.c| 23 +--
 mm/nommu.c  | 14 --
 mm/vmalloc.c| 20 
 4 files changed, 21 insertions(+), 37 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 108f49b47756..f90f2946aac2 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -106,7 +106,6 @@ extern void *vzalloc(unsigned long size);
 extern void *vmalloc_user(unsigned long size);
 extern void *vmalloc_node(unsigned long size, int node);
 extern void *vzalloc_node(unsigned long size, int node);
-extern void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t 
flags);
 extern void *vmalloc_exec(unsigned long size);
 extern void *vmalloc_32(unsigned long size);
 extern void *vmalloc_32_user(unsigned long size);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 48d98ea8fad6..249d9bd43321 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -281,26 +281,29 @@ static void *__bpf_map_area_alloc(u64 size, int 
numa_node, bool mmapable)
 * __GFP_RETRY_MAYFAIL to avoid such situations.
 */
 
-   const gfp_t flags = __GFP_NOWARN | __GFP_ZERO;
+   const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO;
+   unsigned int flags = 0;
+   unsigned long align = 1;
void *area;
 
if (size >= SIZE_MAX)
return NULL;
 
/* kmalloc()'ed memory can't be mmap()'ed */
-   if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-   area = kmalloc_node(size, GFP_USER | __GFP_NORETRY | flags,
+   if (mmapable) {
+   BUG_ON(!PAGE_ALIGNED(size));
+   align = SHMLBA;
+   flags = VM_USERMAP;
+   } else if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
+   area = kmalloc_node(size, gfp | GFP_USER | __GFP_NORETRY,
numa_node);
if (area != NULL)
return area;
}
-   if (mmapable) {
-   BUG_ON(!PAGE_ALIGNED(size));
-   return vmalloc_user_node_flags(size, numa_node, GFP_KERNEL |
-  __GFP_RETRY_MAYFAIL | flags);
-   }
-   return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_RETRY_MAYFAIL | flags,
- numa_node, __builtin_return_address(0));
+
+   return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
+   gfp | GFP_KERNEL | __GFP_RETRY_MAYFAIL, PAGE_KERNEL,
+   flags, numa_node, __builtin_return_address(0));
 }
 
 void *bpf_map_area_alloc(u64 size, int numa_node)
diff --git a/mm/nommu.c b/mm/nommu.c
index 81a86cd85893..b42cd6003d7d 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -150,6 +150,14 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);
 
+void *__vmalloc_node_range(unsigned long size, unsigned long align,
+   unsigned long start, unsigned long end, gfp_t gfp_mask,
+   pgprot_t prot, unsigned long vm_flags, int node,
+   const void *caller)
+{
+   return __vmalloc(size, flags);
+}
+
 void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
int node, const void *caller)
 {
@@ -180,12 +188,6 @@ void *vmalloc_user(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc_user);
 
-void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t flags)
-{
-   return __vmalloc_user_flags(size, flags | __GFP_ZERO);
-}
-EXPORT_SYMBOL(vmalloc_user_node_flags);
-
 struct page *vmalloc_to_page(const void *addr)
 {
return virt_to_page(addr);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 333fbe77255a..f6f2acdaf70c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2658,26 +2658,6 @@ void *vzalloc_node(unsigned long size, int node)
 }
 EXPORT_SYMBOL(vzalloc_node);
 
-/**
- * vmalloc_user_node_flags - allocate memory for userspace on a specific node
- * @size: allocation size
- * @node: numa node
- * @flags: flags for the page level allocator
- *
- * The resulting memory area is zeroed so it can be mapped to userspace
- * without leaking data.
- *
- * Return: pointer to the allocated memory or %NULL on error
- */
-void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t flags)
-{
-   return __vmalloc_node_range(size, SHMLBA,  VMALLOC_START, VMALLOC_END,
-   flags | __GFP_ZERO, PAGE_KERNEL,
-   VM_USERMAP, node,
-   __builtin_return_address(0));
-}

[PATCH 26/28] arm64: use __vmalloc_node in arch_alloc_vmap_stack

2020-04-08 Thread Christoph Hellwig
arch_alloc_vmap_stack can use a slightly higher level vmalloc function.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/vmap_stack.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/vmap_stack.h 
b/arch/arm64/include/asm/vmap_stack.h
index 0a12115d9638..0cc6636e3f15 100644
--- a/arch/arm64/include/asm/vmap_stack.h
+++ b/arch/arm64/include/asm/vmap_stack.h
@@ -19,10 +19,8 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t 
stack_size, int node)
 {
BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));
 
-   return __vmalloc_node_range(stack_size, THREAD_ALIGN,
-   VMALLOC_START, VMALLOC_END,
-   THREADINFO_GFP, PAGE_KERNEL, 0, node,
-   __builtin_return_address(0));
+   return __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,
+   __builtin_return_address(0));
 }
 
 #endif /* __ASM_VMAP_STACK_H */
-- 
2.25.1



[PATCH 27/28] s390: use __vmalloc_node in alloc_vm_stack

2020-04-08 Thread Christoph Hellwig
alloc_vm_stack can use a slightly higher level vmalloc function.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/irq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index a25ed47087ee..4518fb1d6bf4 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -735,9 +735,8 @@ void do_IRQ(struct pt_regs *regs)
 
 static void *__init alloc_vm_stack(void)
 {
-   return __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN, VMALLOC_START,
-   VMALLOC_END, THREADINFO_GFP, PAGE_KERNEL,
-0, NUMA_NO_NODE, (void*)_RET_IP_);
+   return __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, THREADINFO_GFP,
+ NUMA_NO_NODE, (void *)_RET_IP_);
 }
 
 static void __init vmap_irqstack_init(void)
-- 
2.25.1



[PATCH 23/28] mm: remove __vmalloc_node_flags_caller

2020-04-08 Thread Christoph Hellwig
Just use __vmalloc_node instead which gets and extra argument.  To be
able to to use __vmalloc_node in all caller make it available outside
of vmalloc and implement it in nommu.c.

Signed-off-by: Christoph Hellwig 
---
 include/linux/vmalloc.h |  4 ++--
 kernel/bpf/syscall.c|  5 ++---
 mm/nommu.c  |  4 ++--
 mm/util.c   |  2 +-
 mm/vmalloc.c| 10 +-
 5 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 4a46d296e70d..108f49b47756 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -115,8 +115,8 @@ extern void *__vmalloc_node_range(unsigned long size, 
unsigned long align,
unsigned long start, unsigned long end, gfp_t gfp_mask,
pgprot_t prot, unsigned long vm_flags, int node,
const void *caller);
-extern void *__vmalloc_node_flags_caller(unsigned long size,
-int node, gfp_t flags, void *caller);
+void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
+   int node, const void *caller);
 
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 64783da34202..48d98ea8fad6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -299,9 +299,8 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, 
bool mmapable)
return vmalloc_user_node_flags(size, numa_node, GFP_KERNEL |
   __GFP_RETRY_MAYFAIL | flags);
}
-   return __vmalloc_node_flags_caller(size, numa_node,
-  GFP_KERNEL | __GFP_RETRY_MAYFAIL |
-  flags, __builtin_return_address(0));
+   return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_RETRY_MAYFAIL | flags,
+ numa_node, __builtin_return_address(0));
 }
 
 void *bpf_map_area_alloc(u64 size, int numa_node)
diff --git a/mm/nommu.c b/mm/nommu.c
index 9553efa59787..81a86cd85893 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -150,8 +150,8 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags,
-   void *caller)
+void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
+   int node, const void *caller)
 {
return __vmalloc(size, flags);
 }
diff --git a/mm/util.c b/mm/util.c
index 988d11e6c17c..6d5868adbe18 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -580,7 +580,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
if (ret || size <= PAGE_SIZE)
return ret;
 
-   return __vmalloc_node_flags_caller(size, node, flags,
+   return __vmalloc_node(size, 1, flags, node,
__builtin_return_address(0));
 }
 EXPORT_SYMBOL(kvmalloc_node);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3d59d848ad48..ae8249ef5821 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2400,8 +2400,6 @@ void *vmap(struct page **pages, unsigned int count,
 }
 EXPORT_SYMBOL(vmap);
 
-static void *__vmalloc_node(unsigned long size, unsigned long align,
-   gfp_t gfp_mask, int node, const void *caller);
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 pgprot_t prot, int node)
 {
@@ -2552,7 +2550,7 @@ EXPORT_SYMBOL_GPL(__vmalloc_node_range);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-static void *__vmalloc_node(unsigned long size, unsigned long align,
+void *__vmalloc_node(unsigned long size, unsigned long align,
gfp_t gfp_mask, int node, const void *caller)
 {
return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
@@ -2566,12 +2564,6 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags,
- void *caller)
-{
-   return __vmalloc_node(size, 1, flags, node, caller);
-}
-
 /**
  * vmalloc - allocate virtually contiguous memory
  * @size:allocation size
-- 
2.25.1



[PATCH 24/28] mm: switch the test_vmalloc module to use __vmalloc_node

2020-04-08 Thread Christoph Hellwig
No need to export the very low-level __vmalloc_node_range when the
test module can use a slightly higher level variant.

Signed-off-by: Christoph Hellwig 
---
 lib/test_vmalloc.c | 26 +++---
 mm/vmalloc.c   | 17 -
 2 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 8bbefcaddfe8..cd6aef05dfb4 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -91,12 +91,8 @@ static int random_size_align_alloc_test(void)
 */
size = ((rnd % 10) + 1) * PAGE_SIZE;
 
-   ptr = __vmalloc_node_range(size, align,
-  VMALLOC_START, VMALLOC_END,
-  GFP_KERNEL | __GFP_ZERO,
-  PAGE_KERNEL,
-  0, 0, __builtin_return_address(0));
-
+   ptr = __vmalloc_node(size, align, GFP_KERNEL | __GFP_ZERO,
+   __builtin_return_address(0));
if (!ptr)
return -1;
 
@@ -118,12 +114,8 @@ static int align_shift_alloc_test(void)
for (i = 0; i < BITS_PER_LONG; i++) {
align = ((unsigned long) 1) << i;
 
-   ptr = __vmalloc_node_range(PAGE_SIZE, align,
-   VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL | __GFP_ZERO,
-   PAGE_KERNEL,
-   0, 0, __builtin_return_address(0));
-
+   ptr = __vmalloc_node(PAGE_SIZE, align, GFP_KERNEL | __GFP_ZERO,
+   __builtin_return_address(0));
if (!ptr)
return -1;
 
@@ -139,13 +131,9 @@ static int fix_align_alloc_test(void)
int i;
 
for (i = 0; i < test_loop_count; i++) {
-   ptr = __vmalloc_node_range(5 * PAGE_SIZE,
-   THREAD_ALIGN << 1,
-   VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL | __GFP_ZERO,
-   PAGE_KERNEL,
-   0, 0, __builtin_return_address(0));
-
+   ptr = __vmalloc_node(5 * PAGE_SIZE, THREAD_ALIGN << 1,
+   GFP_KERNEL | __GFP_ZERO,
+   __builtin_return_address(0));
if (!ptr)
return -1;
 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ae8249ef5821..333fbe77255a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2522,15 +2522,6 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
return NULL;
 }
 
-/*
- * This is only for performance analysis of vmalloc and stress purpose.
- * It is required by vmalloc test module, therefore do not use it other
- * than that.
- */
-#ifdef CONFIG_TEST_VMALLOC_MODULE
-EXPORT_SYMBOL_GPL(__vmalloc_node_range);
-#endif
-
 /**
  * __vmalloc_node - allocate virtually contiguous memory
  * @size:  allocation size
@@ -2556,6 +2547,14 @@ void *__vmalloc_node(unsigned long size, unsigned long 
align,
return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
gfp_mask, PAGE_KERNEL, 0, node, caller);
 }
+/*
+ * This is only for performance analysis of vmalloc and stress purpose.
+ * It is required by vmalloc test module, therefore do not use it other
+ * than that.
+ */
+#ifdef CONFIG_TEST_VMALLOC_MODULE
+EXPORT_SYMBOL_GPL(__vmalloc_node);
+#endif
 
 void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 {
-- 
2.25.1



[PATCH 20/28] mm: remove the pgprot argument to __vmalloc

2020-04-08 Thread Christoph Hellwig
The pgprot argument to __vmalloc is always PROT_KERNEL now, so remove
it.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/hyperv/hv_init.c  |  3 +--
 arch/x86/include/asm/kvm_host.h|  3 +--
 arch/x86/kvm/svm.c |  3 +--
 drivers/block/drbd/drbd_bitmap.c   |  4 +---
 drivers/gpu/drm/etnaviv/etnaviv_dump.c |  4 ++--
 drivers/lightnvm/pblk-init.c   |  5 ++---
 drivers/md/dm-bufio.c  |  4 ++--
 drivers/mtd/ubi/io.c   |  4 ++--
 drivers/scsi/sd_zbc.c  |  3 +--
 fs/gfs2/dir.c  |  9 -
 fs/gfs2/quota.c|  2 +-
 fs/nfs/blocklayout/extent_tree.c   |  2 +-
 fs/ntfs/malloc.h   |  2 +-
 fs/ubifs/debug.c   |  2 +-
 fs/ubifs/lprops.c  |  2 +-
 fs/ubifs/lpt_commit.c  |  4 ++--
 fs/ubifs/orphan.c  |  2 +-
 fs/xfs/kmem.c  |  2 +-
 include/linux/vmalloc.h|  2 +-
 kernel/bpf/core.c  |  6 +++---
 kernel/groups.c|  2 +-
 kernel/module.c|  3 +--
 mm/nommu.c | 15 +++
 mm/page_alloc.c|  2 +-
 mm/percpu.c|  2 +-
 mm/vmalloc.c   |  4 ++--
 net/bridge/netfilter/ebtables.c|  6 ++
 sound/core/memalloc.c  |  2 +-
 sound/core/pcm_memory.c|  2 +-
 29 files changed, 47 insertions(+), 59 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 5a4b363ba67b..a3d689dfc745 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -95,8 +95,7 @@ static int hv_cpu_init(unsigned int cpu)
 * not be stopped in the case of CPU offlining and the VM will hang.
 */
if (!*hvp) {
-   *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO,
-PAGE_KERNEL);
+   *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
}
 
if (*hvp) {
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 42a2d0d3984a..71bc09bff01a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1280,8 +1280,7 @@ extern struct kmem_cache *x86_fpu_cache;
 #define __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
 {
-   return __vmalloc(kvm_x86_ops.vm_size,
-GFP_KERNEL_ACCOUNT | __GFP_ZERO, PAGE_KERNEL);
+   return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
 }
 void kvm_arch_free_vm(struct kvm *kvm);
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 851e9cc79930..83e8323ba4f2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1927,8 +1927,7 @@ static struct page **sev_pin_memory(struct kvm *kvm, 
unsigned long uaddr,
/* Avoid using vmalloc for smaller buffers. */
size = npages * sizeof(struct page *);
if (size > PAGE_SIZE)
-   pages = __vmalloc(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO,
- PAGE_KERNEL);
+   pages = __vmalloc(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
else
pages = kmalloc(size, GFP_KERNEL_ACCOUNT);
 
diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 15e99697234a..df53dca5d02c 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -396,9 +396,7 @@ static struct page **bm_realloc_pages(struct drbd_bitmap 
*b, unsigned long want)
bytes = sizeof(struct page *)*want;
new_pages = kzalloc(bytes, GFP_NOIO | __GFP_NOWARN);
if (!new_pages) {
-   new_pages = __vmalloc(bytes,
-   GFP_NOIO | __GFP_ZERO,
-   PAGE_KERNEL);
+   new_pages = __vmalloc(bytes, GFP_NOIO | __GFP_ZERO);
if (!new_pages)
return NULL;
}
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c 
b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
index 648cf0207309..706af0304ca4 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
@@ -154,8 +154,8 @@ void etnaviv_core_dump(struct etnaviv_gem_submit *submit)
file_size += sizeof(*iter.hdr) * n_obj;
 
/* Allocate the file in vmalloc memory, it's likely to be big */
-   iter.start = __vmalloc(file_size, GFP_KERNEL | __GFP_NOWARN | 
__GFP_NORETRY,
-  PAGE_KERNEL);
+   iter.start = __vmalloc(file_size, GFP_KERNEL | __GFP_NOWARN |
+   __GFP_NORETRY);
if (!iter.start) {
mutex_unlock(>mmu_context->lock);
dev_warn(gpu->dev, "failed to allocate devcoredump file\n");
diff --git a/drivers/lightnvm/pblk-init.c 

[PATCH 21/28] mm: remove the prot argument to __vmalloc_node

2020-04-08 Thread Christoph Hellwig
This is always PAGE_KERNEL now.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 35 ++-
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 466a449b3a15..de7952959e82 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2401,8 +2401,7 @@ void *vmap(struct page **pages, unsigned int count,
 EXPORT_SYMBOL(vmap);
 
 static void *__vmalloc_node(unsigned long size, unsigned long align,
-   gfp_t gfp_mask, pgprot_t prot,
-   int node, const void *caller);
+   gfp_t gfp_mask, int node, const void *caller);
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 pgprot_t prot, int node)
 {
@@ -2420,7 +2419,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, 
gfp_t gfp_mask,
/* Please note that the recursion is strictly bounded. */
if (array_size > PAGE_SIZE) {
pages = __vmalloc_node(array_size, 1, nested_gfp|highmem_mask,
-   PAGE_KERNEL, node, area->caller);
+   node, area->caller);
} else {
pages = kmalloc_node(array_size, nested_gfp, node);
}
@@ -2539,13 +2538,11 @@ EXPORT_SYMBOL_GPL(__vmalloc_node_range);
  * @size:  allocation size
  * @align: desired alignment
  * @gfp_mask:  flags for the page level allocator
- * @prot:  protection mask for the allocated pages
  * @node:  node to use for allocation or NUMA_NO_NODE
  * @caller:caller's return address
  *
- * Allocate enough pages to cover @size from the page level
- * allocator with @gfp_mask flags.  Map them into contiguous
- * kernel virtual space, using a pagetable protection of @prot.
+ * Allocate enough pages to cover @size from the page level allocator with
+ * @gfp_mask flags.  Map them into contiguous kernel virtual space.
  *
  * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_RETRY_MAYFAIL
  * and __GFP_NOFAIL are not supported
@@ -2556,16 +2553,15 @@ EXPORT_SYMBOL_GPL(__vmalloc_node_range);
  * Return: pointer to the allocated memory or %NULL on error
  */
 static void *__vmalloc_node(unsigned long size, unsigned long align,
-   gfp_t gfp_mask, pgprot_t prot,
-   int node, const void *caller)
+   gfp_t gfp_mask, int node, const void *caller)
 {
return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
-   gfp_mask, prot, 0, node, caller);
+   gfp_mask, PAGE_KERNEL, 0, node, caller);
 }
 
 void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 {
-   return __vmalloc_node(size, 1, gfp_mask, PAGE_KERNEL, NUMA_NO_NODE,
+   return __vmalloc_node(size, 1, gfp_mask, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 EXPORT_SYMBOL(__vmalloc);
@@ -2573,15 +2569,15 @@ EXPORT_SYMBOL(__vmalloc);
 static inline void *__vmalloc_node_flags(unsigned long size,
int node, gfp_t flags)
 {
-   return __vmalloc_node(size, 1, flags, PAGE_KERNEL,
-   node, __builtin_return_address(0));
+   return __vmalloc_node(size, 1, flags, node,
+   __builtin_return_address(0));
 }
 
 
 void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags,
  void *caller)
 {
-   return __vmalloc_node(size, 1, flags, PAGE_KERNEL, node, caller);
+   return __vmalloc_node(size, 1, flags, node, caller);
 }
 
 /**
@@ -2656,8 +2652,8 @@ EXPORT_SYMBOL(vmalloc_user);
  */
 void *vmalloc_node(unsigned long size, int node)
 {
-   return __vmalloc_node(size, 1, GFP_KERNEL, PAGE_KERNEL,
-   node, __builtin_return_address(0));
+   return __vmalloc_node(size, 1, GFP_KERNEL, node,
+   __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc_node);
 
@@ -2670,9 +2666,6 @@ EXPORT_SYMBOL(vmalloc_node);
  * allocator and map them into contiguous kernel virtual space.
  * The memory allocated is set to zero.
  *
- * For tight control over page level allocator and protection flags
- * use __vmalloc_node() instead.
- *
  * Return: pointer to the allocated memory or %NULL on error
  */
 void *vzalloc_node(unsigned long size, int node)
@@ -2745,8 +2738,8 @@ void *vmalloc_exec(unsigned long size)
  */
 void *vmalloc_32(unsigned long size)
 {
-   return __vmalloc_node(size, 1, GFP_VMALLOC32, PAGE_KERNEL,
- NUMA_NO_NODE, __builtin_return_address(0));
+   return __vmalloc_node(size, 1, GFP_VMALLOC32, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc_32);
 
-- 
2.25.1



[PATCH 22/28] mm: remove both instances of __vmalloc_node_flags

2020-04-08 Thread Christoph Hellwig
The real version just had a few callers that can open code it and
remove one layer of indirection.  The nommu stub was public but only
had a single caller, so remove it and avoid a CONFIG_MMU ifdef in
vmalloc.h.

Signed-off-by: Christoph Hellwig 
---
 include/linux/vmalloc.h |  9 -
 mm/nommu.c  |  3 ++-
 mm/vmalloc.c| 20 ++--
 3 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c1b9d6eca05f..4a46d296e70d 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -115,17 +115,8 @@ extern void *__vmalloc_node_range(unsigned long size, 
unsigned long align,
unsigned long start, unsigned long end, gfp_t gfp_mask,
pgprot_t prot, unsigned long vm_flags, int node,
const void *caller);
-#ifndef CONFIG_MMU
-extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
-static inline void *__vmalloc_node_flags_caller(unsigned long size, int node,
-   gfp_t flags, void *caller)
-{
-   return __vmalloc_node_flags(size, node, flags);
-}
-#else
 extern void *__vmalloc_node_flags_caller(unsigned long size,
 int node, gfp_t flags, void *caller);
-#endif
 
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
diff --git a/mm/nommu.c b/mm/nommu.c
index 2df549adb22b..9553efa59787 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -150,7 +150,8 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags)
+void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags,
+   void *caller)
 {
return __vmalloc(size, flags);
 }
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index de7952959e82..3d59d848ad48 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2566,14 +2566,6 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-static inline void *__vmalloc_node_flags(unsigned long size,
-   int node, gfp_t flags)
-{
-   return __vmalloc_node(size, 1, flags, node,
-   __builtin_return_address(0));
-}
-
-
 void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags,
  void *caller)
 {
@@ -2594,8 +2586,8 @@ void *__vmalloc_node_flags_caller(unsigned long size, int 
node, gfp_t flags,
  */
 void *vmalloc(unsigned long size)
 {
-   return __vmalloc_node_flags(size, NUMA_NO_NODE,
-   GFP_KERNEL);
+   return __vmalloc_node(size, 1, GFP_KERNEL, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc);
 
@@ -2614,8 +2606,8 @@ EXPORT_SYMBOL(vmalloc);
  */
 void *vzalloc(unsigned long size)
 {
-   return __vmalloc_node_flags(size, NUMA_NO_NODE,
-   GFP_KERNEL | __GFP_ZERO);
+   return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vzalloc);
 
@@ -2670,8 +2662,8 @@ EXPORT_SYMBOL(vmalloc_node);
  */
 void *vzalloc_node(unsigned long size, int node)
 {
-   return __vmalloc_node_flags(size, node,
-GFP_KERNEL | __GFP_ZERO);
+   return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, node,
+   __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vzalloc_node);
 
-- 
2.25.1



[PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-08 Thread Christoph Hellwig
If this code was broken for non-coherent caches a crude powerpc hack
isn't going to help anyone else.  Remove the hack as it is the last
user of __vmalloc passing a page protection flag other than PAGE_KERNEL.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/drm_scatter.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c
index ca520028b2cb..f4e6184d1877 100644
--- a/drivers/gpu/drm/drm_scatter.c
+++ b/drivers/gpu/drm/drm_scatter.c
@@ -43,15 +43,6 @@
 
 #define DEBUG_SCATTER 0
 
-static inline void *drm_vmalloc_dma(unsigned long size)
-{
-#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
-   return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL));
-#else
-   return vmalloc_32(size);
-#endif
-}
-
 static void drm_sg_cleanup(struct drm_sg_mem * entry)
 {
struct page *page;
@@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void *data,
return -ENOMEM;
}
 
-   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
+   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
if (!entry->virtual) {
kfree(entry->busaddr);
kfree(entry->pagelist);
-- 
2.25.1



[PATCH 18/28] mm: enforce that vmap can't map pages executable

2020-04-08 Thread Christoph Hellwig
To help enforcing the W^X protection don't allow remapping existing
pages as executable.

Based on patch from Peter Zijlstra .

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/pgtable_types.h | 6 ++
 include/asm-generic/pgtable.h| 4 
 mm/vmalloc.c | 2 +-
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 947867f112ea..2e7c442cc618 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -282,6 +282,12 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
 
 typedef struct { pgdval_t pgd; } pgd_t;
 
+static inline pgprot_t pgprot_nx(pgprot_t prot)
+{
+   return __pgprot(pgprot_val(prot) | _PAGE_NX);
+}
+#define pgprot_nx pgprot_nx
+
 #ifdef CONFIG_X86_PAE
 
 /*
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 329b8c8ca703..8c5f9c29698b 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -491,6 +491,10 @@ static inline int arch_unmap_one(struct mm_struct *mm,
 #define flush_tlb_fix_spurious_fault(vma, address) flush_tlb_page(vma, address)
 #endif
 
+#ifndef pgprot_nx
+#define pgprot_nx(prot)(prot)
+#endif
+
 #ifndef pgprot_noncached
 #define pgprot_noncached(prot) (prot)
 #endif
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7356b3f07bd8..334c75251ddb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2390,7 +2390,7 @@ void *vmap(struct page **pages, unsigned int count,
if (!area)
return NULL;
 
-   if (map_kernel_range((unsigned long)area->addr, size, prot,
+   if (map_kernel_range((unsigned long)area->addr, size, pgprot_nx(prot),
pages) < 0) {
vunmap(area->addr);
return NULL;
-- 
2.25.1



[PATCH 17/28] mm: remove the prot argument from vm_map_ram

2020-04-08 Thread Christoph Hellwig
This is always GFP_KERNEL - for long term mappings with other properties
vmap should be used.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c   | 2 +-
 drivers/media/common/videobuf2/videobuf2-dma-sg.c  | 3 +--
 drivers/media/common/videobuf2/videobuf2-vmalloc.c | 3 +--
 fs/erofs/decompressor.c| 2 +-
 fs/xfs/xfs_buf.c   | 2 +-
 include/linux/vmalloc.h| 3 +--
 mm/nommu.c | 2 +-
 mm/vmalloc.c   | 4 ++--
 8 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c 
b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
index 9272bef57092..debaf7b18ab5 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
@@ -66,7 +66,7 @@ static void *mock_dmabuf_vmap(struct dma_buf *dma_buf)
 {
struct mock_dmabuf *mock = to_mock(dma_buf);
 
-   return vm_map_ram(mock->pages, mock->npages, 0, PAGE_KERNEL);
+   return vm_map_ram(mock->pages, mock->npages, 0);
 }
 
 static void mock_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c 
b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 6db60e9d5183..92072a08af25 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -309,8 +309,7 @@ static void *vb2_dma_sg_vaddr(void *buf_priv)
if (buf->db_attach)
buf->vaddr = dma_buf_vmap(buf->db_attach->dmabuf);
else
-   buf->vaddr = vm_map_ram(buf->pages,
-   buf->num_pages, -1, PAGE_KERNEL);
+   buf->vaddr = vm_map_ram(buf->pages, buf->num_pages, -1);
}
 
/* add offset in case userptr is not page-aligned */
diff --git a/drivers/media/common/videobuf2/videobuf2-vmalloc.c 
b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
index 1a4f0ca87c7c..c66fda4a65e4 100644
--- a/drivers/media/common/videobuf2/videobuf2-vmalloc.c
+++ b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
@@ -107,8 +107,7 @@ static void *vb2_vmalloc_get_userptr(struct device *dev, 
unsigned long vaddr,
buf->vaddr = (__force void *)
ioremap(__pfn_to_phys(nums[0]), size + offset);
} else {
-   buf->vaddr = vm_map_ram(frame_vector_pages(vec), n_pages, -1,
-   PAGE_KERNEL);
+   buf->vaddr = vm_map_ram(frame_vector_pages(vec), n_pages, -1);
}
 
if (!buf->vaddr)
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 5d2d81940679..7628816f2453 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -274,7 +274,7 @@ static int z_erofs_decompress_generic(struct 
z_erofs_decompress_req *rq,
 
i = 0;
while (1) {
-   dst = vm_map_ram(rq->out, nrpages_out, -1, PAGE_KERNEL);
+   dst = vm_map_ram(rq->out, nrpages_out, -1);
 
/* retry two more times (totally 3 times) */
if (dst || ++i >= 3)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f880141a2268..940af9da6db1 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -474,7 +474,7 @@ _xfs_buf_map_pages(
nofs_flag = memalloc_nofs_save();
do {
bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count,
-   -1, PAGE_KERNEL);
+   -1);
if (bp->b_addr)
break;
vm_unmap_aliases();
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 15ffbd8e8e65..9273b1a91ca5 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -88,8 +88,7 @@ struct vmap_area {
  * Highlevel APIs for driver use
  */
 extern void vm_unmap_ram(const void *mem, unsigned int count);
-extern void *vm_map_ram(struct page **pages, unsigned int count,
-   int node, pgprot_t prot);
+extern void *vm_map_ram(struct page **pages, unsigned int count, int node);
 extern void vm_unmap_aliases(void);
 
 #ifdef CONFIG_MMU
diff --git a/mm/nommu.c b/mm/nommu.c
index 318df4e236c9..4f07b7ef0297 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -351,7 +351,7 @@ void vunmap(const void *addr)
 }
 EXPORT_SYMBOL(vunmap);
 
-void *vm_map_ram(struct page **pages, unsigned int count, int node, pgprot_t 
prot)
+void *vm_map_ram(struct page **pages, unsigned int count, int node)
 {
BUG();
return NULL;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 258220b203f1..7356b3f07bd8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1834,7 +1834,7 @@ EXPORT_SYMBOL(vm_unmap_ram);
  *
  * Returns: a 

[PATCH 16/28] mm: remove unmap_vmap_area

2020-04-08 Thread Christoph Hellwig
This function just has a single caller, open code it there.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b0c7cdc8701a..258220b203f1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1247,14 +1247,6 @@ int unregister_vmap_purge_notifier(struct notifier_block 
*nb)
 }
 EXPORT_SYMBOL_GPL(unregister_vmap_purge_notifier);
 
-/*
- * Clear the pagetable entries of a given vmap_area
- */
-static void unmap_vmap_area(struct vmap_area *va)
-{
-   unmap_kernel_range_noflush(va->va_start, va->va_end - va->va_start);
-}
-
 /*
  * lazy_max_pages is the maximum amount of virtual address space we gather up
  * before attempting to purge with a TLB flush.
@@ -1416,7 +1408,7 @@ static void free_vmap_area_noflush(struct vmap_area *va)
 static void free_unmap_vmap_area(struct vmap_area *va)
 {
flush_cache_vunmap(va->va_start, va->va_end);
-   unmap_vmap_area(va);
+   unmap_kernel_range_noflush(va->va_start, va->va_end - va->va_start);
if (debug_pagealloc_enabled_static())
flush_tlb_kernel_range(va->va_start, va->va_end);
 
-- 
2.25.1



[PATCH 15/28] mm: remove map_vm_range

2020-04-08 Thread Christoph Hellwig
Switch all callers to map_kernel_range, which symmetric to the unmap
side (as well as the _noflush versions).

Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/cachetlb.rst |  2 +-
 include/linux/vmalloc.h | 10 --
 mm/vmalloc.c| 21 +++--
 mm/zsmalloc.c   |  4 +++-
 net/ceph/ceph_common.c  |  3 +--
 5 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/Documentation/core-api/cachetlb.rst 
b/Documentation/core-api/cachetlb.rst
index 93cb65d52720..a1582cc79f0f 100644
--- a/Documentation/core-api/cachetlb.rst
+++ b/Documentation/core-api/cachetlb.rst
@@ -213,7 +213,7 @@ Here are the routines, one by one:
there will be no entries in the cache for the kernel address
space for virtual addresses in the range 'start' to 'end-1'.
 
-   The first of these two routines is invoked after map_vm_area()
+   The first of these two routines is invoked after map_kernel_range()
has installed the page table entries.  The second is invoked
before unmap_kernel_range() deletes the page table entries.
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3070b4dbc2d9..15ffbd8e8e65 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -168,11 +168,11 @@ extern struct vm_struct *__get_vm_area_caller(unsigned 
long size,
 extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
 
-extern int map_vm_area(struct vm_struct *area, pgprot_t prot,
-   struct page **pages);
 #ifdef CONFIG_MMU
 extern int map_kernel_range_noflush(unsigned long start, unsigned long size,
pgprot_t prot, struct page **pages);
+int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot,
+   struct page **pages);
 extern void unmap_kernel_range_noflush(unsigned long addr, unsigned long size);
 extern void unmap_kernel_range(unsigned long addr, unsigned long size);
 static inline void set_vm_flush_reset_perms(void *addr)
@@ -189,14 +189,12 @@ map_kernel_range_noflush(unsigned long start, unsigned 
long size,
 {
return size >> PAGE_SHIFT;
 }
+#define map_kernel_range map_kernel_range_noflush
 static inline void
 unmap_kernel_range_noflush(unsigned long addr, unsigned long size)
 {
 }
-static inline void
-unmap_kernel_range(unsigned long addr, unsigned long size)
-{
-}
+#define unmap_kernel_range unmap_kernel_range_noflush
 static inline void set_vm_flush_reset_perms(void *addr)
 {
 }
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ca8dc5d42580..b0c7cdc8701a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -272,8 +272,8 @@ int map_kernel_range_noflush(unsigned long addr, unsigned 
long size,
return 0;
 }
 
-static int map_kernel_range(unsigned long start, unsigned long size,
-  pgprot_t prot, struct page **pages)
+int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot,
+   struct page **pages)
 {
int ret;
 
@@ -2027,16 +2027,6 @@ void unmap_kernel_range(unsigned long addr, unsigned 
long size)
flush_tlb_kernel_range(addr, end);
 }
 
-int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages)
-{
-   unsigned long addr = (unsigned long)area->addr;
-   int err;
-
-   err = map_kernel_range(addr, get_vm_area_size(area), prot, pages);
-
-   return err > 0 ? 0 : err;
-}
-
 static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
struct vmap_area *va, unsigned long flags, const void *caller)
 {
@@ -2408,7 +2398,8 @@ void *vmap(struct page **pages, unsigned int count,
if (!area)
return NULL;
 
-   if (map_vm_area(area, prot, pages)) {
+   if (map_kernel_range((unsigned long)area->addr, size, prot,
+   pages) < 0) {
vunmap(area->addr);
return NULL;
}
@@ -2471,8 +2462,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, 
gfp_t gfp_mask,
}
atomic_long_add(area->nr_pages, _vmalloc_pages);
 
-   if (map_vm_area(area, prot, pages))
+   if (map_kernel_range((unsigned long)area->addr, get_vm_area_size(area),
+   prot, pages) < 0)
goto fail;
+
return area->addr;
 
 fail:
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index ac0524330b9b..f6dc0673e62c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1138,7 +1138,9 @@ static inline void __zs_cpu_down(struct mapping_area 
*area)
 static inline void *__zs_map_object(struct mapping_area *area,
struct page *pages[2], int off, int size)
 {
-   BUG_ON(map_vm_area(area->vm, PAGE_KERNEL, pages));
+   unsigned long addr = (unsigned long)area->vm->addr;
+
+   BUG_ON(map_kernel_range(addr, PAGE_SIZE * 2, PAGE_KERNEL, pages) < 0);
area->vm_addr = area->vm->addr;

[PATCH 14/28] mm: don't return the number of pages from map_kernel_range{, _noflush}

2020-04-08 Thread Christoph Hellwig
None of the callers needs the number of pages, and a 0 / -errno return
value is a lot more intuitive.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a3d810def567..ca8dc5d42580 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -249,7 +249,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
  * function.
  *
  * RETURNS:
- * The number of pages mapped on success, -errno on failure.
+ * 0 on success, -errno on failure.
  */
 int map_kernel_range_noflush(unsigned long addr, unsigned long size,
 pgprot_t prot, struct page **pages)
@@ -269,7 +269,7 @@ int map_kernel_range_noflush(unsigned long addr, unsigned 
long size,
return err;
} while (pgd++, addr = next, addr != end);
 
-   return nr;
+   return 0;
 }
 
 static int map_kernel_range(unsigned long start, unsigned long size,
-- 
2.25.1



[PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Christoph Hellwig
This allows to unexport map_vm_area and unmap_kernel_range, which are
rather deep internal and should not be available to modules.

Signed-off-by: Christoph Hellwig 
---
 mm/Kconfig   | 2 +-
 mm/vmalloc.c | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 36949a9425b8..614cc786b519 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -702,7 +702,7 @@ config ZSMALLOC
 
 config ZSMALLOC_PGTABLE_MAPPING
bool "Use page table mapping to access object in zsmalloc"
-   depends on ZSMALLOC
+   depends on ZSMALLOC=y
help
  By default, zsmalloc uses a copy-based object mapping method to
  access allocations that span two pages. However, if a particular
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3375f9508ef6..9183fc0d365a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2046,7 +2046,6 @@ void unmap_kernel_range(unsigned long addr, unsigned long 
size)
vunmap_page_range(addr, end);
flush_tlb_kernel_range(addr, end);
 }
-EXPORT_SYMBOL_GPL(unmap_kernel_range);
 
 int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages)
 {
@@ -2058,7 +2057,6 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, 
struct page **pages)
 
return err > 0 ? 0 : err;
 }
-EXPORT_SYMBOL_GPL(map_vm_area);
 
 static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
struct vmap_area *va, unsigned long flags, const void *caller)
-- 
2.25.1



[PATCH 13/28] mm: rename vmap_page_range to map_kernel_range

2020-04-08 Thread Christoph Hellwig
This matches the map_kernel_range_noflush API.  Also change to pass
a size instead of the end, similar to the noflush version.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 55df5dc6a9fc..a3d810def567 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -272,13 +272,13 @@ int map_kernel_range_noflush(unsigned long addr, unsigned 
long size,
return nr;
 }
 
-static int vmap_page_range(unsigned long start, unsigned long end,
+static int map_kernel_range(unsigned long start, unsigned long size,
   pgprot_t prot, struct page **pages)
 {
int ret;
 
-   ret = map_kernel_range_noflush(start, end - start, prot, pages);
-   flush_cache_vmap(start, end);
+   ret = map_kernel_range_noflush(start, size, prot, pages);
+   flush_cache_vmap(start, start + size);
return ret;
 }
 
@@ -1866,7 +1866,7 @@ void *vm_map_ram(struct page **pages, unsigned int count, 
int node, pgprot_t pro
 
kasan_unpoison_vmalloc(mem, size);
 
-   if (vmap_page_range(addr, addr + size, prot, pages) < 0) {
+   if (map_kernel_range(addr, size, prot, pages) < 0) {
vm_unmap_ram(mem, count);
return NULL;
}
@@ -2030,10 +2030,9 @@ void unmap_kernel_range(unsigned long addr, unsigned 
long size)
 int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages)
 {
unsigned long addr = (unsigned long)area->addr;
-   unsigned long end = addr + get_vm_area_size(area);
int err;
 
-   err = vmap_page_range(addr, end, prot, pages);
+   err = map_kernel_range(addr, get_vm_area_size(area), prot, pages);
 
return err > 0 ? 0 : err;
 }
-- 
2.25.1



[PATCH 12/28] mm: remove vmap_page_range_noflush and vunmap_page_range

2020-04-08 Thread Christoph Hellwig
These have non-static aliases claled map_kernel_range_noflush and
unmap_kernel_range_noflush that just differ slightly in the calling
conventions that pass addr + size instead of an end.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 98 +---
 1 file changed, 40 insertions(+), 58 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index aada9e9144bd..55df5dc6a9fc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -127,10 +127,24 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long 
addr, unsigned long end)
} while (p4d++, addr = next, addr != end);
 }
 
-static void vunmap_page_range(unsigned long addr, unsigned long end)
+/**
+ * unmap_kernel_range_noflush - unmap kernel VM area
+ * @addr: start of the VM area to unmap
+ * @size: size of the VM area to unmap
+ *
+ * Unmap PFN_UP(@size) pages at @addr.  The VM area @addr and @size specify
+ * should have been allocated using get_vm_area() and its friends.
+ *
+ * NOTE:
+ * This function does NOT do any cache flushing.  The caller is responsible
+ * for calling flush_cache_vunmap() on to-be-mapped areas before calling this
+ * function and flush_tlb_kernel_range() after.
+ */
+void unmap_kernel_range_noflush(unsigned long addr, unsigned long size)
 {
-   pgd_t *pgd;
+   unsigned long end = addr + size;
unsigned long next;
+   pgd_t *pgd;
 
BUG_ON(addr >= end);
pgd = pgd_offset_k(addr);
@@ -219,18 +233,30 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
return 0;
 }
 
-/*
- * Set up page tables in kva (addr, end). The ptes shall have prot "prot", and
- * will have pfns corresponding to the "pages" array.
+/**
+ * map_kernel_range_noflush - map kernel VM area with the specified pages
+ * @addr: start of the VM area to map
+ * @size: size of the VM area to map
+ * @prot: page protection flags to use
+ * @pages: pages to map
  *
- * Ie. pte at addr+N*PAGE_SIZE shall point to pfn corresponding to pages[N]
+ * Map PFN_UP(@size) pages at @addr.  The VM area @addr and @size specify 
should
+ * have been allocated using get_vm_area() and its friends.
+ *
+ * NOTE:
+ * This function does NOT do any cache flushing.  The caller is responsible for
+ * calling flush_cache_vmap() on to-be-mapped areas before calling this
+ * function.
+ *
+ * RETURNS:
+ * The number of pages mapped on success, -errno on failure.
  */
-static int vmap_page_range_noflush(unsigned long start, unsigned long end,
-  pgprot_t prot, struct page **pages)
+int map_kernel_range_noflush(unsigned long addr, unsigned long size,
+pgprot_t prot, struct page **pages)
 {
-   pgd_t *pgd;
+   unsigned long end = addr + size;
unsigned long next;
-   unsigned long addr = start;
+   pgd_t *pgd;
int err = 0;
int nr = 0;
 
@@ -251,7 +277,7 @@ static int vmap_page_range(unsigned long start, unsigned 
long end,
 {
int ret;
 
-   ret = vmap_page_range_noflush(start, end, prot, pages);
+   ret = map_kernel_range_noflush(start, end - start, prot, pages);
flush_cache_vmap(start, end);
return ret;
 }
@@ -1226,7 +1252,7 @@ EXPORT_SYMBOL_GPL(unregister_vmap_purge_notifier);
  */
 static void unmap_vmap_area(struct vmap_area *va)
 {
-   vunmap_page_range(va->va_start, va->va_end);
+   unmap_kernel_range_noflush(va->va_start, va->va_end - va->va_start);
 }
 
 /*
@@ -1686,7 +1712,7 @@ static void vb_free(unsigned long addr, unsigned long 
size)
rcu_read_unlock();
BUG_ON(!vb);
 
-   vunmap_page_range(addr, addr + size);
+   unmap_kernel_range_noflush(addr, size);
 
if (debug_pagealloc_enabled_static())
flush_tlb_kernel_range(addr, addr + size);
@@ -1984,50 +2010,6 @@ void __init vmalloc_init(void)
vmap_initialized = true;
 }
 
-/**
- * map_kernel_range_noflush - map kernel VM area with the specified pages
- * @addr: start of the VM area to map
- * @size: size of the VM area to map
- * @prot: page protection flags to use
- * @pages: pages to map
- *
- * Map PFN_UP(@size) pages at @addr.  The VM area @addr and @size
- * specify should have been allocated using get_vm_area() and its
- * friends.
- *
- * NOTE:
- * This function does NOT do any cache flushing.  The caller is
- * responsible for calling flush_cache_vmap() on to-be-mapped areas
- * before calling this function.
- *
- * RETURNS:
- * The number of pages mapped on success, -errno on failure.
- */
-int map_kernel_range_noflush(unsigned long addr, unsigned long size,
-pgprot_t prot, struct page **pages)
-{
-   return vmap_page_range_noflush(addr, addr + size, prot, pages);
-}
-
-/**
- * unmap_kernel_range_noflush - unmap kernel VM area
- * @addr: start of the VM area to unmap
- * @size: size of the VM area to unmap
- *
- * Unmap PFN_UP(@size) pages at @addr.  The VM area @addr and @size
- * specify should have been 

[PATCH 09/28] mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING

2020-04-08 Thread Christoph Hellwig
Rename the Kconfig variable to clarify the scope.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/configs/omap2plus_defconfig | 2 +-
 include/linux/zsmalloc.h | 2 +-
 mm/Kconfig   | 2 +-
 mm/zsmalloc.c| 8 
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/configs/omap2plus_defconfig 
b/arch/arm/configs/omap2plus_defconfig
index 3cc3ca5fa027..583d8abd80a4 100644
--- a/arch/arm/configs/omap2plus_defconfig
+++ b/arch/arm/configs/omap2plus_defconfig
@@ -81,7 +81,7 @@ CONFIG_PARTITION_ADVANCED=y
 CONFIG_BINFMT_MISC=y
 CONFIG_CMA=y
 CONFIG_ZSMALLOC=m
-CONFIG_PGTABLE_MAPPING=y
+CONFIG_ZSMALLOC_PGTABLE_MAPPING=y
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 2219cce81ca4..0fdbf653b173 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -20,7 +20,7 @@
  * zsmalloc mapping modes
  *
  * NOTE: These only make a difference when a mapped object spans pages.
- * They also have no effect when PGTABLE_MAPPING is selected.
+ * They also have no effect when ZSMALLOC_PGTABLE_MAPPING is selected.
  */
 enum zs_mapmode {
ZS_MM_RW, /* normal read-write mapping */
diff --git a/mm/Kconfig b/mm/Kconfig
index 691021492e78..36949a9425b8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -700,7 +700,7 @@ config ZSMALLOC
  returned by an alloc().  This handle must be mapped in order to
  access the allocated space.
 
-config PGTABLE_MAPPING
+config ZSMALLOC_PGTABLE_MAPPING
bool "Use page table mapping to access object in zsmalloc"
depends on ZSMALLOC
help
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2f836a2b993f..ac0524330b9b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -293,7 +293,7 @@ struct zspage {
 };
 
 struct mapping_area {
-#ifdef CONFIG_PGTABLE_MAPPING
+#ifdef CONFIG_ZSMALLOC_PGTABLE_MAPPING
struct vm_struct *vm; /* vm area for mapping object that span pages */
 #else
char *vm_buf; /* copy buffer for objects that span pages */
@@ -1113,7 +1113,7 @@ static struct zspage *find_get_zspage(struct size_class 
*class)
return zspage;
 }
 
-#ifdef CONFIG_PGTABLE_MAPPING
+#ifdef CONFIG_ZSMALLOC_PGTABLE_MAPPING
 static inline int __zs_cpu_up(struct mapping_area *area)
 {
/*
@@ -1151,7 +1151,7 @@ static inline void __zs_unmap_object(struct mapping_area 
*area,
unmap_kernel_range(addr, PAGE_SIZE * 2);
 }
 
-#else /* CONFIG_PGTABLE_MAPPING */
+#else /* CONFIG_ZSMALLOC_PGTABLE_MAPPING */
 
 static inline int __zs_cpu_up(struct mapping_area *area)
 {
@@ -1233,7 +1233,7 @@ static void __zs_unmap_object(struct mapping_area *area,
pagefault_enable();
 }
 
-#endif /* CONFIG_PGTABLE_MAPPING */
+#endif /* CONFIG_ZSMALLOC_PGTABLE_MAPPING */
 
 static int zs_cpu_prepare(unsigned int cpu)
 {
-- 
2.25.1



[PATCH 11/28] mm: pass addr as unsigned long to vb_free

2020-04-08 Thread Christoph Hellwig
Ever use of addr in vb_free casts to unsigned long first, and the caller
has an unsigned long version of the address available anyway.  Just pass
that and avoid all the casts.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9183fc0d365a..aada9e9144bd 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1664,7 +1664,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
return vaddr;
 }
 
-static void vb_free(const void *addr, unsigned long size)
+static void vb_free(unsigned long addr, unsigned long size)
 {
unsigned long offset;
unsigned long vb_idx;
@@ -1674,24 +1674,22 @@ static void vb_free(const void *addr, unsigned long 
size)
BUG_ON(offset_in_page(size));
BUG_ON(size > PAGE_SIZE*VMAP_MAX_ALLOC);
 
-   flush_cache_vunmap((unsigned long)addr, (unsigned long)addr + size);
+   flush_cache_vunmap(addr, addr + size);
 
order = get_order(size);
 
-   offset = (unsigned long)addr & (VMAP_BLOCK_SIZE - 1);
-   offset >>= PAGE_SHIFT;
+   offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT;
 
-   vb_idx = addr_to_vb_idx((unsigned long)addr);
+   vb_idx = addr_to_vb_idx(addr);
rcu_read_lock();
vb = radix_tree_lookup(_block_tree, vb_idx);
rcu_read_unlock();
BUG_ON(!vb);
 
-   vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
+   vunmap_page_range(addr, addr + size);
 
if (debug_pagealloc_enabled_static())
-   flush_tlb_kernel_range((unsigned long)addr,
-   (unsigned long)addr + size);
+   flush_tlb_kernel_range(addr, addr + size);
 
spin_lock(>lock);
 
@@ -1791,7 +1789,7 @@ void vm_unmap_ram(const void *mem, unsigned int count)
 
if (likely(count <= VMAP_MAX_ALLOC)) {
debug_check_no_locks_freed(mem, size);
-   vb_free(mem, size);
+   vb_free(addr, size);
return;
}
 
-- 
2.25.1



[PATCH 08/28] mm: unexport unmap_kernel_range_noflush

2020-04-08 Thread Christoph Hellwig
There are no modular users of this function.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d1534d610b48..3375f9508ef6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2029,7 +2029,6 @@ void unmap_kernel_range_noflush(unsigned long addr, 
unsigned long size)
 {
vunmap_page_range(addr, addr + size);
 }
-EXPORT_SYMBOL_GPL(unmap_kernel_range_noflush);
 
 /**
  * unmap_kernel_range - unmap kernel VM area and flush cache and TLB
-- 
2.25.1



[PATCH 07/28] mm: remove __get_vm_area

2020-04-08 Thread Christoph Hellwig
Switch the two remaining callers to use __get_vm_area_caller instead.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/pci_64.c | 3 ++-
 arch/sh/kernel/cpu/sh4/sq.c  | 3 ++-
 include/linux/vmalloc.h  | 2 --
 mm/vmalloc.c | 8 
 4 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 8e86bd9c1eca..155e2ef60053 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -132,7 +132,8 @@ void __iomem *ioremap_pbh(phys_addr_t paddr, unsigned long 
size)
 * address decoding but I'd rather not deal with those outside of the
 * reserved 64K legacy region.
 */
-   area = __get_vm_area(size, 0, PHB_IO_BASE, PHB_IO_END);
+   area = __get_vm_area_caller(size, 0, PHB_IO_BASE, PHB_IO_END,
+   __builtin_return_address(0));
if (!area)
return NULL;
 
diff --git a/arch/sh/kernel/cpu/sh4/sq.c b/arch/sh/kernel/cpu/sh4/sq.c
index 934ff84844fa..d432164b23b7 100644
--- a/arch/sh/kernel/cpu/sh4/sq.c
+++ b/arch/sh/kernel/cpu/sh4/sq.c
@@ -103,7 +103,8 @@ static int __sq_remap(struct sq_mapping *map, pgprot_t prot)
 #if defined(CONFIG_MMU)
struct vm_struct *vma;
 
-   vma = __get_vm_area(map->size, VM_ALLOC, map->sq_addr, SQ_ADDRMAX);
+   vma = __get_vm_area_caller(map->size, VM_ALLOC, map->sq_addr,
+   SQ_ADDRMAX, __builtin_return_address(0));
if (!vma)
return -ENOMEM;
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 0507a162ccd0..3070b4dbc2d9 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -161,8 +161,6 @@ static inline size_t get_vm_area_size(const struct 
vm_struct *area)
 extern struct vm_struct *get_vm_area(unsigned long size, unsigned long flags);
 extern struct vm_struct *get_vm_area_caller(unsigned long size,
unsigned long flags, const void 
*caller);
-extern struct vm_struct *__get_vm_area(unsigned long size, unsigned long flags,
-   unsigned long start, unsigned long end);
 extern struct vm_struct *__get_vm_area_caller(unsigned long size,
unsigned long flags,
unsigned long start, unsigned long end,
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 399f219544f7..d1534d610b48 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2127,14 +2127,6 @@ static struct vm_struct *__get_vm_area_node(unsigned 
long size,
return area;
 }
 
-struct vm_struct *__get_vm_area(unsigned long size, unsigned long flags,
-   unsigned long start, unsigned long end)
-{
-   return __get_vm_area_node(size, 1, flags, start, end, NUMA_NO_NODE,
- GFP_KERNEL, __builtin_return_address(0));
-}
-EXPORT_SYMBOL_GPL(__get_vm_area);
-
 struct vm_struct *__get_vm_area_caller(unsigned long size, unsigned long flags,
   unsigned long start, unsigned long end,
   const void *caller)
-- 
2.25.1



[PATCH 05/28] powerpc: add an ioremap_phb helper

2020-04-08 Thread Christoph Hellwig
Factor code shared between pci_64 and electra_cf into a ioremap_pbh
helper that follows the normal ioremap semantics, and returns a
useful __iomem pointer.  Note that it opencodes __ioremap_at as
we know from the callers the slab is available.  Switch pci_64
to also store the result as __iomem pointer, and unmap the result
using iounmap instead of force casting and using vmalloc APIs.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/io.h |  2 +
 arch/powerpc/include/asm/pci-bridge.h |  2 +-
 arch/powerpc/kernel/pci_64.c  | 53 ++-
 drivers/pcmcia/electra_cf.c   | 45 ---
 4 files changed, 54 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 635969b5b58e..71f1c5d69839 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -719,6 +719,8 @@ void __iomem *ioremap_coherent(phys_addr_t address, 
unsigned long size);
 
 extern void iounmap(volatile void __iomem *addr);
 
+void __iomem *ioremap_pbh(phys_addr_t paddr, unsigned long size);
+
 int early_ioremap_range(unsigned long ea, phys_addr_t pa,
unsigned long size, pgprot_t prot);
 void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, unsigned long 
size,
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 69f4cb3b7c56..b92e81b256e5 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -66,7 +66,7 @@ struct pci_controller {
 
void __iomem *io_base_virt;
 #ifdef CONFIG_PPC64
-   void *io_base_alloc;
+   void __iomem *io_base_alloc;
 #endif
resource_size_t io_base_phys;
resource_size_t pci_io_size;
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index f83d1f69b1dd..8e86bd9c1eca 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -109,23 +109,46 @@ int pcibios_unmap_io_space(struct pci_bus *bus)
/* Get the host bridge */
hose = pci_bus_to_host(bus);
 
-   /* Check if we have IOs allocated */
-   if (hose->io_base_alloc == NULL)
-   return 0;
-
pr_debug("IO unmapping for PHB %pOF\n", hose->dn);
pr_debug("  alloc=0x%p\n", hose->io_base_alloc);
 
-   /* This is a PHB, we fully unmap the IO area */
-   vunmap(hose->io_base_alloc);
-
+   iounmap(hose->io_base_alloc);
return 0;
 }
 EXPORT_SYMBOL_GPL(pcibios_unmap_io_space);
 
-static int pcibios_map_phb_io_space(struct pci_controller *hose)
+void __iomem *ioremap_pbh(phys_addr_t paddr, unsigned long size)
 {
struct vm_struct *area;
+   unsigned long addr;
+
+   WARN_ON_ONCE(paddr & ~PAGE_MASK);
+   WARN_ON_ONCE(size & ~PAGE_MASK);
+
+   /*
+* Let's allocate some IO space for that guy. We don't pass VM_IOREMAP
+* because we don't care about alignment tricks that the core does in
+* that case.  Maybe we should due to stupid card with incomplete
+* address decoding but I'd rather not deal with those outside of the
+* reserved 64K legacy region.
+*/
+   area = __get_vm_area(size, 0, PHB_IO_BASE, PHB_IO_END);
+   if (!area)
+   return NULL;
+
+   addr = (unsigned long)area->addr;
+   if (ioremap_page_range(addr, addr + size, paddr,
+   pgprot_noncached(PAGE_KERNEL))) {
+   unmap_kernel_range(addr, size);
+   return NULL;
+   }
+
+   return (void __iomem *)addr;
+}
+EXPORT_SYMBOL_GPL(ioremap_pbh);
+
+static int pcibios_map_phb_io_space(struct pci_controller *hose)
+{
unsigned long phys_page;
unsigned long size_page;
unsigned long io_virt_offset;
@@ -146,12 +169,11 @@ static int pcibios_map_phb_io_space(struct pci_controller 
*hose)
 * with incomplete address decoding but I'd rather not deal with
 * those outside of the reserved 64K legacy region.
 */
-   area = __get_vm_area(size_page, 0, PHB_IO_BASE, PHB_IO_END);
-   if (area == NULL)
+   hose->io_base_alloc = ioremap_pbh(phys_page, size_page);
+   if (!hose->io_base_alloc)
return -ENOMEM;
-   hose->io_base_alloc = area->addr;
-   hose->io_base_virt = (void __iomem *)(area->addr +
- hose->io_base_phys - phys_page);
+   hose->io_base_virt = hose->io_base_alloc +
+   hose->io_base_phys - phys_page;
 
pr_debug("IO mapping for PHB %pOF\n", hose->dn);
pr_debug("  phys=0x%016llx, virt=0x%p (alloc=0x%p)\n",
@@ -159,11 +181,6 @@ static int pcibios_map_phb_io_space(struct pci_controller 
*hose)
pr_debug("  size=0x%016llx (alloc=0x%016lx)\n",
 hose->pci_io_size, size_page);
 
-   /* Establish the mapping */
-   if (__ioremap_at(phys_page, area->addr, size_page,
-

[PATCH 06/28] powerpc: remove __ioremap_at and __iounmap_at

2020-04-08 Thread Christoph Hellwig
These helpers are only used for remapping the ISA I/O base.  Replace
the mapping side with a remap_isa_range helper in isa-bridge.c that
hard codes all the known arguments, and just remove __iounmap_at in
favour of open coding it in the only caller.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/io.h|  8 -
 arch/powerpc/kernel/isa-bridge.c | 28 +-
 arch/powerpc/mm/ioremap_64.c | 50 
 3 files changed, 21 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 71f1c5d69839..4fdbb9e45dd7 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -699,10 +699,6 @@ static inline void iosync(void)
  *
  * * iounmap undoes such a mapping and can be hooked
  *
- * * __ioremap_at (and the pending __iounmap_at) are low level functions to
- *   create hand-made mappings for use only by the PCI code and cannot
- *   currently be hooked. Must be page aligned.
- *
  * * __ioremap_caller is the same as above but takes an explicit caller
  *   reference rather than using __builtin_return_address(0)
  *
@@ -729,10 +725,6 @@ void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t 
offset, unsigned long size,
 extern void __iomem *__ioremap_caller(phys_addr_t, unsigned long size,
  pgprot_t prot, void *caller);
 
-extern void __iomem * __ioremap_at(phys_addr_t pa, void *ea,
-  unsigned long size, pgprot_t prot);
-extern void __iounmap_at(void *ea, unsigned long size);
-
 /*
  * When CONFIG_PPC_INDIRECT_PIO is set, we use the generic iomap implementation
  * which needs some additional definitions here. They basically allow PIO
diff --git a/arch/powerpc/kernel/isa-bridge.c b/arch/powerpc/kernel/isa-bridge.c
index 773671b512df..2257d24e6a26 100644
--- a/arch/powerpc/kernel/isa-bridge.c
+++ b/arch/powerpc/kernel/isa-bridge.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -38,6 +39,22 @@ EXPORT_SYMBOL_GPL(isa_bridge_pcidev);
 #define ISA_SPACE_MASK 0x1
 #define ISA_SPACE_IO 0x1
 
+static void remap_isa_base(phys_addr_t pa, unsigned long size)
+{
+   WARN_ON_ONCE(ISA_IO_BASE & ~PAGE_MASK);
+   WARN_ON_ONCE(pa & ~PAGE_MASK);
+   WARN_ON_ONCE(size & ~PAGE_MASK);
+
+   if (slab_is_available()) {
+   if (ioremap_page_range(ISA_IO_BASE, ISA_IO_BASE + size, pa,
+   pgprot_noncached(PAGE_KERNEL)))
+   unmap_kernel_range(ISA_IO_BASE, size);
+   } else {
+   early_ioremap_range(ISA_IO_BASE, pa, size,
+   pgprot_noncached(PAGE_KERNEL));
+   }
+}
+
 static void pci_process_ISA_OF_ranges(struct device_node *isa_node,
  unsigned long phb_io_base_phys)
 {
@@ -105,15 +122,13 @@ static void pci_process_ISA_OF_ranges(struct device_node 
*isa_node,
if (size > 0x1)
size = 0x1;
 
-   __ioremap_at(phb_io_base_phys, (void *)ISA_IO_BASE,
-size, pgprot_noncached(PAGE_KERNEL));
+   remap_isa_base(phb_io_base_phys, size);
return;
 
 inval_range:
printk(KERN_ERR "no ISA IO ranges or unexpected isa range, "
   "mapping 64k\n");
-   __ioremap_at(phb_io_base_phys, (void *)ISA_IO_BASE,
-0x1, pgprot_noncached(PAGE_KERNEL));
+   remap_isa_base(phb_io_base_phys, 0x1);
 }
 
 
@@ -248,8 +263,7 @@ void __init isa_bridge_init_non_pci(struct device_node *np)
 * and map it
 */
isa_io_base = ISA_IO_BASE;
-   __ioremap_at(pbase, (void *)ISA_IO_BASE,
-size, pgprot_noncached(PAGE_KERNEL));
+   remap_isa_base(pbase, size);
 
pr_debug("ISA: Non-PCI bridge is %pOF\n", np);
 }
@@ -297,7 +311,7 @@ static void isa_bridge_remove(void)
isa_bridge_pcidev = NULL;
 
/* Unmap the ISA area */
-   __iounmap_at((void *)ISA_IO_BASE, 0x1);
+   unmap_kernel_range(ISA_IO_BASE, 0x1);
 }
 
 /**
diff --git a/arch/powerpc/mm/ioremap_64.c b/arch/powerpc/mm/ioremap_64.c
index 50a99d9684f7..ba5cbb0d66bd 100644
--- a/arch/powerpc/mm/ioremap_64.c
+++ b/arch/powerpc/mm/ioremap_64.c
@@ -4,56 +4,6 @@
 #include 
 #include 
 
-/**
- * Low level function to establish the page tables for an IO mapping
- */
-void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, 
pgprot_t prot)
-{
-   int ret;
-   unsigned long va = (unsigned long)ea;
-
-   /* We don't support the 4K PFN hack with ioremap */
-   if (pgprot_val(prot) & H_PAGE_4K_PFN)
-   return NULL;
-
-   if ((ea + size) >= (void *)IOREMAP_END) {
-   pr_warn("Outside the supported range\n");
-   return NULL;
-   }
-
-   WARN_ON(pa & ~PAGE_MASK);
-   WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
-   WARN_ON(size & ~PAGE_MASK);
-
-   if 

[PATCH 04/28] dma-mapping: use vmap insted of reimplementing it

2020-04-08 Thread Christoph Hellwig
Replace the open coded instance of vmap with the actual function.  In
the non-contiguous (IOMMU) case this requires an extra find_vm_area,
but given that this isn't a fast path function that is a small price
to pay.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/remap.c | 48 --
 1 file changed, 12 insertions(+), 36 deletions(-)

diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
index d14cbc83986a..7a8ba60951e8 100644
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -20,23 +20,6 @@ struct page **dma_common_find_pages(void *cpu_addr)
return area->pages;
 }
 
-static struct vm_struct *__dma_common_pages_remap(struct page **pages,
-   size_t size, pgprot_t prot, const void *caller)
-{
-   struct vm_struct *area;
-
-   area = get_vm_area_caller(size, VM_DMA_COHERENT, caller);
-   if (!area)
-   return NULL;
-
-   if (map_vm_area(area, prot, pages)) {
-   vunmap(area->addr);
-   return NULL;
-   }
-
-   return area;
-}
-
 /*
  * Remaps an array of PAGE_SIZE pages into another vm_area.
  * Cannot be used in non-sleeping contexts
@@ -44,15 +27,12 @@ static struct vm_struct *__dma_common_pages_remap(struct 
page **pages,
 void *dma_common_pages_remap(struct page **pages, size_t size,
 pgprot_t prot, const void *caller)
 {
-   struct vm_struct *area;
+   void *vaddr;
 
-   area = __dma_common_pages_remap(pages, size, prot, caller);
-   if (!area)
-   return NULL;
-
-   area->pages = pages;
-
-   return area->addr;
+   vaddr = vmap(pages, count, VM_DMA_COHERENT, prot);
+   if (vaddr)
+   find_vm_area(vaddr)->pages = pages;
+   return vaddr;
 }
 
 /*
@@ -62,24 +42,20 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
 void *dma_common_contiguous_remap(struct page *page, size_t size,
pgprot_t prot, const void *caller)
 {
-   int i;
+   int count = size >> PAGE_SHIFT;
struct page **pages;
-   struct vm_struct *area;
+   void *vaddr;
+   int i;
 
-   pages = kmalloc(sizeof(struct page *) << get_order(size), GFP_KERNEL);
+   pages = kmalloc_array(count, sizeof(struct page *), GFP_KERNEL);
if (!pages)
return NULL;
-
-   for (i = 0; i < (size >> PAGE_SHIFT); i++)
+   for (i = 0; i < count; i++)
pages[i] = nth_page(page, i);
-
-   area = __dma_common_pages_remap(pages, size, prot, caller);
-
+   vaddr = vmap(pages, count, VM_DMA_COHERENT, prot);
kfree(pages);
 
-   if (!area)
-   return NULL;
-   return area->addr;
+   return vaddr;
 }
 
 /*
-- 
2.25.1



[PATCH 03/28] staging: media: ipu3: use vmap insted of reimplementing it

2020-04-08 Thread Christoph Hellwig
Just use vmap instead of messing with vmalloc internals.

Signed-off-by: Christoph Hellwig 
---
 drivers/staging/media/ipu3/ipu3-css-pool.h |  4 +--
 drivers/staging/media/ipu3/ipu3-dmamap.c   | 30 ++
 2 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/media/ipu3/ipu3-css-pool.h 
b/drivers/staging/media/ipu3/ipu3-css-pool.h
index f4a60b41401b..a8ccd4f70320 100644
--- a/drivers/staging/media/ipu3/ipu3-css-pool.h
+++ b/drivers/staging/media/ipu3/ipu3-css-pool.h
@@ -15,14 +15,12 @@ struct imgu_device;
  * @size:  size of the buffer in bytes.
  * @vaddr: kernel virtual address.
  * @daddr: iova dma address to access IPU3.
- * @vma:   private, a pointer to  vm_struct,
- * used for imgu_dmamap_free.
  */
 struct imgu_css_map {
size_t size;
void *vaddr;
dma_addr_t daddr;
-   struct vm_struct *vma;
+   struct page **pages;
 };
 
 /**
diff --git a/drivers/staging/media/ipu3/ipu3-dmamap.c 
b/drivers/staging/media/ipu3/ipu3-dmamap.c
index 7431322379f6..8a19b0024152 100644
--- a/drivers/staging/media/ipu3/ipu3-dmamap.c
+++ b/drivers/staging/media/ipu3/ipu3-dmamap.c
@@ -96,6 +96,7 @@ void *imgu_dmamap_alloc(struct imgu_device *imgu, struct 
imgu_css_map *map,
unsigned long shift = iova_shift(>iova_domain);
struct device *dev = >pci_dev->dev;
size_t size = PAGE_ALIGN(len);
+   int count = size >> PAGE_SHIFT;
struct page **pages;
dma_addr_t iovaddr;
struct iova *iova;
@@ -114,7 +115,7 @@ void *imgu_dmamap_alloc(struct imgu_device *imgu, struct 
imgu_css_map *map,
 
/* Call IOMMU driver to setup pgt */
iovaddr = iova_dma_addr(>iova_domain, iova);
-   for (i = 0; i < size / PAGE_SIZE; ++i) {
+   for (i = 0; i < count; ++i) {
rval = imgu_mmu_map(imgu->mmu, iovaddr,
page_to_phys(pages[i]), PAGE_SIZE);
if (rval)
@@ -123,33 +124,23 @@ void *imgu_dmamap_alloc(struct imgu_device *imgu, struct 
imgu_css_map *map,
iovaddr += PAGE_SIZE;
}
 
-   /* Now grab a virtual region */
-   map->vma = __get_vm_area(size, VM_USERMAP, VMALLOC_START, VMALLOC_END);
-   if (!map->vma)
+   map->vaddr = vmap(pages, count, VM_USERMAP, PAGE_KERNEL);
+   if (!map->vaddr)
goto out_unmap;
 
-   map->vma->pages = pages;
-   /* And map it in KVA */
-   if (map_vm_area(map->vma, PAGE_KERNEL, pages))
-   goto out_vunmap;
-
+   map->pages = pages;
map->size = size;
map->daddr = iova_dma_addr(>iova_domain, iova);
-   map->vaddr = map->vma->addr;
 
dev_dbg(dev, "%s: allocated %zu @ IOVA %pad @ VA %p\n", __func__,
-   size, >daddr, map->vma->addr);
-
-   return map->vma->addr;
+   size, >daddr, map->vaddr);
 
-out_vunmap:
-   vunmap(map->vma->addr);
+   return map->vaddr;
 
 out_unmap:
imgu_dmamap_free_buffer(pages, size);
imgu_mmu_unmap(imgu->mmu, iova_dma_addr(>iova_domain, iova),
   i * PAGE_SIZE);
-   map->vma = NULL;
 
 out_free_iova:
__free_iova(>iova_domain, iova);
@@ -177,8 +168,6 @@ void imgu_dmamap_unmap(struct imgu_device *imgu, struct 
imgu_css_map *map)
  */
 void imgu_dmamap_free(struct imgu_device *imgu, struct imgu_css_map *map)
 {
-   struct vm_struct *area = map->vma;
-
dev_dbg(>pci_dev->dev, "%s: freeing %zu @ IOVA %pad @ VA %p\n",
__func__, map->size, >daddr, map->vaddr);
 
@@ -187,11 +176,8 @@ void imgu_dmamap_free(struct imgu_device *imgu, struct 
imgu_css_map *map)
 
imgu_dmamap_unmap(imgu, map);
 
-   if (WARN_ON(!area) || WARN_ON(!area->pages))
-   return;
-
-   imgu_dmamap_free_buffer(area->pages, map->size);
vunmap(map->vaddr);
+   imgu_dmamap_free_buffer(map->pages, map->size);
map->vaddr = NULL;
 }
 
-- 
2.25.1



[PATCH 02/28] staging: android: ion: use vmap instead of vm_map_ram

2020-04-08 Thread Christoph Hellwig
vm_map_ram can keep mappings around after the vm_unmap_ram.  Using that
with non-PAGE_KERNEL mappings can lead to all kinds of aliasing issues.

Signed-off-by: Christoph Hellwig 
---
 drivers/staging/android/ion/ion_heap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/ion/ion_heap.c 
b/drivers/staging/android/ion/ion_heap.c
index 473b465724f1..a2d5c6df4b96 100644
--- a/drivers/staging/android/ion/ion_heap.c
+++ b/drivers/staging/android/ion/ion_heap.c
@@ -99,12 +99,12 @@ int ion_heap_map_user(struct ion_heap *heap, struct 
ion_buffer *buffer,
 
 static int ion_heap_clear_pages(struct page **pages, int num, pgprot_t pgprot)
 {
-   void *addr = vm_map_ram(pages, num, -1, pgprot);
+   void *addr = vmap(pages, num, VM_MAP);
 
if (!addr)
return -ENOMEM;
memset(addr, 0, PAGE_SIZE * num);
-   vm_unmap_ram(addr, num);
+   vunmap(addr);
 
return 0;
 }
-- 
2.25.1



[PATCH 01/28] x86/hyperv: use vmalloc_exec for the hypercall page

2020-04-08 Thread Christoph Hellwig
Use the designated helper for allocating executable kernel memory, and
remove the now unused PAGE_KERNEL_RX define.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/hyperv/hv_init.c| 2 +-
 arch/x86/include/asm/pgtable_types.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b0da5320bcff..5a4b363ba67b 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -355,7 +355,7 @@ void __init hyperv_init(void)
guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-   hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
+   hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
if (hv_hypercall_pg == NULL) {
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
goto remove_cpuhp_state;
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index b6606fe6cfdf..947867f112ea 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -194,7 +194,6 @@ enum page_cache_mode {
 #define _PAGE_TABLE_NOENC   (__PP|__RW|_USR|___A|   0|___D|   0|   0)
 #define _PAGE_TABLE (__PP|__RW|_USR|___A|   0|___D|   0|   0| _ENC)
 #define __PAGE_KERNEL_RO(__PP|   0|   0|___A|__NX|___D|   0|___G)
-#define __PAGE_KERNEL_RX(__PP|   0|   0|___A|   0|___D|   0|___G)
 #define __PAGE_KERNEL_NOCACHE   (__PP|__RW|   0|___A|__NX|___D|   0|___G| __NC)
 #define __PAGE_KERNEL_VVAR  (__PP|   0|_USR|___A|__NX|___D|   0|___G)
 #define __PAGE_KERNEL_LARGE (__PP|__RW|   0|___A|__NX|___D|_PSE|___G)
@@ -220,7 +219,6 @@ enum page_cache_mode {
 #define PAGE_KERNEL_RO __pgprot_mask(__PAGE_KERNEL_RO | _ENC)
 #define PAGE_KERNEL_EXEC   __pgprot_mask(__PAGE_KERNEL_EXEC   | _ENC)
 #define PAGE_KERNEL_EXEC_NOENC __pgprot_mask(__PAGE_KERNEL_EXEC   |0)
-#define PAGE_KERNEL_RX __pgprot_mask(__PAGE_KERNEL_RX | _ENC)
 #define PAGE_KERNEL_NOCACHE__pgprot_mask(__PAGE_KERNEL_NOCACHE| _ENC)
 #define PAGE_KERNEL_LARGE  __pgprot_mask(__PAGE_KERNEL_LARGE  | _ENC)
 #define PAGE_KERNEL_LARGE_EXEC __pgprot_mask(__PAGE_KERNEL_LARGE_EXEC | _ENC)
-- 
2.25.1



decruft the vmalloc API

2020-04-08 Thread Christoph Hellwig
Hi all,

Peter noticed that with some dumb luck you can toast the kernel address
space with exported vmalloc symbols.

I used this as an opportunity to decruft the vmalloc.c API and make it
much more systematic.  This also removes any chance to create vmalloc
mappings outside the designated areas or using executable permissions
from modules.  Besides that it removes more than 300 lines of code.

A git tree is also available here:

git://git.infradead.org/users/hch/misc.git sanitize-vmalloc-api

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/sanitize-vmalloc-api


Re: [PATCH v2 0/2] Don't generate thousands of new warnings when building docs

2020-04-08 Thread Mauro Carvalho Chehab
Em Tue, 07 Apr 2020 13:46:23 +1000
Michael Ellerman  escreveu:

> Mauro Carvalho Chehab  writes:
> > This small series address a regression caused by a new patch at
> > docs-next (and at linux-next).
> >

...

> > This solves almost all problems we have. Still, there are a few places
> > where we have two chapters at the same document with the
> > same name. The first patch addresses this problem.  
> 
> I'm still seeing a lot of warnings. Am I doing something wrong?
> 
> cheers
> 
> /linux/Documentation/powerpc/cxl.rst:406: WARNING: duplicate label 
> powerpc/cxl:open, other instance in /linux/Documentation/powerpc/cxl.rst
...
> /linux/Documentation/powerpc/syscall64-abi.rst:86: WARNING: duplicate label 
> powerpc/syscall64-abi:parameters and return value, other instance in 
> /linux/Documentation/powerpc/syscall64-abi.rst
...
> /linux/Documentation/powerpc/ultravisor.rst:339: WARNING: duplicate label 
> powerpc/ultravisor:syntax, other instance in 
> /linux/Documentation/powerpc/ultravisor.rst
...

I can't reproduce your issue here at linux-next (+ my pending doc patches).

So, I can only provide you some hints.

If you see the logs you posted, all of them are related to duplicated
labels inside the same file.

-

The new Sphinx module we're using (sphinx.ext.autosectionlabel) generates
references for two levels, within the same document file (after this patch).


Looking at the first document (at linux-next version), it has:

1) A first level document title:

   Coherent Accelerator Interface (CXL)

2) Several second level titles:

   Introduction
   Hardware overview
   AFU Modes
   MMIO space
   Interrupts
   Work Element Descriptor (WED)
   User API
   Sysfs Class
   Udev rules

Right now, there's no duplication, but if someone adds, for example, 
another first-level or second-level title called "Interrupts", then 
the file will produce a duplicated label and Sphinx will warn.

The same would happen if someone adds another title (either first
level or second level) called "Coherent Accelerator Interface (CXL)",
as this will conflict with the document title.

-

Now, if the title "Coherent Accelerator Interface (CXL)" got removed,
then "Introduction".."Udev rules" will become first level titles.

Then, the sections at the "User API": "open", "ioctl"... will become
second level titles and it will produce lots of warnings.

-

That's said, IMHO, this document needs section titles for the two
sections under "User API". Adding it would allow removing the document
title. See enclosed.

Thanks,
Mauro

powerpc: docs: cxl.rst: mark two section titles as such

The User API chapter contains two sub-chapters. Mark them as
such.

Signed-off-by: Mauro Carvalho Chehab 


diff --git a/Documentation/powerpc/cxl.rst b/Documentation/powerpc/cxl.rst
index 920546d81326..d2d77057610e 100644
--- a/Documentation/powerpc/cxl.rst
+++ b/Documentation/powerpc/cxl.rst
@@ -133,6 +133,7 @@ User API
 
 
 1. AFU character devices
+
 
 For AFUs operating in AFU directed mode, two character device
 files will be created. /dev/cxl/afu0.0m will correspond to a
@@ -395,6 +396,7 @@ read
 
 
 2. Card character device (powerVM guest only)
+^
 
 In a powerVM guest, an extra character device is created for the
 card. The device is only used to write (flash) a new image on the



Re: [PATCH v2] powerpc/ptrace: Do not return ENOSYS if invalid syscall

2020-04-08 Thread Michael Ellerman
Hi Cascardo,

Thanks for following-up on this.

Unfortunately I don't think I can merge this fix.

Thadeu Lima de Souza Cascardo  writes:
> If a tracer sets the syscall number to an invalid one, allow the return
> value set by the tracer to be returned the tracee.

The problem is this patch not only *allows* the tracer to set the return
value, but it also *requires* the tracer to set the return value. That
would be a change to the ABI.

Currently if a tracer sets the syscall number to -1, that's all they
need to do, and the kernel will make sure ENOSYS is returned to the
tracee.

With this patch applied the tracer can set the syscall to -1 but they
also must set the return value explicitly. Otherwise the syscall will
just return with whatever value happens to be in r3.

I confirmed this patch breaks the strace testsuite:

  # cd strace/tests/
  # bash qual_inject-retval.test
  ../../strace: Failed to tamper with process 13301: unexpectedly got no error 
(return value 0x10001090, error 0)
  expected retval 0, got retval 268439696
  chdir("..") = 268439696 (INJECTED)
  +++ exited with 1 +++
  qual_inject-retval.test: failed test: ../../strace -a12 -echdir 
-einject=chdir:retval=0 ../qual_inject-retval 0 failed with code 1

The return value 0x10001090 is the address of the ".." string passed to
the syscall.

> The test for NR_syscalls is already at entry_64.S, and it's at
> do_syscall_trace_enter only to skip audit and trace.
>
> After this, two failures from seccomp_bpf selftests complete just fine,
> as the failing test was using ptrace to change the syscall to return an
> error or a fake value, but were failing as it was always returning
> -ENOSYS.

This test wants to change the syscall number and the return value, and
do both from the syscall enter hook.

We don't support that, because we have no way of knowing if the tracer
set the return value, so we always return ENOSYS. Our ptrace ABI has
been that way forever.

We could possibly do something like compare r3 and orig_gpr3 and assume
that if they're different then the tracer has set r3 to the return
value. But I worry that will break something and/or just be very subtle
and bug prone.

I think the right way to fix it is for the test case to change the
return value from the syscall exit hook. That will work on all existing
kernels AFAIK. It's also what strace does.

cheers


> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 25c0424e8868..557ae4bc2331 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -3314,7 +3314,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
> 
>   /* Avoid trace and audit when syscall is invalid. */
>   if (regs->gpr[0] >= NR_syscalls)
> - goto skip;
> + return regs->gpr[0];
> 
>   if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
>   trace_sys_enter(regs, regs->gpr[0]);


[PATCH] powerpc/powernv: Add a print indicating when an IODA PE is released

2020-04-08 Thread Oliver O'Halloran
Quite useful to know in some cases.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 3d81c01..82e5098 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3475,6 +3475,8 @@ static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
struct pnv_phb *phb = pe->phb;
struct pnv_ioda_pe *slave, *tmp;
 
+   pe_info(pe, "Releasing PE\n");
+
mutex_lock(>ioda.pe_list_mutex);
list_del(>list);
mutex_unlock(>ioda.pe_list_mutex);
-- 
2.9.5



  1   2   >