Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-24 Thread Jon Masters

On 1/7/21 1:09 PM, Florian Fainelli wrote:

On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:

On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:

Hi Greg and Konrad,

This change is intended to be non-arch specific. Any arch that lacks DMA access
control and has devices not behind an IOMMU can make use of it. Could you share
why you think this should be arch specific?


The idea behind non-arch specific code is it to be generic. The devicetree
is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
be in arch specific code.


In premise the same code could be used with an ACPI enabled system with
an appropriate service to identify the restricted DMA regions and unlock
them.

More than 1 architecture requiring this function (ARM and ARM64 are the
two I can think of needing this immediately) sort of calls for making
the code architecture agnostic since past 2, you need something that scales.

There is already code today under kernel/dma/contiguous.c that is only
activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
no different.




Just a note for history/archives that this approach would not be 
appropriate on general purpose Arm systems, such as SystemReady-ES 
edge/non-server platforms seeking to run general purpose distros. I want 
to have that in the record before someone at Arm (or NVidia, or a bunch 
of others that come to mind who have memory firewalls) gets an idea.


If you're working at an Arm vendor and come looking at this later 
thinking "wow, what a great idea!", please fix your hardware to have a 
real IOMMU/SMMU and real PCIe. You'll be pointed at this reply.


Jon.

--
Computer Architect


Re: [PATCH] lib/sstep: Fix incorrect return from analyze_instr()

2021-01-24 Thread Ananth N Mavinakayanahalli

On 1/23/21 6:03 AM, Michael Ellerman wrote:

Ananth N Mavinakayanahalli  writes:

We currently just percolate the return value from analyze_instr()
to the caller of emulate_step(), especially if it is a -1.

For one particular case (opcode = 4) for instructions that
aren't currently emulated, we are returning 'should not be
single-stepped' while we should have returned 0 which says
'did not emulate, may have to single-step'.

Signed-off-by: Ananth N Mavinakayanahalli 
Tested-by: Naveen N. Rao 
---
  arch/powerpc/lib/sstep.c |   49 +-
  1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 5a425a4a1d88..a3a0373843cd 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1445,34 +1445,39 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
  
  #ifdef __powerpc64__

case 4:
-   if (!cpu_has_feature(CPU_FTR_ARCH_300))
-   return -1;
-
-   switch (word & 0x3f) {
-   case 48:/* maddhd */
-   asm volatile(PPC_MADDHD(%0, %1, %2, %3) :
-"=r" (op->val) : "r" (regs->gpr[ra]),
-"r" (regs->gpr[rb]), "r" (regs->gpr[rc]));
-   goto compute_done;
+   /*
+* There are very many instructions with this primary opcode
+* introduced in the ISA as early as v2.03. However, the ones
+* we currently emulate were all introduced with ISA 3.0
+*/
+   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   switch (word & 0x3f) {
+   case 48:/* maddhd */
+   asm volatile(PPC_MADDHD(%0, %1, %2, %3) :
+"=r" (op->val) : "r" 
(regs->gpr[ra]),
+"r" (regs->gpr[rb]), "r" 
(regs->gpr[rc]));
+   goto compute_done;


Indenting everything makes this patch harder to read, and I think makes
the resulting code harder to read too. We already have two levels of
switch here, and we're inside a ~1700 line function, so keeping things
simple is important I think.

Doesn't this achieve the same result?

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index bf7a7d62ae8b..d631baaf1da2 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1443,8 +1443,10 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
  
  #ifdef __powerpc64__

case 4:
-   if (!cpu_has_feature(CPU_FTR_ARCH_300))
-   return -1;
+   if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+   op->type = UNKNOWN;
+   return 0;
+   }
  
  		switch (word & 0x3f) {

case 48:/* maddhd */
@@ -1470,7 +1472,8 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
 * There are other instructions from ISA 3.0 with the same
 * primary opcode which do not have emulation support yet.
 */
-   return -1;
+   op->type = UNKNOWN;
+   return 0;
  #endif
  
  	case 7:		/* mulli */




Looks good to me.

Acked-by: Ananth N Mavinakayanahalli 


--
Ananth


[powerpc:merge] BUILD SUCCESS 44158b256b30415079588d0fcb1bccbdc2ccd009

2021-01-24 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: 44158b256b30415079588d0fcb1bccbdc2ccd009  Automatic merge of 
'fixes' into merge (2021-01-24 09:52)

elapsed time: 954m

configs tested: 140
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
mips tb0287_defconfig
mips mpc30x_defconfig
arm   h5000_defconfig
sh  rsk7264_defconfig
powerpc linkstation_defconfig
arm  pxa255-idp_defconfig
arm am200epdkit_defconfig
mips  pistachio_defconfig
xtensa  cadence_csp_defconfig
powerpc  chrp32_defconfig
arm mxs_defconfig
mips cu1000-neo_defconfig
powerpc tqm8560_defconfig
powerpc64alldefconfig
shsh7757lcr_defconfig
sh  kfr2r09_defconfig
arm   cns3420vb_defconfig
powerpc ppa8548_defconfig
m68k  multi_defconfig
sh  rts7751r2d1_defconfig
mips tb0219_defconfig
mips   ip27_defconfig
m68k apollo_defconfig
arcnsimosci_defconfig
powerpc  mpc885_ads_defconfig
s390  debug_defconfig
arm  iop32x_defconfig
arm  tango4_defconfig
mipsnlm_xlr_defconfig
arm  pxa3xx_defconfig
arm hackkit_defconfig
arm  pcm027_defconfig
shshmin_defconfig
powerpc mpc512x_defconfig
arm  integrator_defconfig
h8300   h8s-sim_defconfig
powerpc  mgcoge_defconfig
arm   aspeed_g5_defconfig
sh sh7710voipgw_defconfig
arm  imote2_defconfig
mips loongson1b_defconfig
armdove_defconfig
armmps2_defconfig
sh   rts7751r2dplus_defconfig
mipsworkpad_defconfig
powerpc  walnut_defconfig
arm   sama5_defconfig
mips  ath79_defconfig
sh   se7751_defconfig
mips bigsur_defconfig
csky alldefconfig
arm  pxa168_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386   tinyconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a001-20210124
i386 randconfig-a002-20210124
i386 randconfig-a003-20210124
i386 randconfig-a001-20210125
i386 randconfig-a002-20210125
i386 randconfig-a004-20210125
i386 randconfig-a006-20210125
i386 randconfig-a005-20210125
i386 randconfig-a003-20210125
i386 randconfig-a004-20210124
i386

[powerpc:fixes-test] BUILD SUCCESS 4025c784c573cab7e3f84746cc82b8033923ec62

2021-01-24 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
fixes-test
branch HEAD: 4025c784c573cab7e3f84746cc82b8033923ec62  powerpc/64s: prevent 
recursive replay_soft_interrupts causing superfluous interrupt

elapsed time: 956m

configs tested: 147
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
mips tb0287_defconfig
mips mpc30x_defconfig
arm   h5000_defconfig
sh  rsk7264_defconfig
powerpc linkstation_defconfig
arm  pxa255-idp_defconfig
arm am200epdkit_defconfig
mips  pistachio_defconfig
xtensa  cadence_csp_defconfig
arm   aspeed_g5_defconfig
powerpc mpc832x_mds_defconfig
arm  pcm027_defconfig
mipsqi_lb60_defconfig
mips  decstation_64_defconfig
powerpc  chrp32_defconfig
arm mxs_defconfig
mips cu1000-neo_defconfig
powerpc tqm8560_defconfig
powerpc64alldefconfig
shsh7757lcr_defconfig
sh  kfr2r09_defconfig
arm   cns3420vb_defconfig
powerpc ppa8548_defconfig
m68k  multi_defconfig
sh  rts7751r2d1_defconfig
mips tb0219_defconfig
mips   ip27_defconfig
m68k apollo_defconfig
arcnsimosci_defconfig
powerpc  mpc885_ads_defconfig
s390  debug_defconfig
arm  iop32x_defconfig
arm  tango4_defconfig
mipsnlm_xlr_defconfig
arm  pxa3xx_defconfig
arm hackkit_defconfig
shshmin_defconfig
powerpc mpc512x_defconfig
arm  integrator_defconfig
arm cm_x300_defconfig
powerpc mpc8540_ads_defconfig
sh  r7785rp_defconfig
arm   sunxi_defconfig
h8300   h8s-sim_defconfig
powerpc  mgcoge_defconfig
sh sh7710voipgw_defconfig
arm  imote2_defconfig
mips loongson1b_defconfig
armdove_defconfig
armmps2_defconfig
sh   rts7751r2dplus_defconfig
mipsworkpad_defconfig
powerpc  walnut_defconfig
arm   sama5_defconfig
mips  ath79_defconfig
sh   se7751_defconfig
mips bigsur_defconfig
csky alldefconfig
arm  pxa168_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386   tinyconfig
i386defconfig
mips allmodconfig
mips allyesconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a001-20210124
i386 randconfig-a002-20210124
i386 randconfig

Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-24 Thread Nicholas Piggin
Excerpts from Christoph Hellwig's message of January 25, 2021 1:07 am:
> On Sun, Jan 24, 2021 at 06:22:29PM +1000, Nicholas Piggin wrote:
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index 24862d15f3a3..f87feb616184 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -724,6 +724,16 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>  config HAVE_ARCH_HUGE_VMAP
>>  bool
>>  
>> +config HAVE_ARCH_HUGE_VMALLOC
>> +depends on HAVE_ARCH_HUGE_VMAP
>> +bool
>> +help
>> +  Archs that select this would be capable of PMD-sized vmaps (i.e.,
>> +  arch_vmap_pmd_supported() returns true), and they must make no
>> +  assumptions that vmalloc memory is mapped with PAGE_SIZE ptes. The
>> +  VM_NOHUGE flag can be used to prohibit arch-specific allocations from
>> +  using hugepages to help with this (e.g., modules may require it).
> 
> help texts don't make sense for options that aren't user visible.

Yeah it was supposed to just be a comment but if it was user visible 
then similar kind of thing would not make sense in help text, so I'll
just turn it into a real comment as per Randy's suggestion.

> More importantly, is there any good reason to keep the option and not
> just go the extra step and enable huge page vmalloc for arm64 and x86
> as well?

Yes they need to ensure they exclude vmallocs that can't be huge one
way or another (VM_ flag or prot arg).

After they're converted we can fold it into HUGE_VMAP.

>> +static inline bool is_vm_area_hugepages(const void *addr)
>> +{
>> +/*
>> + * This may not 100% tell if the area is mapped with > PAGE_SIZE
>> + * page table entries, if for some reason the architecture indicates
>> + * larger sizes are available but decides not to use them, nothing
>> + * prevents that. This only indicates the size of the physical page
>> + * allocated in the vmalloc layer.
>> + */
>> +return (find_vm_area(addr)->page_order > 0);
> 
> No need for the braces here.
> 
>>  }
>>  
>> +static int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
>> +pgprot_t prot, struct page **pages, unsigned int page_shift)
>> +{
>> +unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
>> +
>> +WARN_ON(page_shift < PAGE_SHIFT);
>> +
>> +if (page_shift == PAGE_SHIFT)
>> +return vmap_small_pages_range_noflush(addr, end, prot, pages);
> 
> This begs for a IS_ENABLED check to disable the hugepage code for
> architectures that don't need it.

Yeah good point.

>> +int map_kernel_range_noflush(unsigned long addr, unsigned long size,
>> + pgprot_t prot, struct page **pages)
>> +{
>> +return vmap_pages_range_noflush(addr, addr + size, prot, pages, 
>> PAGE_SHIFT);
>> +}
> 
> Please just kill off map_kernel_range_noflush and map_kernel_range
> off entirely in favor of the vmap versions.

I can do a cleanup patch on top of it.

>> +for (i = 0; i < area->nr_pages; i += 1U << area->page_order) {
> 
> Maybe using a helper that takes the vm_area_struct and either returns
> area->page_order or always 0 based on IS_ENABLED?

I'll see how it looks.

Thanks,
Nick


Re: [PATCH v4 2/2] powerpc/mce: Remove per cpu variables from MCE handlers

2021-01-24 Thread kernel test robot
Hi Ganesh,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.11-rc4 next-20210122]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Ganesh-Goudar/powerpc-mce-Reduce-the-size-of-event-arrays/20210124-191230
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r005-20210124 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
bd3a387ee76f58caa0d7901f3f84e9bb3d006f27)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# 
https://github.com/0day-ci/linux/commit/fab6401db419da33d1757ebf519f030ab758ae7a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Ganesh-Goudar/powerpc-mce-Reduce-the-size-of-event-arrays/20210124-191230
git checkout fab6401db419da33d1757ebf519f030ab758ae7a
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> arch/powerpc/kernel/setup-common.c:940:2: error: implicit declaration of 
>> function 'mce_init' [-Werror,-Wimplicit-function-declaration]
   mce_init();
   ^
   1 error generated.


vim +/mce_init +940 arch/powerpc/kernel/setup-common.c

   847  
   848  /*
   849   * Called into from start_kernel this initializes memblock, which is 
used
   850   * to manage page allocation until mem_init is called.
   851   */
   852  void __init setup_arch(char **cmdline_p)
   853  {
   854  kasan_init();
   855  
   856  *cmdline_p = boot_command_line;
   857  
   858  /* Set a half-reasonable default so udelay does something 
sensible */
   859  loops_per_jiffy = 5 / HZ;
   860  
   861  /* Unflatten the device-tree passed by prom_init or kexec */
   862  unflatten_device_tree();
   863  
   864  /*
   865   * Initialize cache line/block info from device-tree (on ppc64) 
or
   866   * just cputable (on ppc32).
   867   */
   868  initialize_cache_info();
   869  
   870  /* Initialize RTAS if available. */
   871  rtas_initialize();
   872  
   873  /* Check if we have an initrd provided via the device-tree. */
   874  check_for_initrd();
   875  
   876  /* Probe the machine type, establish ppc_md. */
   877  probe_machine();
   878  
   879  /* Setup panic notifier if requested by the platform. */
   880  setup_panic();
   881  
   882  /*
   883   * Configure ppc_md.power_save (ppc32 only, 64-bit machines do
   884   * it from their respective probe() function.
   885   */
   886  setup_power_save();
   887  
   888  /* Discover standard serial ports. */
   889  find_legacy_serial_ports();
   890  
   891  /* Register early console with the printk subsystem. */
   892  register_early_udbg_console();
   893  
   894  /* Setup the various CPU maps based on the device-tree. */
   895  smp_setup_cpu_maps();
   896  
   897  /* Initialize xmon. */
   898  xmon_setup();
   899  
   900  /* Check the SMT related command line arguments (ppc64). */
   901  check_smt_enabled();
   902  
   903  /* Parse memory topology */
   904  mem_topology_setup();
   905  
   906  /*
   907   * Release secondary cpus out of their spinloops at 0x60 now 
that
   908   * we can map physical -> logical CPU ids.
   909   *
   910   * Freescale Book3e parts spin in a loop provided by firmware,
   911   * so smp_release_cpus() does nothing for them.
   912   */
   913  #ifdef CONFIG_SMP
   914  smp_setup_pacas();
   915  
   916  /* On BookE, setup per-core TLB data structures. */
   917  setup_tlb_core_data();
   918  #endif
   919  /* Print various info about the machine that has been gathered 
so far. */
   920  print_system_info();
   921  
   922  /* Reserve large chunks of memory for use by CMA for KVM. */
   923  kvm_cma_reserve();
   924  
   925  /*  Reserve large chunks of memory for us by CMA for hugetlb */
   926  gigantic_hugetlb_cma_reserve();
   927  
   928  klp_init_threa

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.11-5 tag

2021-01-24 Thread pr-tracker-bot
The pull request you sent on Sun, 24 Jan 2021 23:15:52 +1100:

> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> tags/powerpc-5.11-5

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/14c50a66183856672d822f25dbb73ad26d1e8f11

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-24 Thread Randy Dunlap
On 1/24/21 7:07 AM, Christoph Hellwig wrote:
>> +config HAVE_ARCH_HUGE_VMALLOC
>> +depends on HAVE_ARCH_HUGE_VMAP
>> +bool
>> +help
>> +  Archs that select this would be capable of PMD-sized vmaps (i.e.,
>> +  arch_vmap_pmd_supported() returns true), and they must make no
>> +  assumptions that vmalloc memory is mapped with PAGE_SIZE ptes. The
>> +  VM_NOHUGE flag can be used to prohibit arch-specific allocations from
>> +  using hugepages to help with this (e.g., modules may require it).
> help texts don't make sense for options that aren't user visible.

It's good that the Kconfig symbol is documented and it's better here
than having to dig thru git commit logs IMO.

It could be done as "# Arhcs that select" style comments instead
of Kconfig help text.


-- 
~Randy



Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-24 Thread Christoph Hellwig
On Sun, Jan 24, 2021 at 06:22:29PM +1000, Nicholas Piggin wrote:
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..f87feb616184 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -724,6 +724,16 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>  config HAVE_ARCH_HUGE_VMAP
>   bool
>  
> +config HAVE_ARCH_HUGE_VMALLOC
> + depends on HAVE_ARCH_HUGE_VMAP
> + bool
> + help
> +   Archs that select this would be capable of PMD-sized vmaps (i.e.,
> +   arch_vmap_pmd_supported() returns true), and they must make no
> +   assumptions that vmalloc memory is mapped with PAGE_SIZE ptes. The
> +   VM_NOHUGE flag can be used to prohibit arch-specific allocations from
> +   using hugepages to help with this (e.g., modules may require it).

help texts don't make sense for options that aren't user visible.

More importantly, is there any good reason to keep the option and not
just go the extra step and enable huge page vmalloc for arm64 and x86
as well?

> +static inline bool is_vm_area_hugepages(const void *addr)
> +{
> + /*
> +  * This may not 100% tell if the area is mapped with > PAGE_SIZE
> +  * page table entries, if for some reason the architecture indicates
> +  * larger sizes are available but decides not to use them, nothing
> +  * prevents that. This only indicates the size of the physical page
> +  * allocated in the vmalloc layer.
> +  */
> + return (find_vm_area(addr)->page_order > 0);

No need for the braces here.

>  }
>  
> +static int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> + pgprot_t prot, struct page **pages, unsigned int page_shift)
> +{
> + unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
> +
> + WARN_ON(page_shift < PAGE_SHIFT);
> +
> + if (page_shift == PAGE_SHIFT)
> + return vmap_small_pages_range_noflush(addr, end, prot, pages);

This begs for a IS_ENABLED check to disable the hugepage code for
architectures that don't need it.

> +int map_kernel_range_noflush(unsigned long addr, unsigned long size,
> +  pgprot_t prot, struct page **pages)
> +{
> + return vmap_pages_range_noflush(addr, addr + size, prot, pages, 
> PAGE_SHIFT);
> +}

Please just kill off map_kernel_range_noflush and map_kernel_range
off entirely in favor of the vmap versions.

> + for (i = 0; i < area->nr_pages; i += 1U << area->page_order) {

Maybe using a helper that takes the vm_area_struct and either returns
area->page_order or always 0 based on IS_ENABLED?


Re: [PATCH v10 10/12] mm/vmalloc: add vmap_range_noflush variant

2021-01-24 Thread Christoph Hellwig
On Sun, Jan 24, 2021 at 06:22:28PM +1000, Nicholas Piggin wrote:
> As a side-effect, the order of flush_cache_vmap() and
> arch_sync_kernel_mappings() calls are switched, but that now matches
> the other callers in this file.
> 
> Signed-off-by: Nicholas Piggin 

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH v10 09/12] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c

2021-01-24 Thread Christoph Hellwig
On Sun, Jan 24, 2021 at 06:22:27PM +1000, Nicholas Piggin wrote:
> This is a generic kernel virtual memory mapper, not specific to ioremap.

Looks good:

Reviewed-by: Christoph Hellwig 

Although it would be nice if you could fix up the > 80 lines while
you're at it.


Re: [PATCH v10 05/12] mm: HUGE_VMAP arch support cleanup

2021-01-24 Thread Nicholas Piggin
Excerpts from Christoph Hellwig's message of January 24, 2021 9:40 pm:
>> diff --git a/arch/arm64/include/asm/vmalloc.h 
>> b/arch/arm64/include/asm/vmalloc.h
>> index 2ca708ab9b20..597b40405319 100644
>> --- a/arch/arm64/include/asm/vmalloc.h
>> +++ b/arch/arm64/include/asm/vmalloc.h
>> @@ -1,4 +1,12 @@
>>  #ifndef _ASM_ARM64_VMALLOC_H
>>  #define _ASM_ARM64_VMALLOC_H
>>  
>> +#include 
>> +
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> +bool arch_vmap_p4d_supported(pgprot_t prot);
>> +bool arch_vmap_pud_supported(pgprot_t prot);
>> +bool arch_vmap_pmd_supported(pgprot_t prot);
>> +#endif
> 
> Shouldn't the be inlines or macros?  Also it would be useful
> if the architectures would not have to override all functions
> but just those that are it actually implements?

It gets better in the next patches. I did it this way again to avoid 
moving a lot of code at the same time as changing name / prototype
slightly.

I didn't see individual generic fallbacks being all that useful really 
at this scale. I don't mind keeping the explicit false.

> Also lots of > 80 char lines in the patch.

Yeah there's a few, I can reduce those.

Thanks,
Nick


[GIT PULL] Please pull powerpc/linux.git powerpc-5.11-5 tag

2021-01-24 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull some more powerpc fixes for 5.11:

The following changes since commit 41131a5e54ae7ba5a2bb8d7b30d1818b3f5b13d2:

  powerpc/vdso: Fix clock_gettime_fallback for vdso32 (2021-01-14 15:56:44 
+1100)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.11-5

for you to fetch changes up to 08685be7761d69914f08c3d6211c543a385a5b9c:

  powerpc/64s: fix scv entry fallback flush vs interrupt (2021-01-20 15:58:19 
+1100)

- --
powerpc fixes for 5.11 #5

Fix a bad interaction between the scv handling and the fallback L1D flush, which
could lead to user register corruption. Only affects people using scv (~no one)
on machines with old firmware that are missing the L1D flush.

Two small selftest fixes.

Thanks to Eirik Fuller, Libor Pechacek, Nicholas Piggin, Sandipan Das, Tulio
Magno Quites Machado Filho.

- --
Michael Ellerman (1):
  selftests/powerpc: Only test lwm/stmw on big endian

Nicholas Piggin (1):
  powerpc/64s: fix scv entry fallback flush vs interrupt

Sandipan Das (1):
  selftests/powerpc: Fix exit status of pkey tests


 arch/powerpc/include/asm/exception-64s.h  | 13 +++
 arch/powerpc/include/asm/feature-fixups.h | 10 
 arch/powerpc/kernel/entry_64.S|  2 +-
 arch/powerpc/kernel/exceptions-64s.S  | 19 

 arch/powerpc/kernel/vmlinux.lds.S |  7 ++
 arch/powerpc/lib/feature-fixups.c | 24 
+---
 tools/testing/selftests/powerpc/alignment/alignment_handler.c |  5 +++-
 tools/testing/selftests/powerpc/mm/pkey_exec_prot.c   |  2 +-
 tools/testing/selftests/powerpc/mm/pkey_siginfo.c |  2 +-
 9 files changed, 77 insertions(+), 7 deletions(-)
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmANYaoACgkQUevqPMjh
pYDyFRAAqwsxxbbCe+AlggURQi7nap5JL4qHV0bEYPR34IEIPs9blDOb5ECQNbNt
fbxDK9y3ij5ceETsdzM6d3gkocBo/O8JMa9scfmHNFpQLWQk013MUg3YJQnycDkE
vpmaXPMdkcZv82VXdYe4DonhlS3FBTpbL1jPVZn6KIJGpiWfuS7vgptLeBqtMMZz
Mz4lAkzMKbSw/NmKe+Iq3Rc8zsw4C6gXPIhkNsD32s5U+lVMKLpFpxtwhxcGFxDy
sTUBWXJn+mW4+XJVNHQOvLN3gTPNgEcg2xoKkQiwB5/y+GKgPco24Ep6bUalYfNG
dViUAEgzpyhwTfkBxwwV8bpxSaw9HAQRjVC18QJ7sLM+ogHEJm7ejipAOmAfAzuf
+BwQgkSZ2I/peJJDNvVjC3vRIDl29LEA73ZORcp4ynDP/cKuhgvaYBTPCVCzcc0r
+bPXFEfS0OofLBkLekHIdSRfCLQjmQF/TB3CVkDAlDKjiMwTJk/khTn0+0RD6DRK
i/iBkCXjOBuizXkIzRUAit6YMMoO6Yt/nuyrPhDetBFpMPmZgAuLZCs1UI+qUR/L
lS4jOSUQnZqLXsDJqT7uUIdaWZPODdV1U8XEl1+C9xAZ5A4Juy9fFr2K91OtBa2e
/45tUCpDCmtt5aXZXWgwghJeQteBI0Ng5U4NH0asH2W8oVDFyRM=
=f+xY
-END PGP SIGNATURE-


Re: [PATCH v10 04/12] mm/ioremap: rename ioremap_*_range to vmap_*_range

2021-01-24 Thread Nicholas Piggin
Excerpts from Christoph Hellwig's message of January 24, 2021 9:36 pm:
> On Sun, Jan 24, 2021 at 06:22:22PM +1000, Nicholas Piggin wrote:
>> This will be used as a generic kernel virtual mapping function, so
>> re-name it in preparation.
> 
> The new name looks ok, but shouldn't it also move to vmalloc.c with
> the more generic name and purpose?
> 

Yes, I moved it in a later patch to make reviewing easier. Rename in 
this one then the move patch is cut and paste.

Thanks,
Nick


Re: [PATCH] powerpc/64s: fix scv entry fallback flush vs interrupt

2021-01-24 Thread Michael Ellerman
On Mon, 11 Jan 2021 16:24:08 +1000, Nicholas Piggin wrote:
> The L1D flush fallback functions are not recoverable vs interrupts,
> yet the scv entry flush runs with MSR[EE]=1. This can result in a
> timer (soft-NMI) or MCE or SRESET interrupt hitting here and overwriting
> the EXRFI save area, which ends up corrupting userspace registers for
> scv return.
> 
> Fix this by disabling RI and EE for the scv entry fallback flush.

Applied to powerpc/fixes.

[1/1] powerpc/64s: fix scv entry fallback flush vs interrupt
  https://git.kernel.org/powerpc/c/08685be7761d69914f08c3d6211c543a385a5b9c

cheers


Re: [PATCH v10 05/12] mm: HUGE_VMAP arch support cleanup

2021-01-24 Thread Christoph Hellwig
> diff --git a/arch/arm64/include/asm/vmalloc.h 
> b/arch/arm64/include/asm/vmalloc.h
> index 2ca708ab9b20..597b40405319 100644
> --- a/arch/arm64/include/asm/vmalloc.h
> +++ b/arch/arm64/include/asm/vmalloc.h
> @@ -1,4 +1,12 @@
>  #ifndef _ASM_ARM64_VMALLOC_H
>  #define _ASM_ARM64_VMALLOC_H
>  
> +#include 
> +
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +bool arch_vmap_p4d_supported(pgprot_t prot);
> +bool arch_vmap_pud_supported(pgprot_t prot);
> +bool arch_vmap_pmd_supported(pgprot_t prot);
> +#endif

Shouldn't the be inlines or macros?  Also it would be useful
if the architectures would not have to override all functions
but just those that are it actually implements?

Also lots of > 80 char lines in the patch.


Re: [PATCH v10 04/12] mm/ioremap: rename ioremap_*_range to vmap_*_range

2021-01-24 Thread Christoph Hellwig
On Sun, Jan 24, 2021 at 06:22:22PM +1000, Nicholas Piggin wrote:
> This will be used as a generic kernel virtual mapping function, so
> re-name it in preparation.

The new name looks ok, but shouldn't it also move to vmalloc.c with
the more generic name and purpose?


Re: [PATCH v10 03/12] mm/vmalloc: rename vmap_*_range vmap_pages_*_range

2021-01-24 Thread Christoph Hellwig
On Sun, Jan 24, 2021 at 06:22:21PM +1000, Nicholas Piggin wrote:
> The vmalloc mapper operates on a struct page * array rather than a
> linear physical address, re-name it to make this distinction clear.
> 
> Signed-off-by: Nicholas Piggin 

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH v10 02/12] mm: apply_to_pte_range warn and fail if a large pte is encountered

2021-01-24 Thread Christoph Hellwig
On Sun, Jan 24, 2021 at 06:22:20PM +1000, Nicholas Piggin wrote:
> apply_to_pte_range might mistake a large pte for bad, or treat it as a
> page table, resulting in a crash or corruption. Add a test to warn and
> return error if large entries are found.

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH v10 01/12] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings

2021-01-24 Thread Christoph Hellwig
On Sun, Jan 24, 2021 at 06:22:19PM +1000, Nicholas Piggin wrote:
> vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
> Whether or not a vmap is huge depends on the architecture details,
> alignments, boot options, etc., which the caller can not be expected
> to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.
> 
> This change teaches vmalloc_to_page about larger pages, and returns
> the struct page that corresponds to the offset within the large page.
> This makes the API agnostic to mapping implementation details.

Maybe enable instead of fix would be better in the subject line?

Otherwise this looks good:

Reviewed-by: Christoph Hellwig 


[PATCH v10 12/12] powerpc/64s/radix: Enable huge vmalloc mappings

2021-01-24 Thread Nicholas Piggin
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nicholas Piggin 
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 arch/powerpc/Kconfig|  1 +
 arch/powerpc/kernel/module.c| 13 +++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a10b545c2070..d62df53e5200 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3225,6 +3225,8 @@
 
nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.
 
+   nohugevmalloc   [PPC] Disable kernel huge vmalloc mappings.
+
nosmt   [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 107bb4319e0e..781da6829ab7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -181,6 +181,7 @@ config PPC
select GENERIC_GETTIMEOFDAY
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
+   select HAVE_ARCH_HUGE_VMALLOC   if HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index a211b0253cdb..bc2695eeeb4c 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -92,8 +92,17 @@ void *module_alloc(unsigned long size)
 {
BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
 
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, 
GFP_KERNEL,
-   PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, 
NUMA_NO_NODE,
+   /*
+* Don't do huge page allocations for modules yet until more testing
+* is done. STRICT_MODULE_RWX may require extra work to support this
+* too.
+*/
+
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+   GFP_KERNEL,
+   PAGE_KERNEL_EXEC,
+   VM_NOHUGE | VM_FLUSH_RESET_PERMS,
+   NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
-- 
2.23.0



[PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-24 Thread Nicholas Piggin
Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
supports PMD sized vmap mappings.

vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
or larger, and fall back to small pages if that was unsuccessful.

Architectures must ensure that any arch specific vmalloc allocations
that require PAGE_SIZE mappings (e.g., module allocations vs strict
module rwx) use the VM_NOHUGE flag to inhibit larger mappings.

When hugepage vmalloc mappings are enabled in the next patch, this
reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.

This can result in more internal fragmentation and memory overhead for a
given allocation, an option nohugevmalloc is added to disable at boot.

Signed-off-by: Nicholas Piggin 
---
 arch/Kconfig|  10 +++
 include/linux/vmalloc.h |  18 
 mm/page_alloc.c |   5 +-
 mm/vmalloc.c| 192 ++--
 4 files changed, 177 insertions(+), 48 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a3..f87feb616184 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -724,6 +724,16 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
 config HAVE_ARCH_HUGE_VMAP
bool
 
+config HAVE_ARCH_HUGE_VMALLOC
+   depends on HAVE_ARCH_HUGE_VMAP
+   bool
+   help
+ Archs that select this would be capable of PMD-sized vmaps (i.e.,
+ arch_vmap_pmd_supported() returns true), and they must make no
+ assumptions that vmalloc memory is mapped with PAGE_SIZE ptes. The
+ VM_NOHUGE flag can be used to prohibit arch-specific allocations from
+ using hugepages to help with this (e.g., modules may require it).
+
 config ARCH_WANT_HUGE_PMD_SHARE
bool
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 40649c4bb5a2..2ba023daf188 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -25,6 +25,7 @@ struct notifier_block;/* in notifier.h */
 #define VM_NO_GUARD0x0040  /* don't add guard page */
 #define VM_KASAN   0x0080  /* has allocated kasan shadow 
memory */
 #define VM_MAP_PUT_PAGES   0x0100  /* put pages and free array in 
vfree */
+#define VM_NOHUGE  0x0200  /* force PAGE_SIZE pte mapping 
*/
 
 /*
  * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
@@ -59,6 +60,7 @@ struct vm_struct {
unsigned long   size;
unsigned long   flags;
struct page **pages;
+   unsigned intpage_order;
unsigned intnr_pages;
phys_addr_t phys_addr;
const void  *caller;
@@ -194,6 +196,18 @@ static inline void set_vm_flush_reset_perms(void *addr)
if (vm)
vm->flags |= VM_FLUSH_RESET_PERMS;
 }
+
+static inline bool is_vm_area_hugepages(const void *addr)
+{
+   /*
+* This may not 100% tell if the area is mapped with > PAGE_SIZE
+* page table entries, if for some reason the architecture indicates
+* larger sizes are available but decides not to use them, nothing
+* prevents that. This only indicates the size of the physical page
+* allocated in the vmalloc layer.
+*/
+   return (find_vm_area(addr)->page_order > 0);
+}
 #else
 static inline int
 map_kernel_range_noflush(unsigned long start, unsigned long size,
@@ -210,6 +224,10 @@ unmap_kernel_range_noflush(unsigned long addr, unsigned 
long size)
 static inline void set_vm_flush_reset_perms(void *addr)
 {
 }
+static inline bool is_vm_area_hugepages(const void *addr)
+{
+   return false;
+}
 #endif
 
 /* for /dev/kmem */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 027f6481ba59..b7a9661fa232 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -72,6 +72,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -8238,6 +8239,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
void *table = NULL;
gfp_t gfp_flags;
bool virt;
+   bool huge;
 
/* allow the kernel cmdline to have a say */
if (!numentries) {
@@ -8305,6 +8307,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
} else if (get_order(size) >= MAX_ORDER || hashdist) {
table = __vmalloc(size, gfp_flags);
virt = true;
+   huge = is_vm_area_hugepages(table);
} else {
/*
 * If bucketsize is not a power-of-two, we may free
@@ -8321,7 +8324,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
 
pr_info("%s hash table entries: %ld (order: %d, %lu bytes, %s)\n",
tablename, 1UL << log2qty, ilog2(size) - PAGE_SHIFT, 

[PATCH v10 10/12] mm/vmalloc: add vmap_range_noflush variant

2021-01-24 Thread Nicholas Piggin
As a side-effect, the order of flush_cache_vmap() and
arch_sync_kernel_mappings() calls are switched, but that now matches
the other callers in this file.

Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5d79148b7fa7..0377e1d059e5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -235,7 +235,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, 
unsigned long end,
return 0;
 }
 
-int vmap_range(unsigned long addr, unsigned long end,
+static int vmap_range_noflush(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot,
unsigned int max_page_shift)
 {
@@ -257,14 +257,24 @@ int vmap_range(unsigned long addr, unsigned long end,
break;
} while (pgd++, phys_addr += (next - addr), addr = next, addr != end);
 
-   flush_cache_vmap(start, end);
-
if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
arch_sync_kernel_mappings(start, end);
 
return err;
 }
 
+int vmap_range(unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   unsigned int max_page_shift)
+{
+   int err;
+
+   err = vmap_range_noflush(addr, end, phys_addr, prot, max_page_shift);
+   flush_cache_vmap(addr, end);
+
+   return err;
+}
+
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 pgtbl_mod_mask *mask)
 {
-- 
2.23.0



[PATCH v10 09/12] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c

2021-01-24 Thread Nicholas Piggin
This is a generic kernel virtual memory mapper, not specific to ioremap.

Signed-off-by: Nicholas Piggin 
---
 include/linux/vmalloc.h |   3 +
 mm/ioremap.c| 197 
 mm/vmalloc.c| 196 +++
 3 files changed, 199 insertions(+), 197 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 00bd62bd701e..40649c4bb5a2 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -178,6 +178,9 @@ extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
 
 #ifdef CONFIG_MMU
+int vmap_range(unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   unsigned int max_page_shift);
 extern int map_kernel_range_noflush(unsigned long start, unsigned long size,
pgprot_t prot, struct page **pages);
 int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot,
diff --git a/mm/ioremap.c b/mm/ioremap.c
index c67f91164401..d1dcc7e744ac 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -28,203 +28,6 @@ early_param("nohugeiomap", set_nohugeiomap);
 static const bool iomap_max_page_shift = PAGE_SHIFT;
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
-static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
-{
-   pte_t *pte;
-   u64 pfn;
-
-   pfn = phys_addr >> PAGE_SHIFT;
-   pte = pte_alloc_kernel_track(pmd, addr, mask);
-   if (!pte)
-   return -ENOMEM;
-   do {
-   BUG_ON(!pte_none(*pte));
-   set_pte_at(_mm, addr, pte, pfn_pte(pfn, prot));
-   pfn++;
-   } while (pte++, addr += PAGE_SIZE, addr != end);
-   *mask |= PGTBL_PTE_MODIFIED;
-   return 0;
-}
-
-static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift)
-{
-   if (max_page_shift < PMD_SHIFT)
-   return 0;
-
-   if (!arch_vmap_pmd_supported(prot))
-   return 0;
-
-   if ((end - addr) != PMD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(addr, PMD_SIZE))
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PMD_SIZE))
-   return 0;
-
-   if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
-   return 0;
-
-   return pmd_set_huge(pmd, phys_addr, prot);
-}
-
-static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift, pgtbl_mod_mask *mask)
-{
-   pmd_t *pmd;
-   unsigned long next;
-
-   pmd = pmd_alloc_track(_mm, pud, addr, mask);
-   if (!pmd)
-   return -ENOMEM;
-   do {
-   next = pmd_addr_end(addr, end);
-
-   if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot, 
max_page_shift)) {
-   *mask |= PGTBL_PMD_MODIFIED;
-   continue;
-   }
-
-   if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask))
-   return -ENOMEM;
-   } while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
-   return 0;
-}
-
-static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift)
-{
-   if (max_page_shift < PUD_SHIFT)
-   return 0;
-
-   if (!arch_vmap_pud_supported(prot))
-   return 0;
-
-   if ((end - addr) != PUD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(addr, PUD_SIZE))
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PUD_SIZE))
-   return 0;
-
-   if (pud_present(*pud) && !pud_free_pmd_page(pud, addr))
-   return 0;
-
-   return pud_set_huge(pud, phys_addr, prot);
-}
-
-static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift, pgtbl_mod_mask *mask)
-{
-   pud_t *pud;
-   unsigned long next;
-
-   pud = pud_alloc_track(_mm, p4d, addr, mask);
-   if (!pud)
-   return -ENOMEM;
-   do {
-   next = pud_addr_end(addr, end);
-
-   if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot, 
max_page_shift)) {
-   *mask |= PGTBL_PUD_MODIFIED;
-   continue;
-   }
-
-   if (vmap_pmd_range(pud, addr, next, phys_addr, prot, 
max_page_shift, mask))
-   return -ENOMEM;
-   } while 

[PATCH v10 08/12] x86: inline huge vmap supported functions

2021-01-24 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Signed-off-by: Nicholas Piggin 
---
 arch/x86/include/asm/vmalloc.h | 22 +++---
 arch/x86/mm/ioremap.c  | 21 -
 arch/x86/mm/pgtable.c  | 13 -
 3 files changed, 19 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
index 094ea2b565f3..e714b00fc0ca 100644
--- a/arch/x86/include/asm/vmalloc.h
+++ b/arch/x86/include/asm/vmalloc.h
@@ -1,13 +1,29 @@
 #ifndef _ASM_X86_VMALLOC_H
 #define _ASM_X86_VMALLOC_H
 
+#include 
 #include 
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+#ifdef CONFIG_X86_64
+   return boot_cpu_has(X86_FEATURE_GBPAGES);
+#else
+   return false;
+#endif
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   return boot_cpu_has(X86_FEATURE_PSE);
+}
 #endif
 
 #endif /* _ASM_X86_VMALLOC_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index fbaf0c447986..12c686c65ea9 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -481,27 +481,6 @@ void iounmap(volatile void __iomem *addr)
 }
 EXPORT_SYMBOL(iounmap);
 
-#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-
-bool arch_vmap_pud_supported(pgprot_t prot)
-{
-#ifdef CONFIG_X86_64
-   return boot_cpu_has(X86_FEATURE_GBPAGES);
-#else
-   return false;
-#endif
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   return boot_cpu_has(X86_FEATURE_PSE);
-}
-#endif
-
 /*
  * Convert a physical pointer to a virtual kernel pointer for /dev/mem
  * access
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f6a9e2e36642..d27cf69e811d 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -780,14 +780,6 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
-/*
- * Until we support 512GB pages, skip them in the vmap area.
- */
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;
-}
-
 #ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
@@ -861,11 +853,6 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud, unsigned long addr)
-{
-   return pud_none(*pud);
-}
-
 /*
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
-- 
2.23.0



[PATCH v10 07/12] arm64: inline huge vmap supported functions

2021-01-24 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Acked-by: Catalin Marinas 
Signed-off-by: Nicholas Piggin 
---
 arch/arm64/include/asm/vmalloc.h | 23 ---
 arch/arm64/mm/mmu.c  | 26 --
 2 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 597b40405319..fc9a12d6cc1a 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -4,9 +4,26 @@
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+   /*
+* Only 4k granule supports level 1 block mappings.
+* SW table walks can't handle removal of intermediate entries.
+*/
+   return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
+  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   /* See arch_vmap_pud_supported() */
+   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
+}
 #endif
 
 #endif /* _ASM_ARM64_VMALLOC_H */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f6614c378792..ab9ba7c36dae 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1313,27 +1313,6 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int 
*size, pgprot_t prot)
return dt_virt;
 }
 
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-
-bool arch_vmap_pud_supported(pgprot_t prot);
-{
-   /*
-* Only 4k granule supports level 1 block mappings.
-* SW table walks can't handle removal of intermediate entries.
-*/
-   return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
-  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   /* See arch_vmap_pud_supported() */
-   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
-}
-
 int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot)
 {
pud_t new_pud = pfn_pud(__phys_to_pfn(phys), mk_pud_sect_prot(prot));
@@ -1425,11 +1404,6 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
return 1;
 }
 
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;   /* Don't attempt a block mapping */
-}
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
 {
-- 
2.23.0



[PATCH v10 06/12] powerpc: inline huge vmap supported functions

2021-01-24 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/vmalloc.h   | 19 ---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 21 -
 2 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/vmalloc.h 
b/arch/powerpc/include/asm/vmalloc.h
index 105abb73f075..3f0c153befb0 100644
--- a/arch/powerpc/include/asm/vmalloc.h
+++ b/arch/powerpc/include/asm/vmalloc.h
@@ -1,12 +1,25 @@
 #ifndef _ASM_POWERPC_VMALLOC_H
 #define _ASM_POWERPC_VMALLOC_H
 
+#include 
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+   /* HPT does not cope with large pages in the vmalloc area */
+   return radix_enabled();
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   return radix_enabled();
+}
 #endif
 
 #endif /* _ASM_POWERPC_VMALLOC_H */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 743807fc210f..8da62afccee5 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1082,22 +1082,6 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
 }
 
-bool arch_vmap_pud_supported(pgprot_t prot)
-{
-   /* HPT does not cope with large pages in the vmalloc area */
-   return radix_enabled();
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   return radix_enabled();
-}
-
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;
-}
-
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
pte_t *ptep = (pte_t *)pud;
@@ -1181,8 +1165,3 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
return 1;
 }
-
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-- 
2.23.0



[PATCH v10 05/12] mm: HUGE_VMAP arch support cleanup

2021-01-24 Thread Nicholas Piggin
This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.

This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.

This also adds a prot argument to the arch query. This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).

Cc: linuxppc-dev@lists.ozlabs.org
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Acked-by: Catalin Marinas  [arm64]
Signed-off-by: Nicholas Piggin 
---
 arch/arm64/include/asm/vmalloc.h |  8 +++
 arch/arm64/mm/mmu.c  | 10 +--
 arch/powerpc/include/asm/vmalloc.h   |  8 +++
 arch/powerpc/mm/book3s64/radix_pgtable.c |  8 +--
 arch/x86/include/asm/vmalloc.h   |  7 ++
 arch/x86/mm/ioremap.c| 12 ++--
 include/linux/io.h   |  9 ---
 include/linux/vmalloc.h  |  6 ++
 init/main.c  |  1 -
 mm/ioremap.c | 88 +---
 10 files changed, 79 insertions(+), 78 deletions(-)

diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 2ca708ab9b20..597b40405319 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -1,4 +1,12 @@
 #ifndef _ASM_ARM64_VMALLOC_H
 #define _ASM_ARM64_VMALLOC_H
 
+#include 
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_ARM64_VMALLOC_H */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ae0c3d023824..f6614c378792 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1313,12 +1313,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int 
*size, pgprot_t prot)
return dt_virt;
 }
 
-int __init arch_ioremap_p4d_supported(void)
+bool arch_vmap_p4d_supported(pgprot_t prot)
 {
-   return 0;
+   return false;
 }
 
-int __init arch_ioremap_pud_supported(void)
+bool arch_vmap_pud_supported(pgprot_t prot);
 {
/*
 * Only 4k granule supports level 1 block mappings.
@@ -1328,9 +1328,9 @@ int __init arch_ioremap_pud_supported(void)
   !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
-int __init arch_ioremap_pmd_supported(void)
+bool arch_vmap_pmd_supported(pgprot_t prot)
 {
-   /* See arch_ioremap_pud_supported() */
+   /* See arch_vmap_pud_supported() */
return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
diff --git a/arch/powerpc/include/asm/vmalloc.h 
b/arch/powerpc/include/asm/vmalloc.h
index b992dfaaa161..105abb73f075 100644
--- a/arch/powerpc/include/asm/vmalloc.h
+++ b/arch/powerpc/include/asm/vmalloc.h
@@ -1,4 +1,12 @@
 #ifndef _ASM_POWERPC_VMALLOC_H
 #define _ASM_POWERPC_VMALLOC_H
 
+#include 
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_POWERPC_VMALLOC_H */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 98f0b243c1ab..743807fc210f 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1082,13 +1082,13 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
 }
 
-int __init arch_ioremap_pud_supported(void)
+bool arch_vmap_pud_supported(pgprot_t prot)
 {
/* HPT does not cope with large pages in the vmalloc area */
return radix_enabled();
 }
 
-int __init arch_ioremap_pmd_supported(void)
+bool arch_vmap_pmd_supported(pgprot_t prot)
 {
return radix_enabled();
 }
@@ -1182,7 +1182,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
return 1;
 }
 
-int __init arch_ioremap_p4d_supported(void)
+bool arch_vmap_p4d_supported(pgprot_t prot)
 {
-   return 0;
+   return false;
 }
diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
index 29837740b520..094ea2b565f3 100644
--- a/arch/x86/include/asm/vmalloc.h
+++ b/arch/x86/include/asm/vmalloc.h
@@ -1,6 +1,13 @@
 #ifndef _ASM_X86_VMALLOC_H
 #define _ASM_X86_VMALLOC_H
 
+#include 
 #include 
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_X86_VMALLOC_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 9e5ccc56f8e0..fbaf0c447986 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -481,24 +481,26 @@ void iounmap(volatile void __iomem *addr)
 }
 

[PATCH v10 04/12] mm/ioremap: rename ioremap_*_range to vmap_*_range

2021-01-24 Thread Nicholas Piggin
This will be used as a generic kernel virtual mapping function, so
re-name it in preparation.

Signed-off-by: Nicholas Piggin 
---
 mm/ioremap.c | 64 +++-
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/mm/ioremap.c b/mm/ioremap.c
index 5fa1ab41d152..3f4d36f9745a 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -61,9 +61,9 @@ static inline int ioremap_pud_enabled(void) { return 0; }
 static inline int ioremap_pmd_enabled(void) { return 0; }
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
-static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pte_t *pte;
u64 pfn;
@@ -81,9 +81,8 @@ static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
return 0;
 }
 
-static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_pmd_enabled())
return 0;
@@ -103,9 +102,9 @@ static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long 
addr,
return pmd_set_huge(pmd, phys_addr, prot);
 }
 
-static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pmd_t *pmd;
unsigned long next;
@@ -116,20 +115,19 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned 
long addr,
do {
next = pmd_addr_end(addr, end);
 
-   if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) {
+   if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) {
*mask |= PGTBL_PMD_MODIFIED;
continue;
}
 
-   if (ioremap_pte_range(pmd, addr, next, phys_addr, prot, mask))
+   if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask))
return -ENOMEM;
} while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
return 0;
 }
 
-static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_pud_enabled())
return 0;
@@ -149,9 +147,9 @@ static int ioremap_try_huge_pud(pud_t *pud, unsigned long 
addr,
return pud_set_huge(pud, phys_addr, prot);
 }
 
-static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pud_t *pud;
unsigned long next;
@@ -162,20 +160,19 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
do {
next = pud_addr_end(addr, end);
 
-   if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot)) {
+   if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot)) {
*mask |= PGTBL_PUD_MODIFIED;
continue;
}
 
-   if (ioremap_pmd_range(pud, addr, next, phys_addr, prot, mask))
+   if (vmap_pmd_range(pud, addr, next, phys_addr, prot, mask))
return -ENOMEM;
} while (pud++, phys_addr += (next - addr), addr = next, addr != end);
return 0;
 }
 
-static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_p4d(p4d_t *p4d, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_p4d_enabled())
return 0;
@@ -195,9 +192,9 @@ static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long 
addr,
return p4d_set_huge(p4d, phys_addr, prot);
 }
 
-static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
-   

[PATCH v10 03/12] mm/vmalloc: rename vmap_*_range vmap_pages_*_range

2021-01-24 Thread Nicholas Piggin
The vmalloc mapper operates on a struct page * array rather than a
linear physical address, re-name it to make this distinction clear.

Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 62372f9e0167..7f2f36116980 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -189,7 +189,7 @@ void unmap_kernel_range_noflush(unsigned long start, 
unsigned long size)
arch_sync_kernel_mappings(start, end);
 }
 
-static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
+static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -217,7 +217,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
return 0;
 }
 
-static int vmap_pmd_range(pud_t *pud, unsigned long addr,
+static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -229,13 +229,13 @@ static int vmap_pmd_range(pud_t *pud, unsigned long addr,
return -ENOMEM;
do {
next = pmd_addr_end(addr, end);
-   if (vmap_pte_range(pmd, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (pmd++, addr = next, addr != end);
return 0;
 }
 
-static int vmap_pud_range(p4d_t *p4d, unsigned long addr,
+static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -247,13 +247,13 @@ static int vmap_pud_range(p4d_t *p4d, unsigned long addr,
return -ENOMEM;
do {
next = pud_addr_end(addr, end);
-   if (vmap_pmd_range(pud, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (pud++, addr = next, addr != end);
return 0;
 }
 
-static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
+static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -265,7 +265,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
return -ENOMEM;
do {
next = p4d_addr_end(addr, end);
-   if (vmap_pud_range(p4d, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (p4d++, addr = next, addr != end);
return 0;
@@ -306,7 +306,7 @@ int map_kernel_range_noflush(unsigned long addr, unsigned 
long size,
next = pgd_addr_end(addr, end);
if (pgd_bad(*pgd))
mask |= PGTBL_PGD_MODIFIED;
-   err = vmap_p4d_range(pgd, addr, next, prot, pages, , );
+   err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, , 
);
if (err)
return err;
} while (pgd++, addr = next, addr != end);
-- 
2.23.0



[PATCH v10 02/12] mm: apply_to_pte_range warn and fail if a large pte is encountered

2021-01-24 Thread Nicholas Piggin
apply_to_pte_range might mistake a large pte for bad, or treat it as a
page table, resulting in a crash or corruption. Add a test to warn and
return error if large entries are found.

Signed-off-by: Nicholas Piggin 
---
 mm/memory.c | 66 +++--
 1 file changed, 49 insertions(+), 17 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index feff48e1465a..672e39a72788 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2440,13 +2440,21 @@ static int apply_to_pmd_range(struct mm_struct *mm, 
pud_t *pud,
}
do {
next = pmd_addr_end(addr, end);
-   if (create || !pmd_none_or_clear_bad(pmd)) {
-   err = apply_to_pte_range(mm, pmd, addr, next, fn, data,
-create, mask);
-   if (err)
-   break;
+   if (pmd_none(*pmd) && !create)
+   continue;
+   if (WARN_ON_ONCE(pmd_leaf(*pmd)))
+   return -EINVAL;
+   if (!pmd_none(*pmd) && WARN_ON_ONCE(pmd_bad(*pmd))) {
+   if (!create)
+   continue;
+   pmd_clear_bad(pmd);
}
+   err = apply_to_pte_range(mm, pmd, addr, next,
+fn, data, create, mask);
+   if (err)
+   break;
} while (pmd++, addr = next, addr != end);
+
return err;
 }
 
@@ -2468,13 +2476,21 @@ static int apply_to_pud_range(struct mm_struct *mm, 
p4d_t *p4d,
}
do {
next = pud_addr_end(addr, end);
-   if (create || !pud_none_or_clear_bad(pud)) {
-   err = apply_to_pmd_range(mm, pud, addr, next, fn, data,
-create, mask);
-   if (err)
-   break;
+   if (pud_none(*pud) && !create)
+   continue;
+   if (WARN_ON_ONCE(pud_leaf(*pud)))
+   return -EINVAL;
+   if (!pud_none(*pud) && WARN_ON_ONCE(pud_bad(*pud))) {
+   if (!create)
+   continue;
+   pud_clear_bad(pud);
}
+   err = apply_to_pmd_range(mm, pud, addr, next,
+fn, data, create, mask);
+   if (err)
+   break;
} while (pud++, addr = next, addr != end);
+
return err;
 }
 
@@ -2496,13 +2512,21 @@ static int apply_to_p4d_range(struct mm_struct *mm, 
pgd_t *pgd,
}
do {
next = p4d_addr_end(addr, end);
-   if (create || !p4d_none_or_clear_bad(p4d)) {
-   err = apply_to_pud_range(mm, p4d, addr, next, fn, data,
-create, mask);
-   if (err)
-   break;
+   if (p4d_none(*p4d) && !create)
+   continue;
+   if (WARN_ON_ONCE(p4d_leaf(*p4d)))
+   return -EINVAL;
+   if (!p4d_none(*p4d) && WARN_ON_ONCE(p4d_bad(*p4d))) {
+   if (!create)
+   continue;
+   p4d_clear_bad(p4d);
}
+   err = apply_to_pud_range(mm, p4d, addr, next,
+fn, data, create, mask);
+   if (err)
+   break;
} while (p4d++, addr = next, addr != end);
+
return err;
 }
 
@@ -2522,9 +2546,17 @@ static int __apply_to_page_range(struct mm_struct *mm, 
unsigned long addr,
pgd = pgd_offset(mm, addr);
do {
next = pgd_addr_end(addr, end);
-   if (!create && pgd_none_or_clear_bad(pgd))
+   if (pgd_none(*pgd) && !create)
continue;
-   err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, create, 
);
+   if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+   return -EINVAL;
+   if (!pgd_none(*pgd) && WARN_ON_ONCE(pgd_bad(*pgd))) {
+   if (!create)
+   continue;
+   pgd_clear_bad(pgd);
+   }
+   err = apply_to_p4d_range(mm, pgd, addr, next,
+fn, data, create, );
if (err)
break;
} while (pgd++, addr = next, addr != end);
-- 
2.23.0



[PATCH v10 01/12] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings

2021-01-24 Thread Nicholas Piggin
vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
Whether or not a vmap is huge depends on the architecture details,
alignments, boot options, etc., which the caller can not be expected
to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.

This change teaches vmalloc_to_page about larger pages, and returns
the struct page that corresponds to the offset within the large page.
This makes the API agnostic to mapping implementation details.

[*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
fail gracefully on unexpected huge vmap mappings")

Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 41 ++---
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e6f352bf0498..62372f9e0167 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -34,7 +34,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 #include 
 #include 
@@ -343,7 +343,9 @@ int is_vmalloc_or_module_addr(const void *x)
 }
 
 /*
- * Walk a vmap address to the struct page it maps.
+ * Walk a vmap address to the struct page it maps. Huge vmap mappings will
+ * return the tail page that corresponds to the base page address, which
+ * matches small vmap mappings.
  */
 struct page *vmalloc_to_page(const void *vmalloc_addr)
 {
@@ -363,25 +365,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 
if (pgd_none(*pgd))
return NULL;
+   if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+   return NULL; /* XXX: no allowance for huge pgd */
+   if (WARN_ON_ONCE(pgd_bad(*pgd)))
+   return NULL;
+
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d))
return NULL;
-   pud = pud_offset(p4d, addr);
+   if (p4d_leaf(*p4d))
+   return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(p4d_bad(*p4d)))
+   return NULL;
 
-   /*
-* Don't dereference bad PUD or PMD (below) entries. This will also
-* identify huge mappings, which we may encounter on architectures
-* that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
-* identified as vmalloc addresses by is_vmalloc_addr(), but are
-* not [unambiguously] associated with a struct page, so there is
-* no correct value to return for them.
-*/
-   WARN_ON_ONCE(pud_bad(*pud));
-   if (pud_none(*pud) || pud_bad(*pud))
+   pud = pud_offset(p4d, addr);
+   if (pud_none(*pud))
+   return NULL;
+   if (pud_leaf(*pud))
+   return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(pud_bad(*pud)))
return NULL;
+
pmd = pmd_offset(pud, addr);
-   WARN_ON_ONCE(pmd_bad(*pmd));
-   if (pmd_none(*pmd) || pmd_bad(*pmd))
+   if (pmd_none(*pmd))
+   return NULL;
+   if (pmd_leaf(*pmd))
+   return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(pmd_bad(*pmd)))
return NULL;
 
ptep = pte_offset_map(pmd, addr);
@@ -389,6 +399,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
if (pte_present(pte))
page = pte_page(pte);
pte_unmap(ptep);
+
return page;
 }
 EXPORT_SYMBOL(vmalloc_to_page);
-- 
2.23.0



[PATCH v10 00/12] huge vmalloc mappings

2021-01-24 Thread Nicholas Piggin
Fixed a couple of bugs that Ding noticed in review and testing.

Thanks,
Nick

Since v9:
- Fixed intermediate build breakage on x86-32 !PAE [thanks Ding]
- Fixed small page fallback case vm_struct double-free [thanks Ding]

Since v8:
- Fixed nommu compile.
- Added Kconfig option help text
- Added VM_NOHUGE which should help archs implement it [suggested by Rick]

Since v7:
- Rebase, added some acks, compile fix
- Removed "order=" from vmallocinfo, it's a bit confusing (nr_pages
  is in small page size for compatibility).
- Added arch_vmap_pmd_supported() test before starting to allocate
  the large page, rather than only testing it when doing the map, to
  avoid unsupported configs trying to allocate huge pages for no
  reason.

Since v6:
- Fixed a false positive warning introduced in patch 2, found by
  kbuild test robot.

Since v5:
- Split arch changes out better and make the constant folding work
- Avoid most of the 80 column wrap, fix a reference to lib/ioremap.c
- Fix compile error on some archs

Since v4:
- Fixed an off-by-page-order bug in v4
- Several minor cleanups.
- Added page order to /proc/vmallocinfo
- Added hugepage to alloc_large_system_hage output.
- Made an architecture config option, powerpc only for now.

Since v3:
- Fixed an off-by-one bug in a loop
- Fix !CONFIG_HAVE_ARCH_HUGE_VMAP build fail

*** BLURB HERE ***

Nicholas Piggin (12):
  mm/vmalloc: fix vmalloc_to_page for huge vmap mappings
  mm: apply_to_pte_range warn and fail if a large pte is encountered
  mm/vmalloc: rename vmap_*_range vmap_pages_*_range
  mm/ioremap: rename ioremap_*_range to vmap_*_range
  mm: HUGE_VMAP arch support cleanup
  powerpc: inline huge vmap supported functions
  arm64: inline huge vmap supported functions
  x86: inline huge vmap supported functions
  mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c
  mm/vmalloc: add vmap_range_noflush variant
  mm/vmalloc: Hugepage vmalloc mappings
  powerpc/64s/radix: Enable huge vmalloc mappings

 .../admin-guide/kernel-parameters.txt |   2 +
 arch/Kconfig  |  10 +
 arch/arm64/include/asm/vmalloc.h  |  25 +
 arch/arm64/mm/mmu.c   |  26 -
 arch/powerpc/Kconfig  |   1 +
 arch/powerpc/include/asm/vmalloc.h|  21 +
 arch/powerpc/kernel/module.c  |  13 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c  |  21 -
 arch/x86/include/asm/vmalloc.h|  23 +
 arch/x86/mm/ioremap.c |  19 -
 arch/x86/mm/pgtable.c |  13 -
 include/linux/io.h|   9 -
 include/linux/vmalloc.h   |  27 ++
 init/main.c   |   1 -
 mm/ioremap.c  | 225 +
 mm/memory.c   |  66 ++-
 mm/page_alloc.c   |   5 +-
 mm/vmalloc.c  | 455 +++---
 18 files changed, 563 insertions(+), 399 deletions(-)

-- 
2.23.0